Mastering Pandas: 10 Essential Functions Every Data Scientist Should Know
Machine Learning Libraries (Part 12)
📚Chapter: 2 - Pandas
If you want to read more articles about Machine Learning Libraries, don’t forget to stay tuned :) click here.
Introduction
Pandas is one of the most powerful and versatile libraries in the Python ecosystem, especially when it comes to data manipulation and analysis. If you are working with data, understanding and leveraging Pandas is crucial. While basic operations like filtering, sorting, and grouping are widely discussed, there are numerous other essential functions that can significantly boost your productivity and efficiency. Let’s dive into some of these lesser-known but equally important Pandas functions.
Sections
Sample
Memory usage
Data Selection
PandasAI
cudf.pandas
df.assign()
df.query()
pd.pivot_table()
df.to_csv()
Conclusion
1- Sample
It displays the random n number of rows in the sample data
Example
df.sample(6)
2-Memory usage
df.memory_usage(): It will tell you how much memory is being consumed by each column.
3- Data Selection
You can also select the data of any specific row, column, or even multiple columns.
df.iloc[row_num]
: It will select a particular row based on its index
Example
df.iloc[0]
df[col_name]
: It will select the particular column
Example
df["SALES"]
df[[‘col1’, ‘col2’]]
: It will select multiple columns given
Example
df[["SALES", "PRICEEACH"]]
4-PandasAI
PandasAI is a library that makes data analysis conversational and fun again. It leverages the power of pandas dataframes combined to the most advanced LLMs to let users to data analysis in a conversational way.
To get started, we need to install the last version of PandasAI.
!pip install pandasai
SmartDataframe
A SmartDataframe is a pandas (or polars) dataframe that inherits all the properties and methods from the pd.DataFrame, but also adds conversational features to it.
from pandasai import SmartDataframe
import pandas as pd
df = pd.DataFrame({
"country": [
"United States",
"United Kingdom",
"France",
"Germany",
"Italy",
"Spain",
"Canada",
"Australia",
"Japan",
"China",
],
"gdp": [
19294482071552,
2891615567872,
2411255037952,
3435817336832,
1745433788416,
1181205135360,
1607402389504,
1490967855104,
4380756541440,
14631844184064,
],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],
})
Since PandasAI is powered by a LLM, you should import the LLM you’d like to use for your use case.
By default, if no LLM is provided, it will use BambooLLM.
import os
os.environ['PANDASAI_API_KEY'] = "API_KEY"
sdf = SmartDataframe(df)
5-cudf.pandas
cudf.pandas
is a module within the cudf library, part of the RAPIDS AI suite developed by NVIDIA. It is designed to be a drop-in replacement for the pandas
library, allowing users to run data manipulation operations on GPUs, thereby achieving significant speedups.
6- df.assign()
Pandas’ .assign() function is used to add new columns to a DataFrame, based on the computation of existing columns. It allows you to add new columns to a DataFrame without modifying the original dataframe. The function returns a new DataFrame with the added columns.
Here is an example of how you can use it:
df_new = df.assign(count_plus_5=df['Count'] + 5)
df_new.head()
7- df.query()
Pandas’ .query() function allows you to filter a DataFrame based on a Boolean expression. It allows you to select rows from a DataFrame using a query string similar to SQL. The function returns a new DataFrame containing only the rows that satisfy the Boolean expression.
Here is an example of how you can use it:
# Select rows where gender is Male
df_query = df.query("Gender == 'MALE'")
df_query.head()
8-pd.pivot_table()
pd.pivot_table() is a method in the Pandas library that is used to create a pivot table from a DataFrame. A pivot table is a table that summarizes and aggregates data in a more meaningful and organized way, by creating a new table with one or more columns as the index, one or more columns as values, and one or more columns as attributes.
In the example below we will create a pivot table with Ethnicity as the index and aggregate the sum of the count. This is used to know the count of each Ethnicity in the dataset.
pivot_table = pd.pivot_table(df, index='Ethnicity', values='Count', aggfunc='sum')
pivot_table.head()
9. df.transpose()
df.transpose() is a method in the Pandas library used to transpose the rows and columns of a DataFrame? This means that the rows become columns and the columns become rows.
# Transpose the DataFrame
df_transposed = df.transpose()
# Print the transposed DataFrame
df_transposed.head()
10-df.to_csv()
df.to_csv() is a method used in the Pandas library to export a DataFrame to a CSV file. CSV stands for “Comma Separated Values” and it is a popular file format for storing data in a tabular form.
For example, let’s say we want to save df that you want to export to a CSV file. You can export the DataFrame to a CSV file by calling df.to_csv() and passing the file name as a string:
df.to_csv('data.csv')
It is also possible to only save specific columns of the DataFrame by passing the list of column names to the columns parameter, and also to save only specific rows by passing a boolean mask to the index parameter.
df.to_csv('path/to/data.csv', columns=['Rank','Count'])
Conclusion
Pandas is an incredibly versatile library, and mastering these essential functions can elevate your data manipulation and analysis skills to new heights. Whether you’re working with complex data transformations, time series analysis, or merging multiple datasets, these functions will empower you to handle data with ease and efficiency. The more you explore Pandas, the more you’ll uncover its hidden gems, making it an indispensable tool in your data science toolkit.
Please Follow and 👏 Clap for the story courses teach to see latest updates on this story
🚀 Elevate Your Data Skills with Coursesteach! 🚀
Ready to dive into Python, Machine Learning, Data Science, Statistics, Linear Algebra, Computer Vision, and Research? Coursesteach has you covered!
🔍 Python, 🤖 ML, 📊 Stats, ➕ Linear Algebra, 👁️🗨️ Computer Vision, 🔬 Research — all in one place!
Don’t Miss Out on This Exclusive Opportunity to Enhance Your Skill Set! Enroll Today 🌟 at
Machine Learning libraries Course
🔍 Explore Tools, Python libraries for ML, Slides, Source Code, Free online Courses and More!
Stay tuned for our upcoming articles because we reach end to end ,where we will explore specific topics related to Machine Learning libraries in more detail!
Remember, learning is a continuous process. So keep learning and keep creating and Sharing with others!💻✌️
Ready to dive into data science and AI but unsure how to start? I’m here to help! Offering personalized research supervision and long-term mentoring. Let’s chat on Skype: themushtaq48 or email me at mushtaqmsit@gmail.com. Let’s kickstart your journey together!
Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.
Together, let’s make this the best AI learning Community! 🚀
Source
1–10 Essential Pandas Functions Every Data Scientist Should Know