Data Visualization in Python: A Beginner’s Guide to Creating Stunning Plots with Pandas
Machine Learning Libraries (Part 11)
📚Chapter: 2 - Pandas
If you want to read more articles about Machine Learning Libraries, don’t forget to stay tuned :) click here.
Introduction
In the world of data science, visualization is key to understanding and communicating insights. One of the most powerful tools for data visualization in Python is Pandas. This versatile library not only excels at data manipulation and analysis but also offers robust capabilities for creating informative and visually appealing charts. In this blog, we’ll explore the basics of data visualization with Pandas and demonstrate how to create various types of plots.
Sections
Why Use Pandas for Data Visualization
Getting Started with Pandas
Loading Data
Creating Basic Plots
Customizing Plots
Conclusion
Section 1- Why Use Pandas for Data Visualization?
Pandas is a widely-used data manipulation library that integrates seamlessly with Matplotlib, a comprehensive library for creating static, animated, and interactive visualizations in Python. The combination of Pandas and Matplotlib allows you to quickly and efficiently generate plots directly from Pandas DataFrames. This integration simplifies the process of data visualization, making it more accessible to both beginners and experienced data scientists.
Section 2- Getting Started with Pandas
Before diving into data visualization, Ensure you have the required libraries installed. You can install them using pip:
pip install pandas matplotlib
pip install matplotlib seaborn
pip install seaborn
Once installed, you can start by importing the necessary libraries:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Section 3- Loading Data
Let’s begin by loading a sample dataset. Pandas supports various data formats, but we’ll use a CSV file for this example. You can use any dataset of your choice. For this blog, we’ll use a dataset containing information about car sales:
# Load the Titanic dataset
titanic = sns.load_dataset('titanic')
print(titanic.head())
The Titanic dataset includes columns such as survived
, pclass
, sex
, age
, fare
, and embarked
.
Section 4- Creating Basic Plots
Line Plot
A line plot is useful for visualizing trends over time. Suppose we want to visualize car sales over the years:
# Plotting average age by passenger class
titanic.groupby('pclass')['age'].mean().plot(kind='line', title='Average Age by Passenger Class')
plt.xlabel('Passenger Class')
plt.ylabel('Average Age')
plt.show()
titanic.plot()
Bar Plot
Bar plots are great for comparing categorical data. For example, to compare the total sales by car make:
# Plotting the number of survivors vs non-survivors
titanic['survived'].value_counts().plot(kind='bar', title='Survival Counts', color=['skyblue', 'salmon'])
plt.xlabel('Survived')
plt.ylabel('Count')
plt.show()
or
titanic.plot.bar()
titanic.plot.barh()
Histogram
Histograms are useful for understanding the distribution of a numerical variable. Let’s plot the distribution of car prices:
# Plotting the distribution of passenger age
titanic['age'].plot(kind='hist', title='Distribution of Passenger Age', bins=20, color='skyblue')
plt.xlabel('Age')
plt.show()
ordf.plot(kind='hist')
Scatter Plot
Scatter plots help in identifying relationships between two numerical variables. To visualize the relationship between car price and sales:
# Plotting the relationship between fare and age
titanic.plot(kind='scatter', x='fare', y='age', title='Fare vs Age')
plt.xlabel('Fare')
plt.ylabel('Age')
plt.show()
df.plot.scatter(X='Duration',y='Pulse')
Area Plot
Area plots are useful for showing cumulative totals over time.
titanic.plot.area()
Box Plot
Box plots are useful for showing the distribution of data based on a five-number summary.
titanic.plot.box()
Hexagonal binning Plot
Hexbin plots are useful for visualizing the density of points in a scatter plot.
titanic.plot.hexbin()
Density Estimate Plot
df.plot.kde()
Pie Plot
df.plot.pie()
Save the plot as an image
df.plot()
plt.savefig('lineplot.png)
Section 5- Customizing Plots
Pandas plots can be customized to enhance readability and visual appeal. You can modify colors, add gridlines, set figure sizes, and more. Here are a few examples:
Customizing Colors and Adding Gridlines
# Customizing the bar plot for survival counts
ax = titanic['survived'].value_counts().plot(kind='bar', title='Survival Counts', color=['skyblue', 'salmon'])
ax.set_xlabel('Survived')
ax.set_ylabel('Count')
ax.grid(True)
plt.show()
Setting Figure Size
# Setting figure size for the scatter plot
titanic.plot(kind='scatter', x='fare', y='age', title='Fare vs Age', figsize=(10, 6))
plt.xlabel('Fare')
plt.ylabel('Age')
plt.show()
Adding Annotations
# Adding annotations to the line plot
ax = titanic.groupby('pclass')['age'].mean().plot(kind='line', title='Average Age by Passenger Class')
ax.set_xlabel('Passenger Class')
ax.set_ylabel('Average Age')
for x, y in titanic.groupby('pclass')['age'].mean().items():
ax.text(x, y, f'{y:.1f}', fontsize=9, ha='center', va='bottom')
plt.show()
Conclusion
Pandas, in combination with Matplotlib, provides a powerful toolkit for data visualization in Python. With just a few lines of code, you can create a variety of plots to explore and present your data. Whether you’re analyzing trends, comparing categories, or uncovering relationships, Pandas makes it easy to visualize your data effectively.
By mastering data visualization with Pandas, you’ll be better equipped to communicate your findings and make data-driven decisions. So, dive in, experiment with different plots, and unlock the full potential of your data!
Please Follow and 👏 Subscribe for the story courses teach to see latest updates on this story
🚀 Elevate Your Data Skills with Coursesteach! 🚀
Ready to dive into Python, Machine Learning, Data Science, Statistics, Linear Algebra, Computer Vision, and Research? Coursesteach has you covered!
🔍 Python, 🤖 ML, 📊 Stats, ➕ Linear Algebra, 👁️🗨️ Computer Vision, 🔬 Research — all in one place!
Don’t Miss Out on This Exclusive Opportunity to Enhance Your Skill Set! Enroll Today 🌟 at
Machine Learning libraries Course
🔍 Explore Tools, Python libraries for ML, Slides, Source Code, Free online Courses and More!
Stay tuned for our upcoming articles because we reach end to end ,where we will explore specific topics related to Machine Learning libraries in more detail!
Remember, learning is a continuous process. So keep learning and keep creating and Sharing with others!💻✌️
Ready to dive into data science and AI but unsure how to start? I’m here to help! Offering personalized research supervision and long-term mentoring. Let’s chat on Skype: themushtaq48 or email me at mushtaqmsit@gmail.com. Let’s kickstart your journey together!
Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.
Together, let’s make this the best AI learning Community! 🚀