📊 Getting Started with Matplotlib for Exploratory Data Analysis (EDA)
ML-Libraries (Part 15)-
📚 Chapter 4: Exploratory Data Analysis (EDA)
If you want to read more articles about Machine Learning Libraries , don’t forget to stay tuned :) click here.
Introduction
In the world of data analysis, visualizing data is a crucial step in uncovering patterns, trends, and insights. One of the most powerful and widely used libraries for data visualization in Python is Matplotlib. Whether you’re new to Python or an experienced data scientist, Matplotlib offers a flexible, easy-to-use interface to create various types of static, animated, and interactive plots. Matplotlib provides a MATLAB like plotting environment to prepare high-quality figures/charts for publications, notebooks, web applications and so on.
Matplotlib is a powerful plotting library in Python that allows you to create a wide variety of static, animated, and interactive plots. It’s particularly useful for visualizing data and displaying images.
In this blog post, we’ll dive into the basics of Matplotlib, learn how to create common plots, and explore customization options to make your visualizations more insightful and visually appealing.
Sections
Why Use Matplotlib?
Getting Started with Matplotlib
Line Plot: The Foundation of Data Visualization
Bar Chart: Comparing Categories
Scatter Plot: Visualizing Relationships Between Variables
Histogram: Understanding Data Distributions
Pie Chart: Showing Proportions
Customizing Plots
Dark Background
Conclusion
Why Use Matplotlib?
Matplotlib is the foundation of Python’s data visualization ecosystem. Its versatility and simplicity make it a go-to library for creating:
Line plots
Bar charts
Scatter plots
Histograms
Pie charts
And much more
The library integrates seamlessly with other popular Python libraries such as pandas and NumPy, making it easy to visualize data from these sources.
Getting Started with Matplotlib
Before we begin, let’s make sure you have Matplotlib installed. You can install it via pip if you haven’t already:
pip install matplotlib
Now, let’s walk through some basic examples.
1. Line Plot: The Foundation of Data Visualization
A line plot is one of the simplest and most common visualizations. It shows the relationship between two variables by connecting data points with a straight line.
Here’s how to create a basic line plot:
import matplotlib.pyplot as plt
%matplotlib inline
# Sample data
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
# Create a line plot
plt.plot(x, y)
# Add title and labels
plt.title('Line Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# Show the plot
plt.show()
The %matplotlib inline
magic command is used to display plots inline within the Jupyter Notebook.
import matplotlib.pyplot as plt
import pandas as pd
# Sample data (assuming you intended to use the x and y data in a DataFrame)
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
# Create a Pandas DataFrame
df = pd.DataFrame({'x': x, 'y': y})
plt.rcParams['figure.figsize'] = (14, 10)
plt.plot(df['x'], df['y']) # Plot the 'x' and 'y' columns of the DataFrame
plt.title('Line Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
rcParams
is a dictionary-like object in Matplotlib that stores the default configuration settings for your plots. You can use it to customize various aspects of your plots, such as figure size, line styles, colors, fonts, and more.
2. Bar Chart: Comparing Categories
Bar charts are useful when comparing quantities across different categories. For instance, we can compare the sales of different products or the frequency of different events.
import matplotlib.pyplot as plt
# Sample data
categories = ['A', 'B', 'C', 'D']
values = [3, 7, 2, 5]
# Create a bar chart
plt.bar(categories, values)
# Add title and labels
plt.title('Bar Chart Example')
plt.xlabel('Categories')
plt.ylabel('Values')
# Show the plot
plt.show()
3. Scatter Plot: Visualizing Relationships Between Variables
Scatter plots are useful for exploring the relationships between two continuous variables. For instance, we can visualize the relationship between height and weight, or hours studied and exam scores.
import matplotlib.pyplot as plt
# Sample data
x = [5, 7, 8, 9, 10]
y = [12, 15, 14, 18, 20]
# Create a scatter plot
plt.scatter(x, y)
# Add title and labels
plt.title('Scatter Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# Show the plot
plt.show()
4. Histogram: Understanding Data Distributions
A histogram is used to visualize the distribution of numerical data by grouping it into bins. It’s particularly useful for understanding the frequency distribution of data points in a dataset.
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
data = np.random.randn(1000)
# Create a histogram
plt.hist(data, bins=30, alpha=0.75, color='blue')
# Add title and labels
plt.title('Histogram Example')
plt.xlabel('Value')
plt.ylabel('Frequency')
# Show the plot
plt.show()
5. Pie Chart: Showing Proportions
Pie charts are great for displaying the proportions of categories within a whole. Here’s how to create one in Matplotlib:
import matplotlib.pyplot as plt
# Sample data
labels = ['A', 'B', 'C', 'D']
sizes = [20, 30, 25, 25]
# Create a pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
# Add title
plt.title('Pie Chart Example')
# Show the plot
plt.show()
Customizing Plots
Matplotlib allows you to customize your plots in a variety of ways, such as changing colors, adding grids, or styling markers. Here’s a simple example to show how you can customize a line plot with more detail:
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
# Customize the line plot
plt.plot(x, y, color='red', linestyle='--', marker='o', markersize=8)
# Add grid
plt.grid(True)
# Add title and labels
plt.title('Customized Line Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# Show the plot
plt.show()
Dark Background
The first one is just a simple dark background. You can make a ple lsimine plot make nicer looking by simply a dark background. Here it is:
import matplotlib.pyplot as plt
from matplotlib import style
with plt.style.context('dark_background'):
plt.plot([1, 12, 4, 10, 3, 11, 2], 'r-o',)
plt.plot([3, 9, 2, 7, 6, 14, 4], 'g-v')
plt.show()
Conclusion
Matplotlib is a versatile and powerful tool for data visualization in Python. Whether you’re just getting started or looking to create highly customized plots, Matplotlib offers all the functionality you need to explore and present your data effectively. With its ability to create a wide variety of plot types and its seamless integration with other libraries, Matplotlib remains a go-to solution for data visualization
.🎯 Call to Action
Liked this tutorial?
👉 Subscribe to our newsletter for more Python + ML tutorials
👉 Follow our GitHub for code notebooks and projects
👉 Leave a comment below if you’d like a tutorial on vectorized backpropagation next!
👉, Machine Learning Libraries: Enroll for Full Course to find notes, repository etc.
👉, Deep learning and Neural network: Enroll for Full Course to find notes, repository etc.
🎁 Access exclusive Supervise Leanring with sklearn bundles and premium guides on our Gumroad store: From sentiment analysis notebooks to fine-tuning transformers—download, learn, and implement faster.
Source
1-Matplotlib: Visualization with Python