For Developers

How Does Data Visualization Work With Python Using Matplotlib?

Exploring Data Visualization with Python Using Matplotlib

Data visualization plays a key role in any data science project. Businesses and organizations worldwide collect vast amounts of data almost every day and use it to make essential business decisions. But not every decision-maker has the technical proficiency to understand data in its raw form. This is where data visualization comes in. Today, data visualization using Matplotlib, a prominent library for Python, is one of the most popular avenues that analysts and data scientists take.

In this article, we will explore how to work with the different visualizations available in Matplotlib.

What is Matplotlib?

Matplotlib was created by John Hunter during his post-doctoral research in neurobiology and released in 2003. It is the go-to Python library for graphs and visualizations. It offers a variety of plots like Line, Scatter, Bar, Histogram, Box, etc. It also supports 2D and 3D plotting.

Data Visualization using Matplotlib.webp Image source: Matplotlib

Data visualization using Matplotlib

Installation and loading

Matplotlib can installed directly from Jupyter Notebook by running the command:

!pip install matplotlib

Or, by running this command in cmd:

conda install -c conda-forge matplotlib

Importing the necessary libraries


import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline

The %matplotlib inline command is used to embed static .png images of the plot directly into the IPython Notebook.

Customizing plots using Matplotlib

Line styles

Let’s plot a simple line graph using sample data. We can do so using plt.plot():


x = [2,4,7,8,10,15,19,22]
plt.plot(x);

Data visualization in python.webp

Matplotlib offers a variety of linestyles that can be customized using the ls or linestyle argument in the plot(). Here’s a list of all the available options:

import matplotlib
matplotlib.lines.lineStyles

output:

{'-': '_draw_solid',
 '--': '_draw_dashed',
 '-.': '_draw_dash_dot',
 ':': '_draw_dotted',
 'None': '_draw_nothing',
 ' ': '_draw_nothing',
 '': '_draw_nothing'}

Let’s try out a few linestyles and some other arguments:

plt.plot(x,linestyle=':',color='red');

Linestyle in a line graph.webp

Linewidth is used to change the thickness of the plot:

plt.plot(x,linestyle='dashdot',color='green',linewidth=5)

Data visualization with matplotlib.webp

Markers

Markers are used to highlight points on the graph. Like linestyle, there’s a long list of selections of linemarkers. Here’s how they can be viewed, along with a few examples:

matplotlib.lines.lineMarkers
{'.': 'point',
 ',': 'pixel',
 'o': 'circle',
 'v': 'triangle_down',
 '^': 'triangle_up',
 '<': 'triangle_left',
 '>': 'triangle_right',
 '1': 'tri_down',
 '2': 'tri_up'}

plt.plot(x, marker='X');

Linemarkers in a line graph.webp

We can change the size of the markers using the argument markersize.

plt.plot(x, marker='|',markersize=15);

Markersize argument to change size of markers.webp

Markerfacecolor is used to change the color of the marker to highlight it more, and markeredgecolor is used to change the borders:

plt.plot(x, marker='o', markersize=10, markeredgecolor='black', 
markerfacecolor='yellow');

Python data visualization.webp

Title and axes labels

Most times, it’s necessary to add texts or labels to the axes of the graphs to help viewers understand what the plot is actually about. In Matplotlib, we do this using xlabel() and ylabel().

Here’s our sample data to show the monthly sales of a company:


months = ['Jan', 'Feb', 'March', 'Apr', 'May', 'June']
sales = [10789, 12897, 13554, 12650, 14320, 15236]

plt.plot(months,sales,linewidth=3)
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales');

Title and axes labels in Matplotlib.webp

Legend

When plotting multiple lines in a graph, legends are used to describe the different elements using matplotlib.pyplot.legend().

Loc is used to specify the location of the legend index. The values can be ‘upper left’, ‘upper right’, ‘lower left’, and ‘lower right’ of the corresponding graph.

salesA = [10789, 12897, 13554, 13650, 14320, 15236]
salesB = [10364, 13326, 14569, 14987, 15005, 15241]
salesC = [9891, 11369, 11659, 12458, 12430, 13985]

plt.plot(months,salesA,linewidth=2)
plt.plot(months,salesB,linewidth=2)
plt.plot(months,salesC,linewidth=2)

plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales Comparison')
plt.legend(labels=['Company A', 'Company B', 'Company C'], loc='upper left');

Data visualization with matplotlib in python.webp

Grid

The plt.grid() function is used to add a grid to the plots. Let’s add one to the Monthly Sales Comparison Plot:

plt.plot(months,salesA,linewidth=2)
plt.plot(months,salesB,linewidth=2)
plt.plot(months,salesC,linewidth=2)

plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales Comparison')
plt.legend(labels=['Company A', 'Company B', 'Company C'], loc='upper left')
plt.grid(True);

Adding a grid to the plot.webp

This is the default grid that gets added if we don’t use any customization. A few common attributes we can use are:

  • Color: To change the color of the grid lines.

  • Alpha: To change the visibility of the grid lines.

  • Linewidth: To alter the thickness of the grid lines.

  • Linestyle: To change the line style of the grid lines.

plt.plot(months,salesA,linewidth=2,marker='o')
plt.plot(months,salesB,linewidth=2,marker='o')
plt.plot(months,salesC,linewidth=2,marker='o')

plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales Comparison')
plt.legend(labels=['Company A', 'Company B', 'Company C'], loc='upper left')
plt.grid(color='red', alpha=0.2, linewidth=2);

Grid attributes in Matplotlib.webp

Plot dimensions

We can change the dimensions of the graph using the figsize argument in plt.figure().

x = np.linspace(0, 2, 200)

fig = plt.figure(figsize=(12, 6))

plt.plot(x, x, label='linear',linewidth=3)
plt.plot(x, x**2, label='quadratic',linewidth=3)
plt.plot(x, x**3, label='cubic',linewidth=3)

plt.grid(True)
plt.title("Simple Plot")
plt.legend();

Matplotlib For Data Visualization.webp

Now that we’ve learned the basics of customization using line graphs, we will now cover the other different types of plots and graphs that assist with data visualization.

We will use a dataset to simulate data visualization of a real-life project. For this article, we will use the Titanic Dataset.

Loading data

df = pd.read_csv('../input/titanic/train.csv')
df.head()

Loading dataset for data visualization.webp

Before beginning data visualization work with Python using Matplotlib, let’s get familiar with all the columns of the dataset:

  • PassengerId: A unique number assigned to each passenger

  • Survived: Whether the passenger survived the sinking (0 = No, 1 = Yes)

  • Pclass: Passengers’ cabin class that can be used as an indicator of their socioeconomic background. (1 = first, 2 = second, 3 = third)

  • Name: Name of the passenger

  • Sex: Gender of the passenger

  • Age: The passenger’s age

  • SibSp: # of siblings or spouses aboard

  • Parch: # of parents or children aboard

  • Ticket: Ticket number of the passenger

  • Fare: Ticket fare

  • Cabin: Cabin number of passengers

  • Embarked: Port passengers embarked from (S = Southampton, C = Cherbourg, Q = Queenstown).

Types of plots

Scatter plot

In a scatter plot, the data points are represented individually using dots or circles. This type of plot is generally used to represent the relationship between two variables.

Function used - plt.scatter()

age_m = df[df['Sex'] == 'male']['Age']
fare_m = df[df['Sex'] == 'male']['Fare']
age_f = df[df['Sex'] == 'female']['Age']
fare_f = df[df['Sex'] == 'female']['Fare']

fig = plt.figure(figsize=(12, 6))
plt.scatter(x=age_m, y=fare_m,s=25)
plt.scatter(x=age_f, y=fare_f,s=25)
plt.legend(['male', 'female']);

Scatter plot in Python using Matplotlib.webp

We can also represent a third axis using different shades for the data points. This can be achieved using cmap.

pclass = df[df['Sex']=='male']['Pclass']

fig = plt.figure(figsize=(12, 6))
plt.scatter(x=age_m, y=fare_m,s=25, c=pclass, cmap='cool')
plt.colorbar(ticks=[1, 2, 3])

plt.xlabel('Age - Male')
plt.ylabel('Fare');

Visualization with Matplotlib.webp

Histogram

A histogram is a bar graph-like plot used to represent the frequency distribution of a single variable. The whole range of frequencies is divided into equal units called ‘bins’.

Function used - plt.hist()

age = df['Age'] #distribution of passenger’s age
plt.hist(age);

Plotting a histogram using Matplotlib.webp

We can divide the distribution into any number of bins by using:

plt.hist(age, bins=20)

plt.xlabel('Age')
plt.ylabel('No. of Passengers')
plt.title('Distribution of Passenger Ages')
plt.grid(True);

Guide to Data Visualization in Python.webp

Bar graph

A bar graph is used to compare categorical data, where the height of the bars represents its value.

Function used - plt.bar()

Let's see how many people embarked from each port:

emb = df['Embarked'].value_counts()

plt.bar(x=emb.index, height=emb.values, width=0.5);

Plotting a bar graph using Matplotlib.webp

We can customize the bars using additional arguments.

Let’s compare the number of people who survived and those who did not:

survivor = df['Survived'].value_counts()

plt.bar(x=[0,1], height=survivor.values, width=0.2, color='yellow', edgecolor='red',linewidth=2);
print('Thus the majority didn’t survive.')

Customizing a bar graph.webp

Thus the majority did not survive.

Horizontal bar graph

We can plot bar graphs horizontally too.

Let’s compare the male and female survivors using two subplots:

m_survivors = df[df['Sex']=='male']['Survived'].value_counts()
f_survivors = df[df['Sex']=='female']['Survived'].value_counts()

fig, axes = plt.subplots(1, 2) #for subplots
fig.set_size_inches(8,4) 

axes[0].barh(y=m_survivors.index, width=m_survivors.values, height = 0.2, color=['orange', 'blue'])
axes[0].set_title('Male Survivors')


axes[1].barh(y=f_survivors.index, width=f_survivors.values, height = 0.2, color=['blue', 'orange'])
axes[1].set_title('Female Survivors');

Plot In Python using Matplotlib.webp

Stacked bar graph

A stacked bar graph further divides the variable we are comparing into smaller parts to represent the different categories of the variable.

Let’s compare the ratio of men and woman who boarded the ship from different ports:

embarked_m = df[df['Sex']=='male']['Embarked'].value_counts()
embarked_f = df[df['Sex']=='female']['Embarked'].value_counts()

plt.bar(x=embarked_m.index, height=embarked_m.values)
plt.bar(x=embarked_f.index, height=embarked_f.values, bottom=embarked_m.values, color='orange')

plt.xlabel('Port')
plt.ylabel('Passengers')
plt.title('Embarking Ports by Sex')
plt.legend(labels=['Male', 'Female']);

Plotting a stacked bar graph.webp

Pie chart

A pie chart is a circle divided into slices, where each slice represents the different categories under that variable. The size of the slices depends on the relative percentage or count of the categories.

Function used - plt.pie()

Let’s compare the number of men and women present onboard:

plt.pie(df['Sex'].value_counts(),labels=['Male','Female']);

Plotting a pie chart to represent categories.webp

We can make further specifications to make it more readable:

plt.pie(df['Sex'].value_counts(),labels=['Male','Female'], startangle=90, 
autopct='%.1f%%', explode=[0.1, 0]);

Specifications represented in the pie chart.webp

Box plot

A box plot is used to display a summary of the distribution, such as maximum, minimum, first quartile, median, third quartile, and outliers, if present.

Here’s the anatomy of a box plot:

Anatomy of a box plot.webp

Image source

Function used: plt.boxplot()

age = df['Age'].dropna()

fig = plt.figure(figsize=(8, 6))
plt.boxplot(age, labels=['Age']);

Box plot using Matplotlib.webp

From the above box plot, we can infer that the median age of the passengers was around 28; the oldest person onboard was around 80; and the youngest wasn't even a year old.

Area chart

An area chart is essentially a line graph with the area under the curve colored or shaded. It is used to represent changes in quantity over time.

Function used - plt.fill_between()

x = range(20)
y = [i**2 for i in x]

plt.figure(figsize=(8,6))
plt.fill_between(x,y);

Change in quantity represented by Area Chart.webp

Contour plot

A contour plot is used to plot 3D surfaces on a 2D plane using color intensities to represent the third axis.

Function used - plt.contour(), plt.contourf()

M = np.random.rand(10,20)
plt.contour(M);

Data visualization in python using matplotlib examples.webp

To fill the contour plot:

plt.contourf(M);

Filling a contour plot using color intensities.webp

3D plot

Three axes are used to find relationships between three variables. There’s no separate function for this, but we can add a third variable and Matplotlib will automatically plot it on a 3D plane.

Let’s make a scatter plot to find the relationship between Age Fare and Pclass variables:

age = df['Age']
fare = df['Fare']
pclass = df['Pclass']

fig = plt.figure(figsize=(10,8))

X = plt.axes(projection='3d')
X.scatter(xs=age, ys=fare, zs=pclass)

X.set_xlabel('Age')
X.set_ylabel('Fare')
X.set_zlabel('Class');

Scatter plot plotted on a 3D plane.webp

Subplots

Matplotlib offers a way to chart multiple data visualizations near each other using the plt.subplot() function.

plt.subplot(1,2) means one row with two plots.

Similarly, if we want to plot four different graphs where each row has two plots in it, we’ll use plt.subplot(2,2).

sex = df['Sex'].value_counts()
emb = df['Embarked'].value_counts()

fig, axes = plt.subplots(1, 2)
fig.set_size_inches(8, 4)

axes[0].pie(sex, labels=['Male', 'Female'], startangle=90, autopct='%.1f%%')
axes[0].set_title('Sex')

axes[1].pie(emb, labels=['S', 'C', 'Q'], startangle=90, explode=[0, 0.1, 0])
axes[1].set_title('Embarked');

Pie chart for multiple data visualizations using subplot.webp

We can merge different types of plots together too.

fig, axes = plt.subplots(1, 3)
fig.set_size_inches(12, 4)

axes[0].pie(s_vals, labels=['Male', 'Female'], startangle=90, autopct='%.1f%%')
axes[0].set_title('Sex')

age = df['Age'].dropna() 
axes[1].boxplot(age, labels=['Age']);

age = df['Age']
fare = df['Fare']
axes[2].scatter(x=age, y=fare);

Merging of different plots together.webp

Text and annotations

Sometimes, titles, legends, and axes’ names are not enough to convey all the information we want to deliver.

We can add text directly in our plot using plt.text():

X = np.linspace(start=-np.pi, stop=np.pi, num=200)
Y = np.sin(2*x)

fig = plt.figure(figsize=(8, 5))
plt.plot(X,Y)
plt.text(-1.80, 0.5, 'y = sin(2x)', fontsize=14);

Adding text to a plot.webp

We can also add arrows for more accurate annotations:

fig = plt.figure(figsize=(8, 5))
plt.plot(X, Y, color='gray')

plt.plot([0], [0], 'o')

pointer = dict(facecolor='black', shrink=0.1, width=3)
plt.annotate('origin point',        
             xy=(0, 0),             #location of marker
             xytext=(-1.80, 0.5),   #location of text
             arrowprops=pointer ,   #pointer arrow
             fontsize=10            
            );

Adding an arrow in a plot.webp

We’ve explored data visualization using Matplotlib in depth and examined the important features available. Although Matplotlib has many more functionalities, the ones laid out in this article will help you get started. Try them out before moving on to more complex visualizations.

Author

  • Author

    Turing

    Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers.

Frequently Asked Questions

Data visualization’s primary role is to help non-technical people consume and understand large quantities of data. When presented properly, it helps draw actionable conclusions more easily and readily.

Matplotlib is the most popular Python module (or library) to make plots and visualizations. Since it’s the first ever Python visualization library, a number of other visualization tools are built and extended from it, for example, Seaborn, Astropy, mplfinance, PyART, etc.

Not every decision-maker in an organization has the necessary technical skills to understand vast amounts of data. Data visualization, therefore, helps leaders make more informed decisions to further their organizations’ progress.

Matplotlib is a collection of different visualization functions that work like MATLAB under the hood.

Python has an extensive collection of data visualization tools with domain-specific packages like Seaborn for statistical graphics, Cartopy for geospatial data processing, DNA features viewer, WCSAxes for plotting astronomical data, etc.

Python is the go-to programming language for any type of data analysis since it’s easy to learn and fast to implement. The Python script is intuitive and easy to read which makes it the best choice for a visualization tool.

View more FAQs
Press

Press

What's up with Turing? Get the latest news about us here.
Blog

Blog

Know more about remote work.
Checkout our blog here.
Contact

Contact

Have any questions?
We'd love to hear from you.

Hire and manage remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

Hire Developers