For Developers

How to Create a Boxplot in Python

How to Create a Boxplot in Python

Data visualization is the process of converting information into a visual format in form of charts, images, pictures, and so on, to derive insight from data easily and aid data analytics and business intelligence.

One of the useful data visualization techniques used in deriving insight is the Boxplot in Python.

What is a Boxplot?

A Boxplot also called a whisker plot, is a data visualization technique that visualizes the distribution of data from the dataset provided. It separated the data into a 5-point summary which consists of the minimum point, the first quartile, the median, the third quartile, and the maximum point, with these it shows how data is distributed in a diagram. It can be said to give a summary of the data distribution or variation.

Libraries to be used in creating Python Boxplot

In this article, we will create a Boxplot using 3 different ways or formats. We would make use of these libraries

  1. Pandas library
  2. Matplotlib library
  3. Seaborn library

How to create a Python Boxplot

We start by importing useful libraries and reading the data. We will be using a phone price obtained from Kaggle in this article.

boxplot in python pandas.webp

Afterward, we do some more data analysis to find numerical columns for the Boxplots.

create Python Boxplot.webp

python boxplot example.webp

From these we will deduce the numerical columns to be used, we’ll now learn how to use these to create Boxplots using the three different methods.

How to create a Boxplot Using Pandas

Creating a single plot in Pandas is quite easy, and very similar when plotting with it to the use of Matplotlib. Matplotlib is a visualization platform integrated into Pandas to make plotting easier.

Single plot

To create a single plot you can use the following syntax:

Creating single plot in Panda.webp

Where battery capacity is a column in the data frame. And it produces this result.

create a boxplot Using Pandas.webp

Here we read that the battery capacity distribution is somewhere between 4000 and 2000.

Categorical plot

In a categorical plot, we would plot a numerical column by a categorical column to see the distribution relationship between them. This is done by the following syntax:

Boxplot with pandas.webp

As seen, it is possible to directly call the Boxplot syntax from the data frame. This produces the result:

python seaborn boxplot.webp

Plotting battery capacity and touch screen we see that most touchscreen phones have higher battery capacity than phones that aren’t touch screens.

Multiple plots

Multiple plots can be done on numerical columns with the syntax:

Multiple plot Python.webp

matplotlib boxplot multiple column.webp

The diagram is a bit squeezed up due to a large number of outliers, we can adjust it by making the y-axis logarithmic using:

Uses of a boxplot.webp

This gives the result:

Creating Boxplots Using Matplotlib in Python.webp

Which makes the boxes more visible.

How to create a Boxplot using Matplotlib

Matplotlib is a data visualization tool used to create graphs for analyzing and visualizing data. Matplotlib’s syntax is very complex and confusing, that's why it was integrated and made easier with the use of Pandas and Seaborn.

Single plot

You can create a single plot using matplotlib. pyplot with the syntax:

single plot using matplotlib.webp

This gives the diagram a similar look to the one used with Pandas:

boxplot Using Matplotlib..webp

Categorical plot

In categorical plotting we have the syntax:

boxplot with Matplotlib..webp

Matplotlib cannot directly convert data frame columns into plots so they have to be changed to arrays in this code and then plotted in the format. This gives the diagram:

matplotli .pyplot boxplot.webp

Multiple plots

Creating multiple plots becomes more complicated as you may have to manually specify the numerical columns that will be used to turn them into arrays, this can be done by using this: Boxplot Matplotlib Python.webp

This gives the figure:

Matplotlib Box Plot Examples.webp

When adjusted logarithmically it gives:

Seaborn boxplot Python.webp

Box plot in Python with matplotlib.webp

How to create a Boxplot using Seaborn

Seaborn is one of the most popular data visualization tools which, although built on matplotlib, is incredibly easier for users to make plots.

Single plot

The following syntax is used to create seaborn visualization for a single plot.

Boxplot with Seaborn.webp

By specifying the column and the data used, it gives:

seaborn visualization forsingle plot.webp

Seaborn colors make the plot more visible and easier to understand.

Categorical plot

To make a numerical vs categorical plot you can use this:

numerical vs categorical plot in Seaborn.webp

The above syntax plots:

Pandas Boxplots.webp

Multiple plots

Creating multiple plots with Seaborn is done with the code:

Box Plot Python With Code Examples.webp

This automatically selects the numerical columns and creates multiple Boxplots:

Boxplot matplotlib’s logarithmic function.webp

This can also be adjusted by using matplotlib’s logarithmic function:

How to make a boxplot.webp

plotting boxplots in python.webp

Understanding the Boxplot

The Boxplot consists of 5 main points of distribution:

1. The minimum point- The minimum point is the whisker part of the diagram which indicates a lower range of the distribution. It is calculated as one and half times the interquartile range(1.5*IQR).

2. The first quartile or 25th percentile- This is the lower part of the actual box which shows the lower distribution of data points. It is the lower part of the interquartile range. It is the lower quartile Q1.

3. The median or 50th percentile- This is the line drawn inside the box, it indicates the middle point of the distribution.

4. The third quartile or 75th percentile- This is the upper part of the box which indicates the upper part of the distribution. It is represented by Q3.

5. The maximum point- This is the top whisker point which is one and half times the interquartile range and added to the third quartile.

In addition to these, in some Boxplots, there are little dots that indicate outlines. Outliners are points in the data which fall out far from the distribution, it is abnormally away from the related distribution.

Uses of a Boxplot

The following are the uses for constructing Python Boxplots:

  1. You get an idea of the whole data and how it is distributed
  2. You can understand when the data is positively or negatively skewed

Conclusion

This article is an introduction to the Boxplot in Python and how to create a Boxplots using 3 different libraries. We have seen how Boxplot can be created using Pandas, Matplotlib, and Seaborn. Keeping 3 aspects of each library in mind, namely single plot, categorical plot, and multiple plots we have described how Boxplots can easily be created.

Author

  • Author

    Ezeana Michael

    Ezeana Michael is a data scientist with a passion for machine learning and technical writing. He has worked in the field of data science and has experience working with Python programming to derive insight from data, create machine learning models, and deploy them into production environments.

Frequently Asked Questions

A Boxplot is used to get a five-point summary of the distribution of data.

Seaborn is arguably the best library for plotting since it's easier to plot and understand without many lines of code.

In the Boxplot, the upper and lower lines protruding from the box are the whiskers which widen the distribution of values in the Boxplot they are 0.5* interquartile range. The box is the range of the distribution between the upper quartile and the lower quartile. The line drawn inside the box is the median line.

The upper and lower lines protruding from the box are called the whiskers, the box itself is the interquartile range which starts from the upper quartile, Q3 to the lower quartile Q1, and the mark that divides the box is the median.

You can make a Boxplot in seaborn by using the command:

Import seaborn as sns

sns.Boxplot(data)

Or

sns.Boxplot(x=df['col1'], y=df['col2'])

A Boxplot is best used for numerical data.

Since matplotlib is integrated inside pandas you can easily use the syntax Data.plot(kind="box")

View more FAQs
Press

Press

What's up with Turing? Get the latest news about us here.
Blog

Blog

Know more about remote work.
Checkout our blog here.
Contact

Contact

Have any questions?
We'd love to hear from you.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

Hire Developers