We'll now take an in-depth look at the Matplotlib package for visualization in Python. Matplotlib is a multi-platform data visualization library built on NumPy arrays, and designed to work with the broader SciPy stack. It was conceived by John Hunter in 2002, originally as a patch to IPython for enabling interactive MATLAB-style plotting via gnuplot from the IPython command line. IPython's creator, Fernando Perez, was at the time scrambling to finish his PhD, and let John know he wouldn’t have time to review the patch for several months. John took this as a cue to set out on his own, and the Matplotlib package was born, with version 0.1 released in 2003. It received an early boost when it was adopted as the plotting package of choice of the Space Telescope Science Institute (the folks behind the Hubble Telescope), which financially supported Matplotlib’s development and greatly expanded its capabilities.

One of Matplotlib’s most important features is its ability to play well with many operating systems and graphics backends. Matplotlib supports dozens of backends and output types, which means you can count on it to work regardless of which operating system you are using or which output format you wish. This cross-platform, everything-to-everyone approach has been one of the great strengths of Matplotlib. It has led to a large user base, which in turn has led to an active developer base and Matplotlib’s powerful tools and ubiquity within the scientific Python world.

In recent years, however, the interface and style of Matplotlib have begun to show their age. Newer tools like ggplot and ggvis in the R language, along with web visualization toolkits based on D3js and HTML5 canvas, often make Matplotlib feel clunky and old-fashioned. Still, I'm of the opinion that we cannot ignore Matplotlib's strength as a well-tested, cross-platform graphics engine. Recent Matplotlib versions make it relatively easy to set new global plotting styles (see Customizing Matplotlib: Configurations and Style Sheets), and people have been developing new packages that build on its powerful internals to drive Matplotlib via cleaner, more modern APIs—for example, Seaborn (discussed in Visualization With Seaborn), ggpy, HoloViews, Altair, and even Pandas itself can be used as wrappers around Matplotlib's API. Even with wrappers like these, it is still often useful to dive into Matplotlib's syntax to adjust the final plot output. For this reason, I believe that Matplotlib itself will remain a vital piece of the data visualization stack, even if new tools mean the community gradually moves away from using the Matplotlib API directly.

Just as we use the `np`

shorthand for NumPy and the `pd`

shorthand for Pandas, we will use some standard shorthands for Matplotlib imports:

In [1]:

```
import matplotlib as mpl
import matplotlib.pyplot as plt
```

We will use the `plt.style`

directive to choose appropriate aesthetic styles for our figures.
Here we will set the `classic`

style, which ensures that the plots we create use the classic Matplotlib style:

In [2]:

```
plt.style.use('classic')
```

Throughout this section, we will adjust this style as needed. Note that the stylesheets used here are supported as of Matplotlib version 1.5; if you are using an earlier version of Matplotlib, only the default style is available. For more information on stylesheets, see Customizing Matplotlib: Configurations and Style Sheets.

Before we dive into the details of creating visualizations with Matplotlib, there are a few useful things you should know about using the package.

The IPython notebook is a browser-based interactive data analysis tool that can combine narrative, code, graphics, HTML elements.

Plotting interactively within an IPython notebook can be done with the `%matplotlib`

command, and works in a similar way to the IPython shell.
In the IPython notebook, you also have the option of embedding graphics directly in the notebook, with two possible options:

`%matplotlib notebook`

will lead to*interactive*plots embedded within the notebook`%matplotlib inline`

will lead to*static*images of your plot embedded in the notebook

For this book, we will generally opt for `%matplotlib inline`

:

In [3]:

```
%matplotlib inline
```

After running this command (it needs to be done only once per kernel/session), any cell within the notebook that creates a plot will embed a PNG image of the resulting graphic:

In [11]:

```
import numpy as np
x = np.linspace(0, 10, 100)
fig = plt.figure()
plt.plot(x, np.sin(x), '-')
plt.plot(x, np.cos(x), '--'); # pay attention to the last ";" to ignore the Line2D object
```

In [8]:

```
type(_[0])
```

Out[8]:

matplotlib.lines.Line2D

One nice feature of Matplotlib is the ability to save figures in a wide variety of formats.
Saving a figure can be done using the `savefig()`

command.
For example, to save the previous figure as a PNG file, you can run this:

In [12]:

```
fig.savefig('my_figure.png')
```

We now have a file called `my_figure.png`

in the current working directory:

In [14]:

```
!ls -lh my_figure.png
```

-rw-r--r-- 1 mn staff 26K Apr 8 12:10 my_figure.png

In `savefig()`

, the file format is inferred from the extension of the given filename.
Depending on what backends you have installed, many different file formats are available.
The list of supported file types can be found for your system by using the following method of the figure canvas object:

In [15]:

```
fig.canvas.get_supported_filetypes()
```

Out[15]:

{'eps': 'Encapsulated Postscript', 'jpg': 'Joint Photographic Experts Group', 'jpeg': 'Joint Photographic Experts Group', 'pdf': 'Portable Document Format', 'pgf': 'PGF code for LaTeX', 'png': 'Portable Network Graphics', 'ps': 'Postscript', 'raw': 'Raw RGBA bitmap', 'rgba': 'Raw RGBA bitmap', 'svg': 'Scalable Vector Graphics', 'svgz': 'Scalable Vector Graphics', 'tif': 'Tagged Image File Format', 'tiff': 'Tagged Image File Format'}

A potentially confusing feature of Matplotlib is its dual interfaces: a convenient MATLAB-style state-based interface, and a more powerful object-oriented interface. We'll quickly highlight the differences between the two here.

Matplotlib was originally written as a Python alternative for MATLAB users, and much of its syntax reflects that fact.
The MATLAB-style tools are contained in the pyplot (`plt`

) interface.
For example, the following code will probably look quite familiar to MATLAB users:

In [18]:

```
fig = plt.figure() # create a plot figure
# create the first of two panels and set current axis
plt.subplot(2, 1, 1) # (rows, columns, panel number)
plt.plot(x, np.sin(x))
# create the second panel and set current axis
plt.subplot(2, 1, 2)
plt.plot(x, np.cos(x), '--');
```

The object-oriented interface is available for these more complicated situations, and for when you want more control over your figure.
Rather than depending on some notion of an "active" figure or axes, in the object-oriented interface the plotting functions are *methods* of explicit `Figure`

and `Axes`

objects.
To re-create the previous plot using this style of plotting, you might do the following:

In [19]:

```
# First create a grid of plots
# ax will be an array of two Axes objects
fig, ax = plt.subplots(2)
# Call plot() method on the appropriate object
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x), '--');
```

Perhaps the simplest of all plots is the visualization of a single function $y = f(x)$. Here we will take a first look at creating a simple plot of this type. As with all the following sections, we'll start by setting up the notebook for plotting and importing the packages we will use:

In [20]:

```
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
```

For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a figure and axes can be created as follows:

In [21]:

```
fig = plt.figure()
ax = plt.axes()
```

In Matplotlib, the *figure* (an instance of the class `plt.Figure`

) can be thought of as a single container that contains all the objects representing axes, graphics, text, and labels.
The *axes* (an instance of the class `plt.Axes`

) is what we see above: a bounding box with ticks and labels, which will eventually contain the plot elements that make up our visualization.
Throughout this book, we'll commonly use the variable name `fig`

to refer to a figure instance, and `ax`

to refer to an axes instance or group of axes instances.

Once we have created an axes, we can use the `ax.plot`

function to plot some data. Let's start with a simple sinusoid:

In [13]:

```
fig = plt.figure()
ax = plt.axes()
x = np.linspace(0, 10, 1000)
ax.plot(x, np.sin(x));
```

Alternatively, we can use the pylab interface and let the figure and axes be created for us in the background:

In [14]:

```
plt.plot(x, np.sin(x));
```

If we want to create a single figure with multiple lines, we can simply call the `plot`

function multiple times:

In [22]:

```
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x), '--');
```

The first adjustment you might wish to make to a plot is to control the line colors and styles.
The `plt.plot()`

function takes additional arguments that can be used to specify these.
To adjust the color, you can use the `color`

keyword, which accepts a string argument representing virtually any imaginable color.

The color can be specified in a variety of ways (If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple lines):

In [23]:

```
plt.plot(x, np.sin(x - 0), '--', color='blue') # specify color by name
plt.plot(x, np.sin(x - 1), color='g') # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44') # Hex code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 to 1
plt.plot(x, np.sin(x - 5), color='chartreuse'); # all HTML color names supported
```

Similarly, the line style can be adjusted using the `linestyle`

keyword:

In [25]:

```
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed') # '--'
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');
# For short, you can use the following codes:
plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
```

If you would like to be extremely terse, these `linestyle`

and `color`

codes can be combined into a single non-keyword argument to the `plt.plot()`

function:

In [18]:

```
plt.plot(x, x + 0, '-g') # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red
```

In [26]:

```
import this
```

Matplotlib does a decent job of choosing default axes limits for your plot, but sometimes it's nice to have finer control.
The most basic way to adjust axis limits is to use the `plt.xlim()`

and `plt.ylim()`

methods:

In [19]:

```
plt.plot(x, np.sin(x))
plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);
```

If for some reason you'd like either axis to be displayed in reverse, you can simply reverse the order of the arguments:

In [20]:

```
plt.plot(x, np.sin(x))
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);
```

A useful related method is `plt.axis()`

(note here the potential confusion between *axes* with an *e*, and *axis* with an *i*).
The `plt.axis()`

method allows you to set the `x`

and `y`

limits with a single call, by passing a list which specifies `[xmin, xmax, ymin, ymax]`

:

In [21]:

```
plt.plot(x, np.sin(x))
plt.axis([-1, 11, -1.5, 1.5]);
```

The `plt.axis()`

method goes even beyond this, allowing you to do things like automatically tighten the bounds around the current plot:

In [22]:

```
plt.plot(x, np.sin(x))
plt.axis('tight');
```

It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit in `x`

is equal to one unit in `y`

:

In [23]:

```
plt.plot(x, np.sin(x))
plt.axis('equal');
```

As the last piece of this section, we'll briefly look at the labeling of plots: titles, axis labels, and simple legends.

Titles and axis labels are the simplest such labels—there are methods that can be used to quickly set them:

In [24]:

```
plt.plot(x, np.sin(x))
plt.title("A Sine Curve")
plt.xlabel("x")
plt.ylabel("sin(x)");
```

When multiple lines are being shown within a single axes, it can be useful to create a plot legend that labels each line type.
Again, Matplotlib has a built-in way of quickly creating such a legend.
It is done via the (you guessed it) `plt.legend()`

method.
Though there are several valid ways of using this, I find it easiest to specify the label of each line using the `label`

keyword of the plot function:

In [25]:

```
plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.plot(x, np.cos(x), ':b', label='cos(x)')
plt.axis('equal')
plt.legend();
```

While most `plt`

functions translate directly to `ax`

methods (such as `plt.plot()`

→ `ax.plot()`

, `plt.legend()`

→ `ax.legend()`

, etc.), this is not the case for all commands.
In particular, functions to set limits, labels, and titles are slightly modified.
For transitioning between MATLAB-style functions and object-oriented methods, make the following changes:

`plt.xlabel()`

→`ax.set_xlabel()`

`plt.ylabel()`

→`ax.set_ylabel()`

`plt.xlim()`

→`ax.set_xlim()`

`plt.ylim()`

→`ax.set_ylim()`

`plt.title()`

→`ax.set_title()`

In the object-oriented interface to plotting, rather than calling these functions individually, it is often more convenient to use the `ax.set()`

method to set all these properties at once:

In [26]:

```
ax = plt.axes()
ax.plot(x, np.sin(x))
ax.set(xlim=(0, 10), ylim=(-2, 2), xlabel='x', ylabel='sin(x)', title='A Simple Plot'); # focus on metadata only
```

Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape. We’ll start by setting up the notebook for plotting and importing the functions we will use:

In [27]:

```
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
```

`plt.plot`

¶In the previous section we looked at `plt.plot`

/`ax.plot`

to produce line plots.
It turns out that this same function can produce scatter plots as well:

In [30]:

```
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, 'o', color='black');
```

The third argument in the function call is a character that represents the type of symbol used for the plotting. Just as you can specify options such as `'-'`

, `'--'`

to control the line style, the marker style has its own set of short string codes. The full list of available symbols can be seen in the documentation of `plt.plot`

, or in Matplotlib's online documentation. Most of the possibilities are fairly intuitive, and we'll show a number of the more common ones here:

In [31]:

```
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5), marker, label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);
```

For even more possibilities, these character codes can be used together with line and color codes to plot points along with a line connecting them:

In [32]:

```
plt.plot(x, y, '-ok');
```

Additional keyword arguments to `plt.plot`

specify a wide range of properties of the lines and markers:

In [31]:

```
plt.plot(x, y, '-p', color='gray',
markersize=15, linewidth=4,
markerfacecolor='white',
markeredgecolor='gray',
markeredgewidth=2)
plt.ylim(-1.2, 1.2);
```

`plt.scatter`

¶A second, more powerful method of creating scatter plots is the `plt.scatter`

function, which can be used very similarly to the `plt.plot`

function:

In [33]:

```
plt.scatter(x, y, marker='o');
```

The primary difference of `plt.scatter`

from `plt.plot`

is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let's show this by creating a random scatter plot with points of many colors and sizes.
In order to better see the overlapping results, we'll also use the `alpha`

keyword to adjust the transparency level:

In [36]:

```
rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, cmap='viridis')
plt.colorbar(); # show color scale. In parallel with `plt.legend()`
```