ggplot

ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is built for making professional looking, plots quickly with minimal code.


1.

from ggplot import *

ggplot(aes(x='date', y='beef'), data=meat) +\
    geom_line() +\
    stat_smooth(colour='blue', span=0.2)


2.

ggplot(diamonds, aes(x='carat', y='price', color='cut')) +\
    geom_point() +\
    scale_color_brewer(type='diverging', palette=4) +\
    xlab("Carats") + ylab("Price") + ggtitle("Diamonds")

3.

ggplot(diamonds, aes(x='price', fill='cut')) +\
    geom_density(alpha=0.25) +\
    facet_wrap("clarity")


How it Works?


Making plots is a very repetitive: draw this line, add these colored points, then add these, etc. Instead of re-using the same code over and over, ggplot implements them using a high-level but very expressive API. The result is less time spent creating your charts, and more time interpreting what they mean.


ggplot is not a good fit for people trying to make highly customized data visualizations. While you can make some very intricate, great looking plots, ggplot sacrafices highly customization in favor of generall doing "what you'd expect".


Data

ggplot has a symbiotic relationship with pandas. If you're planning on using ggplot, it's best to keep your data in DataFrames. Think of a DataFrame as a tabular data object. For example, let's look at the diamonds dataset which ships with ggplot.

from ggplot import *
diamonds.head()


Aesthetics

Aesthetics describe how your data will relate to your plots. Some common aesthetics are: x, y, and color. Aesthetics are specific to the type of plot (or layer) you're adding to your visual. For example, a scatterplot (geom_point) and a line (geom_line)will share x and y, but only a line chart has a linetype aesthetic.



Layers

ggplot lets you combine or add different types of visualization components (or layers) together. I think this is easiest to understand with an example.

Start with a blank canvas.

p = ggplot(aes(x='date', y='beef'), data=meat)
p

Add some points.

p + geom_point()


Add a line.

p + geom_point() + geom_line()


Add a trendline.

p + geom_point() + geom_line() + stat_smooth(color='blue')


Installation


Dependencies

I realize that these are not fun to install. My best luck has always been using brew if you're on a Mac or just using the binaries if you're on Windows. If you're using Linux then this should be relatively painless. You should be able to apt-get or yum all of these.

  • matplotlib

  • pandas

  • numpy

  • scipy

  • statsmodels


Installing

Ok the hard part is over. Installing ggplot is really easy. Just use pip!

$ pip install ggplot