Data Visualization: Graphics with GGPlot2

By:  DSC00302 - Version 2

Basic plots in R using standard packages like lattice work for most situations where you want to see trends in small data sets, such as your simulation variables, which make sense considering lattice began with the Bell Lab’s S language.  However, when we need to summarize and communicate our work with those primarily interested in the “forest” perspective, we use tools like ggplot2.  In other words, the difference between lattice and ggplot2 is the difference between understanding data versus drawing pictures.

You can learn all about ggplot2 by downloading the R package and reading, but even Even Hadley Wickham, author of ggplot2, thinks going through the R help documentation will “drive you crazy!”  To alleviate stress, we’ve compiled references, examples, documentation, blogs, books, groups, and commentary from practitioners who use ggplot2 regularly, enjoy.

GGplot2 is an actively maintained open-source chart-drawing library for R based upon the principles of “Grammar of Graphics”, thus the “gg”.  Grammar of Graphics was written for statisticians, computer scientists, geographers, research and applied scientists, and others interested in visualizing data.  GGplot2 can be generalized as layers composed of: a data set, mappings and aesthetics (position, shape, size color), statistical transforms, and scaling.  To better wrap our minds around how this applies to ggplot2, we can take Hadley’s tour, or attend one of his events.  The overall goal is to automate graphical processes and put more resources at our fingertips; below are some great works from practitioners.

London Bike RoutesPopularLondonBikeRoutes

The London bike routes image is built with three layers: building polygons, waterways and lakes, and bike routes.  The route data itself is a count of the number of bikes, as well as their position, featured as thickness and color intensity in yellow, which is a nice contrast to the black and grey of the city map.  I enjoy this dataviz because you can imagine yourself trying to get around on a bicycle in London.

Raman Spectroscopic Grading of GliomasSpectroscopicObservations

The background of this work is the classification of tumour tissues using their Raman-Spectra. A detailed discussion can be found in C. Beleites et al.  Gliomas are the most frequent brain tumours, and astrocytomas are their largest subgroup. These tumours are treated by surgery. However, the exact borders of the tumour are hardly visible. Thus the need for new tools that help the surgeon find the tumour border. A grading scheme is given by the World Health Organization (WHO).

TwitteR Packagetwitter-ggplot

Curious about your influence on twitter?  Want to see how your messages resonate within and outside your network?  Here is a great website that goes through many examples on using the TwitteR package in R, with the following ggplot2 code that creates the chart on our right-hand-side:

[code lang="R"]require(ggplot2)

ggplot()+geom_bar(aes(x=na.omit(df$rt)))+

opts(axis.text.x=theme_text(angle=-90,size=6))+

xlab(NULL)

[/code]

The ggplot2 interface is interesting because you’re using the + operator, thus manifesting the Grammar of Graphics concept of layers.visualizingSentencingData-ggplot2

This final example of Sentencing Data for Local Courts easily breaks up the data by demographics committing different classes of crimes.  As above, the R code is very simple and follows the layering paradigm:

 

[code lang="R"]ggplot(iw, aes(AGE,fill=sex))+geom_bar() +

facet_wrap(~Offence_type)

[/code]

The following two tabs change content below.

Sean Gonzalez

Co-Founder, Secretary at Data Community DC
Sean Gonzalez is a data scientist, data visualizer, co-founder of Data Community DC, and organizer of Data Visualization DC. Sean consults for organizations whose data has become complex over time, or organizations who have recently come into large complicated unintegrated data sets. Sean has expertise in machine learning and AI primarily from his DoD background.

Latest posts by Sean Gonzalez (see all)

This entry was posted in Data Visualization DC, R, Round-Ups, Visualization and tagged , . Bookmark the permalink.