Data Visualization: Sweave

Sweave No No’s

So you’re a data scientist (statistician, physicist, data miner, machine learning expert, AI guy, etc.) and you have the envious challenge of communicating your ideas and your work to people who have not followed you down your rabbit hole.  Typically this involved first getting the data, writing your code, honing the analysis, distilling the pertinent information and graphs/charts, then organizing it into a presentable format (document, presentation, etc.).  Interactive visualizations are really cool and if done right they allow the user to explore the data and the implications of your analysis on their own time.  Unfortunately interactive visualizations require an extra effort, so once you’re done with your analysis you have to repurpose the functions so they work within a framework such as Shiny.  For those of us who simply want a nice presentable document to compile once we’ve finished our work, I introduce you to Sweave.

Sweave is not necessarily built for RStudio, it is built specifically for R to create LaTeX documents, but naturally RStudio has built it right in and created a great interface.  This is a positive and a negative in that it’s so easy you don’t need to know precisely how the whole mechanism results in a pdf file, but that becomes an issue when your document doesn’t compile and you need to debug it.  Sweave is its own language of sorts, with blocks for evaluating an R session, blocks for plain english, and an html tag style of its own that gives the document format instructions (title, body, size, figure dimensions, etc.).  In principle it’s easy to understand, but with any new language it has its own syntax, its own unwritten rules, plenty of google searches despite well written tutorials, videos, and books, compatibility issues with different versions of R, and of course throwing your hands in the air in confusion.

Mathematica

Once you get the hang of it you can fit your normal data analysis into the framework sweave provides, and you end up telling a story with your work as you work.  Having good comments has always been a staple of writing code, whatever the language, but there is always a push back because it’s mixed in with the code, requiring the reader to understand the flow of the code and how the functions and scripts work together.  Mathematica is a great example of code and presentation working together, but unfortunately it is not free.

Sweave will however change your style and will make you break up your analysis into digestible chunks for the target reader.  For example, when I am analyzing details of some dataset and/or debugging my functions, I will produce many more graphs than are necessary for the end reader.  Perhaps the makers of sweave worked similarly, and purposefully required a R code block in sweave to only print the first plot from that block, forcing you to choose your plots carefully.  You can get around this by exporting ggplot2 figure objects from your code as a list variable and plotting them using the “grid.arrange()” function from the “gridExtra” package, but this is not something you might normally do.  This is how sweave draws you into its style (don’t forget to resize your figure: “<<fig=TRUE, echo=FALSE, height=10>>=” and “\setkeys{Gin}{width=0.9\textwidth}”, the kittens will be fine), but the bottom line is if you can make sweave part of your routine, you can produce beautiful reports from your R comments and code; maybe it will even help me better remember years from now what that set of functions does that’s buried in my computer, but I can only speculate.

The following two tabs change content below.

Sean Gonzalez

Co-Founder, Secretary at Data Community DC
Sean Gonzalez is a data scientist, data visualizer, co-founder of Data Community DC, and organizer of Data Visualization DC. Sean consults for organizations whose data has become complex over time, or organizations who have recently come into large complicated unintegrated data sets. Sean has expertise in machine learning and AI primarily from his DoD background.

Latest posts by Sean Gonzalez (see all)

This entry was posted in Commentary, Data Visualization DC, DataBlog, R, Resources, Reviews, Visualization. Bookmark the permalink.