Visualizing geographic data is a task many of us face in our jobs as data scientists. Often, we must visualize vast amounts of data (tens of thousands to millions of data points) and we need to do so in the browser in real time to ensure the widest-possible audience for our efforts and we often want to do this leveraging free and/or open software.
Luckily for us, Google offered a series of fascinating talks at this year’s (2013) IO that show one particular way of solving this problem. Even better, Google discusses all aspects of this problem: from cleaning the data at scale using legacy C++ code to providing low latency yet web-scale data storage and, finally, to rendering efficiently in the browser. Not surprisingly, Google’s approach highly leverages **alot** of Google’s technology stack but we won’t hold that against them.
The first talk walks through an overview of where the data comes from and the collection of Google cloud services that compose the system architecture responsible for cleaning, storing, and serving the data fast enough to do real time queries. This video is very useful for understanding how the different technology layers (browser, database, virtual instances, etc) can efficiently interact.
Description: Tens of thousands of ships report their position at least once every 5 minutes, 24 hours a day. Visualizing that quantity of data and serving it out to large numbers of people takes lots of power both in the browser and on the server. This session will explore the use of Maps, App Engine, Go, Compute Engine, BigQuery, Big Store, and WebGL to do massive data visualization.
Description: Much if not most of the world’s data has a geographic component. Data visualizations with a geographic component are some of the most popular on the web. This session will explore the principles of data visualization and how you can use HTML5 – particularly WebGL – to supplement Google Maps visualizations.
As a bit of background, Brendan leverages a number of technologies that you might not be familiar with, including three.js and WebGL. Three.js is a nice wrapper for WebGL (among other things) and can greatly simplify the process of getting up and running with 3D in the browser. From the excellent tutorial here:
I have used Three.js for some of my experiments, and it does a really great job of abstracting away the headaches of getting going with 3D in the browser. With it you can create cameras, objects, lights, materials and more, and you have a choice of renderer, which means you can decide if you want your scene to be drawn using HTML 5′s canvas, WebGL or SVG. And since it’s open source you could even get involved with the project. But right now I’ll focus on what I’ve learned by playing with it as an engine, and talk you through some of the basics.
WebGL is one mechanism for rendering three dimensional data in the browser and is based on OpenGL 2.0 ES. Wikipedia describes it as:
Latest posts by Sean Murphy (see all)
- Is Statistics the Least Important Part of Data Science? - November 26, 2013
- PyDataNYC 2013 – A Summary of a Fantastic Conference for Data Community - November 25, 2013
- “Ten Simple Rules for Reproducible Computational Research” – An Excellent Read for Data Scientists - November 5, 2013