Visualizing Web Scale Geographic Data in the Browser in Real Time: A Meta Tutorial

Visualizing geographic data is a task many of us face in our jobs as data scientists. Often, we must visualize vast amounts of data (tens of thousands to millions of data points) and we need to do so in the browser in real time to ensure the widest-possible audience for our efforts and we often want to do this leveraging free and/or open software.

Luckily for us, Google offered a series of fascinating talks at this year’s (2013) IO that show one particular way of solving this problem. Even better, Google discusses all aspects of this problem: from cleaning the data at scale using legacy C++ code to providing low latency yet web-scale data storage and, finally, to rendering efficiently in the browser.  Not surprisingly, Google’s approach highly leverages **alot** of Google’s technology stack but we won’t hold that against them.

AllTheShips

 

All the Ships in the World: Visualizing Data with Google Cloud and Maps (36 minutes)

The first talk walks through an overview of where the data comes from and the collection of Google cloud services that compose the system architecture responsible for cleaning, storing, and serving the data fast enough to do real time queries. This video is very useful for understanding how the different technology layers (browser, database, virtual instances, etc) can efficiently interact.

Description: Tens of thousands of ships report their position at least once every 5 minutes, 24 hours a day. Visualizing that quantity of data and serving it out to large numbers of people takes lots of power both in the browser and on the server. This session will explore the use of Maps, App Engine, Go, Compute Engine, BigQuery, Big Store, and WebGL to do massive data visualization.

Google Maps + HTML5 + Spatial Data Visualization: A Love Story (60 minutes)

The second talk discusses in code-level detail how to render vast geographic data (up to a few million data points) using Javascript in the browser.  One of the keys to enabling such large scale data visualization is to pass much of the complex and large scale rendering tasks to the computer’s graphics processing unit (GPU) through the use of relatively simple vertex and fragment shaders.  Brendan Kenny, the speaker, explains how he uses CanvasLayer, available from his GitHub (https://github.com/brendankenny), to synch a WebGL canvas containing the data, to Google Maps Version 3. Basically, he renders one layer for the map and one layer for the data. These two layers must move and scale in a synchronized fashion.  He even dives into excellent examples showing the workings of individual shaders running on the GPU.

Description: Much if not most of the world’s data has a geographic component. Data visualizations with a geographic component are some of the most popular on the web. This session will explore the principles of data visualization and how you can use HTML5 – particularly WebGL – to supplement Google Maps visualizations.

Background

As a bit of background, Brendan leverages a number of technologies that you might not be familiar with, including three.js and WebGL. Three.js is a nice wrapper for WebGL (among other things) and can greatly simplify the process of getting up and running with 3D in the browser.  From the excellent tutorial here:

I have used Three.js for some of my experiments, and it does a really great job of abstracting away the headaches of getting going with 3D in the browser. With it you can create cameras, objects, lights, materials and more, and you have a choice of renderer, which means you can decide if you want your scene to be drawn using HTML 5′s canvas, WebGL or SVG. And since it’s open source you could even get involved with the project. But right now I’ll focus on what I’ve learned by playing with it as an engine, and talk you through some of the basics.

WebGL is one mechanism for rendering three dimensional data in the browser and is based on OpenGL 2.0 ES. Wikipedia describes it as:

WebGL (Web Graphics Library) is a JavaScriptAPI for rendering interactive 3D graphics and 2D graphics[2] within any compatible web browser without the use of plug-ins. WebGL is integrated completely into all the web standards of the browser allowing GPU accelerated usage of physics and image processing and effects as part of the web page canvas. WebGL elements can be mixed with other HTML elements and composited with other parts of the page or page background.[3] WebGL programs consist of control code written in JavaScript and shader code that is executed on a computer’s Graphics Processing Unit (GPU). WebGL is designed and maintained by the non-profit Khronos Group.

The following two tabs change content below.

Sean Murphy

Senior Scientist and Data Science Consultant at JHU
Sean Patrick Murphy, with degrees in math, electrical engineering, and biomedical engineering and an MBA from Oxford, has served as a senior scientist at Johns Hopkins University for over a decade, advises several startups, and provides learning analytics consulting for EverFi. Previously, he served as the Chief Data Scientist at a series A funded health care analytics firm, and the Director of Research at a boutique graduate educational company. He has also cofounded a big data startup and Data Community DC, a 2,000 member organization of data professionals. Find him on LinkedIn, Twitter, and .
This entry was posted in Data Visualization DC, Infographics, Tutorials, Visualization and tagged , , , , , . Bookmark the permalink.