This page lists resources that Data Community DC and its board members use and genuinely recommend. The list will be continually evolving as we find more products that are worth mentioning.
Please note that some of these links are affiliate links. All commissions earned go to Data Community DC to support our efforts in promoting data science in the Capitol region.
Web Hosting and Design
Lightning Base – DataCommunityDC.org is currently hosted at Lightning Base, which offers fast, managed hosting for WordPress at an affordable rate. We chose Lightning Base because they offer excellent customer service and are one of the only managed WordPress hosts that offers the Varnish http accelerator (which speeds the delivery of content by 300-1000x) out of the box. If you have a growing WordPress blog, we highly recommend Lightning Base as your web host.
Principles of Big Data by Jules J. Berman – In this book, local author Jules J. Berman covers basic themes of big data that are ingnored or downplayed by other authors. Specifically, why identifiers and metadata are crucial in building a good data resource, as well as why big data must be immutable and accessible to the public to be trusted and have scientific value. Read the announcement and in-depth description of this book in our guest post from the author.
Python for Data Analysis by Wes McKinney - Python is one of the most popular languages for data analysis due to the high availability of libraries for data science. This is a practical introduction to the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. Specifically, it covers NumPy, matplotlib, pandas, Scipy, and IPython. For more on Python for data science, see our post on The Landscape of Tutorials.
Learning IPython for Interactive Computing and Data Visualization by Cyrille Rossant - Interactive programming is essential in such exploratory tasks and IPython is the perfect tool for that. Once you’ve learnt it, you won’t be able to live without it. This book is a practical tutorial to improve your productivity during interactive Python sessions, and shows you how to effectively use IPython for interactive computing and data analysis.
Natural Language Processing with Python - This book is the definitive guide to the Natural Language ToolKit (NLTK), which is arguably the most widely used Natural Language Processing (NLP) library out there. Not only does this provide an extensive guide to NLTK, but it has a concise and effective primer on Python as well as a solid introduction to the field of NLP woven into the hands-on introduction to NLTK. For more on NLTK, check out our series of posts on NLTK and Hadoop.
Foundations of Statistical Natural Language Processing by Christopher D Manning - Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods.
MongoDB in Action by Kyle Banker - MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications.
Designing for Behavior Change by Stephen Wendel - Local author and Data Community DC supporter, Stephen Wendel gives step-by-step guidance on how to design, build, and test products that help people change their daily behavior and routines. The goal is to help people take actions that they want to take, but have struggled with in the past: from exercising more, to taking control of their finances, to spending less on utilities. Read the announcement of the book in our guest post here.
Analyzing the Social Web by Jennifer Golbeck – When local author Jennifer Golbeck presented her work at our March 2013 Data Science meetup, her introductory slide said “I develop methods for discovering things about people online. I never want anyone to use those methods on me.” This book provides a framework for uncovering hidden information from analyzing data that is publicly available via social media. Read the review of her presentation here.
Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran - This book will take you from not knowing anything about data science to having a good understanding of how a lot things work quicker than any other data book we know of. The book covers collaborative filtering, recommendation systems, machine learning, optimization, and genetic algorithms and programming in a uniquely easy-to-read way with code examples in Python.
Machine Learning for Hackers by Drew Conway - If you’re an experienced programmer interested in crunching data, this book will get you started with machine learning. This book features a series of hands-on case studies, instead of a traditional math-heavy presentation. Using R, you’ll learn how to write simple algorithms for problems in machine learning including classification, prediction, optimization, and recommendation.
The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver – Nate Silver of the New York Times and fivethirtyeight.com is largely responsible for propelling data science into the popular spotlight. In this book, he shows how having massive amounts of data has the potential to cloud our decisions due to the fact that there tends to be more noise than signal. According to Silver, when generating predictions using data we should take probabilities into consideration, be conscious of biases, and try not to predict certainties because we usually get them wrong.
The Lady Tasting Tea : How Statistics Revolutionized Science in the Twentieth Century by David Salsburg - At a summer tea party in Cambridge, England, a guest states that tea poured into milk tastes different from milk poured into tea. Her notion is shouted down by the scientific minds of the group. But one man, Ronald Fisher, proposes to scientifically test the hypothesis. There is no better person to conduct such an experiment, for Fisher is a pioneer in the field of statistics.
Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran - 39 contributors explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. Beautiful Data explores the opportunities and challenges involved in working with the vast number of datasets made available by the Web, visualizing trends in urban crime, and building the massive infrastructure required to create, capture, and process DNA data.
How to Measure Anything : Finding the Value of Intangibles in Business by Douglas W. Hubbard - Building up from simple concepts to illustrate the hands-on yet intuitively easy application of advanced statistical techniques, How to Measure Anything reveals the power of measurement in our understanding of business and the world at large. This book shows you how to measure those things in your business that until now you may have considered “immeasurable,” including technology ROI, organizational flexibility, customer satisfaction, and technology risk.
Leap Motion Gesture Motion Control - One of the first commercial gesture controllers available, Leap Motion brings you one step closer to Tony Stark’s laboratory. By tracking hand and finger movement the controller’s software follows your every move, allowing you to control your computer with 3D gestures.