The following was first posted on O’Reilly’s Strata blog, to coincide with the release of the report by Harlan Harris (author of this post), Sean Murphy, and Marck Vaisman. See also our earlier post about the work that led to this report. Thoughts? Comments welcome!
Analyzing the Analyzers: An Introspective Survey of Data Scientists and their Work is the result of applying the methods of data science to our own professional community. My co-authors (Sean Murphy and Marck Vaisman) and I run professional Meetup groups for statistical and analytics professionals in the Washington, DC area. In the course of organizing Data Science DC,Data Business DC, Statistical Programming DC, and serving on the board of Data Community DC, we meet a lot of people, many of whom either call themselves “data scientists” or aspire to do so. But these people have substantially different education, experiences, aptitudes, and attitudes. Why are they all using the same label?
We believe that this new job title or career path of “data science” came about because people were dissatisfied with existing ways of describing their roles and their work. But is everyone converging on “data scientist” progress, or is it just a source of confusion?
In the Spring of 2012, we observed that this new, vaguely-defined career, although tremendously exciting and fulfilling for all of us, was impaired by unclear communication, unrealistic expectations, and missed opportunities. Something had to be done. As data scientists, we thought that a natural way to bring more clarity to the issue would be to collect some data, so we developed a survey and recruited hundreds of participants. Our analysis focused on finding underlying explanatory structure in the results that would let us help to improve communication, expectations, and opportunities for and about data scientists.
Our primary result is that we were able to identify four major categories of data scientist, based on clustering the ways that our respondents viewed themselves and their careers. We created new titles for these categories, and studied the common patterns in our respondents. Here are the categories and some highlights:
- Data Businesspeople are the product and profit-focused data scientists. They’re leaders, managers, and entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an MBA.
- Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may think of themselves as artists or hackers, and excel at visualization and open source technologies.
- Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often in production environments. They often have computer science degrees, and often work with so-called “big data”.
- Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to organizational data. They may have PhDs, and their creative applications of mathematical tools yields valuable insights and products.
Furthermore, we were able to show how these categories correlate with varying skills in five general areas. This figure from the report shows the relationships between the four categories and the five skill groups:
Want to read more about the survey, our interpretation of the results, and how T-shaped skills fit in? Want to learn how our results might apply to organizations looking for data scientists, and to individuals looking for their next steps in professional development? Download Analyzing the Analyzers: An Introspective Survey of Data Scientists and their Work for free.