Confire: A new Python library

Announcing the release of a new open source library: Confire is a simple but powerful configuration scheme that builds on the configuration parsers of Scapy, elasticsearch, Django and others. The basic scheme is to have a configuration search path that looks for YAML files in standard locations. The search path is hierarchical (meaning that system configurations are overloaded by user configurations, etc). These YAML files are then added to a default, class-based configuration management scheme that allows for easy development.

Full documentation can be found here: http://confire.readthedocs.org/

Confire on PyPI

In a fit of procrastination, I put my first project on PyPI (the Python Package Index): Confire, a simple app configuration scheme using YAML and class based defaults. It was an incredible learning experience into the amount of work that goes into Python developers being simply able to pip install something! I wanted to go the whole nine yards, and set up documentation on Read The Docs and an open source platform on Github and even though it took a while, it was well worth the effort!

Continue reading

Posted in Announcements, Python | Tagged , , , , , | 1 Comment

Announcing Discussion Lists! First up: Deep Learning

Data Community DC is pleased to announce a new service to the area data community: topic-specific discussion lists! In this way we hope to extend the successes of our Meetups and workshops by providing a way for groups of local people with similar interests to maintain contact and have ongoing discussions. Our first discussion list will be on the topic of Deep Learning. The below is a guest post from John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup.

A while back, there was this blog post about Deep Learning. At the end, we asked readers about their interest in hands-on Deep Learning tutorials.

ELEVEN

The results are in, and the survey went to 11. And as in all data science, context matters–and this eleven is decidedly less inspiring than Nigel Tufnel’s eleven. That said, ten out of eleven respondents wanted a hands-on Deep Learning tutorial, and eight respondents said they would register for a tutorial even if it required hardware approval or enrollment in a hardware tutorial. But interest in practical hands-on Deep Learning workshops appears to be highly nonuniform. One respondent said they’d drive from hundreds of miles away for these workshops, but of the 3000+ data scientists in DC’s data and analytics community, presumably more local, only eleven total responded with interest.

In short, the survey was a bust. Continue reading

Posted in Announcements, Community, GuestPost, Resources | Tagged , , | Leave a comment

Natural Language Processing in Python and R

This is a guest post by Charlie Greenbacker and Tommy Jones.

Data comes in many forms. As a data scientist, you might be comfortable working with large amounts of structured data nicely organized in a database or other tabular format, but what do you do if a customer drops 10,000 unstructured text documents in your lap and asks you to analyze them?

Some estimates claim unstructured data accounts for more than 90 percent of the digital universe, much of it in the form of text. Digital publishing, social media, and other forms of electronic communication all contribute to the deluge of text data from which you might seek to derive insights and extract value. Fortunately, many tools and techniques have been developed to facilitate large-scale text analytics. Operating at the intersection of computer science, artificial intelligence, and computational linguistics, Natural Language Processing (NLP) focuses on algorithmically understanding human language.

Interested in getting started with Natural Language Processing but don’t know where to begin? On July 9th, a joint meetup co-hosted by Statistical Programming DC, Data Wranglers DC, and DC NLP will feature two introductory talks on the nuts & bolts of working with NLP in Python and R. Continue reading

Posted in Announcements, Data Wranglers DC, Events, GuestPost, Languages, Meetup, Python, R, Statistical Programming DC | Leave a comment

Event Recap: DC Energy and Data Summit

This is a guest post by Majid al-Dosari, a master’s student in Computational Science at George Mason University.

I recently attended the first DC Energy and Data Summit organized by Potential Energy DC and co-hosted by the American Association for the Advancement of Science’s Fellowship Big Data Affinity Group. I was excited to be at a conference where two important issues of modern society meet: energy and (big) data!

There was a keynote and plenary panel. In addition, there were three breakout sessions where participants brainstormed improvements to building energy efficiency, the grid, and transportation. Many of the issues raised at the conference could be either big data or energy issues (separately). However, I’m only going to highlight points raised that deal with both energy and data.
Continue reading

Posted in Events, GuestPost, Reviews | Tagged , , , , , , , , , , , , , | Leave a comment

Event Recap: DSDC June Meetup

This is a guest post by Alex Evanczuk, a software engineer at FiscalNote.

Hello DC2!  My name is Alex Evanczuk, and I recently joined a government data startup right here in the nation’s capital that goes by the name of FiscalNote. Our mission is to make government data easily accessible, transparent, and understandable for everyone. We are a passionate group of individuals and are actively looking for other like-minded people who want to see things change. If this is you, and particularly if you are a software developer (front-end, with experience in Ruby on Rails), please reach out to me at alex@fiscalnote.com and I can put you in touch with the right people.

Screen Shot 2014-07-02 at 2.13.51 PM

The topics covered by the presenters at June’s Data Science DC Meetup were varied and interesting. Subjects included spatial forecasting in uncertain environments, cell phone surveys in Africa (GeoPoll), causal inference models for improving the lives and prospects of Children and Youth (Child Trends), and several others.

Continue reading

Posted in Data Science DC, GuestPost, Reviews | Tagged , , , , | Leave a comment

Event Recap: Tandem NSI Deal Day (Part 2)

This is the second part of a guest post by John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup.

Tandem NSI is a public-private partnership between Arlington Economic Development and Amplifier Ventures. According to the TNSI website, the partnership is intended to foster a vibrant technology ecosystem that combines entrepreneurs, university researchers and students, national security program managers and the supporting business community. I attended the Tandem NSI Deal Day on May 7; this post is a summary of a few discussions relevant to DC2. Continue reading

Posted in Events, GuestPost, Reviews | Tagged , , | Leave a comment

Data Visualization DC Workshops

DC2_workshop_cartoonData Visualization DC workshops have been a long time coming, but why did they take so long to organize? Organizing education in the DC area is a competitive enterprise, and the biggest challenge is finding good teachers. Public speaking can be frightening; public speaking and demonstrating expertise is even harder (as opposed to politicians who can dance around a subject without providing independently verifiable results). As a result, finding good teachers isn’t just about a willingness to run the gauntlet–the individual must be motivated and there must be a strong motivation to teach.

Existing approaches focus on teaching as a business. General Assembly and statistics.com share a significant portion of profit with their teachers, but money can be a catch-22. Once teaching becomes about money there is a cascading effect on expectations starting with the teacher and ending with the students. For a three-hour workshop a teacher can spend 20 hours preparing material. Considering they make $40-$80/hr (or more!) in their day jobs, that’s $800-$1600 in teacher labor costs per class. If proceeds are split 50/50 with the organizers, that’s at least $1600-$3200 total per class, or $107/student for 15 to 30 students, and that’s without considering anything else. Of course it’s expected that teachers teach more than once, but teaching even once a week is like adding an extra day of work to your schedule, something many people can’t do. In addition, students paying $100-$150/class have high expectations and are happy to be critical. So what’s the alternative? Continue reading

Posted in Announcements, Community, Data Visualization DC, Events, Management, Newsletter, Uncategorized | Leave a comment

Event Recap: Tandem NSI Deal Day (Part 1)

This is a guest post by John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup.

Tandem NSI is a public-private partnership between Arlington Economic Development and Amplifier Ventures. According to the TNSI website, the partnership is intended to foster a vibrant technology ecosystem that combines entrepreneurs, university researchers and students, national security program managers and the supporting business community. I attended the Tandem NSI Deal Day on May 7; this post is a summary of a few discussions relevant to DC2. Continue reading

Posted in Events, GuestPost, Reviews | Tagged , , | Leave a comment

DIDC Lean Data Product Development with the US Census Bureau – Debrief and Video

Thank you

I want to thank everyone for attending DIDC’s May Meetup event, Lean Data Product Development with the US Census Bureau. This was our first attempt at helping bring potential data product needs to our audience and, based on audience feedback, it will not be our last. That being said, we would love your thoughts on how we could further improve future such events.lean_data_product_panel

I want to add a massive thanks not only to our in-person and online panelists, but also to Logan Powell who was a major force in both organizing this event and also acting as the emcee and guiding the conversation.

Continue reading

Posted in Data Innovation DC, Events | Tagged , , , | Leave a comment

Win Free eCopies of Social Media Mining with R

This is a sponsored post by Richard Heimann. Rich is Chief Data Scientist at L-3 NSS and recently published Social Media Mining with R (Packt Publishing, 2014) with co-author Nathan Danneman, also a Data Scientist at L-3 NSS Data Tactics. Nathan has been featured at recent Data Science DC and DC NLP meetups.

Nathan Danneman and Richard Heimann have teamed up with DC2 to organize a giveaway of their new book, Social Media Mining with R.

Over the new two weeks five lucky winners will win a digital copy of the book. Please keep reading to find out how you can be one of the winners and learn more about Social Media Mining with R.

Continue reading

Posted in Announcements, Community, GuestPost, R, Sponsored | Tagged , , , , | 16 Comments