Marc Joffe is the founder of Public Sector Credit Solutions (PSCS), which applies open data and analytics to rating government bonds. Before starting PSCS, Marc was a Senior Director at Moody’s Analytics. You can contact him at email@example.com. Marc is also one of the winners of Sunlight Foundation’s OpenGov Grants.
Extracting useful information from PDFs is a problem as old as … PDFs. Too often, we focus on extracting information from a specific set of documents instead of looking at the bigger picture. If you’ve ever struggled with this problem, join us for Sunlight’s PDF Liberation Hackathon, dedicated to improving open source tools for PDF extraction.
Instead of focusing on one set of documents, coders will come together to add features, extensions and plugins to existing PDF extraction frameworks, making them more flexible, useful and sustainable. Sunlight’s PDF Liberation Hackathon will tackle real-world PDF data extraction problems. In doing so, we will build upon existing open-source PDF extraction solutions such as Tabula and Ashima’s PDF Table Extractor built on Poppler. In addition, hackers will have the option of using licensed PDF software libraries as long as the implementation cost of these libraries is less than $1,000. If you have an idea for a library you want to use, please mention it in your signup form and we will try to work out the licensing ahead of time so that things run smoothly.
Latest posts by Guest Author (see all)
- DC NLP February Meetup Announcement: Sentiment Analysis - February 10, 2014
- Data Book Review: Anonymizing Health Data - January 13, 2014
- The Power of Power: Common Problems Running Experiments Online - January 8, 2014