Tuesday, August 14, 2012

Data Scientists – building open data capability


Following on from the previous post, an emerging community of “farmers” are sharing practices and constructing a series of helpful guides on how to approach data management.  If you want to move beyond being a hunter gatherer of data, you should look at the guides that are under construction, and contribute your experience to the community. Here are three sources that you may find useful.


Semantic Community is a wiki dedicated to using and promoting Data Science - “It is not just where you put your data (cloud), but how you put it there." A good entry point is the section on free data visualisation tools. I was fortunate that one of the drivers behind the Semantic Community, Brand Niemann, was willing to be online at 4am to deliver a presentation to a W3C egov conference call, which helped me connect to this rich data source.


The Data Wrangling handbook is a crowdsourced “textbook” from the School of Data, supported by the good folk at the Open Knowledge Foundation. The OKFN blog last month published Managing Expectations  by Rufus Pollock which described the long term evolution of open knowledge; it promised to be the first of two posts, so watch out for the sequel.


The Guardian data blog has been doing some great work on visualising data about the Olympics over the last two week. Last year Tim O’Reilly wrote a short piece for Forbes on the topic of the “World’s 7 Most Powerful Data Scientists." More interesting than the fact that the list actually contains ten names, is the fact that they are all from the USA – just like "World Series" baseball. In the w3c discussion, Brand Niemann confirmed my view that the Guardian data blog is leading the application of data science to data journalism; maybe Simon Rogers should be at the top of the list.





1 comment:

  1. The Guardian data blog has been doing some great work on visualising data about the Olympics over the last two week.

    Media Monitoring

    ReplyDelete