Tuesday, August 14, 2012

Data Scientists – building open data capability

Following on from the previous post, an emerging community of “farmers” are sharing practices and constructing a series of helpful guides on how to approach data management.  If you want to move beyond being a hunter gatherer of data, you should look at the guides that are under construction, and contribute your experience to the community. Here are three sources that you may find useful.

Semantic Community is a wiki dedicated to using and promoting Data Science - “It is not just where you put your data (cloud), but how you put it there." A good entry point is the section on free data visualisation tools. I was fortunate that one of the drivers behind the Semantic Community, Brand Niemann, was willing to be online at 4am to deliver a presentation to a W3C egov conference call, which helped me connect to this rich data source.

The Data Wrangling handbook is a crowdsourced “textbook” from the School of Data, supported by the good folk at the Open Knowledge Foundation. The OKFN blog last month published Managing Expectations  by Rufus Pollock which described the long term evolution of open knowledge; it promised to be the first of two posts, so watch out for the sequel.

The Guardian data blog has been doing some great work on visualising data about the Olympics over the last two week. Last year Tim O’Reilly wrote a short piece for Forbes on the topic of the “World’s 7 Most Powerful Data Scientists." More interesting than the fact that the list actually contains ten names, is the fact that they are all from the USA – just like "World Series" baseball. In the w3c discussion, Brand Niemann confirmed my view that the Guardian data blog is leading the application of data science to data journalism; maybe Simon Rogers should be at the top of the list.

Wednesday, August 1, 2012

The anthropology of Open Government Data – moving beyond Hunter-Gatherer

RAW DATA NOW was the rallying cry issued by Rufus Pollock from the Open Knowledge Foundation in November 2007. Sir Tim Berners-Lee picked up the call in his landmark TED talk from February 2009, and now, nearly five years on, Open Government and Open Data have become part of government operations for many countries around the world.

In this post, I propose that we are still at an early stage of Open Government Data, and use the stages of evolution of our species as a framework for thinking about the future of Open Data.

Hunter Gatherer

For over 100,000 years, homo sapiens was a Hunter Gatherer and generally nomadic, hunting and foraging for food and moving constantly in the search for sustenance.

The open data community is essentially a hunter-gatherer world – finding food (data) and providing it to our families in the best way possible.  The tribes have ways of sharing information on where good food can be found (#opendata on Twitter is a good source), but in some terrains (governments) food is hard to find, and it takes skill, experience, and cunning to be an effective data hunter.

Fortunately hunters are willing to share their findings, and provide signposts to help hunters find easy to gather food, although a lot of it has tough skin (pdf) and is of low (calorific) value. 

Maps are emerging, but are not authoritative. Interoperability is sharing information on the design of the bow and arrow, through channels such as on Scraperwiki and G_Refine.


Humans first began the systematic cultivation of plants and animals between 7,000 and 10,000 years ago, and the relative security provided by agriculture provided the incentive for most humans to live as farmers in permanent settlements.

The idea of farming and harvesting data is emerging in a few areas; two notable examples are OpenCorporates and the World Government Data Store, where farmers have planted crops from many terrains in one place.

The emergence of data geo-coding and Spatial Data Infrastructure initiatives suggest that more facilities will be available to support agriculture.

Cities, states and empires

The next phase of anthropological evolution saw the establishment of governments, complex economic and social structures with increasing specialisation, sophisticated language and writing systems, and distinct cultures and religions.  The rise and fall of these cities, states and empires has happened across the world for the last 2,000 years.

Open Government Data has not yet moved into this stage, although some people are thinking what it might bring. Government is the world’s largest information business – from global organisations such as the UN to national, regional and local governments.  How will the supply chain change as the internet deconstructs the management and distribution of government information? Other industries have tried to preserve their old business models but have been unsuccessful – artificial scarcity is met by abundance.

What are the specialised roles that will emerge to support this more complex environment - data retailers, data wholesalers, distributors, quality control inspectors, curators and regulators – and what are the new operational models?  Can we expect to see the emergence of the farmers market, specialist stores, department stores and hypermarket chains?


Just over 200 years ago, the industrial revolution replaced human and animal labour with machines which led to major new modes of mass production, and the related social and economic changes that are the foundation of modern society.

The promise of the semantic web, interoperability, and the 5 star scale of open data may be a pointer to a future. Many have commented that the semantic web is too complex for today’s operational needs; the open data ecosystem may need to evolve through different phases.  The experience of early pioneers can help to ensure that all parts of the ecosystem develop to support high levels of automation - the journey will be much shorter than the evolution of the human race, but will still take decades to reach full potential.