The personal data ecosystem: Out of Trust, Out of Control
1. The pre-digital social contract was straightforward: I chose to disclose my secrets to others based on the level of trust in our relationship. I chose to reveal personal data when the value I got was more than the risk of something bad, based on the integrity of the other party.
2. Trust has been reduced by ubiquitous sensors and monitoring (such as mobile phone location sensing, CCTV with face recognition, and embedded systems[i]), people choosing to be “always on”, and the power of big data analytics to bring together data from different sources. In a post-digital world we have no ability to assess the integrity of those making judgments on our personal data.
3. Trust can be restored by integrity, reputation, and transparency – qualities that are increasingly absent in public world.[ii] The power and knowledge asymmetry breaks personal trust. So we must think laterally about how to rebalance the asymmetry.
4. The personal information ecosystem is so complex that it cannot be categorized, controlled or managed. There are endless articles about the volume of data created by and about people; there is not “too much data”, there is what there is.[iii] There are small patterns in big data (a very low signal to noise ratio), so exponential increases in data create more opportunity for inferences. Control must be embedded in different parts of the ecosystem, and must cover user generated data (name, registration details, email), observed data (location, search behavior, social connections), and inferred data (looking for new home, medical condition).
5. Inferred data can be crass, spooky, revealing and sometimes just plain wrong. The data algorithms are programmed and the results interpreted without context, often without any sensitivity or respect for the individual. Correlation does not mean causation. But, once data has been interpreted, it becomes real, and is reinforced without the subject having any opportunity to correct errors.[iv] Inferred data is the toxic zone of the personal data ecosystem.
6. The boundary between public and private knowledge is rapidly and dramatically changing as the personal data ecosystem expands. This can be illustrated by using the Johari grid[v] to divide personal data into four categories.
7. Two changes relate to public information about the individual: (1) More personal information, previously only known to the individual, and disclosed to trusted others, is becoming “public”; and (2) More personal information previously “blind”, is also being shared, and thereby “public”. Both these changes relate to the transparent and pervasive nature of digital media (in particular social media). Individuals have some control over the extent to which they share and read personal information online, making these changes relatively benign.
8. The third change – to inferred data – is more challenging. Data from the “unknown” is moving into the “blind” – so that others know more things about me that I do not know - based on inferences from personal data harvested from the personal data ecosystem. This is the honeypot, where a lot of money can be made in a domain with no rules.[vi]
9. Personal Data is the new currency of the internet. Personal data costs half a cent to collect and is worth around $1,200. People are clustered and their data is traded. Why is this bad? What is wrong with producing a list of left handed dentists who have visited Disneyland in the last 5 years? What about selling a list of rape victims at $0.05 per name? What about selling data on suspected alcoholics, HIV sufferers, or people inquiring about abortion?[vii]
10. Privacy as a social norm has been replaced by privacy as a political norm, malleable by media and controllable by the powerful. Polls in many countries report that the majority agree with the statement “it is worth losing some personal privacy in order to keep us safe from terrorist attacks”, but this view is not shared by minorities. The level of support is less about surveillance and more about trust in the organisation that is doing the surveilling.
11. Many commentators are suggesting mechanisms (such as laws & regulations, or user education) to build a new post-digital social contract based on shared values on how personal data should be used. This approach is dangerous because intelligent and well-intentioned people who understand the critical issues are diverted from building a robust post-digital social contract, and meanwhile power asymmetry increases.
12. There are three major reasons that this approach will not work for the global personal data ecosystem: (1) there is no effective jurisdiction to create a regulatory and compliance regime; (2) there is no agreement on shared values across different cultures;[viii] (3) the approach will not control “bad actors”.[ix]
13. In other domains, money is often used to crystallize the balance between competing claims, but personal data is different – ethically and economically. Ethically, there are generally agreed no-go areas for trading personal assets – kidneys, blood, babies – but for personal data there are no bright lines, only fuzzy edges. Economically, a personal data asset is a non-rival good – it can be shared without losing value. A new calculus is needed.
14. The strategic risk is homogeneity - homogeneity reduces resilience. The personal data ecosystem needs diversity to be innovative and sustainable. By clustering individuals based on their profile, the profiled become the profile, become predictable and become exploitable. Diversity has intrinsic value by creating and maintaining the variety of personal data. When patterns tell me who I am, I become what they tell me. The personal data ecosystem is a public good, and its future must not be viewed through a lens of property rights.
[ii] Adam Curtis summed it up perfectly: “Nobody trusts anyone in authority today. It is one of the main features of our age. Wherever you look there are lying politicians, crooked bankers, corrupt police officers, cheating journalists and double-dealing media barons, sinister children's entertainers, rotten and greedy energy companies and out-of-control security services.” Suspicious Minds .
[v] The Johari Window was developed in the 1950s as a framework for understanding interpersonal relationships; its use in the personal data ecosystem was suggested to me by Kaliya Hamlin.
[vi] Examples of inferences include targeted marketing of products (“people who bought this, also bought this”), genome sequencing to identify pre-disposition to health events, and law enforcement (“has a family history of criminal behavior”). John Podesta , who led the Big Data review for the White House, commented that “One significant finding of our review was the potential for big data analytics to lead to discriminatory outcomes and to circumvent longstanding civil rights protections in housing, employment, credit, and the consumer marketplace.”
[vii] Sue Halpern has estimated that the personal data industry is $120 billion and talks about data sales in this video this video. .
[viii] While there could be general agreement to the statement ‘personal data collection is necessary to catch the terrorists’, this statement presents two problems: (1) how to decide who is a terrorist, recognizing that yesterday’s terrorists can become tomorrow’s governments; and (2) how to decide someone is a terrorist without gathering their personal data.
[ix] Bad actors (in both public and private sector) decide to ignore regulations to achieve other goals that they consider more important. If you trust an organisation with your data, you believe they will act in good faith (the good actor) and the personal data ecosystem can make this more efficient and effective. If you do not trust an organisation, any personal data ecosystem will not protect you; for example you can set a policy to require a company to confirm that they have deleted your data, but you cannot verify that your data is deleted.