Data Is Not The New Oil
We all know that. But what is it?
Dateline: Woking, 12th February 2022.
I was sitting on a webinar the other day, having a cup of tea and listening to a discussion about the future of retail banking. That’s about as exciting as it gets for me. But then someone on the panel said “data is the new oil” and sat back as if they had delivered a revelation. That was great, because it gave me the opportunity to say something at last. That’s because not only is it not a revelation (since management consultants say it all the time), but it’s not true.
I’ve said this before. In an article for Forbes, talking about the options for regulating technology companies in the new economy, I said explicitly that data is not the new oil (I actually said data is not the new West Texas Intermediate and Facebook is not the new Standard Oil) thus driving my stake into the ground. So I thought perhaps I should back up my position by asking if data is the not the new oil, then what is it?
The “new oil” aphorism is commonly attributed to Clive Humby, who rather famously helped to create the then-revolutionary Tesco Clubcard loyalty scheme many years ago. In Humby’s view from 2006, data resembled oil because "it's valuable, but if unrefined it cannot really be used”. He thought that just as oil has to be refined, to create gas, plastic, chemicals and so on to create the actual resources needed to power the economy, so must data be broken down and refined (ie, analysed) to create value.
All because of refining.
It’s a useful, albeit limited, analogy but as James Bridle (author of the brilliant “The New Dark Age”) points out, the emphasis on the work that is required to make information useful has been lost over the years, aided by processing power and machine intelligence, to be replaced by pure speculation, the hoarding of data for potential future use. As he puts it, “in the process of simplification, the analogy's historical ramifications - as well as its present dangers and its long-term repercussions - have been forgotten” .
The data-is-oil meme continued to spread for another decade. In 2014 Wired magazine restated it with further historical context by saying that data is like oil back in the 18th century, a recently discovered and untapped asset of great value. Those who learn to “extract and use it” will strike it rich.
A few years later, in 2018, they returned to this theme and called the data-as-oil memo “one of those deceptively simple mantras for the modern world” . They attributed its spread to the power of the metaphors around the wildcatting nature of oil exploration and the extractive exploitation of a trapped asset, but that doesn’t work for me. Facebook and Google aren’t out on a frontier drilling for data, they are forming environments where data is created, managed and ultimately farmed to their exclusive benefit. They are more like feudal Barons, allowing the serfs to fertilise the fields and then harvesting their fruit on the Lord’s behalf.
(Maybe data is the new water rather than new oil. It’s all around us, raining down and pooling and forming rivers and lakes and seas. Unfortunately, it often gets polluted and then has to be expensively purified before it can be used. No, that doesn’t seem quite right either. We don’t recycle data to use it again, because the original data is still there even after we’ve made a purified version.)
So if data isn’t the new oil, the new water or the new black, then what exactly is it? Maybe it isn’t like anything else at all, and that’s why it is so difficult to think about how to regulate it and the businesses that depend on it. The crucial point about data that makes it different from oil, water, lithium or time is that it simply does not get used up. Far from it: the more data you process the more data you produce and the volumes of data being produced are exploding towards infinity on a seemingly unstoppable journey.
At around the same time as that Wired piece talking about extraction, Bridle put forward a much better metaphor, saying that data “more closely resembles atomic power than oil - an effectively unlimited resource that still contains immense destructive power and that's even more explicitly connected to histories of violence”. From this perspective data isn’t the new oil, it’s the new plutonium and when you put it in an AI-powered fast breeder reactor, the fun starts.
I do not mean this in a trivial or superficial sense. A recent paper in Science examines the value of new data analysis of past publications as a key tool in extracting added value, noting the key finding that “latent knowledge regarding future discoveries is to a large extent embedded in past publications”. They illustrate their point by using AI to predict the development of a key thermodynamic material from papers that were published years before the actual discovery!
Data is the new plutonium works much better as a narrative. If data is the new nuclear fuel, then personal data is the new nuclear waste that no-one wants to handle because of the cost of managing it, which certainly works as a metaphor for Personally Identifiable Information (PII) in the post-GDPR post-COVID all online business of the future.
The point about the cost of personal was well-made by Bernard Marr when writing about Shell (an actual oil company) becoming a data-driven organisation. As he put it, data is an asset but also a potential liability, which is why I am so curious about the next generation of business models built around credentials rather than attributes, proofs instead of facts, of information about data rather than data itself. That is a practical narrative for the future and should be central to corporate strategies.
Here is a non-financial use case that I often use to explain why this is the way forward. Online dating apps and websites provide a rich and practical environment for exploring different notions of identity, so they are useful for developing this particular narrative.
I go to a dating site and create an account. As part of this process the dating site asks me to log in via my bank account. At this point it bounces me to my bank where I carry out the appropriate two factor authentication to establish my identity to the bank's satisfaction. The bank then returns an appropriate cryptographic token to the dating service, which tells them that I am over 18, resident in the UK have funds available for them to bill against. These are provided as cryptographic proofs, not as raw data (eg, my date of birth).
One thing thing that is not provided is my “real” identity, which is safely locked up back in the bank vault. My real identity has been linked by the bank (but no-one else) to a virtual identity used in online interactions. So my Internet dating persona contains no PII, but if I use that persona to get up to no good then the dating sites can provide the persona to the police, the police can see that the token comes from my bank and my bank will tell them that it belongs to Dave Birch.
This seems to me a very appropriate distribution of responsibilities. When the Internet dating site gets hacked, as they inevitably do, all the criminals will obtain is a meaningless token: they have no idea who it belongs to and the bank won’t tell them. No nuclear waste, no half-life, no exclusion zones for future generations.