There IS such a thing as data, Benedict Evans

Reading Time: 5 minutes

Again I was drawn to Benedict Evans’ emphatic statement that there is no such thing as data (There’s no such thing as data — Benedict Evans (ben-evans.com)). In this essay, Benedict challenges the present infatuation with data, claiming that in practice, data’s value is ineffectual, even bordering on the irrelevant.

He first succeeded at baiting my click with an episode of the same title on his Another Podcast with Toni Cowan-Brown (11 January 2021). Back then I think I surmised their argument is that data gets complicated with ownership and differing source systems, so it’s not worth worrying about too much. In this more recent essay though, perhaps the crux of the argument is more simple.

I was actually hoping that this topic would be in a similar vein to Professor Tom Wilson’s 2002 academic paper, The nonsense of ‘knowledge management’ – which was very formative in my early days of data and information management. In that paper, the professor argued quite successfully that the term KM was little more than a blurring of lines with information management. And that blurring was due to the information field no longer being sexy enough for management consultants and platform roadmaps.

Mr Evans though has come from very different stock, from largely telco market analysis and tech venture capital & industry trends. So it’s unsurprising he (quite sensibly) may have never thought of the discipline of data and information management as sexy.

I didn’t tweak this point at the time. Maybe I assumed due to half a dozen years of appreciating Benedict Evans’ content on Twitter – and as a subscriber to his newsletter for the past couple, that I would always agree with everything he says. Until now this position has held true for topics I don’t remotely understand. And perhaps this is why I immediately bit at this apparent reheating of a position that data (a topic I’ve had two decades of involvement with) is all nonsense.

https://twitter.com/benedictevans/status/1532632592658440194

Benedict was more clear, and correct in his reply. I was hung up with how things were phrased, rather than the accuracy of the claim. To begin an essay with the dismissive premise was actually a wonderful prompt to spark the attention of a student and practitioner of data, information, and architecture. The master stroke however was to go on to say it isn’t worth anything. I of course figured this to mean the personal, intrinsic and ongoing value that our data retains. I found, however, it most probably reflects the kind of returns that a venture capital lens would expect to see in a portfolio.

This point is developed further in the essay, claiming that our Instagram posts mean very little. A quick learner, I tried re-reading this as they mean “very little commercially”. But people aren’t interested in the commercialisation of their data. Quite the opposite. (Although we’ll all have a problem with non-viable platforms if no one is profiting.) Benedict views Instagram likes as “not [being] your ‘my’ data or ‘your’ data alone, and it’s not worth much without the context of all the other likes and follows.” This doesn’t sound like a problem of data not existing, or nonsense. It sounds like much more data exists than we originally conceived, and its ownership and management is complicated.

Similarly for likes on other social media platforms. Adding TikTok and PageRank into this same discussion, he sees “the value isn’t in the ‘data’ at all but in the flow of activity around it”. Yet it somehow omits that this flow of activity is captured, of course, in data. Then it steps further to consider those data streams of human interactions to not be restricted just to the world of the living. He challenges us to see these phenomena as mechanical Turks. I read this as data represents human activity, therefore, like other human processes we can automate without humans, and with scale. I worry what kind of future that will be. They are systems – it correctly highlights – but they’re human systems. By default those will always compose and present human data.

But back to the definitions used. I’m not sure we started with a valid foundation when it begins with “‘data’ is not one thing, but innumerable different collections of information.” Data is generally about one thing, and collections of it progresses to information with adding a context. It’s through context, we can understand. It’s not the other way around. There is little to no value in the isolated values of spreadsheet columns, but if we know the rows represent a highly sensitive context, the overall information asset which is produced has a clearer value and can certainly be leveraged to produce greater insights.

The contrary example the essay used here was combining wind turbine telemetry with specific public transport events. Their unhelpful correlation is pretty obvious. That’s not the fault of data, but the juxtaposition of two completely different contexts. Data relates to things (or events/entities). So very different “things” will rarely have a useful relationship between their data. What can be notable, and perhaps is undersold here (and oversold in plenty of industries) is how the advances in AI can bring potential in inferring and identifying causal relationships between disparate data. Such links may be inconceivable and inaccessible to the capabilities and capacity of human analysis.

Not to stop there, Evans asserts the “uselessness of common assertions” with an interesting example that routing insights from delivering large volumes of restaurant orders may not assist missile guidance systems. I hope not! (Although I think we’ve been through the idea of borrowing military hardware to deliver food.) My view again is that data is merely an atomic representation of the thing. It’s not a useful or achievable goal to make a single pool of all and everything we know about everything in an understandable (let alone actionable) way. For the reasons of analysis, many relatable (but not all) data sets can be brought together for wider insights. At the level of enterprises, data lakes aim to be that comprehensive repository of respective insight. I say respective, because it will still be based on a context of how, and for what, it was collected; and thereby how it might and might not be used. Even climate change won’t boil the ocean quick enough for arbitrary links to be made between everything and everything. And despite Benedict raising the challenge and nonsense of such an activity, I’m not sure that anyone is explicitly asserting they can and will.

The essay ended with a summary comparing the current AI and data concerns, with previous generational concerns associated during the early adoption of databases. It argues that the risks didn’t live up to the concerns of that time. So we shouldn’t worry now about topics of National or strategic data. Maybe Benedict’s position is indeed accurate, but the question will remain who is making most value from key data sets. Data exists everywhere, and vast arrays of data at scale with advanced analytics can tell us things we didn’t know before.

Any new insights that are generated can be used exploitatively before regulators can catch up. Surely this should all be handled with care, which is best done by appreciating its true value. So I like to think, even at a non-macro level, data is somewhat more than a nonsense or in fact not non-existent.

To conclude, I really like the referenced Tim O’Reilly macro quote that ‘data isn’t oil – it’s sand’. But I also like a competing value proposition by kids author & broadcaster, Michael Rosen, in the form of a poem called Words Are Ours. [laughing emoji didn’t work here]