“Data is the new oil.” You may have heard this analogy before. It was coined by Clive Humby in 2006 and he said "Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”This analogy has been widely used and is great because it gives materiality to a very vague concept: data. However, up until this now famous analogy was made, no one really elaborated on it. This is what I intend to do here: to draw a realistical landscape of dataWhile this concept was first stated in 2006, it’s still very relevant today, especially with explosion of the internet, mobile, IoT, and the multitude of tools for people to generate content and to enable transactions of all types."For the few last weeks, I tried to understand who are the main actors working in data; what do they do and how they do it. "A few things became apparent very quickly: 1) Many companies don’t fall neatly into a specific category2) Many companies are in stealth mode and we only have few sentences to describe them."Data is an essential resource that powers the information economy in much the way that oil has fueled the industrial economy. Data is the fuel that powers decision-making. The more data you have, the greater the potential you have to make good informed decisions. In this way, data is power, thus it’s very valuable.
Data flows like oil. One must “drill down” into data to extract value from it.
As far as analogies go, I think that this one is well suited to help us frame our thinking around how to gain value from data.Where oil is composed of the compressed bodies of long-dead micro-organisms, this personal data is made from the compressed fragments of our personal lives. It is a dense condensate of our human experience. Like oil, data has many forms. It can be words, videos, images, numbers, sounds.The main agents that would be interested by data would be companies, countries and individuals. In my mental framework, all of these agents have territories (the amount of data = relative size of territory). All of these territories have exploited or unexploited oil (data) inside it. In this way, garguantuan companies such as Apple, Facebook, Google or amazon would be empires: they would own a lot of land with a lot of data inside it.Industries (banking, retail, agriculture) are continents and companies are countries inside them.Then you have all types of agents that work with the oil (data) from your territory, some of them:
- extract oil
- store oil
- refine oil (from oil to fuel, from data to insights)
- Data Aggregators. Their job is to find data all over the world, using different techniques (android app, scraping, open datasets), to give context to data. Examples: Premise (to aggregate for themselves), Diffbot (to aggregate for others), Duedil, Enigma, skymind.
- Data Extractors. Extractors already have organized data and just apply different models to literally extract the meaning out of it. This is by far the most crowded category, operating in almost every verticals. Some Extractors are specialized in specific types of data (computer vision, audio, NLP) while others in specific data extraction techniques (deep learning).
- Data Refinors.
- Data Displayers. These companies already have the insights from the data Extractors, their job is to present insights in a relevant way to the analysts or managers. Usually they say they are “Dashboard for your data”. A new trendy interface is using chatbots instead of a dashboard. Examples: Facebook M
- Data Forecasters. They are like Data Displayers, but a step after. While Displayers focus on past and present (real time) insights, Forecasters focus on the future. It’s what’s closest to the shared idea of an AI. Displayers will shift to become forecasters with years.
- Frontier Companies. They work on the core technology used by extractors, aggregators, etc.. Fun characteristic: these companies are often made of PhDs or PhD dropouts and are acquired at ridiculous high amounts. Examples: Deepmind.
- Data Platformers. They are the guys who are here throughout the entire process: they aggregate, extract, display and predict
For each step of transformation from oil to fuel (data to insights) there are companies working on it. Some companies are in charge of the whole process (like Google). The amazing thing that’s unique to the refining process of data is that the systems get smarter and smarter with quantity and time. Data can thus be more refined (precise) with time (neural nets on ImageNet that boosts recognition rate). The latter can be done in many different ways: with deep learning, different machine learning techniques, neural nets, human labour. And each of these refining processes work best with different types of data. Some companies are specialized in the refining process of a specific type of data (clarifai using computer vision for images and videos).This is then how I would categorize companies operating in Machine Intelligence:The most successful company of humankind will be the one with all data (oil) of the world, in charge of its entire process from data to insights. With the reinforcement loop (refining process getting smarter and smarter), this company would be able to measure and predict almost everything. The most successful company in the world would be big brother. Turns out we need to watch out for Big Brother, monopoly ain’t good they say. To refrain this from happening, the creation of a big brother-like company, what we can do is:democratize public datasetsData fusion is retailing as of Big data is wholesaling. 90% of the big data technology has been designed by Google. People need to understand and experience data ownership. While everyone in our society is producing vast quantities of data, individuals rarely see or interact with any of it. When people are given tools to store, visualize, and explore their own data, they gain an understanding of the worth and utility of this information. Applied on a broad scale, this improved understanding of data could lead to better decisions by individuals — both in cases where there data is being misused and in cases where data can be applied to solve important problems like disaster response, cancer diagnosis or disease spread. Companies must begin treating data as an enterprise wide corporate asset Data fusion at big levels. Scanning > extract (when data is unstructured) > refine (when data is structured)> store > funnel > predict > display.Conclude with internet: it's the underground web that links all oil sources. That's where Datract is heading