Blog

Data = Oil analogy

“Data is the new oil.” You may have heard this analogy before. It was coined by Clive Humby in 2006 and he said "Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”This analogy has been widely used and is great because it gives materiality to a very vague concept: data. However, up until this now famous analogy was made, no one really elaborated on it. This is what I intend to do here: to draw a realistical landscape of dataWhile this concept was first stated in 2006, it’s still very relevant today, especially with explosion of the internet, mobile, IoT, and the multitude of tools for people to generate content and to enable transactions of all types."For the few last weeks, I tried to understand who are the main actors working in data; what do they do and how they do it. "A few things became apparent very quickly: 1) Many companies don’t fall neatly into a specific category2) Many companies are in stealth mode and we only have few sentences to describe them."Data is an essential resource that powers the information economy in much the way that oil has fueled the industrial economy. Data is the fuel that powers decision-making. The more data you have, the greater the potential you have to make good informed decisions. In this way, data is power, thus it’s very valuable.

Data flows like oil. One must “drill down” into data to extract value from it. 

As far as analogies go, I think that this one is well suited to help us frame our thinking around how to gain value from data.Where oil is composed of the compressed bodies of long-dead micro-organisms, this personal data is made from the compressed fragments of our personal lives. It is a dense condensate of our human experience. Like oil, data has many forms. It can be words, videos, images, numbers, sounds.The main agents that would be interested by data would be companies, countries and individuals. In my mental framework, all of these agents have territories (the amount of data = relative size of territory). All of these territories have exploited or unexploited oil (data) inside it. In this way, garguantuan companies such as Apple, Facebook, Google or amazon would be empires: they would own a lot of land with a lot of data inside it.Industries (banking, retail, agriculture) are continents and companies are countries inside them.Then you have all types of agents that work with the oil (data) from your territory, some of them:

  • extract oil
  • store oil
  • refine oil (from oil to fuel, from data to insights)
  • Data Aggregators. Their job is to find data all over the world, using different techniques (android app, scraping, open datasets), to give context to data. Examples: Premise (to aggregate for themselves), Diffbot (to aggregate for others), Duedil, Enigma, skymind.
  • Data Extractors. Extractors already have organized data and just apply different models to literally extract the meaning out of it. This is by far the most crowded category, operating in almost every verticals. Some Extractors are specialized in specific types of data (computer vision, audio, NLP) while others in specific data extraction techniques (deep learning).
  • Data Refinors.
  • Data Displayers. These companies already have the insights from the data Extractors, their job is to present insights in a relevant way to the analysts or managers. Usually they say they are “Dashboard for your data”. A new trendy interface is using chatbots instead of a dashboard. Examples: Facebook M
  • Data Forecasters. They are like Data Displayers, but a step after. While Displayers focus on past and present (real time) insights, Forecasters focus on the future. It’s what’s closest to the shared idea of an AI. Displayers will shift to become forecasters with years.
  • Frontier Companies. They work on the core technology used by extractors, aggregators, etc.. Fun characteristic: these companies are often made of PhDs or PhD dropouts and are acquired at ridiculous high amounts. Examples: Deepmind.
  • Data Platformers. They are the guys who are here throughout the entire process: they aggregate, extract, display and predict

For each step of transformation from oil to fuel (data to insights) there are companies working on it. Some companies are in charge of the whole process (like Google). The amazing thing that’s unique to the refining process of data is that the systems get smarter and smarter with quantity and time. Data can thus be more refined (precise) with time (neural nets on ImageNet that boosts recognition rate). The latter can be done in many different ways: with deep learning, different machine learning techniques, neural nets, human labour. And each of these refining processes work best with different types of data. Some companies are specialized in the refining process of a specific type of data (clarifai using computer vision for images and videos).This is then how I would categorize companies operating in Machine Intelligence:The most successful company of humankind will be the one with all data (oil) of the world, in charge of its entire process from data to insights. With the reinforcement loop (refining process getting smarter and smarter), this company would be able to measure and predict almost everything. The most successful company in the world would be big brother. Turns out we need to watch out for Big Brother, monopoly ain’t good they say. To refrain this from happening, the creation of a big brother-like company, what we can do is:democratize public datasetsData fusion is retailing as of Big data is wholesaling. 90% of the big data technology has been designed by Google. People need to understand and experience data ownership. While everyone in our society is producing vast quantities of data, individuals rarely see or interact with any of it. When people are given tools to store, visualize, and explore their own data, they gain an understanding of the worth and utility of this information. Applied on a broad scale, this improved understanding of data could lead to better decisions by individuals — both in cases where there data is being misused and in cases where data can be applied to solve important problems like disaster response, cancer diagnosis or disease spread. Companies must begin treating data as an enterprise wide corporate asset Data fusion at big levels. Scanning > extract (when data is unstructured) > refine (when data is structured)> store > funnel > predict > display.Conclude with internet: it's the underground web that links all oil sources. That's where Datract is heading

Exploring the internet

The Internet

The Internet has become an integral part of everyday life and an integral part of most of modern economy in developed countries. Through the Internet we can communicate with our loved ones, we can buy what we need and we can even create a business.

From the inside, from the core, the Internet its amazing: its a massive network of connections between organizations all around the world. With submarine cables, satellites and tons of hardware and software to create the biggest communication network known to mankind, covering the entire world.

At Datract we have been collecting data about how the Internet is interconnected during the last months to do some basic data exploration and create some basic visualizations on how the internet is constructed, which countries are best connected, the amount of IP addresses assigned to each country, etc.

How the Internet is organized

This massive network of connections has no central authority, its just companies and organizations creating agreements to exchange data. They use the BGP protocol to coordinate and collaborate to create the entire Internet.

Companies creates agreements to let other companies to go through them to reach other parts of the Internet in exchange of money (transit) or in exchange of the other party doing the same (peering).

Internet

The only regulation comes from the ICANN, a not-for-profit public-benefit corporation with participants from all over the world dedicated to keeping the Internet secure, stable and interoperable. Basically ICANN assigns Internet Addresses, among other things.

Organizations operating on the Internet can use different strategies to reach the entire Internet, since directly connecting to any organization in the Internet is extremely costly and practically inefficient, most organizations creates agreements to other well connected organizations that in exchange for money let them reach any part of the Internet, and then seek for agreements of peering (free) with local organizations that could be interested in the mutual benefit of not going through a paid intermediary.

Years of agreements, cables and lots of economical and politic interests have created what is the Internet today and how each organization reach the rest of the Internet.

Our first goal exploring the Internet using publicly available data, was to create aninteractive graph visualization representing all the interconnections between Internet companies and organizations all around the world.

We created a graph, were the size of the nodes represents how many connections the node has, and the color of the edges represents the country of the organization.

So basically, its a photo of the Internet as it is now.

Warning: this visualization requires lots of CPU Power and memory to work.

The entire internet graph
Click to open the interactive internet graph

Who can become part of the Internet network

Each country has different laws on who can become part of the Internet, for example more economical liberal countries just ask you to register a company/organization, pay some taxes/fees and follow some rules for you to become directly to the Internet, later you can use this connection to the Internet to sell Internet to consumers or for lots of other reasons.

However, in more controlled countries, like china, only the government or government related organizations can directly connect to the Internet, the rest of the organizations simply pay for Internet access from this government organizations but are not allowed to be directly connected to the rest of the Internet or to establish peering or transit agreements with other organizations.

We have created an interactive where you can select a country and view the connections to other countries. You can check the percentage of the world connected to your country. We have found very interisting things using this interactive map!

Map of internet country interconnections
Click to open the interactive map of Internet countries

To better show which countries are best interconnected to the rest of the world, we have created an interactive visualization with a graph including all countries. The size of the nodes represents the ammount of countries connected to the node. Better connected countries are near the center, while countries with few external connections are drawn far from the center.

Map of internet country interconnections
Click to open the interactive map of Internet countries

Internet addresses: a scarce resource

Each computer connected to the Internet uses an I.P. address. The I.P. address is the number that identifies a computer in the entire Internet. There are 2^32 possible Internet addresses, that is 4.294.967.296 possible devices connected to the Internet directly. The ICANN assigns blocks of I.P. addresses to entire countries, so the country can use them.

Each country decides how to use their addresses, most countries sell the I.P. addresses to Internet companies or organizations. Other countries decide to control all the addresses available for them, so the Internet can only be accessed through/by the government.

Internet organizations are collaborating to update their software/hardware to be able to use longer addresses in the future, called IPV6 addresses. This change is not trivial given the size and decentralized organization of the Internet.

For the moment, there are 2^32 I.P. addresses and its becoming a very scarce resource…

We have created an interactive visualization to see which countries get more addresses and in what proportion. Bogus addresses are addresses that for technical reasons can not be assigned to any country.

Graph with addresses per country
Click to open the interactive I.P. country assignations graph

You can click in any continent to see the amount of IP addresses inside the continent. Countries with very few addresses have been removed from the graph.

Sign up for early access to our private beta.
Thank you 👍
😱 Try again