Wednesday, 3 July 2013

Beyond Data Mining: Knowledge Mining


We live in the digital age. The flow of data is truly impressive. Statistics inform us that each year we generate more data than the past generations did in decades. But the problem is that Information Technology (IT) has concentrated on simply automating old ways of thinking, creating bottlenecks and problems we didn't even imagine, and not really inventing new processes or approaches. There is little innovation going on in the IT arena. But one thing is certain, we are drowning in data and we're thirsty for knowledge. The problem is not simply storage (disk space is cheap). The big deal is how to extract workable knowledge out of all this data. A few examples:

Every 60 seconds there are:

  • 170 million emails
  • 700000 search queries
Every day: there are:

  • 15 petabytes of new information
  • 7 trillion text messages
The amount of data generated by everything that forms part of the economy is astronomical. People are speaking of Big Data. But a new problem emerges. We must resort to new means of analyzing this mass of data and turning it into something useful. Simply storing huge amounts of numbers is not equivalent to progress.  As the state of our global economy shows, conventional means of data analysis, coupled with simulation (the so called Business Analytics) are a bit outdated. To say the least. Old paradigms don't scale. What is the alternative?

Turn data into structure

Structure is the overture to knowledge. But what is knowledge? What is a "body of knowledge"? Setting aside ontological hairsplitting, we could say that a body of knowledge is equivalent to a structured and dynamic set of inter-related rules. The rules can be crisp or fuzzy or both. But the key here is structure. Structure is the skeleton upon which a certain body of knowledge can be further expanded, refined, modified (this is why we say "dynamic"). One could say that structure forms the basis of a model or of a theory. Today there exist many ways of extracting structure from data. Statistics is one way. Building models based on data is another. But because building models and mishandling of statistics has contributed to the destruction of a big chunk of our economy, we have invented a new method of identifying structure in data - a model-free method, which if free of statistics and building models. A method which is "natural" and unbiased.

Consider a piece of ordinary data (such as that managed by accountants in any corporation or a bank, or by an investor):

The data (only a portion is illustrated) has this structure, known as a Business Structure Map:

What the map illustrates is the complete set of dependencies (we like the word "relations") between the business parameters. These relations are, de facto, rules. Rules of the type "if A increases then B decreases". So the above map does represent a body of knowledge. In this particular example it represents the knowledge of the functioning of a large multinational software firm as reflected in the financials it publishes on a quarterly basis. In order to understand how to navigate such maps read here. It is easy and intuitive.

The point now is this. When you make decisions based on data, such as the example illustrated above, what do you do about its structure? Do you take it into account? Probably not. Many managers do have a feeling of how their companies work. But intuition is one thing, science is another. In turbulence, intuition, even when backed by years of successful practice, can fail. See the global economy meltdown. We say that:

There should be a Business Structure Map on the table during every Board Meeting

The above map has around two dozen parameters and about a hundred rules. In our quarterly analysis of the Eurozone economy, we analyze a total of 648 parameters (24 parameters for each of the 27 member states). The corresponding Business Structure Map has approximately forty thousand rules! That gives an idea of the immense difficulty which fixing the EU will entail.

Knowledge Mining, NOT Data Mining!

The big deal, however, is not just knowledge through structure. It is also about mapping gigabytes of data of data onto megabytes needed to store a Business Structure Map. The degree of condensation is phenomenal.
But there is more. Model-free methods. Because model-free methods, which are employed to build Structure Maps, don't require you build math models on top of your data, they take you  to the next level. What you get is this:

  • understanding of the structure of data - relationships, topology, hubs, information flow patterns, etc. Structure, not hundreds of pie charts, plots or surfaces.
  • new means of parameter ranking
  • transformation of terabytes of data into megabytes of knowledge
  • measures of complexity and critical complexity
  • measures of resilience and fragility
  • global patterns

The most important of these is understanding. In order to understand Nature better we must analyze the data is provides us with in its pure form, not using methods which warp and distort the information it carries. With statistical (and other) techniques it is incredibly easy to destroy information. Model-free methods preserve information in its original form and shape. Building models is NOT the only way to proceed.