Collecting large amounts of data for no specific purpose is generally a waste of both time and money. It’s only when you combine sources of data (both within and outside the organisation firewalls) with analytic applications that you can give the information context and make it meaningful for the business. Without this context, it’s impossible to gain any intelligence and you lower the value of the information - plain simple "infonomics".
Everyone is talking about “big data”. A few minutes ago, a Google search on "big data" gave me over 820,000,000 hits. Initially we had three Vs (volume, velocity and variety) and now thanks to Forrester, we have a fourth V (variability). Whenever I think of big data, I invariably settle down for "Information Optimisation".
From a geek’s perspective, let me explain what this really means. We are on a quest to derive insights from data which is growing every passing minute (volume), this data is getting created and manipulated at the speed of thought (velocity), this data spans across different domains like structured, semi-structured to unstructured to multi-media files (variety) and most importantly, this data needs to be handled preserving context and be primed for easier distribution (variability).
I would argue that there are a good number of scenarios possible with different levels of scale and complexity associated with these 4Vs, which need closer analysis to help define how to approach the architectural and implementation strategies concerning big data:
- Data can be both big and poly-structured. For example, consider the classic Hadoop log-collection use case, or MarkLogic's databases, or even the dynamic-schema parts of relational data warehouses built by Zynga and eBay.
- Data can be big and yet simply structured. I think most of Teradata’s and Vertica’s petabyte-scale installations would fit that description, the countless examples of legacy data warehouses would suffice as well.
- Data can be not-so-big but polystructured. Consider, for example, the traditional business applications and associated structured and unstructured data they handle.
- Data can be not-so-big and simply-structured. Consider, for example, most of the traditional RDBMS world.
This is what I have advised clients who are puzzled and worried (thanks to the non-stop chatter around big data and constant flow of marketing collaterals from product vendors) about big data implications.
I think it is high time we get out of this big data definitions business and get to the implications and implementations perspective. The big problem is not just a data volume and data management problem, it is actually an information optimisation problem.
By Soumendra Mohanty, Global Lead - Information Management, Accenture