It’s time to revisit that original post from July 4th, 2011 post on the the Three V’s of big data. Here’s the recap:
Traditionally, big data describes data that’s too large for existing systems to process. Over the past three years, experts and gurus in the space have added additional characteristics to define big data. As big data enters the mainstream language, it’s time to revisit the definition.
- Volume: This original characteristic describes the relative size of data to the processing capability. Today a large number may be 10 terabytes. In 12 months, 50 terabytes may constitute big data, if we follow Moore’s Law. Overcoming the volume issue requires technologies that store vast amounts of data in a scalable fashion and provide distributed approaches to querying or finding that data. Two options exist today: Apache Hadoop based solutions and massively parallel processing databases such as CalPont, EMC GreenPlum, EXASOL, HP Vertica, IBM Netezza, Kognitio, ParAccel and Teradata Kickfire
- Velocity: Velocity describes the frequency at which data is generated, captured and shared. The growth in sensor data from devices, and web based click stream analysis now create requirements for greater real-time use cases. The velocity of large data streams power the ability to parse text, detect sentiment and identify new patterns. Real-time offers in a world of engagement, require fast matching and immediate feedback loops so promotions align with geo-location data, customer purchase history and current sentiment. Key technologies that address velocity include streaming processing and complex event processing. NoSQL databases are used when relational approaches no longer make sense. In addition, the use of in-memory databases (IMDB), columnar databases and key-value stores help improve retrieval of pre-calculated data.
- Variety: A proliferation of data types from social, machine to machine and mobile sources add new data types to traditional transactional data. Data no longer fits into neat, easy to consume structures. New types include content, geo-spatial, hardware data points, location based, log data, machine data, metrics, mobile, physical data points, process, RFID’s, search, sentiment, streaming data, social, text and web. The addition of unstructured data such as speech, text and language increasingly complicate the ability to categorise data. Some technologies that deal with unstructured data include data mining, text analytics and noisy text analytics.
Contextual scenarios require two more Vs
In an age where we shift from transactions to engagement and then to experience, the forces of social, mobile, cloud and unified communications add two more big data characteristics that should be considered when seeking insights. These characteristics highlight the importance and complexity required to solve context in big data.
- Viscosity: Viscosity measures the resistance to flow in the volume of data. This resistance can come from different data sources, friction from integration flow rates, and processing required to turn the data into insight. Technologies to deal with viscosity include improved streaming, agile integration bus’ and complex event processing.
- Virality: Virality describes how quickly information gets dispersed across people to people (P2P) networks. Virality measures how quickly data is spread and shared to each unique node. Time is a determinant factor along with rate of spread.
The bottom line: Big data provides the key element in moving from real time to right time
Context represents the next frontier as we move to intelligent systems. Big data systems and techniques will provide the key infrastructure in delivering context within business processes, across relationships, by geo-spatial position and within a time spectrum.
As engagement systems make the shift to experiential systems, expect context to provide the key filter in improving signal to noise ratios. Big data provides the context required to move from real time to right time.
By R "Ray" Wang
Reprints can be purchased through Constellation Research, Inc. To request official reprints in PDF format, please contact Sales.
Although we work closely with many mega software vendors, we want you to trust us. For the full disclosure policy, stay tuned for the full client list on the Constellation Research website.
* Not responsible for any factual errors or omissions. However, happy to correct any errors upon email receipt.Copyright © 2001 -2012 R Wang and Insider Associates, LLC All rights reserved.