Big trends in big data analytics

Bill Loconzolo, vice president of data engineering at Intuit, jumped into a data lake with both feet. Dean Abbott, chief data scientist at Smarter Remarketer, made a beeline for the cloud. The leading edge of big data and analytics, which includes data lakes for holding vast stores of data in its native format and, of course, cloud computing, is a moving target, both say. And while the technology options are far from mature, waiting simply isn't an option.

Share

Bill Loconzolo, vice president of data engineering at Intuit, jumped into a data lake with both feet. Dean Abbott, chief data scientist at Smarter Remarketer, made a beeline for the cloud. The leading edge of big data and analytics, which includes data lakes for holding vast stores of data in its native format and, of course, cloud computing, is a moving target, both say. And while the technology options are far from mature, waiting simply isn't an option.

"The reality is that the tools are still emerging, and the promise of the [Hadoop] platform is not at the level it needs to be for business to rely on it," says Loconzolo. But the disciplines of big data and analytics are evolving so quickly that businesses need to wade in or risk being left behind. "In the past, emerging technologies might have taken years to mature," he says. "Now people iterate and drive solutions in a matter of months -- or weeks." So what are the top emerging technologies and trends that should be on your watch list -- or in your test lab? Computerworld asked IT leaders, consultants and industry analysts to weigh in. Here's their list.

Big data analytics in the cloud

Hadoop, a framework and set of tools for processing very large data sets, was originally designed to work on clusters of physical machines. That has changed. "Now an increasing number of technologies are available for processing data in the cloud," says Brian Hopkins, an analyst at Forrester Research. Examples include Amazon's Redshift hosted BI data warehouse, Google's BigQuery data analytics service, IBM's Bluemix cloud platform and Amazon's Kinesis data processing service. "The future state of big data will be a hybrid of on-premises and cloud," he says.

Smarter Remarketer, a provider of SaaS-based retail analytics, segmentation and marketing services, recently moved from an in-house Hadoop and MongoDB database infrastructure to the Amazon Redshift, a cloud-based data warehouse. The Indianapolis-based company collects online and brick-and-mortar retail sales and customer demographic data, as well as real-time behavioral data and then analyzes that information to help retailers create targeted messaging to elicit a desired response on the part of shoppers, in some cases in real time.

Redshift was more cost-effective for Smart Remarketer's data needs, Abbott says, especially since it has extensive reporting capabilities for structured data. And as a hosted offering, it's both scalable and relatively easy to use. "It's cheaper to expand on virtual machines than buy physical machines to manage ourselves," he says.

For its part, Mountain View, Calif.-based Intuit has moved cautiously toward cloud analytics because it needs a secure, stable and auditable environment. For now, the financial software company is keeping everything within its private Intuit Analytics Cloud. "We're partnering with Amazon and Cloudera on how to have a public-private, highly available and secure analytic cloud that can span both worlds, but no one has solved this yet," says Loconzolo. However, a move to the cloud is inevitable for a company like Intuit that sells products that run in the cloud. "It will get to a point where it will be cost-prohibitive to move all of that data to a private cloud," he says.

Hadoop: The new enterprise data operating system

Distributed analytic frameworks, such as MapReduce, are evolving into distributed resource managers that are gradually turning Hadoop into a general-purpose data operating system, says Hopkins. With these systems, he says, "you can perform many different data manipulations and analytics operations by plugging them into Hadoop as the distributed file storage system."

What does this mean for the enterprise? As SQL, MapReduce, in-memory, stream processing, graph analytics and other types of workloads are able to run on Hadoop with adequate performance, more businesses will use Hadoop as an enterprise data hub. "The ability to run many different kinds of [queries and data operations] against data in Hadoop will make it a low-cost, general-purpose place to put data that you want to be able to analyze," Hopkins says.

Intuit is already building on its Hadoop foundation. "Our strategy is to leverage the Hadoop Distributed File System, which works closely with MapReduce and Hadoop, as a long-term strategy to enable all types of interactions with people and products," says Loconzolo.

Next section: Big data lakes

Find your next job with computerworld UK jobs