Hadoop company concentrates on data integration

A data integration app will be formally released this quarter as part of the overarching Cloudera Dta Platform.


Cloudera is tweaking its business model. The company started life as Red Hat for Hadoop, a provider of paid support for the open source data management platform.

Last autumn, the  startup released its first product - Cloudera Desktop, a management console.

Since then, it has also quietly released a proprietary data integration app. It "doesn't replace an Informatica or Ab Initio," says Cloudera CEO Mike Olson, but it does provide extract and transform features.

The data integration app will be formally released this quarter as part of the overarching Cloudera Dta Platform. No price has been determined yet, said Olson.

It's only one of the capabilities that Cloudera is feverishly working on, analytics and BI dashboards are another, to make its version of Hadoop as easy to use for mainstream corporate workers as SQL-based Business Intelligence tools.

"MicroStrategy, Business Objects, Oracle, IBM DB2 Parallel Edition, these products are all powerful and wonderfully easy to use for the business analyst," Olson said. By contrast, Hadoop remains something that tends to intimidate all but "hardcore Java hackers."

"Hadoop needs to be made easier. It's powerful, but requires a fair bit of programming," he said.

Cloudera counts 30 customers today, most of them in government, financial services and retail, said Olson. They include LinkedIn, eHarmony, JP Morgan Chase, and many of the other companies that presented at the inaugural HadoopWorld conference last fall.

"Our goal in 2010 is to demonstrate to enterprises who haven't seen Hadoop before how you can get more value out of data already collected in your relational databases, which you would leave in place, by combining it with new data types," he said.

While Olson grants that SQL is an easier and more powerful environment for many users today, he says Hadoop will soon catch up because they "are innovating much faster."

"Why don't we see how long it takes for Oracle to make another major release?" he said.

Hadoop is better at crunching disparate data types than relational-based data marts or data warehouses, which force you to create a schema for the data upfront.

So also, argues Olson, is Hadoop's scalability, saying there are a number of Hadoop clusters storing data "well-known to be multiple petabytes in size." He declined to name who those companies are and whether they are Cloudera customers. Despite the potential of the Hadoop technology to serve as a scalable, universal data store, Olson sees it complementing, not competing with, relational databases.

"It kinda sucked to compete with Larry Ellison," said Olson, referring to his former firm, SleepyCat Software, embedded database maker BerkeleyDB, which was acquired by Oracle in 2006. "I finally managed to sell the guy a company. So I don't want to [compete with] him again."

Cloudera also works closely with Vertica Systems to enable users to connect data stored in Vertica's SQL-based data warehouse with Cloudera, and vice-versa. Olson differentiated Cloudera's offering from relational data warehouse vendors such as Greenplum and Aster Data Systems who have introduced MapReduce/Hadoop features.

"What Aster Data and Greenplum have is not MapReduce in my view... it's tied only to relational data, not general data," he said. "The reason you would choose Greenplum [MapReduce] is because you'd already be a Greenplum customer, not because you wanted MapReduce."

"Recommended For You"

Cloudera releases version 3 of Hadoop distribution HP plugs the Vertica analytics engine into Hadoop