Greenplum and Aster Data Systems, two startups involved in large-scale data analysis, announced this week that their products will support MapReduce, a programming technique originally developed by Google for parallel processing of large data sets across commodity hardware.
Software developers tend to be more comfortable with languages such as Java and C++ than the database language SQL, said Mayank Bawa, cofounder and CEO of Aster, maker of a cluster database system that splits workloads into multiple discrete tiers.
"Most developers struggle with the nuances of making a database dance well to their directions," he wrote in a blog post. "Indeed, a SQL maestro is required to perform interesting queries for data transformations (during ETL processing or Extract-Load-Transform processing) or data mining (during analytics)."
Enter MapReduce, the goal of which was to provide a "trivially parallelisable framework so that even novice developers (aka interns) could write programs in a variety of languages (Java/C/C++/Perl/Python) to analyse data independent of scale," Bawa wrote.
Meanwhile, Greenplum, maker of a database it says can scale to a petabyte of information, said this week that a MapReduce framework will be part of its dataflow engine as of September.
The twin announcements brought a nod of approval from one close observer of the database world.
"On its own, MapReduce can do a lot of important work in data manipulation and analysis. Integrating it with SQL should just increase its applicability and power," wrote Curt Monash of Monash Research, on the DBMS2 blog .
"MapReduce isn't needed for tabular data management. That's been efficiently parallelized in other ways," he added. "But if you want to build non-tabular structures such as text indexes or graphs, MapReduce turns out to be a big help."