Open Source Cloud Computing Made Easy


Creating a business around free software is hardly a new idea: Cygnus Solutions, based around Stallman's GCC, was set up in 1989. But here's one with a trendy twist: a company based on the open source *cloud computing* app Hadoop, an Apache Project:

Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

Here's what makes Hadoop especially useful:

Scalable: Hadoop can reliably store and process petabytes.

Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.

Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.

Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.

As you can see from this, Hadoop's original emphasis was on clusters, not clouds, but shrewdly the new company Cloudera has decided to plug into the currently hot status of the latter, not least with its name (well, it certainly beats “Clustera”....) Here's how it explains the thinking behind its creation:

One of the repeating themes we have heard while working with our customers and the community is that Hadoop configuration and deployment is a pain. Often times, Hadoop is the first truly distributed system that administrators encounter, and the problem is made worse by the lack of standardized packages and deployment tools. And some releases are buggy. And upgrades are hard. And the list goes on.