Eucalyptus: the unsung hero of Open Source?


Eucalyptus is an open-source infrastructure for the implementation of cloud computing on computer clusters. Its name is an acronym for "Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems".

The current interface is compatible with Amazon's EC2 cloud computing interface. Tom Callway speaks to Rich Wolski, the project's director, about how Eucalyptus can be leveraged by enterprises and where it sits along side proprietary alternatives like Windows Azure.

1) Can you tell us what led you and your team to develop Eucalyptus?

Yes, certainly. Eucalyptus was developed to support research we have been doing with the VGrADS project. The goal of VGrADS has been to develop a programming and execution environment that makes large-scale grid programs easier and more efficient to write.

As part of this effort, we have been working with the LEAD project and its large-scale weather forecasting applications. We had been having good success in developing software middleware that enabled LEAD workflows to self-schedule across heavily used batch systems on the NSF TeraGrid. We decided that we also wanted to include cloud resources in the resource mix available to LEAD. To do that we needed a development and execution environment for ourselves and the various partner institutions on the project. The idea we had was to run LEAD on the VGrADS execution system (called vgES), and then to alter vgES so that it could include resources from multiple clouds.

The only cloud we could use from the commercial sector, at the time we were architecting the system, was Amazon's AWS. Thus we decided to build an AWS work-alike for the Universities participating in VGrADS, do the vgES port to AWS, and then to run the application on the TeraGrid, Amazon AWS, and the various University machines while they were emulating the AWS cloud. Eucalyptus was the infrastructure we built to support the effort at the University sites.

Because we couldn't change AWS and we didn't have the resources to specialise vgES for more than one cloud API, the quality of the AWS mimicry had to be pretty high. Our goal was to have vgES run in exactly the same manner on AWS and the other Eucalyptus clouds without modification or special purpose code for either.

BTW, we demonstrated the effort at SC08 in November on the TeraGrid, Amazon AWS, and 4 partner Universities/research sites (UCSB, Rice University, RENCI, and University of Tennessee). It was really a great moment to see it all work.

2) Does Eucalyptus sit within or outside UCSB Computing Department's academic curriculum?

Both. UCSB is a research University which means we have a mandate (as UCSB professors) to blend the fruits of our research efforts with our curricular activities. Thus when I teach a graduate operating systems class or a class on scalable systems, Eucalyptus plays a central role. In addition, I have plans for using it as the basis for a graduate class on cloud computing. Development of a new class is quite a time consuming endeavor, however, so it might be a little while before I feel comfortable testing it out on the students.

However we also have a pedagogical obligation to ensure that what we teach is "worth" the students' time. Many excellent research ideas forward science, but are quickly superseded by better ideas that build upon them, or are in an area that will take several years to mature. The students' time is better spent on concepts and systems that have proven longevity and impact. Several aspects of the research we are doing with Eucalyptus have this "not ready for teaching" quality and hence remain outside the curriculum but inside the research lab (for now).