IT professionals in the UK will get the chance to hear the details of eBay's remarkable grid computing infrastructure at the Open Grid Forum in May. If you cannot make the event, read how the online auction site stitched together 12,000 servers

Do you use eBay? Most of us have at least bought something off it at some point, and many have tried selling – the scale of its operations is phenomenal.

With 222m registered users at the last count, 610m listings each quarter, and 6m new items added every day, it takes a lot of computing power to keep eBay operating as fast and reliably as users have come to expect.

So what’s keeping all this running?

There are over 12,000 servers working hard, using software that eBay has had to create for itself because no vendor is offering anything that can cope.

When the company was created, it tried using off-the-shelf software but thisquickly "broke", unable to cope with the volume, says eBay distinguished research scientist Paul Strong. It is now keen to encourage vendors to improve their offerings, to help other companies getting into grid – and take some development work off its own shoulders.

“Building management tools isn’t eBay’s core business. Ultimately we want things to be developed by vendors, so that we can get on with building an e-commerce platform,” he says.

Three levels of infrastructure

There are three main aspects to eBay’s grid infrastructure.

The auction platform, “a very large piece of Java code” runs on around 7,000 servers in the Western USA. “We have about 12,000 to 15,000 instances of the site, running on about half that number of servers, because we direct the traffic from different parts of the world to the same machines,” Strong says.

Those servers take care of all of the code for the transactions that people associate with eBay. “We consider that to be a grid, a highly distributed transactional system,” he says.

“And we also have the search infrastructure, that allows people to actually find stuff, and that’s a massively parallelised scatter/gather application, running on another 2000+ servers. That allows us to meet the latency requirements. Really, we’re using network distributed computing for all the same reasons everyone else is – we couldn’t get the scale any other way, or meet our latency requirements, either.

People expect an instant response from a search when they want to do something. And it brings resilience, too – if you lose one server, you don’t lose the whole service,” he says.

As well as the auction site and the search facility, there is continual development work to improve the eBay customer experience. “The environment we use to build is what you would think of as a traditional kind of grid – users are able to submit jobs for building the site. By which I mean compiling Java, compiling C++, optimising and compressing things like HTML, Javascript and XML.

“In the past we used to do that in individual machines and it would take hours. We now do it on a grid of some 300-400 machines and it takes in the region of 30 minutes to build the whole of eBay.

“We’ve done 2.5 million builds since the grid was introduced about two years ago. We roll new code to all of the auction platforms every two weeks, and add 300 new features to the site every quarter,” says Strong.

What is “grid”

Strong is keen to stress that “grid” is a broad term that covers many types of computer system, and that it’s something all companies should be paying attention to.

eBay’s view of grid is that it’s a natural extension of the trend towards distributed computing in the data centre, he says.

“I would argue that almost all enterprises, if they run multi-tiered applications, are already running a prototypical grid, a primordial grid. They’re already running applications that are network distributed, and leveraging the network to scale and achieve resilience and all of those good things. They just typically haven’t recognised that they should be treating the whole of their infrastructure holistically, as opposed to in discrete silos. It’s almost a philosophical thing, about viewing your architecture as a whole.

“The reason we want people to really recognise this is that once you do, you recognise sets of technologies and the problems that come along with it.”

eBay has taken something of a leadership role in grid, almost by default.

“We believe we are, in many ways, on the crest of a wave and everyone else is going to have the same problems we’re having, even if they don’t have them today. Managing these very large systems built, ultimately, on commodity building blocks, means we’re at the extreme edge of what everyone else is beginning to do and where everyone is heading in their infrastructure.

“The main problems we have are around manageability – we have tools that allow us to manage the site and to be very effective but the majority (of them) had to be hand made. What we want is for these tools to be made by vendors, to let us get on with our own business.”

“Almost all enterprises, if they run multi-tiered applications, are already running a prototypical grid, a primordial grid. They’re already running applications that are network distributed, and leveraging the network to scale and achieve resilience.

Paul Strong, eBay distinguished research scientist

Strong has been heavily involved in the Open Grid Forum (OGF) in the hope of pushing forward this sort of development work forward.

“Developing our own stuff is an unattractive prospect in the long term. We’d like to replace our expensive homemade stuff with commercial off the shelf solutions – solutions that are interoperable, so you have a choice of vendor. So how do you get proprietary, interoperable stuff out there? You deliver that by driving standards,” he says.

It really is time for all businesses to become aware of grid and to look at their infrastructure in a new light, Strong says.

“We view grid as being more or less the whole of distributed computing in the data centre. Grid is the context for everything. It’s really about treating your data centre as a system. A grid centric view does bring a different perspective to things, a holistic view of your environment, where you’re not managing applications and services in silos, but thinking of how they interact together as part of this big system.”

Get up to speed with grid computing

Grid Computing Now is the DTI funded group promoting grid computing in the public and private sectors.

Open Grid Forum 20 in Manchester from 7 to 11 May will be the world's largest gathering of grid computing users and experts. The event will focus on developing grid standards, showcasing real-world applications, discussing large-scale grid infrastructure techniques and applications.