Q&A: Dead servers, bad engineering and data centres

The costs of running servers now exceeds the cost of the hardware. Is there anything we can do? The Uptime Institute's Ken Brill thinks there is.


The business value arising from Moore's Law, which says the number of transistors on a chip will double about every two years, is being turned on its head by the cost of providing power, cooling and other facility support for servers.

Those costs now exceed the price of the computing hardware, according to Ken Brill, founder and executive director of The Uptime Institute. In an interview, he talked about those escalating costs and outlined what IT managers can do to improve data centre energy efficiency, including the elimination of dead servers and more efficient cooling.

Q: What's the biggest threat facing data centres? A: The economic breakdown of Moore's Law.

What do you mean by that? A: Historically, facilities costs have been three per cent of IT's total budget, but the economic breakdown of Moore's Law means that facilities costs (including power consumption) are going to be climbing to 5 per cent, 10 per cent and higher.

And that will change the economics of IT. The business question becomes: will IT get more money so the increasing portion of the budget that facilities represents doesn't crowd out other IT initiatives? Or will the increasing facilities percentage result in curtailing other things that IT is doing? That's the economic truncation of Moore's law.

Q: Can you illustrate that? A: A company [in an example featuring an unnamed client] is going to implement a blade server application, for instance, that requires $22 million of hardware. The business justification based on $22 million of hardware is that you expense over three years and there is a positive cash flow.

What's missing from the justification is the $54 million in facility costs [over three years]. It was not a $22 million decision, it was a $76 million decision.

Infrastructure upgrades include data centre build-out, cooling and electric capacity to support the hardware, network. It's an invisible price. Those expenses don't typically show up in IT. They show up indirectly, or they show up after the fact. When the decision was made to implement the blade servers, the facility people were not at the table.

Q: What's the business cost? A: The business cost is that the return on investment that people think they are going to get is not going to be there.

Q: Is there a way to get business and facilities representatives involved in this? A: The application justification process needs to change so it includes all the cost. Typically, you are looking at just the IT cost of the hardware and the cost of running that hardware.

Q: The larger and denser servers aren't going away, so how do companies change the economics of this? A: First, when buying equipment look not only at performance per dollar, but look at performance per watt. Be sharper on buying. IT has to become conscious of this energy efficiency and put pressure on the manufacturers to be more energy aware. That's going to benefit everybody in the long term.

A second thing people can do is to kill dead servers -- servers that are still running but not actively doing anything.

Q: Are dead servers really an issue? A: From 10 per cent to 30 per cent of the load in a data centre is represented by servers that aren't doing anything. By turning off those servers, you can cut your energy consumption. The problem is there is no incentive -- there is risk -- but no incentive to turn those servers off.

Q: The incentive to turn off unused servers off would seem apparent. A: These costs aren't linked. Who has to turn the server off? The data centre manager. He's measured on availability; He's not measured on costs. You discover the 10 per cent to 30 per cent of dead servers whenever you move a data centre because that's the only time you have to turn stuff off.

Other things that users can do: consolidation of multiple servers onto a bigger platform, which will be more energy efficient. [And] IT can enable the power-saving features that are now built into many new servers. For instance, a laptop comes set [to] not take advantage of power saving. If you are not using the laptop, you turn off the screen, then you turn off the disk. The chip manufacturers, AMD and Intel, have these features built into their chip set, but the default is off.

However, this again involves risks because someone has to make an evaluation that the server/chip will come back up to full speed fast enough to meet the service level agreement for the application. This requires the technical group to evaluate this.

Finally, IT managers can reduce bloatware -- software with inefficient code requiring a bigger processor to get through it.

Q: And what about cooling? A: Most data centres are consuming from 20 per cent to 40 per cent more energy then they should because the cooling systems are not well optimised. For instance, here is a common issue in a computing room with multiple cooling units: If you go up to the face plate of the cooling unit, you may see that one unit is dehumidifying and the unit immediately adjacent to it is humidifying, so you have duelling cooling units.

Q: In terms of cooling, what issues do users have with vendors? A: Are there standards problems? How mature is the technology? In 2000, at 500 watts to 1,000 watts per cabinet, you could do anything and successfully cool it. You could be totally incompetent in your engineering and you could successfully cool it. You may not have done it as energy-efficiently but that was never measured, so nobody knew how badly it was done. As the density per cabinet increases, the mask is ripped off and a user's responsibility for doing the engineering in the computer room becomes apparent. For computer rooms with raised floors, the institute has promoted hot aisles and cold aisles for over 10 years. It's accepted as an optimal solution for up to 3kW to 4kW in a cabinet.

But you go into computer room after computer room and you see that the equipment is lined up facing up one direction. As a result people have hot spots. And if you have hot spots, you go out and buy more air conditioning.

"Recommended For You"

Vegas datacentre bets on 100% uptime Is DC power what every data centre needs?