Understanding the true performance of supercomputers involves getting a grip on the set-up time required to run programmes, not just the raw power of the processors.

For years, the name of the game in supercomputing has been raw speed, with hardware and software designers striving to boost the number of instructions per second -- FLOPS -- that could be crunched. Gigaflops computers gave way to teraflops machines, which are now yielding to petaflops models -- those able to execute 1 quadrillion computations per second.

But those performance ratings are misleading, because they ignore a huge portion of the time required to solve a problem with these multiprocessor computers -- the hours, weeks or even years it can take for software designers to formulate a solution and for programmers to code and test it.

That's why the US Defense Advanced Research Projects Agency in 2002 changed the name of its High Performance Computing Systems program to High Productivity Computing Systems (HPCS). DARPA hoped that its contractors -- Cray, IBM and Sun Microsystems -- could come up with programming languages and tools to improve software development productivity tenfold.

Sun recently lost its bid to go to the next phase of the DARPA job, but that hasn't stopped it from forging ahead with its HPCS programming language, called Fortress. In January, Sun released an early version of a Fortress interpreter. Similarly, Cray and IBM have released their own first-draft implementations of new languages.

The three languages, all available as open-source software, differ substantially when it comes to details, but they have this much in common:

  • They are aimed at a wide range of multiprocessor computers and clusters, from the "petascale" behemoths at national laboratories to the multicore processors now appearing on desktops. Similarly, they are intended for use in at least some mainstream, business-oriented applications, not just in science and engineering.
  • They try to make it easier for programmers to exploit the various levels of parallelism in application software threads, multicores, multiprocessors and distributed clusters.
  • They employ techniques to relieve programmers of work and help them avoid opportunities for coding errors. For example, all use a technique called "type-inference," so programmers don't have to specify the type of every variable, which is tedious and error-prone. And they use techniques for synchronising operations without locking, so that common problems such as deadlocking are avoided.

John Mellor-Crummey, a computer science professor at Rice University, salutes the productivity goal of the three languages, noting, "Programming of parallel systems is much too hard today."

But he says it won't be easy to evolve the nascent languages -- which now run on single, shared-memory systems -- to run efficiently on big, distributed-memory parallel systems. "Until then, these languages won't see much attention," Mellor-Crummey says.

Eric Allen, a co-leader of the Fortress project at Sun Labs, says the language is ideally suited for relatively static environments. But applications that do a lot of dynamic code-loading or Web accessing would probably still be coded in Java, he adds. He says a full-function Fortress compiler will be developed and will include optimisation features that have never existed in a language before.

Like Fortress, Cray's Chapel is a brand-new language. A few alpha users are working with an early Chapel compiler for serial code, but a production-grade compiler for parallel codes is several years away, according to Chief Technology Officer Steve Scott. He says Cray is also developing debugging and performance-analysis tools that, unlike existing tools, will be able to scale up to systems with 1 million processors.

Scott says Chapel will be well suited for machines with low communications overhead, globally addressable memory and many possible parallel threads of execution. He says the most important advance in Chapel is its separation of algorithm specification from machine-dependent structural considerations. That makes it possible for programmers first to code and debug algorithms in relatively simple programs, then later specify how the data is to be laid out in the machine for the most efficient access.

IBM's entry, code-named X10, is a parallel, distributed, object-oriented language developed as an extension of Java. It is designed for systems built out of multicore symmetric multiprocessing chips -- such as IBM's Power processors -- interconnected in scalable cluster configurations.

X10 takes the advantages of object orientation in Java for serial code and adds language constructs for parallel and distributed processing, says Vijay Saraswat, a researcher at IBM. The early version of X10 simply translates X10 code into Java, but a full-function optimising compiler will be available to meet DARPA's 2010 deadline, he says.

DARPA says it has "no plan" to pick a winner among the new languages, but it clearly hopes that at least one of them will be a commercial success. And, as multi-core processor chips become ubiquitous, that would be a welcome outcome, says Mellor-Crummey.

"What we are seeing," he says, "is not a gradual shift but a cataclysmic shift from the sequential world to one in which every processor is parallel. In a small number of years, if your language does not support parallelism, that language will just whither and die."

Never enough: super computer users want more

Richard Barrett, a computer scientist at Oak Ridge National Laboratory in Tennessee, is trying out Chapel, Fortress and X10. He says the promised development productivity will be welcome, but "runtime performance is a concern." He notes that execution efficiency is the ultimate goal in his lab, where the motto is, "Bigger, better, faster, more -- and even that is not enough."

"The applications I'm familiar with will be used for several years, or even decades," Barrett explains. "A scientist who has an idea will... run a set of experiments, each of which may consume days, weeks or months of computer time when using the most powerful machines in the world. The few machines with these capabilities are quite popular, so an experiment may sit in the queue for hours, days or even weeks. So, easy-to-write code that runs significantly slower than harder-to-write code is not acceptable."