GPUs and the need for speed

It’s probably fair to say that the computer community is obsessed with speed. After all, our people buy computers to solve problems, and generally the faster the computer, the faster the problem gets solved. The earliest benchmark that I...


It’s probably fair to say that the computer community is obsessed with speed. After all, our people buy computers to solve problems, and generally the faster the computer, the faster the problem gets solved.

The earliest benchmark that I have seen is published in “High Speed Computing Devices, Engineering Resource Devices, McGraw Hill, 1950.” They cite the Marchant desktop calculator as achieving a best-in-class result of 1,350 digits per minute for addition, and the threshold problems then were figuring out how to break down Newton Raphsen equation solvers for maximum computational efficiency. And so the race begins…

Not much has changed since 1950. While our appetites are now expressed in GFLOPs per CPU and TFLOPS per system, users continue to push for escalation of performance in numerically intensive problems. Just as we settled down to a relatively predictable performance model with standard CPUs and cores glued into servers and aggregated into distributed computing architectures of various flavors, along came the notion of attached processors. First appearing in the 1960s and 1970s as attached mainframe vector processors and attached floating point array processors for minicomputers, attached processors have always had a devoted and vocal minority support within the industry.

My own brush with them was as a developer using a Floating Point Systems array processor attached to a 32-bit minicomputer to speed up a nuclear reactor core power monitoring application. When all was said and done, the 50X performance advantage of the FPS box had decreased to about 3.5X for the total application. Not bad, but a defeat of expectations. Subsequent brushes with attempts to integrate DSPs with workstations left me a bit jaundiced about the future of attached processors as general purpose accelerators.

Fast forward to today. After a discussion with my colleague James Staten and a conversation with a large client in the petrochemical industry, I decided to do some digging into the use of GPUs as application accelerators. Originally designed to apply large numbers of parallel cores to an assortment of graphics rendering operations, within the last decade developers began to look at applying these devices to accelerating general-purpose computational problems as well. The last three years have seen significant and, even in the context of an industry where superlatives get worn out over time, startling progress. Major developments include:

  • Multiple generations of devices now optimized for use as computational accelerators, providing hundreds of cores with full floating point computation capability.
  • Architectures that remove many of the data transfer bottlenecks that reduced the effectiveness of earlier designs.
  • Most importantly, programming languages and development tools that bring effective use of GPUs within reach of moderately skilled programmers who understand the algorithms they wish to solve, as opposed to requiring very skilled specialists in parallel computational algorithms.
  • All of this reinforced by a growing body of user references across multiple industries and applications.

These changes have resulted in increased support from both the ISV as well as the systems hardware communities, with selected ISVs now supporting GPU acceleration and mainstream server vendors IBM and Dell (can HP be far behind?) as well as a roster of smaller specialized vendors such as Bull, Cray, Tyan, Apro and SuperMicro offering servers with pre-integrated GPUs.

The current market leader is NVIDIA, with its Tesla line of accelerators. AMD, which purchased Nvidia competitor ATI, has announced its Fusion integrated CPU/GPU architecture, which it calls an Accelerated Processing Unit (APU).

User testimonials abound, with quotes of 50 - 150x performance improvements in important applications. Note that these results aren’t instant or free. Expect to invest 10 - 20X the time needed to develop an algorithm in a simple HLL for a GPU-based solution, and that you will need to incrementally refine it, since simulation and analysts tools for these applications and systems are in their infancy.

So, while more and faster cores will continue to be an option, and one that Intel appears to be pursuing aggressively, GPUs present another option in the eternal quest for speed that has consumed our collective consciousness since some nameless craftsman sat down and polished the rods on his abacus to make the beads slide a bit faster.

So please tell us. Are you planning to use or are you interested in using GPU technology? What kind of applications? What kind of platform?

Blog post by Richard Fichera