Windows 7 quad core speed tests

Do multi core chips really make a difference?


Microsoft's Windows 7 operating system is receiving raves in its pre-release testing. While much of the kernel that lies at the heart of the operating system is based on Vista code, several key advances have been made that get rid of Vista annoyances and greatly improve the user experience. Inside the kernel, one important change centres on how multithreaded applications are run. The threading advances provide benefits in energy reduction, scalability, and, in theory, performance.

To check out the benefits on the desktop, I ran tests that reflect the most common use case for heavily threaded desktop apps, namely, graphics-oriented software. Programs such as Adobe Photoshop and other graphical applications query a system's capabilities at startup and self-configure workloads accordingly. They typically use all the processor cores and as much RAM as they can get away with monopolising. This approach enables them to provide the fastest performance. So I checked how such programs perform using the Viewperf benchmark (an omnibus graphics benchmark from SPEC, the Standard Performance Evaluation Corporation) and Cinebench, which is a pure rendering benchmark from Maxon Computer. Both benchmarks follow the tradition of using benchmarks that you can download and run on your own systems to see how your mileage varies. Both benchmarks can be obtained at no cost.

I ran the benchmarks on a Dell Precision T3500 workstation. The T3500 is an entry-level workstation that represents the kind of system that high-end graphics users who work on large images or complex projects are likely to employ. It sports a quad-core Xeon W3540 (Nehalem) processor running at 2.93GHz, an Nvidia FX Quadro 4800 graphics card, and 4GB of RAM. I expect that 12 to 18 months from now, its capabilities will represent the high end of the desktop (that is, subworkstation) market.

For this review, we used three identical hard drives, each preloaded by Dell with the latest versions of Windows XP Professional, Vista Ultimate, and Windows 7 Ultimate, all 32-bit, with the latest drivers the company makes available. We then ran the benchmarks on each OS, swapping in a new disk when we were done with the previous operating system. This approach allowed us to see what benefits each version of Windows provided when run on identical hardware. The results for performance appear in the table below.

Performance benchmark results for three versions of Windows

Benchmarks (bigger is better)
Windows XP SP3
Windows Vista SP2
Windows 7 Ultimate
SPEC Viewperf 10 (SMT off)
SPEC Viewperf 10 (SMT on)
Cinebench 10 (SMT off)
Cinebench 10 (SMT on)

These results suggest that when considering Windows 7, performance should be viewed as a reasonable justification for upgrading from Windows XP, but not a driver for migration from Vista. The flat performance results against Vista are reasonable given that, as we noted earlier, Windows 7 is based on the Vista kernel.

What might be surprising is that Windows 7's multithreading changes did not deliver more of a performance punch. The explanation for this lies in what changed in how Windows 7 manages threads. The principal changes consist of increased processor affinity and changes to the Windows kernel dispatcher lock. This eye-glazing term refers to a core aspect of modern operating systems: how the kernel prevents two threads from accessing the same data or resource at the same time.

Anytime a thread wants to access an item that might be claimed by another thread, it must use a lock to make sure that only one thread at a time can modify the item. Prior to Windows 7, when a thread needed to get or access a lock, its request had to go through a global locking mechanism. This mechanism, the kernel dispatcher lock, would handle the requests. Because it was unique and global, it handled potentially thousands of requests from all processors on which Windows ran. As a result, this dispatcher lock was becoming a major bottleneck. In fact, it was a principal gating factor that kept Windows Server from running on more than 64 processors.

New locking mechanism

Windows 7 includes a wholly new mechanism that gets rid of the global locking concept and pushes the management of lock access down to the locked resources. This permits Windows 7 to scale up to 256 processors without performance penalty. On systems with only a few processors, however, the old kernel dispatcher lock was not overburdened, so this new mechanism provides no noticeable improvement in threading performance on desktops and small servers.

The new improved processor affinity does not show up in the performance results. On runs with SMT disabled, this was expected because the benchmarks use all resources available; no Turbo Mode boost is possible. When we ran the four-thread Viewperf benchmark with SMT enabled (giving the benchmark eight processing pipelines), the results were essentially unchanged. That is, the differences were immaterial, which suggests that Turbo Mode works best in narrowly constrained settings, rather than the typical threaded applications we tested. Despite several requests, Microsoft would not comment on these results.

The Cinebench benchmark is a ratio that measures how much faster the multiple threads are than running the benchmark with one thread. It's a true measure of how the threading scales when measured by rendering performance. Cinebench showed negligible differences in performance across the three operating systems, both with SMT disabled and with SMT enabled. However, unlike with Viewperf, the results for all three Windows were distinctly better with SMT enabled, i.e. Cinebench rendering ran nearly 20 percent faster on eight threads (SMT on) than four (SMT off), regardless of the version of Windows. This divergence between the two benchmarks regarding SMT's benefit underscores the need for testing its effect on your existing applications before deciding whether to enable it.

Energy consumption

Windows 7 performs several tricks to keep threads running on the same execution pipelines so that the underlying Nehalem processor can turn off transistors on lesser-used or inactive pipelines. The primary benefit of this feature is reduced energy consumption. To quantify this benefit, I ran the four thread version of Viewperf with SMT enabled. This configuration meant that roughly half the pipelines would see little or no activity. I expected, therefore, to see some power savings for Windows 7. My results appear below.

Watts consumed at three points in Viewperf benchmark

Viewperf Energy Consumption
Windows XP SP3
Vista Ultimate SP2
Windows 7 Ultimate
Watts (average of three test points)
247 W
248 W
207 W

The Windows 7 advantage is indeed significant. Note that this 17 percent decrease in power consumption is for the exact same software running unchanged on the same machine. Only the versions of Windows are different. That's a substantial saving, and there is every reason to believe that other software will similarly benefit from Windows 7's ability to leverage Intel's processor magic.

Wrapping it all up

Tight integration between Intel processors and Microsoft operating systems has been a constant thread in the history and evolution of the PC. This linkage has been dubbed by some a virtuous circle, although not every iteration of the cycle has produced substantial end user benefits. This time, though, the cycle indeed delivers key advantages: Nehalems are much more powerful than predecessors, and they provide, as we have seen, considerable energy savings when teamed up with an OS that leverages them effectively. Among Microsoft offerings, Windows 7 is the software that does this best.