Cramming in core values
Does packing more power onto enterprise CPU's help your apps run any faster?
Chips are changing
The general-purpose CPU is undergoing its first revolution since the advent of the silicon chip in the 1970s, by embracing multicores operating in parallel. Many earlier attempts at a revolution, such as SIMD (single instruction multiple processor) failed to make an impact in mainstream computing; but this time there is no turning back, for there is no other way of sustaining the continuing thirst for ever greater performance.
Bluntly, it is no longer feasible to continue increasing clock speeds at a time when escalating energy costs are conspiring with environmental concerns to make the greater power consumption unacceptable. What's more, "it is not possible to clock a single processor at 4 or 5GHz", says Giuseppe Amato, EMEAI marketing director of the value proposition group at AMD, one of the big makers of x86 processors for the Microsoft world. "The industry realised enough was enough."
Since the launch of dual core processors for x86 in 2006, the trend in clock speeds has been gently downwards, for the first time in the history of the silicon chip. As Amato notes, this enables the voltage to be reduced, cutting power leakage, saving energy. But for the moment, one familiar aspect of chip evolution will not change - transistor density will continue to increase, which means that Moore's Law will still be obeyed.
During 2007, for example, major chip makers began to migrate from 60nm to 45nm processes, increasing transistor density by 33 per cent. The continuing transistor density increase is no longer being exploited to build faster single processors, but to install multiple cores on a single chip or die.
At some point an even more radical revolution will be required to prevent Moore's Law being broken, for the more fundamental laws of physics will prevent transistor sizes sinking much closer to the size of a silicon atom, which is about 0.25nm across, only 180 times smaller than the latest process size - but that's another story.
There is another factor in the mix: the role of dedicated silicon such as ASICs, and FPGAs (field programmable gate arrays).
Use of ASICs was driven by the performance advantages and cost savings that could be achieved through using a design dedicated to a particular task, such as video encoding or data encryption. However, with increasing performance and bandwidth demands, such dedicated processors are becoming a liability, consuming too much power and becoming bottlenecks themselves.
"The penalty for having separate ASICs has become too severe," agrees Vivek Sarkar from Rice University in the US, a pioneer of programming for parallel architectures, who was senior manager of programming technologies at IBM before joining Rice University to research languages and compiler designs for parallel computing in 2007.
Graphics a go-go
As the number of cores increases, multicore processors will absorb ASICs and dedicated silicon, says Sarkar. The idea is that current homogeneous chips with just a few identical cores will evolve into heterogeneous designs including a variety of different core types optimised for specific tasks.
These cores are unlikely to be totally dedicated to highly specific tasks - such as a particular encryption algorithm - as this would constrain their utility too much, and have the effect of rendering part of the chip obsolete or redundant. The objective instead is to have cores that are programmable but optimised for certain categories of task, such as video encoding, that will be required in some capacity on a large number of systems and will not become obsolete since the logic or software can be upgraded.
Such cores will be built to execute particular types of problem, such as vector manipulation or matrix multiplication, that are common to a number of tasks, but that are not needed by most general purpose business applications.
Rather than ASICs, or even FPGAs, such cores would resemble the GPUs (graphics processing units) that provide visual processing for PCs and game consoles, with a highly parallel structure making them more efficient for a range of algorithms manipulating vectors or rows of symbols than general purpose CPUs.
GPUs are already expanding in both directions, taking over some functions, for example in financial modelling, from the general CPU, and performing specialist tasks such as encoding, with leading chip vendors such as AMD planning to incorporate them in their multicores. "We are now working with some leading software vendors to take advantage of GPUs," says Amato.
"Soon you will hear about software that will allow the GPU to do high-definition encoding." Amato reckons consumer software will start to exploit GPUs for tasks such as HD encoding, for example in games applications, by the end of 2008.
Chips have just a few cores,up to four at present, but the number will increase rapidly to reach 1,000 by 2015, according to Sarkar. Already IBM has demonstrated a 256 core design. This proliferation of cores will bring extensive challenges both for bandwidth and programming, which only really emerge as the number of cores starts to exceed current levels. On the bandwidth front, the main problem is that the number of connectors within a chip cannot scale with the number of cores within a two dimensional package, so new approaches are required.
Intel has been promoting PCI Express, which it introduced in 2004 to replace the old PCI expansion bus, adding extensions to make it suitable for multicore; but, ultimately, more radical therapy will be required to transport data between thousands of cores within a chip, with options including optical and RF (radio frequency) interconnects.
Stacked and hacked
IBM has been trying to solve the communication problem by going into the third dimension, by stacking cores on top of each other, increasing the number of interconnects within a given package. In such a chip, all the two dimensional interconnections can still be there, plus additional ones between successive stacks. But this introduces a new problem, removing heat from the cores buried within the stack. IBM has tackled this by reintroducing water cooling (first used for mainframes 40 years ago), but whether this will prove economical remains to be proved, although it has the advantage of efficiently collecting the heat, enabling the energy to be extracted from it.
The even greater challenge, though, lies within the whole software development cycle, from compilers up to high level languages and re-engineering of legacy applications, as was observed recently by Bill Gates when he declared of mulitcore computing: "This is the one which will have the biggest impact on us - we have never had a problem to solve like this… A breakthrough is needed in how applications are done on multicore devices." This means that for the first time in Microsoft's history, the software industry is being asked to play an equal part in keeping up with Moore's Law, rather than assuming that hardware will continue to support larger ever more bloated applications.
"The revolution in hardware technology has got to be followed by an evolution in programming languages," observes John Stewardson, Hewlett-Packard's product marketing manager for multi-processor industry standard servers in EMEA.
According to Gates, Microsoft is now aware of its full responsibilities, and is working with the chip makers to bring about this evolution. AMD is one such partner, and is helping ensure that Windows 7, the sequel to Vista, has much better support for multicores, according to Amato. "We are also working with Microsoft on heterogeneous cores, to ensure that the new operating system can take advantage of different types of central units, and be able to interact better with people," says Amato.
As Amato hints here, there are two aspects to software development for multicore architectures. The first involves splitting applications up into smaller components for execution in parallel across multiple identical processors. On this front there is already a reasonable body of expertise accumulated in the scientific programming arena (and high-performance computing platforms, such as IBM's Blue Gene). Although actual programming languages used in these arenas will not transfer to the commercial sphere, many of the development tools will, so the industry does not have to start from a clean slate.
But heterogeneous cores provide an extra level of complexity because they require partitioning of an application or process into functional units suited for different types of core, which would pose a particular pain when it comes to re-engineering existing software. It has proved hard enough for developers in the mobile and ARM embedded controller sectors, since different processors tend to have their own set of development tools, operating systems, and compilers.
Amato urges the x86 software community to rise to the challenge and develop a common operating system that can operate across heterogeneous multicores and dispatch tasks to any of them, rather than requiring programmers to cope with the different environments.
Yet this will still not spare programmers from the pain of migrating their own skill sets to multicore, as Rob Gibson, solutions manager at IBM's Industry Systems Division observes. "Development tools will provide acceptable code, but as with any architecture, programmers must learn new techniques to optimise code for multicore systems."
There is no choice though, for while uniprocessor performance can still improve some more, it will not be at historical rates. "Multicore is a necessity if we are to continue to deliver performance improvements to our customers at the same rate as in the past," Gibson concludes.