vol 7, issue 1

Processors go slow to save energy

23 January 2012
By Chris Edwards
Share |
A sloth on a chip

Chip designers now need to waste more transistors on processors that are busy doing nothing

Warren East

Warren East, ARM: "Does the software care about which processor it runs on? Mostly, no."

Happy looking little sloth

Sloth style: smart processors appear workshy until they are called upon to get busy

nVidia Tegra 3 processor

The nVidia Tegra 3 processor shows the low-energy ARM processor sitting apart in its own separately-powered section

Justin Rattner with the 'near-threshold' processor

Intel CTO Justin Rattner presents the ‘near-threshold’ processor

IBM Cell parallel processing chip

The IBM Cell architecture was one of the first consumer-grade processors to exploit parallel processing

Processors are working too hard and draining the battery too fast, so could they learn some lessons in energy conservation from the sagacious sloth?

Ask about the best way for power-hungry microprocessors to save energy, and one probably wouldn't expect to hear the answer: "Use more energy". That, however, is happening to the mobile phones, tablets and computers now being designed: every spare square millimetre of space is being filled up with computer processors.

There is a caveat, though: they will never all be employed at the same time. For the IT industry has a big problem. Software is consuming available compute power almost as fast as it can be developed, all in the name of giving us machines that can give us answers when we need them and, increasingly, before we actually pose the question.

Given that chipmakers can double the number of transistors on a device every couple of years, on the face of it this does not seem a big problem; but the batteries inside mobile devices cannot keep up. Avner Goren, director of strategic marketing at Texas Instruments, says companies want to make the most of the 'always on' nature of the phone to provide more services on the move "but we are still constrained by the limitations of the battery: if you look at cellphones and tablets, the battery life is really disappointing".

Power has become such a critical issue that Intel has changed the shape of the transistors it intends to use on its forthcoming 22nm process in an attempt to deal with the problem. Although the FinFET was conceived at the University of California at Berkeley more than a decade ago, only recently have the reasons emerged to actually put the more complex structure into production.

Serge Biesemans, vice president of technology at Belgian research institute IMEC, says: "Finally, the FinFET is here – and it's likely to be the mainstream device adopted by the major manufacturers. The one key reason to use it is for leakage."

Strained silicon

All transistors leak current when they are supposed to be turned off, but ten years ago you could more or less neglect it as a source of power drain unless you were designing specialist low-energy products such as utility meters, where even tiny amounts of leakage are critical because these systems are expected to run for ten years on the same battery.

The CMOS logic used in practically all microprocessors relies on the ability to charge and discharge the capacitance in the circuitry to pass logic signals from one gate to the next. Doing this quickly demands that the device has a high drive-current to deliver the required electrical charge. One way to achieve this is through 'strained silicon'.

Says IMEC's Biesemans: "Before 2000, strain was a negative thing: we had to reduce strain." The technique was not needed for speed as simply making transistors shorter improved their speed. Strain simply reduced reliability – but once the transistor channel was shorter than 100nm, the speed increases from further reductions in length fell away. "Since 2002 the call for more strain came in," Biesmans adds.

Strain of itself is not enough to hit the cycle times needed in higher-speed processors. The other knob that process engineers can tweak is to reduce the voltage at which the device switches: achieved by boosting the number of dopant atoms in the strained channel. There are side-effects, however. As the size of the transistor and threshold voltage reduces, the proportion of energy lost due to current leaking into the silicon substrate shoots up even when the device is meant to be turned off. The team behind the International Technology Roadmap for Semiconductors, which maps out the progress of silicon into the future, estimates that leakage has increased 10,000-fold since 1990. Reducing the threshold voltage to improve speed worsens the situation exponentially.

The FinFET and competing technologies such as silicon-on-insulator structures help, but they cannot reduce the leakage to zero: even Intel's move is just a sticking plaster to partially stem the flow. The good news is the power needed to switch between logic levels does keep falling. Engineers achieved this through a combination of reductions in the supply voltage, and through the lower capacitance of the smaller circuits.

A further factor is the huge number of additional transistors added with each new process. That increases the degrees of freedom that chip-design engineers have. Tom Cronk, executive vice president of ARM's processor division, explains: "Moving down in process geometry and new processor architectures help a bit but the consumer is demanding more than the combination of those things."

Sloth strategy

The answer the industry has found is the so-called 'Sloth strategy': develop a new generation of slow-moving processors that spend most of their time idle. It is the same strategy that the makers of hearing aids and energy meters adopted years ago.

If you want to cut leakage, there is only one reliable way to do it: cut the power to the circuit. If you look at the display on a smart meter, it will seem to be running continually; but, for 99 per cent of its life, the processor inside is 'asleep'. It wakes up for a matter of milliseconds, perhaps every few tenths of a second, takes a measurement and then promptly goes back to sleep, telling the power circuitry to stop powering it for a while.

This slashes power consumption to a fraction of what it would be if the processor just idled, running a loop of instructions that do nothing while it waited for work. Generally, the microcontrollers in meters are designed so that even when they run code, they do so slowly, using high-threshold-voltage transistors. They consume far less energy than the same algorithms running on an Intel-based desktop machine. The PC's processor will use power-hungry low-threshold devices.

Designers of mobile phones and tablets, however, face a dilemma. They want the devices to be able to run games and video that, if run on a single processor core, demand the use of fast, leaky transistors. Texas Instruments' Goren says: "One step in the right direction was the introduction of multicore processing." Marco Cornero, a fellow in ST-Ericsson's advanced computing group, agrees. At the recent Multicore Challenge II, held in Bristol, UK, it quantified some power savings that they have achieved so far by splitting the mobile phone's workload across more than one processor. The team demonstrated that, on average, a 600MHz dual-core system consumed half the energy of a system based on a single 1GHz processor. This is despite having more than double the number of transistors, due to the exponential relationship between threshold voltage and leakage.

When that level of performance is no longer needed, firmware can turn off the extra processors – and it is possible to go further. Goren points out that a problem for typical mobile devices is that they "are always running tasks in the background. We tried to use more dedicated hardware units to perform some of those tasks, but even then we still had the fundamental issue of what to do with the main CPU".

If all the remaining processor is doing is simple housekeeping functions, it does not have to run close to the clock speed needed for video processing. Some researchers proposed using a variety of electrical techniques to shift the threshold voltage of transistors dynamically, putting the processor into slothful mode for quiet moments. These are tricky and expensive to put into action. It has turned out to be much more straightforward to simply add yet another processor.

Big-little processing

Processor vendors ARM and Nvidia have opted to use this approach: high-speed processors are only called upon to run code if more energy-efficient processors run out of oomph. In the autumn, the graphics chipmaker unveiled the Kal-El architecture that it developed for the Tegra-3 system-on-chip (SoC). ARM followed weeks later with what it calls the 'big-little' processing model, based around its Cortex-A7 and Cortex-A15 processors.

In both cases, even the operating system does not see which type of processors is being used at any one time. Instead a virtualisation layer works with hardware sensors to determine which processors need to be running. "Does the software care about which processor it runs on? Mostly no," argues Warren East, president of ARM.

The makers of chips for handsets have found, to their cost in the past, that software writers are slow to keep up with developments in the hardware. Processors that should be more power-efficient turn into battery munchers in commercial handsets simply because their low-power modes are not put into action properly.

Alasdair Rawthorne of the University of Manchester claims hardware is now easier to modify than software. "We always expect hardware to be more flexible than software. Banks, supermarkets and airlines all use software that was architected decades ago but it all runs on much newer hardware," he says.

ARM and Nvidia aim to avoid the problems of this software 'fossilisation' by not requiring operating systems and programs to explicitly decide where their functions should run. ST-Ericsson's Cornero argues that the trend for the future is that software writers will focus on the function of the application source code. Layers of hardware and firmware underneath will determine at runtime how to optimise its performance.

However, the phone SoC makers reckon they will eventually be able to improve power consumption by giving the software writers a little more control. "We have a very good idea of how to partition tasks and assign them to the right processor cores, and once the system architecture is defined I believe there is still an opportunity to optimise further to find the right operating mode," admits Jae Cheol Son, vice president of the SoC platform development team at electronics giant Samsung.

The Nvidia Kal-El processor uses five ARM Cortex-A9 processors with one of them tuned using high-threshold transistors so that it runs more slowly and consumes less power. This is the core that is used for basic background tasks. As soon as activity increases beyond the capability of that slower processor, one or more of the others start up. For its big-little programme, ARM decided to develop new processor core rather than retrofit its existing Cortex-A9 designs, launching the Cortex-A7 last autumn.

The A7 is designed to act as the shadow for the faster A15. The A7's simpler instruction pipeline coupled with some optimisations learned from experience with the older A8 and A9 processors means it uses less energy per instruction than the larger A15 core.

Says Samsung's Son: "The beauty of the A7 is that it is the default processor. The A7 will be the processor that handles most of the software tasks." In response to the argument that the die space allocated to the larger A15 core is going to waste, Son replies: "If we don't have the A15, we can't answer the demand for an increase in performance. It's better to have both." *

Further information

Share |

Speed versus voltage?

Intel chief technology officer Justin Rattner demonstrated a technique at the Intel Developer Forum in the autumn of 2011 that could extend the idea of the slothful processor far further: the 'near-threshold' processor.

Intel is far from being the first to experiment with the technique. Engineers have known for years that transistors pass measurable amounts of current close and even below the critical threshold voltage. UK-based Toumaz Technology has used the approach in its low-energy medical chips. However, leading circuit-design researchers such as Professor Mark Horowitz of Stanford University have expressed scepticism about it being a useful technique because of one important drawback: circuits built using sub- or near-threshold transistors are extremely slow. Given that they have very low drive-current, it takes a long time to build up a charge that can be used by successive logic stages.

Engineer Ram Krishnamurthy, who was part of an Intel team that designed cryptographic circuits using near-threshold voltage transistors, explained at the Design Automation Conference last year: "There are also problems with noise, temperature effects and ageing."

Common circuit design techniques used to improve speed, such as stacking transistors on top of each other in gate so that multiple inputs can be processed in one hit, cannot be used in near-threshold devices because the stack causes the already low voltage to drop to dangerous levels. The workaround in the case of wide multiplexers is to break them into a series of narrower gates. "Clearly you are adding more transistors to the logic path so things slow down," said Krishnamurthy, "but it allowed the AES engine to cope with a wider spectrum of voltages."

As a result, near-threshold devices turn out to be even slower than an initial analysis suggests, which is why they have rarely been used in practice. As slow-running background processors in chips with transistors to spare, these devices could find a niche alongside much faster logic; and, using extensive parallelism, the AES engine could achieve reasonable speeds at low power, achieving a performance versus power level of 2.2Tb/W.

ARM is also working on near-threshold architectures in collaboration with researchers at the University of Michigan at Ann Arbor. The latter has warned that it is not always the answer to low-energy performance. In one experiment, a group led by David Blaauw found that increasing the voltage in cache memories could result in lower power because it would allow two processors to share access to the same array, dramatically halving the number of memory cells that need to be powered during operation.

Defining the 'fin'

The term FinFET was coined to describe a nonplanar, double-gate transistor built on an SOI substrate, based on the earlier DELTA (single-gate) transistor design. The distinguishing characteristic of the FinFET is that the conducting channel is wrapped by a silicon 'fin', which forms the gate of the device. The thickness of the fin (measured in the direction from source to drain) determines the effective channel length of the device. The term FinFET now has a less precise definition. AMD, IBM, and Motorola describe double-gate development efforts as FinFET development; Intel avoids using the term to describe its closely related tri-gate architecture.

Related forum discussions
forum comment To start a discussion topic about this article, please log in or register.    

Latest Issue

E&T cover image 0413

"Is augmented reality the next big thing or a marketing gimmick? Is it fundamental to the future or a fashion faux pas?"

E&T jobs

Subscribe

Choose the way you would like to access the latest news and developments in your field.

Subscribe to E&T

E&T podcast

Tune into our latest podcast

iTunes logo