Want electronics you could drive from a piece of fruit? So does the semiconductor business.
The electronics industry has been obsessed with driving down power consumption for more than a decade, not so much from a green perspective but to stop laptops and phones going flat at critical moments. Admittedly, it's going to be some time before you can drive anything more complex than a watch off the voltage from a decaying piece of fruit, but the emphasis on carbon footprint is lending new impetus to the question: how do we get the power down and how far can we go?
To the world at large, semiconductor design is fundamental to extending the battery life of the portable devices on which so many people depend, particularly since improvements in battery capacity are so slow to emerge. An increase of 4 per cent per decade since the start of the 20th century is practically negligible when you compare it with the reduction in energy per operation that Moore's Law made possible over the past 40 or so years.
However, even before we saw the first GSM handset that could fit into a jacket pocket, chip companies were already key players in the industrial controls and engine management sectors, attacking power on a much grander scale (see 'Saints and sinners', p34).
Silicon, the thinking goes, can provide the intelligence inside all kinds of devices that will not only enable sophisticated applications on a smartphone but also make any kind of hardware use power in the most sophisticated possible way. And that's largely true. But chip design is itself also at something of a power crossroads. The reason why your laptop - which boasts the equivalent power of multiple 1970s mainframes - does not need its own subgenerator is down largely to the way that smaller transistors can run on lower voltages and with much less current. The bad news is that, when it comes to voltage savings, there isn't much left in the bank.
At ARM's recent US developer conference, techCon3, CTO Mike Muller captured the problem neatly.
"It looks great. Everything's scaling. However, most people miss off the 'power' line. And if you look at the predictions for the power saving as you go from [the] 45nm [process node] to 11nm, it's only expected to be 0.3," said Muller.
"So, I'll do the maths for you. According to ITRS, you've got 16 times as many transistors, going 2.4 times as fast taking 0.3 of the power. And that means that if you've got a fixed power budget today and you don't do anything about it, you'll end up so you can only use 10 per cent of the transistors. You can make them. You can afford them. But you can't actually power them up."
The factors limiting projected power savings are already many and varied, and a further problem is that with each node involving still smaller feature sizes on a device, the number of flaws and failures attributable to different kinds of manufacturing variation are also increasing and sometimes impossible to foresee.
However, there is one issue that perhaps stands above all others, and that is the relationship between the supply voltage, the threshold voltage and - of increasing interest - the subthreshold operation.
Alan Gibbons, principal engineering with leading EDA vendor Synopsys, explains: "As we shrink from one node to another, we want to bring the supply voltage, Vdd, down, but in order to maintain enough gate overdrive, enough performance we have to bring the threshold voltage, Vt, down as well. That brings in all kinds of problems. You become more susceptible to noise, short channel effects and all kinds of other variation effects. It's not quite as simple as just bringing down Vdd.
"And we've got to get smarter still. Bringing down Vdd brings down Vt but it also means we increase the subthreshold voltage. And as we get to 28nm and 22nm, there is a school of thought - and some maths to back it up - that in devices which are very active for a period of time and then idle for another period, the residual subthreshold leakage will be significant. In certain cases, it could be greater than the dynamic power when the device is actually 'on'."
So, simple power scaling isn't sufficient - and the increasing controversy about how much power hardware consumes in its standby state doesn't look like going away soon either.
The good news, however, is that Synopsys and most other EDA vendors believe that some well-established techniques can still be brought to bear on this and related problems. First, there is power gating - in its simplest sense, this involves powering down blocks of the IC that are not in use. Second, there is clock gating - turning off clocks that are not needed because of their high consumption of dynamic power.
There has been some debate as to whether the end is in sight for these familiar concepts - power gating, for example, has been around more than 20 years - but Gibbons believes that the challenge is more one of matching sophistication to their immediacy in addressing the main problems than their longevity.
"These are very much as concurrent techniques that will both be with us for many years," he says. "For clock gating, you are dealing directly with the dynamic power dissipation, and with power gating you are dealing directly with the standby power dissipation."
This does not mean that there are not also some other options now maturing or only just becoming available. In his speech, ARM's Muller cited six areas of interest: system architectures, programming methodologies, application-specific accelerators, run-time adaptive power control, 3D and multi-chip modules, and early adoption of new process nodes.
However, these all present significant challenges. Even a technique that is now maturing, dynamic voltage and frequency scaling (DVFS), is seen as facing limits according to how much voltage overhead a design may have, depending on, for example, whether power or performance is the priority.
From Muller's list, smarter software is becoming available within the power management units (PMUs) of a silicon system that handle dynamic and predetermined gating tasks. Tools are also available that handle more sophisticated clock tree synthesis for today's larger designs. Vendors here include Azuro. But as for the applications themselves, that is a very different story.
The current trend in silicon is towards multicore designs, and translating legacy programmes designed for single-threaded silicon is no trivial task. As developers focus on their own efforts as well as hypervisor and other 'middleman' software that aims to optimise the performance of existing code, getting something to run and take modest advantage of the extra performance offered by multicore chips is the priority. Taking that code and making it more power-efficient will have to wait.
And this points to one of the other critical issues: complexity. At the 28/32nm node, estimates for the cost of designing a single chip range from $60m to $75m. Chip designers are notoriously conservative when it comes to adopting new design techniques and methodologies. And the same applies to their customers, particularly if a device might require programming. Cost and time-to-market are as sensitive as the low-power process is challenging.
Geoff Lees, vice president and general manager of NXP Semiconductor's microcontroller division has experience of both sides of the issue. One of his company's recent drives has been to take 32bit ARM-based MCUs into the sector dominated by lower resolution 8bit devices.
"We had already been offering ARM solutions down at the $1 price level and yet we were still encountering a huge amount of resistance," he says. "You could do a spider chart for what was happening and you would end up with about 10 things on it - tools, ecosystem, and so on - but it really came down to three: price, power and ease-of-use. And while there was a significant number of customers were in the power segment, there was a significantly higher one in the ease-of-use segment."
Since that initial launch, NXP now has a range of products based around the still further simplified ARM Cortex M0 code, and in the process of developing them claims to have achieved a code size that is 30 to 40 per cent lower that that of its 8bit rivals, largely because, for code sizes over 64KB the 8bit parts are forced to use complex memory-extension techniques such as paging, overlays or segmentation. More efficient code does typically mean better power performance, but for Lees the 'killer' aspect here remains how that improvement addresses the complexity issue.
Lees again cites what his company's primary EDA supplier - in that case, Cadence Design Systems - has been able to do with the core power and clock gating techniques as central to how NXP has addressed low power. "It's much better now than it was even two years ago - there's a lot of mileage in there."
There is a temptation to say that everything old is new again. Familiar design techniques. De-integrating what there was once pressure to integrate. Constraining code. However, well-tried techniques feed into the complexity debate to further extent that they help both the designers and their tool supplier to make challenging processes invisible.
Synopsys' Gibbons has collaborated with colleagues at ARM to develop a widely-adopted 'Low Power Methodology Manual', that has been refined and extended over the last few years. The adoption strategy there does seem to point silicon's way through its low power challenge.
"It is critical that adding power management to your design is as seamless as possible, and that is a massive challenge," he says. "And there are two ways of approaching the job. You can define a methodology for power management that is revolutionary to what designers have had in the past. And it will work. But it won't get widely adopted, it won't be widely accepted, it'll be a real struggle. You need to make low power design evolutionary, to make it almost seamless even though you are adding a constraint to your design is the real challenge we face.
Gibbons concludes: "All the techniques that we're looking at are not really new from a semiconductor physics perspective: power gating, turning blocks off when you don't need them, voltage rails and moving thresholds around in standby - all have been around a while. But the challenge is how do we deploy those techniques in a way that does not impact the user's performance, does not impact his methodology, and does not touch his time-to-complete."