silicon wafer
Comment

We need ‘self-aware’ silicon that can ‘react to variations on the fly’, says Intel researcher

Image credit: Dreamstime

Chips need greater awareness of how they are performing, says Intel researcher.

To make sure they can operate over their expected lifetime and not demand a new battery every month, system-on-chip devices are going to have to be a lot more adaptable. They have to continually adjust their clock speed depending on what they are doing at any point – something microprocessors already do – and take account of how their own circuitry ages over time. At the same time, this adaptability may make them less secure.

In a keynote at the Custom Integrated Circuits Conference (CICC), held online-only for the first time in the wake of the Covid-19 outbreak, Vivek De, director of circuit technology research at Intel, explained how the combination of factors is changing how chips are put together, especially if they go into IoT devices.

The concepts are not new. More than a decade ago, Arm sponsored research into adaptive microprocessor circuits at the University of Michigan. Then as now, one of the approaches was to have processors operate so close to their energy limit that they would make mistakes and be forced to redo calculations to get back on track.

In a digital processor’s pipeline, there is a constant race against time in the calculation circuitry. Each stage of the pipeline has to finish its work before the next clock tick. If that is not the case, then the wrong results will be written to memory and the system will most likely fail. Normally, processors are tested to check that they will meet all their deadlines over their entire clock range. You see this in PC processors where devices are “binned” according to how far their clock rate can be pushed. The phenomena of overclocking and the Turbo Boost ratings you now see are testaments to the additional margins that AMD and Intel put into their testing regime: they tend to assume worst-case temperatures before slapping a clock speed on the label. Turbo Boost is a nod to the fact that a cooler-running processor can often work faster. By cooling further than this, gamers can often get several hundred extra megahertz out of their purchases.

With Intel’s approach, as with Michigan’s Razor in the past, the designers remove the margins completely and use self-test circuitry to determine when the clock rate is too fast. In an IoT device, the problem is not so much temperature as fluctuations in voltage. If the processor does too much at once, the supply voltage will most likely drop. This tends to slow execution down and increases the probability of an error. So, in a system like one of Intel’s experimental processors designed to handle TCP/IP packets, a control loop continually pushes and pulls at the clock rate to keep it on the edge of failure. Occasionally, when it gets too ambitious, the instruction fails and it has to be rerun. If it fails too often, the clock rate gets forced down.

One side effect of this adaptability is that devices may slow down over time because of the way that stress on the transistors subtly affects how quickly they can switch. De pointed to the need to account for aging effects in future designs and how hardware and software may need to compensate for this. “Degradation changes based on usage,” he said. This may lead to software performing load balancing across multiple cores to try to reduce the stress that any one element suffers or move work away from a core that has previously been used too intensely – much like the way that flash memories perform wear-levelling. “We need systems to be self-aware and react to variations on the fly.”

There is a catch with this adaptability, De warned: “Some techniques that improve energy efficiency and performance can increase the vulnerability to attack.”

The firmware’s reaction to changes may provide hackers with important clues as to what it is doing. That may make it possible for them to extract information that is supposedly secrete. Side-channel attacks that monitor electromagnetic emissions or supply-voltage changes are surprisingly effective in pulling secret encryption keys out of devices that do not have effective countermeasures. De described techniques that try to hide this information from hackers and, in some cases, even detect the probes they use to monitor a target chip. These countermeasures push up power consumption and, though it seems it may be possible to only active them when necessary, there is clearly going to be a balancing act between adaptability and vulnerability. Security may mean systems cannot be as energy- and cost-efficient as you would expect, unless you can pack them away inside a locked safe.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles