IBM’s vice president of innovation Dr Bernard Meyerson wants to change the future. Not simply in the direction of IT, but in using computers to sidestep traffic gridlock, deadly diseases, and energy blackouts.
The thing that might bring it about is, paradoxically, the death of the law that has propelled IT for almost half a decade – and a harder look at other ways to speed up computation. It’s a shift that mirrors Meyerson’s own esteemed career, who will expand on what types of innovation IT needs in his IET & BCS Turing Lecture starting in London at the Royal Institution on 24 February 2014.
In stepping into his current role at IBM in 2009, Meyerson moved from the computer maker’s microelectronics group. He had joined the company in 1980, where he led the development of processes based on silicon germanium, which is now widely used in RF circuits as well as being a key material in practically all microprocessors made since the middle of the past decade. However, it is the need to incorporate high-performance materials that go beyond silicon germanium that signal the death knell of Moore’s Law – the observation made by Intel’s Gordon Moore that the cost of microelectronics circuits halves every two years.
“What most people don't know is that Gordon [Moore] himself wrote half the answer,” says Meyerson, referring to Moore's use of observations of trends and the underlying technology of the time to predict that the number of functions on a chip would double every year. “DRAM inventor Bob Dennard wrote the other half: for how you do it.”
The problem, said Meyerson, is that scaling using Dennard’s rules, which the engineer developed in the mid-1970s at IBM, broke down early in the past decade. For about three decades, chip companies were able to follow the path that Dennard scaling set out for them. In general, as long as you were able to reduce key dimensions of the transistor in the correct ratio, you would be rewarded not only with more on the same-sized piece of silicon, but a transistor that switched more quickly and used less power.
Towards the end of the 1990s, companies such as Intel even put their foot on the accelerator in terms of speed. Clock speeds began to rise quickly because improvements in lithography, particularly in reducing the effects of diffraction, made it possible to scale the length of the transistor’s gate more quickly than other dimensions. And it is the gate length that determines transistor switching speed; or at least it was.
“In the late 1990s, I sat down and looked at the trajectory for each element in the transistor,” says Meyerson. “What I discovered was that if you follow Moore’s Law you would incinerate the transistor. This was the beginning of the end for scaling.”
Into the first decade of the millennium, companies found that they could not continue increasing clock speed, so they have flattened out for about ten years. To overcome the inability to use conventional Dennard scaling, companies such as IBM and Intel added techniques such as ‘strained silicon’, that improve the ability of electrons to move through the transistor, and overcome the lack of benefits derived from simply making the device smaller. Insulators such as cheap silicon dioxide have given way to exotic high-k dielectrics; and now even the transistor has changed shape into a finFET to try overcome problems introduced by scaling down a conventional planar transistor.
“But you end up using the entire periodic table of elements to make a transistor, and it gets increasingly expensive,” says Meyerson. “When I started in the industry, it cost hundreds of thousands of dollars to make the next generation of technology. For the last generation, 22nm, to get the first transistor to work you had to spend billions… And you get a transistor that’s getting slower, burns too much power and costs too much. All the meritorious things of scaling are going away. The only thing you really get is that you are shrinking the transistors. But the costs don’t go down anymore.”
Consumers have to a large degree been insulated from the decaying benefits of Moore’s Law, which according to Meyerson, really expired a decade ago. The one change obvious to outsiders has been the trend to use increasing numbers of microprocessors running at a fixed clock speed, without the accompanying benefit of the higher clock-speeds they enjoyed in the 1990s. Increased wafer costs have, to a large extent, been absorbed by the supply chain. This has led to recrimination in the industry. Meyerson points to companies such as nVidia that have complained about the rising cost of production as they hop from one process to the next. In 2012, CEO Jen-Hsun Huang questioned the economic viability of the 20nm process node.
“The cost per function always used to move forward – but that won’t happen going forward. Silicon itself has become a limitation,” says Meyerson.
Is it 'game over' for an industry predicated on rampant, long-term deflation where consumers can reliably expect to pay the same price for something better if they wait a little longer? Meyerson says he believes there won’t, in fact, be a hiatus in improvement: the means of acceleration will change dramatically once people get their heads around the idea that the base technology is meaningless.
Is conventional IT going to slow down? Quite the opposite, Meyerson claims: “It will improve dramatically”. One of the things about being stuck in a rut, he adds, “is your brain tends to shut off. You keep doing the same thing, and accept it. We are putting in these tremendous efforts to move 10 per cent – and hold a celebratory party when we achieve it! We really need to look at improving computing by a factor of a hundred or more.
"What’s more, we have to think about different ways of building systems so that they can scale over time without relying on Moore’s Law scaling. And we have to waste a lot less."
One option is less reliance on conventional microprocessors, which burn a lot of energy in simply working out what to do next. IBM is looking harder at accelerator technologies using devices such as field-programmable gate arrays (FPGAs), where, in Meyerson words, “You burn the computation into the metal”.
Meyerson uses the example of the Black-Scholes algorithm, a common way of computing the value of financial options. “An FPGA accelerates Black-Scholes 10-to-100 times compared with a modern microprocessor because you have preconfigured the algorithm. The problem historically is that the programming of these things has been challenging… But that has come a long way now through languages such as OpenCL. They are trying to now make it autonomic,” he says, the idea being that software tools would profile applications automatically and determine the best substrate on which to run the code. If a compute kernel would benefit from an FPGA or specialised coprocessor, the software finds an appropriate accelerator and relinks the code.
“Imagine you had intelligent software, an agent, that monitors what computing resources are available and understands what the underlying operations are. It can recompile to produce the optimised system without you touching it. This will give you way more advantages than what you might get out of a few generations of chips.”
Accelerators alone will not solve the problem. Meyerson says: “You start to worry about different aspects of performance that you would not think about before. The key one is the speed of light. It’s too slow, at least at the speed our processors run, 5GHz in some of ours.” Light can only run a few centimetres within a clock cycle.
“When you have a data centre the size of a football field, you have the situation where a system could wait hundreds or thousands of machine cycles because the data has to come from elsewhere in the centre. Everything is spread out. You lose a lot of time communicating, and that also takes a tremendous amount of power. So, one of the things people are looking at after 40, 50 years of scaling-out is to scale-in,” Meyerson argues. “The strategy of data-centre design has been for 40 years the ‘Australian strategy’: throw another server on the barbecue.”
Adding more servers simply adds to the distances packets of data need to cover, which is not helping: “You want to put things in close proximity,” Meyerson contends.
For those 40 years, Moore’s Law has been helping to put things in close proximity, but only on two dimensions and for similar technologies. Processors sit next to processors – but they are linked to memory by a comparatively long-distance bus. “What if you were able to make three-dimensionally integrated chips work? Thin each chip down to a thickness of 50µm, stack one on top of the other, drill through them and interconnect them vertically. So the chip becomes a combination of 50 chips,” Meyerson explains. “These chips will constitute entire systems, but look for all the world like a single chip.”
3D integration will not solve the problem of the speed of light on its own. Architecture will emphasise integration at the system level not chip level, Meyerson predicts: “This is the era of big data. You have to think differently about the architecture of the entire system. In the past you shipped all the data to the processor. And it spat out a set of metadata derived from all the data that it looked at.”
The problem, again, is one of scale: “Before, you had a few million lines of code and a few thousands of bits of data. Now you have a few million lines of code and billions of data items.”
The quantity of data in areas such as medicine is growing fast, and outstripping the speed at which networks can perform. Even for smaller databases, the latency of transferring a gigabyte database may mean that some of the more information is ‘stale’ even before it can be processed.
“There is a physical limit about how much I can ship. You can't move a petabyte of data,” says Meyerson, adding that legal restrictions add a further reason for data needing to stay where it lies. “Medical data can’t be brought out of countries like those in the European Union. You have to think about shipping the compute to the data.
“This is a very recent thing we have been doing: moving compute to the data. We quietly announced some capabilities in that area. Let’s say you are a bank that operates out of Singapore. You have clients who can't move data out of the EU… But you have corporate compliance rules to satisfy and you have to run them from the central office. Normally you would put four analysts on a plane and have them crawl through the local data. It’s expensive, slow, and if someone knows that you are coming and they are running fraud, by the time the analysts come to investigate you can hide some of the data.
What happens if you have an appliance that's pretty much empty in this remote site, Meyerson asks: “You could call out to the appliance to reach out for the analytics software. You then cut off its connections to the outside world and reach into the bank where it's sitting. You run the analytics there to search for issues and, once it's finished, it erases all the local data, just sending the metadata back to Singapore. Anyone who has tried to end-run the rules gets caught because it can happen quickly.”
“You also want to look at the new things that you want to do with compute. What does analytics do? You have discovery of the underlying facts. But, in Big Data what you really want to do is change the future. It's not even getting the understanding – it’s about altering the future,” Meyerson believes, using Singapore traffic as an example of what can be done.
“You can take data on where cars are at different locations in the city over time. You have traffic patterns for the entire year. How it behaves in the rain and in the sun. You build a model. If I take this model can I build an accurate predictor from that? We find we can predict the future with astounding accuracy. For example, 20 minutes from now we might predict that there will be a massive traffic build-up at a certain intersection... But because you have a detailed model you could go further: mathematically solve for the traffic jam and that way alter the future.”
To influence behaviour so that traffic is pushed the right, the system could make use of dynamic road pricing. “You might decrease the cost of going to different entrances to Singapore – reducing one from $5 to $2. It might not divert everyone, but it will divert enough to stop the jam. You've gone from 'there is a traffic jam' to where you were going to have one, but you've prevented it.”
Compressing servers into a tiny cube of chips and moving them to where the data lives will get the industry some of the way to where it goes. Cognitive computing will help improve their overall energy efficiency, says Meyerson. Referring to the supercomputer that won against human opponents in the US TV game show ‘Jeopardy!’, he says the competition was won at some cost: “The Power systems that were used in Watson consumed 80kW of power; human beings about 20W. That 4,000-fold offset is due to the use of a very different compute architecture to what we now have [in 2014]. The question is how do we mimic the functionality of the brain. That's where cognitive computing comes in.”
As Meyerson tells it chips will be laid out more like the human brain, and where we rethink some of the basic premise of doing computing: “That whole notion of doing cognitive computing is a dramatic step forward.”
Work is continuing at IBM Research in places such as its Almaden labs on low-level circuits that emulate brain function. These can be made using similar techniques as those used by conventional silicon chips but, being designed for cognitive computing should be far denser and more energy efficient than the software-based simulations used in a machine like Watson. It’s one more area where a rethinking of design can overcome the juddering halt to which Moore’s Law has come.
IET & BCS Turing Lecture 2014 tour starts on the evening of 24 February at the Royal Institution in London - see http://conferences.theiet.org/turing for full details