Within a decade or less, physical limits will stop silicon scaling. A move into the third dimension is the answer
In the years that followed the punk movement's brief summer of 1977, forlorn stragglers with hair carved into bright green and pink Mohicans would wander London's King's Road long after the shops that briefly made them fashionable had shut or simply moved onto the Next Big Thing. Their T-shirts carried the plea: "Punk's dead but we're still dying". Moore's Law as we have come to know it has similarly entered its Mohican phase.
At the Design Automation Conference in San Francisco in June 2014, engineers talked openly about the likelihood that the scaling trend for silicon that used to see transistor density double every two years was coming to a juddering halt. Their comments echoed those of IBM chief technologist Bernie Meyerson earlier last year in an interview ahead of delivering the IET/BCS Turing Lecture: "The truth is, Moore's Law died in 2003. It's been effectively dead for ten years but we've continued to struggle with it."'
The question now is whether artificial life support can keep the benefits of making integrated circuits denser alive and to a lesser extent what Moore's Law actually means. Its meaning has shifted over the years. When Gordon Moore published the first set of graphs that demonstrated what became known as his Law, they were simply parabolic curves that showed how experience with making integrated circuits led to them halving in cost every generation. In the white heat of 1960s technology, this happened every year. Exactly how it happened was not important, just the result.
Ten years later in a speech at the International Electron Device Meeting (IEDM), Moore explained how the trend would continue, albeit on a two-year cycle. He identified three factors that he saw as vital. In practice, only one has mattered for most of the 40 years since then.
One factor was chip size, another was improvement to design techniques and the third was lithography scaling itself. Within less than a decade, increases to chip size more or less stopped. Although some very expensive parts went bigger, chips for the most part wound up being locked to no more than 100 square millimetres. Above that size, the probability of particles landing on a key point and breaking the entire chip soared. Furthermore, the reticle that holds the mask has been limited to 600 square millimetres for several decades, providing a further physical as well as an economic limit. The first chip to hit that limit, a neural-network emulator, appeared in 1990. So, chip size made little contribution to density for more or less 30 years.
Improvements to design smarts didn't come as naturally as Moore expected. In fact, lithographic scaling did so much of the lifting that people invented design techniques that were less space-efficient than the early custom layouts but which massively increased engineer productivity. Now lithographic scaling has more or less run out of steam.
Too many problems
Chipmakers are still forced to use ultraviolet light with a wavelength close to 200nm to draw features that are less than 20nm. Doing this requires a number of optical tricks, the latest of which is to use multiple masks in sequence to try to overcome the final physical limitations of using such long-wavelength light. The additional masks have increased the cost of production to the point where improvements in device size no longer provide an economic benefit. The many restrictions on design freedom caused by the multiple-mask tricks are sometimes denying designers the opportunity to reduce circuit size at all. In some cases, the trend has gone into reverse.
Although the 14nm and 16nm processes offered by the foundries sound as though they should be denser than the previous 20nm generation, all of them are based on the same grid of interconnections. The big change is to follow Intel by producing three-dimensional finFETs rather than the conventional planar transistors of the 20nm generation. To some extent, the finFETs are better at switching, which allows designers to use smaller logic blocks to do the same job as older transistors. But this is not the the same as classic scaling.
Lars Liebmann, IBM distinguished engineer and the company's primary expert on lithography, says: "We want to scale power, performance area and cost at the same time. But even meeting one of those is very, very difficult."
In his keynote at DAC, Karim Arabi, vice president of engineering at Qualcomm, said the cost issue is central: "We are very cost-sensitive. Moore's Law has been great [up to now]. Although we are still scaling down it's not cost-economic anymore. That's creating a big problem for us."
ARM fellow Rob Aitken likens the situation to a glacial valley with steep paths leading up into the mountains: "Progressively people give up or fall in the crevasse of doom."
One way out of the cost problem is to move to a much smaller wavelength of light - extreme ultraviolet (EUV). But this is a technology that has been in development for close to 20 years and still cannot support the throughput needed for high-volume wafer fabs. IBM said in July it had managed to sustain more than 30 wafers per hour in experiments using EUV, but that is still a long way short of the 100 per hour needed to cut costs compared with current techniques. It could take another three years at least before EUV is ready for production even on an optimistic timescale.
Instead of continuing to implement ever more expensive, byzantine technology, the alternative is to harness the spirit of punk and move in a completely different direction. Instead of trying to shrink the features on a 2D surface, just build more 2D surfaces. In effect, it's a return to Moore's original idea that die size should have an impact on silicon scaling. It's just that in this case the die space comes from adding layers.
Some manufacturers have already started down this path. IBM's server operation sees 3D integration as vital to the development of its largest computers. Apple's iPhone managed to increase the density of the memories it used by having them stacked and then packaged together. Micron Technology and Samsung are going further by building 3D memory stacks that offer massive bandwidth by drilling holes for electrical connections through the stack. This reduces parasitic interference from the extra capacitance and inductance of long bond wires that are normally used to connect chips within a stack to the package.
A cube of chips could contain the entire brains and memory of a tablet and an array of other embedded systems devices. But there are significant drawbacks to the approach. The extra processed silicon area inside the cube does not come for free. Each has to be made on a silicon wafer as before. And although through silicon vias (TSV) improve memory throughput they eat into the area that can be used for circuitry because they are orders of magnitude larger than the transistors that surround them. This further increases cost.
Going over the wall
Aitken says: "People may think they need to go to a 3D stack but then discover that the TSV technology doesn't get them the transistor density that they want."
For this reason, the shift to vertical scaling depends intensely on what happens to the costs of 2D scaling. For the moment, the cost decisions remain finely balanced. Liebmann says: "3D offers an alternative to standard scaling. But first we would have to say 'we are done with standard scaling'."
Gerd Teepe, director of design enablement at IC foundry GlobalFoundries, says: "If the [2D] shrink path is open and moving forward 3D will be put off for five to 20 years. If the shrink path continues to get harder, we will see a huge shift into 3D."
The 3D IC might get a boost if technologists can cut the number of chips that need to be stacked. Arabi says: "We are looking at true monolithic 3D. This is a technology for the end of the decade, but it can give us an advantage of one process node, with a 30 per cent power saving and a 40 per cent gain in performance."
The idea is to put more than one layer of transistors on to a single chip. Memories are beginning to appear that already use similar techniques. Samsung and Toshiba are among those that have developed ways to put strings of the transistors needed for flash memories into vertical strings, massively reducing the area each bit covers.
Logic circuits are less easy to make vertically, which will delay the introduction of monolithic 3D techniques. A further problem is lack of R&D. Only a few organisations have so far invested in developing the technology although they have demonstrated devices that have more than one layer of transistors on them, built using techniques similar to those used to fabricate the wires that link them together. The incremental cost of adding layers should be much lower than that of bonding chips from different wafers together, although each additional layer could add cost by reducing the overall number of good chips that can be carved from the final, processed wafer.
Liebmann says: "Monolithic 3D has much more of the flavour of what Gordon Moore intended us to be doing. But monolithic 3D is not well funded. Where will that R&D funding come from?"
For the moment, the major chipmakers and foundries have pinned their hopes on EUV finally making it to production. But people working in these companies believe there are other reasons than pure cost scaling for moving to 3D ICs.
Arabi points to the wiring between transistors as an important driver of a change in the approach to scaling. Some 2km of ultra-thin wires criss-cross a typical PC or mobile-phone processor and they have to keep thinning down to be squeezed in, which increases resistance. "Interconnect [parasitics are] inching up as we go to deeper and deeper technologies. That is a major problem because designs are becoming interconnect-dominated. Something has to be done about interconnect."
For Arabi, that points to 3D. Shekhar Borkar, director of the microprocessor technology laboratory at Intel, has for a number of years proposed a shift to 3D integration to not only reduce the average interconnect distance but the energy cost of transferring data. "Communications energy increases with distance," says Borkar. Assuming current trends continue "soon, the energy of on-die data movement will dominate everything else".
Although a move to 3D, particularly the monolithic form, may help cut average wire length and therefore energy, the technology could easily make other things worse. A potential major problem is heat. Transistors produce copious amounts of heat when they switch, resulting in big increases in heat production when running at high frequencies. Intel's 'turbo' mechanism in its processor is a recognition that the processsors can only run at top speed for short periods of time before they run the risk of overheating.
Professor Michael Taylor of the University of California at San Diego, says: "If you look at graphs of long-term growth in transistor performance, you would think we would be able to operate them at 15GHz by now."
Instead, even the processors in large servers do not run at much more than 3GHz. Taylor adds: "Transistors have this inherent capability but we can't use it because the transistors can't use it because they have to stay within a power envelope."
The situation has reached the point where, to be able to remove enough heat from the chip to stop individual transistors from cooking themselves to death, a large fraction of the overall device needs to be doing nothing. Mike Muller, ARM CTO, coined the term 'dark silicon' in 2009 to describe the problem.
3D integration is no help, says Mohamed Sabry, researcher at Swiss institute EPFL. "3D could make it darker," he says, referring to the additional problems that stacking causes for both power delivery to and heat removal from transistors buried in the stack. Today's PCBs and heatsinks are based on the idea that heat can be efficiently conducted and radiated from flat surfaces. A hot processor insulated from airflow by other devices will see much higher thermal resistance, forcing it to go slower or even power down for long intervals between bursts of activity.
However, recent research by Qualcomm has suggested that for mobile SoCs, 3D could help with the heat problem. Because the average length of interconnects reduces with 3D stacking, particularly for monolithic 3D, it is possible to use smaller logic cells that, in turn, produce less heat overall.
The heat-removal problem strongly influenced the ideas put forward by researchers such as Borkar for creating 3D stacks. Although the TSVs needed slash useful die area that could otherwise be used for storage bits, it makes more sense to put memories underneath processors, with the bus connections to the PCB running right through them, just so that the processor could be directly attached to a heatsink on top.
Borkar argues that the long-term direction of computer architecture, which places far more emphasis on energy consumption than in the past, will be forced by dark silicon to change radically. "We need a revolutionary software stack," he claims. "Communications energy will far exceed that of computation. Programs will need to make dynamic decisions to move computation to the data. Data locality will predominate."
Some ideas along these lines have started to come to market. The Automata architecture developed by Micron Technology pushes computational elements into the memory chips themselves. But they force programmers to completely rethink the way they write software to be able to take advantage of them.
The cool out
EPFL and IBM propose something even more revolutionary: take a leaf from the structure of the human brain, which already combines memory with computation in its network of neurons. "Biological systems are far superior to computers in terms of computational density," says Sabry.
IBM had part of the answer decades ago when it pioneered liquid-cooled mainframes, which were based on transistors far more power-hungry than the CMOS devices practically all machines use today. Liquid coolant could be injected into tiny pipes that permeate a 3D IC stack. But those pipes would compete with the wiring needed not just for circuitry but power delivery. The brain has the advantage of combining power delivery with cooling, says Sabry. A computer based on the same principles would immerse chip stacks in a chemical mix, using micro fuel cells within each layer to generate current directly. "And you can exploit the liquid to remove heat from the system."Simulations on a virtual IBM Power7-based server have indicated that existing electrochemical techniques can deliver enough juice to run the memories if not the somewhat thirstier processors. The researchers are working on improving energy delivery.
The benefit in terms of cooling is more dramatic. The simulations indicated that, given enough energy from the chemical bath, temperature in the chip stack falls from 85degC to less than 40degC. "We have cooled it down completely to the point where we could look at removing dark silicon," says Sabry.
IBM's researchers have proposed using the liquid bath to actually increase internal temperature and pipe the spent liquid under homes to provide cheap heating. "There is also a positive relationship between the heat extracted and the power you can deliver. With increased inlet temperature, we can increase power by 23 per cent. We will need optimisation methodologies to find the sweet spot as we don't want such a high thermal gradient that it undermines reliability."
If researchers can find a convenient electrochemistry mixture that converts easily to energy on-chip and which does not also eat away the delicate circuitry, the shift to 3D and a successor to the currently accepted version of Moore's Law could get under way in style. But, for the moment, less elegant, even punkier, alternatives are needed to keep the silicon industry on track to deliver at the same rate of progress it has managed for 40 years together with the cost reductions to which the market has become accustomed. Say no to no future.