Prohibitive costs threaten the future scaling of silicon technology, but it is power that is forcing a break to units stacked several layers deep.
Shekhar Borkar's decade-long hunt may soon be over. 'I've been searching for a killer application for 3D for the last ten years,' Intel's director of microprocessor technology told delegates at the recent Design Automation Conference (DAC) in San Diego last June.
Borkar was not talking about 3D in video or sound, but in the world of chip design: an industry that has remained resolutely 2D for 50 years. The economics of the chip business has for all those years relied on the ability to form complete circuits on a single wafer of silicon using just chemical processes; and then on how well those processes have worked. Countless predictions of the demise of silicon scaling, along with the end of Moore's Law, have been greeted with derision by an industry able to maintain its punishing two-year cycle that ratchets up the density by a factor of two each time.
Silicon scaling faces big challenges now – the 20nm process technology threatens to be much more expensive to deploy than expected – but even those may not be enough to force a shift into the third dimension and start stacking circuits on top of each other. The 3DIC is not an easy thing to make, as companies have discovered in the past. The last time scaling seriously seemed to be running out of impetus, back in the early 1990s, companies such as Intel dallied with what were known then as multichip modules; but as soon as they could, they switched back to monolithic integration.
Mary Olsson, chief analyst at Gary Smith EDA, is under no illusions as to how ready the industry is for 3D right now: 'It's very much like the overhyped multichip-module market. The final demise of the multichip module was the lack of known good die, among other issues.'
The question of yield – the proportion of good chips made per batch – dominates chipmaking. On a wafer full of chips, there is a very high probability that some of them will be duds. Perhaps a speck of dust got onto the wafer and disrupted a connection. Or the tungsten needed to fill a via that forms a conductive connection between two layers of copper metal does not quite fill up the hole. The resulting void could easily cause the rest of the chip to fail.
As long as more than 80 or 90 per cent of the devices that are chopped out of the wafer work, a manufacturer will do okay. It can weed out the inoperative or barely functional chips during test, sell the rest at low prices, and still expect to make a profit. There are two levels of test. The first is called the wafer probe: a quick test of the chips before a saw cuts up the wafer. This screens out the weakest devices. Yet to test a chip properly, it usually needs to be sitting inside a package, as this results in better electrical connections without the risk of damaging the delicate circuitry.
The challenge with a multichip module is that you need to take each chip, or die, and connect it to the others before the final test – which means you do not necessarily have a good idea of whether or not it works. When you combine dice in this way the final package yield can tumble. If individual die yield is 90 per cent, when you put together three in a stack, you can expect less than three-quarters of the packaged parts to work.
The memory industry has found a way to make die-stacking work. Most mobile phones have one or more stacked memories inside them – it is the only way you can get 8GB of flash and another gigabyte or two of DRAM into the system. But memories can use redundancy to improve effective yield after packaging – using spare lines of memory cells to stand in for those that have been found to fail; and the memory industry has become good at screening out bad dice at the wafer-probe stage.
The 3DIC plan, however, aims to extend stacking to chips that cannot use redundancy in the same way and are not so easy to test. There is a further problem. Current memory stacks use an extension of the wire-bonding connection technology that has been used for decades in chip production. Tomorrow's 3DICs call for the connections to be made through the silicon itself – thick through-silicon vias (TSVs) will extend down to the bottom of the chip so that when the devices align correctly, they form a solid connection. To stop the stack from taking up too much space, each die needs to be lapped until it is only tens of micrometres thick – a point at which silicon turns transparent, and also more than a little floppy. TSVs have been used to put image sensors on top of signal processors but have a long way to go before becoming mainstream.
During a panel at DAC, Suk Lee, director of design infrastructure for Taiwanese foundry TSMC's marketing division, held up a 300mm wafer of devices intended for use in experimental 3DICs. It had to be supported by a plastic backing to stop the silicon wafer from curling over. Flexing causes problems for manufacture. Lee says: 'We need to work on ways to stop the wafer cracking during processing.'
Mike Gianfagna, vice president of marketing for design-tool supplier Atrenta, concurs with Lee up to a point: 'I think stresses will be a problem for 3D. We have not lapped the silicon paper-thin, and then drilled holes in it before. I think there will be a lot of challenges around that.'
Despite the difficulties of making working 3DIC stacks, companies such as IBM, Intel and Qualcomm have pressed ahead with development. James Warnock, distinguished engineer at IBM, reminded DAC delegates of two previous shifts in technological direction, both of which were caused not by problems with scaling, but to do with power: '3D interconnects do have the potential to offer us benefits into the future, providing a paradigm shift in the technology.' Paradigm shifts are caused by 'the build-up of problems', Warnock adds: 'The move to CMOS was caused by the build-up of power problems. Then people moved to multicore and multithreaded systems.'
CMOS logic did not become a mainstream technology until the late 1980s, some 20 years after metal-on-semiconductor (MOS) transistors became the norm in ICs. Up to that point, n-channel MOS or NMOS was the main form of transistor in use because of the higher logic-density. Using electrons as majority carriers, NMOS transistors are generally smaller than the complementary p-channel transistors that use the virtual particles, known as holes, as their majority carriers. Holes have lower mobility than electrons so need a higher volume to be able to supply a similar amount of current to their NMOS counterparts. The problem with either NMOS or PMOS-only logic is that it draws current all of the time. By using the complementary forms of transistor together, CMOS overcomes that. Like a canal lock gate, when an NMOS transistor is on, its PMOS complement is off, blocking the current flow from the supply rail to ground. Other than a small leakage component, CMOS only draws power when it switches from one state to the other – the PMOS and NMOS transistors are then both partially on.
The drawback to CMOS was size. It more than halved density. As a result, CMOS did not become the prevalent circuit technology for silicon ICs overnight. The manufacturers swallowed the cost hit over time. The transition had to happen or else integration would have been forced to stop by the rampant growth in power consumption.
Processor design saw a sudden change in approach due to a rise in power consumption in the past decade. According to Intel's Shekhar Borkar, for a period of 20 years in which performance increased 1,000-fold, two orders of magnitude of that increase came from running the transistors faster. The increase in clock speed was accompanied, however, by a rapid rise in not just active power consumption but leakage. So companies such as Intel found they could not push clock speed past 3GHz, and switched to more processor cores running more slowly, trying to increase aggregate performance that way.
The shift in processor architecture has exposed a problem. Feeding hundreds of processors sitting on a single chip takes a lot of memory bandwidth. 'Buses have become wider to deliver the necessary bandwidth, but they take a lot of power,' says Borkar, who estimates the energy today at roughly 25pJ/bit at 100Gbps. As memory bandwidth heads into the terabyte-per-second area, a desktop processor could draw more than 35W just accessing DRAM out of a total budget.
The memories are, in 2011, too far from the processor; so the parasitic capacitance and inductance are high. Bringing everything closer together by stacking memories and processors together can drastically reduce the parasitic effects. You can also place many more connections between the processor and memory and reduce the clock speed for the interface, which helps cut energy usage. Intel researchers demonstrated this by stacking a memory chip on top of its experimental 80-core processor called Polaris. 'The total system delivered 1TBps for an energy consumption in the order of 1pJ/bit,' Borkar explains. 'This experiment gave us a pretty good idea of how to reduce interconnect power.'
There is bad news for desktop processors with this plan, says Borkar. If you put the DRAM on top of the processor, the heat from the processor has to flow through it to get to the heatsink. 'DRAM does not like high temperature,' the Intel engineer notes, 'but if you put the DRAM underneath, you have to take signals through the DRAM die, so it has to be made bigger; but it is the most promising solution for memory bandwidth.'
Mobile systems could see processor-memory stacks before the desktop world as those devices run much cooler. The need to slash memory-access power is also more pressing. Qualcomm is developing stacks for possible use in its future mobile processors. Riko Radojcic, leader of design-for-technology initiatives at Qualcomm, said: 'We think 3D is inevitable, especially for wide I/O and high bandwidth memory. The question is when and how. But, one day, I will be able to say I told you so.' *