Sani Nassif, the man in charge of tools and technologies at IBM Microelectronics, has a way of putting into perspective the looming problem for chip designers.
“You want a billion people to form a human chain. But many of them have had a drink and can’t stand up too well. That’s how it is with transistors,” said Nassif at the recent ESSCIRC conference in Munich. “We have one billion transistors that are all different. The problem is predicting how long the chain will be and how much food and beer is required to make it happen.”
The variation in the transistors that are crammed into the densest chips is causing headaches for engineers. “It is one of the most critical issues for the industry and it is arriving very quickly,” said Professor Asen Asenov of the University of Glasgow, talking to Electronics ahead of the International Conference on CMOS Variability in London organised by the National Microelectronics Institute (NMI).
Chip designers have been living with variability for years. It is just that the digital engineers did not notice. Analogue circuits, in particular, have demanded multiple simulation runs using Monte Carlo techniques to try to harden them against changes in the process. Changes that might not be visible in the digital domain, but which would kill an analogue part.
Marcel Pelgrom, a researcher at NXP Semiconductors, told delegates at the NMI’s conference on variability in London in October, how increasing variability with decreasing geometries demanded new circuits. He described how altering the width-length ratio of transistors could reduce noise. But that simple technique started running out of steam as early as 1985. “We were seeing huge mismatches, so we started to use offset compensation on per-transistor basis,” he explained. These are techniques that digital designers are beginning to adopt to make static random-access memory (SRAM) less prone to variations in the process.
In older processes, variability was more of a problem in the early days – it took time for the engineers at the fab to learn how to stop parts of the process going out of control. “Usually, when you ramp up a process, you improve the control,” said Asenov.
Fabs are now working to remove those sources of variability from their processes more quickly so that they can get to full production in a matter of months once the first full wafers have been run. The companies making fab equipment have worked on ways to get much more even conditions inside even comparatively simple tools such as ovens and deposition chambers.
The work has paid off. TSMC claimed that it has been able to reduce the time to get to the point where the nominal yield for chips it produces is above the 95 per point down to less than two years for the 90nm process compared with more than three years for the 0.25µm process. According to the latest figures from the company, the 65nm process is following roughly the same trend as the 90nm process, hitting the 80 per cent point in a little over one year. Yet, even as the fab operators improve their ability to control traditional sources of variability, new sources of variability have emerged.
“The second kind of variability came, most probably, with the 90nm node. It is what we call deterministic variability. This is the main focus of design-for-manufacturing efforts and is mainly related to the way that lithography is done,” said Asenov.
The problem is that lithography using ultraviolet is being pushed to its absolute limit by chipmakers. Over the last ten years, lithography-tool suppliers and fabs have found ever more exotic ways to get light with wavelengths in the 200nm to produce features down to 30nm wide. They use the wave properties of light to produce interference that make it possible to define such tiny features. Optical proximity correction (OPC), for example, adds notches and lumps to features on the mask to try to print the basic shapes that were defined by the circuit layout.
“OPC introduces variation in the shape of the devices, for example,” said Asenov. “Two structures that used to be similar before are now broadly similar but they are each unique. Each structure becomes unique because of the environment in which it is used,” explained Anantha Sethuraman, vice president for design-for-manufacturing at Synopsys.
“There is no such thing as a typical device anymore. And it is compromising our ability to calibrate device models, so that the hardware-to-model mismatch is increasing,” said Nassif. “Some of the things we understand. Others we just have an empirical understanding. We don’t know why they happen.”
Some of the sources of variability now getting attention from tools suppliers and foundries have always been there: they were buried under other effects and were not considered worth designing around, at least until now. They are also not as unpredictable as just computationally expensive people believe – to predict.
“Some effects are deterministic from a physics perspective. But because we have difficulty dealing with them, we treat them as random,” Pelgrom said, pointing to effects such as negative bias temperature instability (NBTI), substrate noise and, one that is getting a lot of attention recently from tools developers, mechanical stress. “There is a role for CAD to handle these effects and handle them properly.”
Pelgrom pointed to stress as something that is often misinterpreted as a statistical effect, largely because it seems to speed up some transistors while slowing down others unpredictably. “People report 10 per cent modulation of current from stress. It can be modelled as a statistical effect but it’s not,” he claimed, adding that the effect is becoming more prominent at 45nm because of the wholesale move to strained silicon.
By deliberately stretching or compressing the silicon lattice through the use of silicon germanium layers or other techniques, it is possible to improve the mobility of carriers in the transistor channel and boost its current-carrying capability. “The variability is enhanced by strain silicon and the strain depends on the layout,” said Pelgrom. Rather than deal with the effects analytically, the response from the industry has been to take account of the potential for variability through ever more rigid design rules. But the rules are becoming more unmanageable with each new process.
For the last 30 years, design rules have made it possible for design teams to generate incredibly complex designs without much knowledge of the underlying physics.
“In the beginning when design was relatively simple, they were small designs with small design teams. They were exemplified by the early Intel processors: just 2,500 devices,” explained Nassif. “One person knew what all those transistors were meant to do. We were in the age of chip engineering. Everyone knew what the parameters for trade-off were.
“Later on, the performance became so good that we could abstract some of the design away. We got into chip computer science and we developed rules,” Nassif added. The number of rules grew gradually up to the end of the 1990s, then they began to become much more prescriptive.
“The number of rules to represent what any given technology looks like will be an unbearable number of rules: 32nm looks like it will be exponential compared with 45nm,” said Nassif. “We have rules for wire spacing. Now the spacing between the lines depends on their width. What used to be just the intersection of poly and diffusion is now a lot more. And transistor behaviour depends on the layout so we have to look at stress. We have to look at strain. There is a large array of possible behaviours based on the layout,” Nassif claimed.
“Design rules used to be pretty open. They forbade you to do things but you could do anything else. Now, they’re moving to where they say you can do A or B but not anything else. Gates have to be aligned. You have to have gates of certain fixed lengths and no empty space between them,” said David Frank, staff researcher at IBM’s TJ Watson research centre.
Just as designers get used to the idea of being able to predict the results of layout effects, new sources of variability have emerged. Again, like NBTI, they have always been there but they were very much second- or third-order effects.
“Much of what we observe today is due to variability. You need to pay attention to the difference between two words: variability and uncertainty. I know, if I change this piece of layout, what will happen. I call that variability and I think of it as being systematic,” explained Nassif.
“Then there is uncertainty. I don’t know why it changes just that it changes. You see this in dopant fluctuations in the transistor channel. It is something that is random. And, if I have uncertainty I have to worst-case it,” Nassif added.
Asenov remarked: “At the 45nm node, it becomes a very serious problem: we have substantial variability. And it is completely statistical variability. It is not coming from not being able to control the process well, or from lithography. It comes from the granularity of the way that the devices are manufactured.
“We see random effects from lithography such as line-edge roughness. It is very difficult to avoid because, to deal with line-edge roughness, you would have to change the chemistry of the photoresist. And the roughness is a consequence of the wavelength of light that you use,” Asenov explained.
Although advanced masking techniques such as OPC have made it possible to produce tiny gates, they produce comparatively blurry images on chip, which resolve out to gates with edges that are far from straight lines. Along the gates, the width can vary by as much as 5nm over a distance of 20 to 30nm.
“You can predict the average shape. But can’t say what the precise shape of the device will be. The shape of the device becomes unique. In some places there is a reduction in the length of the channel. And there are other places where it increases,” said Asenov.
“Another important source of variability we will see is the granularity of the polysilicon. You are starting to see variations that depend on the structure of the interface. But the main source of variability is random dopant fluctuations,” Asenov added. Frank said the random variability chipmakers are beginning to see comes down to the quantum nature of matter and light. “We are pretty much up against the quantum world.”
Pelgrom said individual atoms now play a part in the randomness: it is reaching the point where each go matters to the behaviour of the transistor. He said that a minimum-size transistor made using a 0.25µm process would have around 1,200 dopant atoms in the channel. “For 65nm, it is 60 to 80 atoms.”
A lot of the dopants are used in the pocket implants that lie close to the transistor’s source and drain areas. These implants help keep under control the short-channel effects that would otherwise stop sub-micron transistor channels from working effectively.
Process engineers are now seeing a statistical consequence of what is an otherwise predictable phenomenon: the well proximity effect. “You get more doping at the edge than you would like,” said Ric Borges, technical marketing manager at Synopsys. “Analogue designers really care about that. What happens is that the atoms bounce off the photoresist.”
The result, Borges said, is a change in device properties, putting a small kink in the current-voltage curve of the transistor. Pelgrom said the threshold voltage can change by up to 30mV depending on how far the transistor channel is from the well-edge.
“That is why line-edge roughness is an unpleasant effect. Pocket implants often use the poly layers as a mask.So that roughness is partially imprinted in the pocket implant,” said Pelgrom. “But, by rearranging and adding a few steps in the process, these effects can be addressed.”
Designers are not completely powerless in the face of these new effects. “The fix for systematic defects is regularity,” Nassif advised. “And the fix for random defects is resilience. Adaptability and resilience need to become first-order concerns.”
A lot depends on how far you want to push the process. “What variability makes you do is raise the threshold and not scale the device as quickly,” said Frank. SRAM designers have first-hand experience of variability as the transistors in those macros generally push the process as far as it can go. The increasing density of on-chip memory is the primary reason why Moore’s Law has done as well as it has in recent years: the parts with the largest areas of SRAM cache tend to be the densest. And SRAMs use redundancy extensively.
Yves Laplanche, head of research into silicon-on-insulator libraries for ARM, said it is possible to focus attention in a few places to avoid having to build resilience into every part of the ide. “SRAM has lots of bit cells: it has high impact but you can use redundancy. For standard cells, it is only those in critical timing paths have high impact.”
Ultimately, latches as well as SRAM bit cells may need redundancy, or entire blocks may need to be duplicated so that the chip can continue to work if transistors within one block suffer from variations so extreme they stop operating.
“We have gone a long time with the same architectures,” said Asenov. “We have single processors with millions of transistors. Most of them are not delivering performance. So, we could move to massively parallel architectures. The main problem with massively parallel is the software. But if we can deal with that so it becomes beneficial to go to massively parallel. You could accept that one of the processors is not working.”
But, in the meantime, attention is focusing on statistical design techniques and better modelling, although some changes to transistor architecture could help. “We could minimise variability with a back-gate FET,” Frank said, but added that no commercial design for such a transistor yet exists. If you employ highly conservative design – that is, use large guard bands – then even the biggest variation in device parameters such as threshold voltage will not stop the chip from working.
“You may find that the margins are so big that they have huge penalties in terms of power or performance,” said Asenov. “Designers need to know how devices are fabricated. We are moving to the next stage where tools can’t help us deterministically: you have to think about circuit design statistically.
“Some tools are starting to appear. Statistical timing analysis is already available off-the-shelf. But more than that will be needed. Different types of devices will call for different statistical techniques.”
Designers might also have to think about circuits in a different way, said Asenov: “Think not just about performance and power but yield. If you want very optimal device you may sacrifice yield. But it is a tradeoff that designers are not accustomed to.”
To get the most out of the process, engineers who could previously ignore low-level effects will have to become acquainted with them. Those who only need to achieve a certain level of density, mainly through memory, may be able to avoid those considerations.
The changes mean that tools once isolated to process engineers will become important for circuit specialists. Sethuraman said such technology computer-aided design (TCAD) tools are no longer just for the fabs. “They are also for the rest of us,” he said. “It is becoming a must-do at 45nm. And anyone not doing this at 32nm does not have a prayer. With 32nm it gets much tougher.”
Borges added: “You can use TCAD to go to key parameters and insert variability into those parameters and understand the impact at the Spice-model level. They are what we call process-aware models.”
Companies are going further, said Asenov: “A lot of the fabless and chipless companies are starting to look at this. They are manufacturing control chips in foundries to get the transistors to measure variability for themselves. So the problem is coming into the open.
“And the foundries are beginning to understand that it is in their best interests to make designers aware of variability. It means that they have to share much more information.
“Variability is here to stay and it will be increasing, whatever we do,” concluded Asenov. “We may have to change the way we design circuits.”
“In the past, you could extract Spice models from a standard device and the designers could get on with their work,” claimed Ric Borges, technical marketing manager at Synopsys. “Now the stress in the NMOS and PMOS transistors is affected by what the designers do later.” In turn, the stress in the transistor channel affects its performance.
The main factor in determining stress is how far the transistor channel is from the edge of the diffusion well in which it sits. Since the mid-1990s, to get unrelated transistors to pack together more tightly, process engineers slot in trenches of silicon oxide insulation. “The shallow trench isolation pushes on the active regions so that mobility in PMOS transistors is enhanced. With NMOS it isn’t,” said Professor Andrew Kahng of the University of California at San Diego.
“The distance from gate to isolation and also the contacts alters the stress,” said Marcel Pelgrom, a researcher at NXP Semiconductors.
“AMD did modelling with Synopsys Sentaurus to capture the width effects. You immediately see these 4 to 6 per cent impacts on timing once you are cognisant of the stress impact,” said Kahng.
“The structure will determine the stress a device is exposed to. That needs to be understood by the fabless community: open spaces cause problems,” said Anantha Sethuraman, vice president of design-for-manufacturing at Synopsys.
Borges added: “It turns out that polysilicon pitch is a very important parameter for the PMOS devices.” The effect is boosted by the use of silicon germanium layers used to strain silicon for extra performance. If you don’t like the stress around the devices you could add dummy diffusion to get the stress you like. It is analogous to what people do with dummy metal.
“If I want to speed up PMOS I should move things away. If I don’t want to slow down NMOS, I want to put in dummies so it is not affected. You can get a 5 per cent clear win that way,” Kahng claimed.
“The range of the stress can be as much as 2µm,” claimed Borges. “That is a lot of neighbourhood that has to be taken into account. There can be cases where you are over-conservative by not looking at stress.”
Stress is something that will affect design tools that are used quite early in the layout process, said Borges. “There is a cell placement issue. It will matter what comes to the left or right of a cell. Placement has to be done in a smart way.”
What can you predict?
How good is your layout? Up to now, people have only been concerned about it working or not. Now chipmakers are thinking of scoring designs for manufacturability, using process-simulation tools to work out how well a particular block will yield.
Philippe Magarshack, group vice president at STMicroelectronics said at the Medea+ Design Automation Conference earlier this year that is what the company is doing. “We score for the quality of the layout, for the effects that can happen on silicon. Some effects you can correct. Some you can’t. But you should correct where you have the area to do so.”
One of the first areas to be covered was the metal stack. This is because chemical-mechanical polishing erodes isolated areas of copper, an effect called dishing. More regular spacing of copper wires reduces the effect. “This can be modelled and predicted. It is compute-intensive but, because it is systematic, it is essential that it is simulated and predicted. You can also look at lithography variations. Why not take advantage of these OPC variations to make them your friend?”
Professor Asen Asenov of the University of Glasgow said: “All this is predictable. So you can hope that in a good set of tools you have a good lithography simulator. Then you can take into account factors such as strain. Based on your layout, you can predict the behaviour of your devices.”
Magarshack said it is possible to push decisions further back into the design flow with techniques such as manufacturing-aware synthesis, taking advantage of the blank space that is often present on standard-cell designs. “When you have a utilisation ratio of 75 per cent, you have blank space that can be used better. Why not use a cell with better DFM quality to improve yield? Because you keep the same area it does not cost anything.”
Asenov said: “The other approach you can take is that you can make everything very uniform. A lot of people now design standard cell libraries where you introduce features that are more uniform.”
ST has worked with more regular cells, using a technique developed by Fabbrix, recently acquired by PDF Solutions. “Instead of simulating all these different shapes for different cells, you can use a more regular shape. You use bricks which are a bit larger than standard cells and are more accurate in terms of timing variability. However, you lose in area, so there is a tradeoff to make.”
Professor Andrew Kahng of the University of California at San Diego said restrictions on design caused by more rigid design rules can work for the circuit designer. “We realised we could make an excursion and jiggle cells around in the white space. And there is typically about 30 per cent white space. This kind of juggling can improve leakage. Leakage depends on pitch because CD [critical dimension] changes based on the pattern context. Doing that, I can save 5 per cent leakage. That is maybe not that interesting a result [in terms of leakage saving] but it shows what I can get for free.”
Magarshack said even random variation provides the engineer with trade-offs that can be exploited through analysis. Variations based on dopant profiles can be modelled statistically. “You can work out the distribution of good and bad dice and use that to make soft decisions. If the analysis says you are going to lose only a small percentage of your die through variation you can still decide to go to production.”