Cerebras wafer architecture

Smaller chips, bigger packages, even bigger headaches

Image credit: Cerebras

Chiplets provide a way to pack more silicon into a device but the technology has issues.

When OpenAI unveiled its GPT-3 engine and showed how its AI model could generate sensible text seemingly out of nothing based on a few cues, it set a new limit in how much processing and memory a language model could use. At 175 billion trainable parameters, GPT-3 is enormous. But it’s time at the top of the supermodels did not last long. Google, which introduced the Transformer concept on which many of today’s AI language models are based, soared past it earlier this year with the 1.6 trillion-parameter Switch.

At his company’s Spring Processor Conference in April, analyst Linley Gwennap claimed language models are currently growing at the rate of around 40 times a year. Apparently, the rapid growth is worth it and not just in natural language processing. The results from other areas, such as image recognition, show size improves overall accuracy though the rate of growth is a mere doubling each year for those types of machine-learning model.

Short of a major change in attitude to compute efficiency in AI, chip designers trying to keep pace with the growth in demand. Though Moore’s Law is just about able to keep pace with image-processing models just as long as you can rely on simply adding more processors to handle the extra workload, those language models need a lot, lot more. On the AI training side, the processing gap led to the arrival of wafer-sized processor arrays of the kind sold by Cerebras.

The larger models are pushing the creation of bigger chips for inferencing as well as distributed arrays for training. But there is a limit to how big a chip can go. One hard limit is the reticle, the useable area of a lithographic mask that is used to define features on the silicon wafer. This area has remained pretty much fixed for decades at a little over 8.5 square centimetres.

To get to its wafer-sized processor array, Cerebras and foundry TSMC had to develop a way of overlapping exposures to define the metal wires used to route signals between the chips. As they are not trying to build wafer-sized processors, most companies are breaking out of the reticle through the use of multiple-chip packages. They have chiplets made independently and then soldered or compressed onto a large silicon interposer or an organic plastic substrate that is able to deal with far finer wires than a regular PCB.

Unlike Cerebras, they are also using somewhat smaller chiplets than they could. They will have to reduce chip size eventually anyway as the next generation of lithographic scanners are likely to halve the maximum reticle size because they cannot keep everything in the larger area in focus. But there is another reason why they need larger quantities of smaller chips. That is yield.

As has been the case for many years, random defects lead to dud chips. The problem for big chips is that the bigger they get, the higher the probability of each one failing because of one of these defects. Unless you are able to deploy a lot of redundancy, which is what Cerebras does, the only economic answer is to try to make the chips as small as you can. Breaking a design into multiple chiplets provides a way to do that.

There is no free lunch, explained Intel principal engineer Robert Munoz in a recent online conference on chiplets organised by the trade group MEPTEC. “There are tiling overheads. You need die-to-die interfaces, which consume area and power. And they can cost performance versus a hypothetical monolithic alternative, though that’s not always fair because you might not get viable yields with the monolithic or even fit on a reticle.”

There are other costs that can make monolithic options remain the more cost-effective option. “Inventory management can be more costly and complex. If you are integrating die from third party partners, you have to worry about margin stacking and inventory carrying costs,” Munoz said.

A bigger problem is trying to not lose the yield you gained from using smaller chips going up in smoke because what seemed to be working chips on the wafer do not work properly once the entire package is finished. In a multichip module that, at the high end, could have more than 100 individual dice once you’ve added up all the processors, memory chips and I/O controllers, individual yields of more than 90 per cent do not look that good. You are easily looking at getting less than half the total out of the packaging plant intact. Those will be expensive losses.

Redundancy is one way forward. As a lot of the proposed chiplet-based designs are multiprocessor arrays, redundancy is reasonably easy to achieve. But there is still a big testing problem facing chiplet users.

For the most part, it’s hard to tell whether a chip is going to work before it is bonded into a package. The I/O pads on the top of the chip when it is sitting alongside its siblings on a wafer are tiny and extremely hard to test directly. To make a good contact with a probe, you need to use a fair amount of pressure and that risks damaging the contacts and the electronics underneath. Though foundries do probe wafers the tests are far from exhaustive and really just exist to screen out the obvious duds.

One option that Professor Abhijit Dasgupta of the University of Maryland proposes is to sacrifice some of the wafer area to large sacrificial pads to the side of the chip that provide access to on-chip test logic. Another option that David Hiner, senior director of advanced package technology integration at Amkor, sees as potentially viable is to put a temporary layer of copper pillars on top of the chip to provide a viable set of contacts and then etch them away before the chip is assembled into the target package.

Without good screening for duds, chip integrators have to take a chance on yield being some way from 100 per cent and either rely on redundancy to obtain enough good parts or try to find ways to remove partially assembled packages from the manufacturing line as soon as they trigger a warning. This is an approach used by AMD, which has worked with its suppliers on ways to test partially finished packages as they move through the factory.

A greater focus on statistical analysis may help improve overall yields, says Phil Nigh, distinguished member of technical staff at Broadcom. Very often, wafers or batches of them have bad zones that can act as warnings of poorer quality. In principle, software could watch for changes in early testing that signal whether some wafers need more extensive testing than others before their chips go into the final package. That raises other issues with the supply chain: will suppliers trust customers with such in-depth data? Or will this mean chiplet users will only be able to rely on the chiplets that are designed by others in the same organisation?

The problems of test and supply that surround chiplet-based design will likely mean it remains a viable option only for the larger chipmakers or those with high enough margins to account for the R&D cost for some years to come. We might reach the point where it becomes much more like the PCB-assembly environment, one in which you go and buy chiplets from a catalogue and tell the suppliers which packaging plant to send them. But people such as Nigh expect for a decade the chiplet users will work in small clubs that are willing to share the sensitive data about yield and test quality with each other.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles