Cerebras wafer architecture

Breaking up is hard to do

Image credit: Cerebras

When you’re trying to find ways to avoid putting everything on one chip.

You never used to hear much from the companies that put chips into packages: the outsourced semiconductor assembly and test (OSAT) providers, in market-analyst parlance. That is beginning to change as the mismatch grows in some sectors between what designers want to put on a chip and how much they can squeeze on. The OSATs are now central to how the chipmaking industry is going to evolve.

A big driver is the artificial intelligence (AI) business. It’s a sector that has, for the moment at least, an insatiable thirst for processors and memory, preferably all on one chip. If you’re called Cerebras, one chip is nowhere near enough. As pictured above, you need almost an entire wafer to power one of the company’s AI-training accelerators.

Other suppliers, such as AMD, which launched a crop of server processors in the past few days, are keen to keep costs down. At the VLSI conference last year, senior engineer Samuel Naffziger showed a graph that illustrates the problem many chipmakers have. Although scaling is predicated on the idea that cost per transistor goes down with each generation of silicon, the cost per millimetre has steadily been going up. In recent years it’s not been such a steady process. In the five generations from 45nm to 14nm, the overall cost increase was a doubling. As you get at least ten times more transistors out of 14nm compared to 45nm, that seems a pretty sweet deal.

There is bad news for chipmakers, though, that should make the OSATs happier. The cost of 7nm is about double that of 14nm, with just one generation of silicon in between. Though there is now little doubt that foundries can push technology to a notional 2nm, they have not promised that it will be a cheaper process by any means. As 2nm probably means stacking things on top of each other, you can look forward to even more dollars per square millimetre.

There is a second problem. The ability to squeeze in more transistors really only means minimum-size transistors for digital circuitry. Analogue circuitry has not benefited much from scaling for many generations. The inability to push more than a couple of volts into today’s advanced silicon has made the job of the analogue designer far harder. Even in the digital domain, designers are finding the benefits of moving to the next node are limited. If the design is limited by the number of connections a processing element needs to make to its neighbours, it can make more sense to stick with an older process. If you have elements that need to talk to external I/O and memory, they also often work out to be more cost-effective in older processes.

Like Cerebras, AMD has found an answer: split the functions up so that only the bits that can justify the fancy new process get it. The other elements, such as memory and I/O managers, called “uncore” in the world of AMD and Intel, can go onto silicon made on older processors. You then take those bits and assemble them together in one package for the whole thing to be sold as one device. This has the spinoff benefit of making it easier to offer lots of different options, such as a range of core counts in the case of AMD’s multicore processors.

Other companies are looking seriously at the chiplet option, where you do not necessarily design all your own bits but buy in components such as processor complexes and I/O interfaces. It is much like ordering parts for a PCB, conceptually. Practically, things get tricky when dealing with chips shorn of their usual packaging.

At the recent Technology Unites Global Summit, organised by trade group SEMI, Yin Chang, senior vice president of OSAT specialist ASE Group, went through a list of the options chipmakers have to put multiple chips into one package. Even foundry TSMC, which started a sideline in packaging technology some years ago, has three different approaches. Which technique to use is a far from straightforward choice.

Intel’s programmable-logic unit, which competes with Xilinx, the company that AMD decided to buy last year, likes to use its parent’s EMIB technology. This puts tiny chips that are blank except for circuit traces underneath two adjoining chips. This provides a cleaner signal path between them than trying to use copper wiring drawn onto the surface of the surrounding plastic package. These chiplets for chiplets avoid the need to put high-speed I/O interfaces onto the core digital device. EMIB works well because the interfaces to those interfaces can easily be placed on either side of the core device.

When you look at something like a multicore processor, an approach like EMIB does not work so well because signals may have to cross several chips to get to the destination. One answer might seem to be to just use one big EMIB-type layer: a silicon interposer. This has two drawbacks. It’s expensive given that you need to make blank wafers and use them as glorified PCBs. A second drawback is that electrical signals cannot easily cross a lump of silicon that big. You either need active repeaters on the interposer or you redesign the I/O on the chips that sit on top. By this point, you might as well bite the bullet and design those I/O channels to work on the copper channels used on conventional packages. That is more or less what AMD does.

Another problem is how much area these multichip packages take up. You could stack chips on top of each other, which is already a common approach in the processor-and-memory devices that go into mobile phones. This is only a cheap option if you have the wiring go around the outside and, again, design the I/O to cope with PCB-style routing. That increases power consumption. One way to avoid that is to drill holes through the chips themselves and use through-silicon vias (TSVs) to route signals from the bottom to the top. The bad news is that TSVs chew up a lot of area because they create stress around each little channel. And they are expensive. AI customers are looking to use a 3D-stacked memory format called HBM, but even they balk at the cost.

On top of these choices, chipmakers are faced with the problem of how to test their devices before they go into these more complex packages. Right now, the answer is they don’t. The kind of testing you can do before a device is packaged is pretty limited. If a chip is a dud, it will render the combo useless, but there isn’t a good way to tell for sure if it is a dud before it goes in. So they do a bit of basic screening and try to exploit techniques such as redundancy in the hope that any flaws are not complete chip killers.

These issues mean that packaging is becoming a much more important part of the picture and it will take some years before winners emerge. They will be those companies who can find a way to combine low cost with performance, perform effective pre-screening of the incoming devices and possibly find a way to come up with a less complex array of choices.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles