Optical interconnect technologies are moving into the heart of the supercomputer.
How do you keep computers communicating? In offices and homes, the answer is usually to check the network cabling or reboot a router. In data centres and supercomputers the issue goes much deeper. These systems produce and consume vast amounts of data and need careful engineering if they're going to avoid the perils of data feast or data famine, in which costly computing resources stand idle because they are waiting to ingest raw information or excrete digested results.
Bert Jan Offrein, manager of the photonics research group at IBM's Zurich Research Laboratory, puts it this way: supercomputer performance is rising tenfold every four years, so over ten years we can expect to see a thousand-fold increase in performance - from petaflops today to exaflops in 2020 (where a petaflop is 1,000 trillion floating-point calculations per second, and an exaflop is 1000 times more than that). If computing performance is going to increase a thousand-fold in ten years, then interconnect performance will have to do the same - and that's going to take some big changes to the way computers are designed.
'It's hard to see us doing that within the electrical domain,' says Offrein.
Electricity vs light
According to Offrein, a single wire can now carry about 10Gbit/s of data between slots in a data-centre rack, or between racks, over distances of up to about 10m. Given that there can be hundreds of connections between racks and thousands of connections between cards in a rack, physical limitations mean that this approach won't scale up to match the demands for connections and bandwidth in 2020 - there's just not the space. The energy involved in driving so many long lines so fast is also a factor.
Offrein and his colleagues in Zurich and elsewhere in IBM's global research network have been working on using optical communications in computing for some time. They've shown that they can create much denser optical interconnections than electrical, for example by squeezing 144 optical 10Gbit/s connections into the same sectional area as 21 electrical connections running at the same rate.
'And there's no issue at all to continue to scale this by a factor of 10 or 100,' he said.
His colleagues in the systems division, which actually builds computers and supercomputers, didn't show much interest in his work at first, but are now beginning to come around: 'The vision within the systems division has changed. The project started in research and was not well accepted in the system division until about one or two years ago.'
But back in 2008 IBM took an important step towards optical interconnect as part of building Roadrunner, the first supercomputer to sustain a petaflop calculation rate, for Los Alamos National Laboratory. The machine uses 55 miles of ‘active optical cables', which include optoelectronics and waveguides, so its signals are carried over long distances using light but presented at each end as an electrical signal on a standard connector. The machine designers get the benefit of optical communications, while maintaining a familiar electrical interface.
Backplanes and boards
Supercomputer makers may soon have to give up that comfortable comprise if they want their machines to keep improving as expected. IBM has estimated that to build an exaflops computer in 2020, optics will have to replace the electrical backplane in each cabinet by 2012, the electrical PCB in 2016, and that optical interconnections will have to be used on the processor chips, or at least between them and their memory, by 2020. The same analysis reiterates the staggering scale of the communications challenge in these super-computers: the optical bandwidth used in Roadrunner comes in at 1.2 × 105Gbit/s, consumes 12KW and cost $2.4m. For an exascale computer, the aggregate bandwidth provided by optical communications is expected to be 4 × 109Gbit/s, the optical power consumption will be 8MW and the cost of all that optical interconnect technology will be $200m.
IBM can see the opportunity and has been working on each of these steps for a while. For example, it has designed and built an optical backplane for linking different levels in the same rack. The backplane is made up of multiple layers of a flexible material, with each layer including multiple waveguides. The layers are stacked on top of each other and terminated in four plugs at either end, each of which aligns the embedded waveguides with fibres in the related sockets. The result is a light and flexible connector that can carry hundreds of gigabits per second, using less energy than its electrical equivalent.
IBM has also looked at ways of optically connecting individual cards on each level of a rack to the backplane on that level, using self-aligning butt joints so that waveguides on the card can be pushed up against transceivers in the backplane and still make a good optical connection.
Once data is on a processor card in an optical form, what can you do with it? The bandwidth density problem just gets worse as you get closer to the processor, which may have more than 1,400 pins running at high speed and so could be sloughing off terabits of data per second. Even 10Gbit/s fibre connections will not have the bandwidth density to handle this much data.
'If you get terabit-per-second from the processor but can only put 10Gbit/s down a single fibre, you'll need thousands of fibres, which means we actually have to go to a PCB-like optical interconnect scheme,' said Offrein.
IBM has been experimenting with a couple of approaches, building PCBs that include embedded or surface-mounted optical waveguides that can create very dense high-bandwidth connections.
The embedded version of the technology involves depositing a layer of polymer on a substrate, often an FR4 electrical PCB, and then using lithography or direct writing with laser beams to differentially polymerise the material to create 50 × 50µm waveguides in the depth of the material. The resultant waveguides offer 5dB of loss over a metre when illuminated with 850nm light, making them useful for board-crossing connections. As with electrical PCBs, extra layers can be added to accommodate more waveguides to ease routing. Waveguides can also turn corners and even intersect each other without suffering excessive losses.
IBM worked with Varioprint and IntexyS Photonics to create a prototype optical interconnect system that buried 12 waveguides in a board, routing the waveguides from an edge connector to an electro-optical transceiver connected to 10Gbit/s electrical links. The approach was used to build a 120Gbit/s link in much less space than its electrical equivalent would have taken.
Chip to chip
The point of bringing optical signals on to a board is to bring high-density, high-bandwidth communications close to the processor and memory systems so that they don't have to wait to consume data or produce results.
The question then is how to make the link from the optical to electrical domain. Do you use a hybrid approach, in which an optical transceiver mounted on the PCB intercepts the optical signals and routes their electrical equivalents the last few centimetres on PCB traces, do you go for a multichip module, where the optical transceivers sit next to the bare processor die on a shared substrate, or do you even hope for full electro-optical integration, with optical transceivers being built on the same die as the processor and memory?
This is complex systems engineering, and each approach has advantages and disadvantages. In semiconductors, for example, it's possible to integrate many functions (think of microelectromechanical machines, for example, or RF circuits) but not necessarily desirable from a cost and yield point of view. On the other hand, building complex stacks of photonic and electronic circuits on a variety of substrates using a variety of connection methods to create a ‘3D IC' has its challenges too.
Those challenges may have to be faced sooner rather than later. Offrein argues that 10Gbit/s per waveguide won't offer enough bandwidth and that engineers will have to increase the data-rates they carry to keep up with demand. That means moving to silicon photonics in order to use techniques borrowed from the telecommunications industry, such as wavelength division multiplexing, to increase bandwidths.
'It's still an open question,' he said. 'Do we integrate III/V materials [widely used in photonics because of their bandgap properties] on silicon or do we have separate sources a little way away?'
That remains to be seen. In the meantime, IBM, and US universities Columbia, Cornell and UCSB, have already forecast the state of the art in optoelectronic communications from supercomputers. By 2018, when 22nm CMOS chip manufacturing processes should be available, their forecast suggests that a single chip could carry 10TFLOPS of processing power, the equivalent of 36 of the high-performance Cell chips used in the Sony Playstation 3. A second silicon layer would carry around 30GB of embedded memory. And a third layer, of optical interconnect and routing, would connect all the cores to each other and to the outside world, with an aggregate on-chip bandwidth of more than 70Tbit/s, and an aggregate off-chip bandwidth of 70Tbit/s.
Producing such a chip, or stack of chips, would represent a substantial engineering achievement on its own. But as with today's supercomputers, that would still leave the challenge of getting data into and out of such a highly concentrated processing core. In supercomputing, it seems, the challenge always is to keep tackling the bottlenecks, wherever they migrate to next.