Data centre with concept art on top

Programmable logic seeks a home in cloud computing

Image credit: Dreamstime

C programmers are the targets for a new generation of hardware accelerator. But it's far from being the first attempt to capture them.

Two decades ago, a group of researchers from the University of Oxford created a spinout company to commercialise their work aimed at providing software programmers with a way to design their hardware circuitry without having to deal with traditional hardware tools.

Celoxica’s management took the view that programmers and software-oriented architects would respond much better to building systems using C rather than learning to use a conventional hardware-description language (HDL) like Verilog or VHDL. They saw a big potential market being opened up by a switch to service-based business models in electronics that would sweep traditional OEMs aside. A set-top box, for example, would just be a box of reconfigurable logic into which service providers would upload functions. Having those functions implemented in hardware circuits would reduce power and boost performance beyond what was possible with software running on microprocessors. A bunch of start-ups along with Altera and Xilinx – makers of programmable-logic devices, that had previously been focused on providing custom circuitry around much denser processors based on hardwired logic – struck up partnerships with Celoxica and similar tools providers.

At the end of the 1990s, concepts like reconfigurable computing looked to be viable alternatives to classic fixed-function hardware tied to microprocessors. But the dot-com crash put paid to a lot of the short-term interest in network-delivered services. Added to that, the cost of reconfigurable computing was way too high to compete with microprocessors. At the time, mainstream processors were still ramping up in terms of performance. It would be another four or five years before clock speeds hit the wall, topping out at around 3GHz. The ten-fold or higher silicon-area tax that reconfigurability imposed was difficult to swallow at a time when considerations like profitability suddenly regained their importance.

The first wave of reconfigurability start-ups wound up being bought for their IP by larger companies that still had money, though the FPGA makers were able to fall back on their traditional markets. As the world economy geared up for the credit crunch, programmable accelerators started to look more attractive once again. This time, the focus was on supercomputers and, to a lesser extent, financial trading itself. Supercomputer users found there were limits to how much parallelism they could exploit from using multiple standard processor cores and they had no realistic way to push clock speeds higher. Banks and brokers wanted to exploit low-latency computing techniques in a high-frequency trading arms race and programmable hardware looked a good bet.

But before the idea could get much headway, another crash beckoned. Despite the chaos in finance, Celoxica opted to go with financial trading and sold off the tools used to create hardware from C. It would continue to use programmable logic but in accelerator cards designed to front-end trading-analysis computing systems. High-frequency trading itself took a pasting in the week of the summer-2010 Flash Crash but it more retreated into the background rather than disappearing completely. However, once again, reconfigurable accelerators and concepts like software-defined hardware took a back seat across the technology industry.

Fast-forward a decade or so and we can see the concept is back with a vengeance. This time, one of the big driving forces is cloud computing. Similar to the supercomputer users of the mid-2000s, general-purpose processors have almost run their course. Adding more of them pushes the electricity bill ever higher. For the applications the so-called hyperscalers such as Amazon, Facebook, Google and Microsoft Azure now want to run, the payback on Intel Xeon-class processors is getting worse. They are getting more performance for the dollar out of accelerator cards. They quickly leapt on graphics processor units (GPUs) when they were shown to be able to crunch through AI training software much more aggressively.

Five years ago, Baidu and Microsoft Azure took turns at the Hot Chips conference in Silicon Valley to describe how they could achieve even better power-efficiency from replacing GPUs with FPGAs for a lot of machine-learning tasks. The following winter, Intel decided to buy FPGA-maker Altera to combine its parts with Intel’s Xeon processors.

Since the early experiments with AI, cloud-computing operators were looking more closely at what else they can accelerate with FPGAs. There are tools like Apache Spark, Hadoop MapReduce and Hive as important targets. As Celoxica found with its financial systems, FPGAs are highly suited to streaming data. Using circuits and lookup tables, wrangling data as it floods in from a network or storage unit is a quick operation. If you do it in a processor, you have to keep shuffling data in and out of registers: a slow and wasteful process. As FPGAs can run these and AI operations in a pipeline, the efficiency gains can be significant compared to processors coupled to GPUs.

Celoxica’s language was not normal C but a dialect the researchers named HandelC. This came with a bunch of extensions that could address the things that vanilla C leaves out but are pretty essential in hardware development: things like timing and parallelism. The technology wound up inside Mentor, now a Siemens subsidiary, but was effectively displaced in favour of a different approach to defining hardware behaviour in C. This relies on the use of 'pragma' directives, which are part of the core C language but in there to allow vendor-defined instructions to be baked into the source code.

Mentor’s Catapult is mainly used for converting software algorithms into fully hardwired custom silicon. The job of taking C and implementing it in programmable logic has remained with the FPGA suppliers themselves, which have over the years launched a variety of tools. Altera had C2H in the mid-2000s; Xilinx added C-based high-level synthesis (HLS) to its Vivado tools. Intel refreshed its tools with a HLS compiler for the Altera parts in 2017. In early November, Xilinx launched a reworked HLS tool called Vitis that is aimed squarely at software users.

Rob Armstrong, director of technical marketing for AI and software acceleration at Xilinx, says: “We are appealing to a new market of developers who don't have a hardware background or RTL skills. We are trying to present data to them that isn't overwhelming,” Armstrong said. At the same time, he added: “The developers we are targeting are not unsophisticated. Engineers who use GPUs for acceleration understand things like cache behaviour. They aren't the JavaScript guys.”

For the low-level control they offer, Xilinx is sticking with C and C++ as the main input languages rather than attempting to move into the many languages that are now routinely applied in cloud computing. The reason, says Victor Peng, Xilinx’ CEO is that most of the customisation work is going to be isolated to support functions that can be buried inside packages like Spark and Hive. The FPGAs simply provide accelerated functions to read, write and search data. The software itself just calls them as black-box operations with no idea as to whether they run on a processor or dedicated hardware.

“Most of the libraries are being created in C/C++. I see frameworks driving usage. Spark, Hive, Cassandra, FFMpeg and Tensorflow: these drive usage rather than the software languages. In almost every application area there are new frameworks coming in. It’s when we connect to them that we succeed,” Peng claims.

As with the earlier attempts to get programmable logic into mainstream computing, much depends on how well the cloud computing can stay afloat. An economic slowdown may once again cause operators to think twice about making massive investments in hardware. But, 20 years on, designing hardware using software tools still looks a more realistic option because the other options are running out of steam.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles