Startup tries to solve reconfigurability problem
A UK-based startup has launched its attempt to square the circle of reconfigurable processing without massively increasing logic cost: a problem that has eluded previous attempts.
Colin Dente (pictured above), CEO of Akya, said: “The past history of reconfigurable logic has been companies who try to produce completely general-purpose devices.”
The problem is that a general-purpose architecture tends to demand a large number of hardware gates for each configurable gate it provides. A typical field-programmable gate array (FPGA), the most general-purpose architecture in common use, has a ratio of between ten and twenty to one. Dente argued that Akya’s ART2 architecture provides reconfigurability for application-specific processing, which reduces the overhead.
“Our architecture is only just reconfigurable enough and no more. Our customers design a fabric using our technology that is specific to an application area,” said Dente. The fabric is meant to form part of a larger system-on-chip (SoC) that can address a number of markets or where implementing multiple hardware engines would increase cost to a prohibitive level.
Dente cited the rise of internet TV as an example. The proliferation of video codecs in common use on the internet makes it tough to put support for all of them on a low-end TV that might be fitted with a WiFi or Ethernet port. By designing a combination of processing units that can be reconfigured on the fly to support different codec variants, a TV chipmaker could potentially save costs, Dente argued.
A fabric contains comparably coarse-grained elements, such as adders, Boolean-logic units and finite state machines. The designer picks the bit-width needed for an adder at design time rather than the configuration logic being used to combine narrow arithmetic logic units (ALU) at runtime. This saves area at the cost of flexibility.
In a manner akin to the microcode programming used on complex instruction set computers (CISC) used in the 1970s and 1980s, a sequencer program makes and breaks connections between processing elements. In contrast to conventional microcode engines, data moves directly between the registers on the processing elements themselves. This resembles the direct connections often used in FPGA accelerated computers and the stream-processor approach developed by Professor Bill Dally of Stanford University who recently joined graphics chipmaker nVidia as chief scientist. By cutting out unnecessary moves between main memory or cache, direct forwarding typically saves energy and improves performance at the cost of greater programming complexity.
“We have a fairly powerful address sequencer that defines the connectivity on each cycle,” said Dente. “In some ways it is very similar to a microcode sequencers. Our sequencer looks a little like a 2710 but the difference is that what it is controlling is not your father’s ALU.”
Getting a processing unit onto silicon is a two-stage process. The first uses a high-level hardware description language to define the elements that will be present on the fabric. The second stage used a form of assembly language to define how data moves between the processing elements. Dente said a typical sequence involves fewer than 200 lines of code.
Recognising that algorithms for hardware are now being defined in C or SystemC and compiled directly using tools, Akya is working with a tools supplier on a flow that will use a SystemC description to target its architecture instead of custom hardware. Dente explained that most tools of this type generate a set of processing elements controlled by a state machine. The ART2 target would add flexibility. A research project is to take several different algorithms and have the tool define an architecture that can support all of them, using the sequencer code to switch between personalities.