ARM adds DSP to Cortex microcontrollers
ARM has developed a version of the Cortex-M3 core now used a variety of 32bit microcontrollers that adds digital signal processing (DSP) instructions found on the more expensive and complex R4 in a bid to encourage microcontroller users over from architectures such as Atmel’s AVR.
Shyam Sadasivan, product manager in ARM’s processor division said the M4 is a significantly simpler core than the R4. “The R4 is much higher speeds and deeper pipeline. It has caches and tightly coupled memories and supports ARM instructions as well as Thumb,” he said.
But the M4 borrows many of the integer DSP instructions from the R4. “Cortex is upward compatible. The DSP instructions in the M4 are a subset of the instructions in the R4. The instructions are equal to the performance of the ARM11: they are the instructions that were on the ARM11.”
As well as integer DSP, ARM has made it possible to add a floating-point unit to the M-series processors.
“A lot of this is market-driven. The underlying requirement is signal processing. There are some customers who look more at the integer than the floating point. Take the MP3 decode in audio. A lot of that is still integer. But in industrial automation, a number of the tools, such as Matlab, are designed for floating-point execution.”
Geoff Lees, general manager of NXP’s microcontroller division, said: “We do see both. Unlike the R4 and the R4F where the floating-point unit was a significant addon, we see this for the time being universally included. Floating-point is more and more what programmers and engineers are looking to us to provide. I know some control engineers who are interested in 16bit and even 8bit floating-point.
“We are seeing some remarkable improvements through adding the DSP extension. We have put together some audio demos. One is a graphic equaliser that needs 90MIPS without using the DSP extension. That drops to 21MIPS or so with it. So there are clear implications for freeing up CPU times with control algorithms,” Lees explained. “And that is with a very modest increase in core size.”
Sadasivan added: “We see a lot of audio-accessory manufacturers looking at our M3 and the M4 is very attractive to them for systems such as Bluetooth headsets with audio processing.”
Lees noted: “The appeal of having the M4 is having a single unified architecture with one core. There have been control cores that have merged capabilities, such as the Infineon Tricore, and DSPs that have added support for control, such as the 56K from Freescale. But none of those have the ecosystem that the ARM microcontroller architecture has today. TI has done a great job making DSP accessible to college students but it is a proprietary ecosystem.”
“Our main differentiator is bringing the software ecosystem along,” Sadavisan claimed. “We are by no measure calling the M4 a DSP. But with dual MACs we could be in the same performance range. But the main difference will be the software ecosystem.”
Sadasivan pointed to the Cortex Microcontroller Software Interface Standard (CMSIS) as an example of the firmware the company will use to help promote the M4. “We are supporting the M4 through CMSIS. We have support for some of the more complex instructions, which need to be used with instrinsics functions. They are in the CMSIS set,” said Sadasivan.
The M4 floating-point unit only supports single-precision numbers to limit the size of the core. “The R4F has full double-precision as well, which is the main contributor to its size,” said Sadasivan.
Sadasivan said the gate count for the floating-point unit on its own is 25,000 gates and can despatch a maximum of one instruction per clock. The integer DSP is a single-instruction, multiple-data (SIMD) unit, however, with support for parallel 8 and 16bit operations and single 32bit.
“We have been working really hard to get to silicon as fast as possible,” said Lees. “We expect to have a test chip very soon. I hope to make an announcement in the coming months. If it goes like the M0 then we will be able to go to volume by the end of the year or the start of 2011.”
Lees said the 90nm process is a likely target for the M4 to take advantage of better memory density rather than performance. “The 90nm lower-power process doesn’t give you the clock improvement you would expect because of the concentration on low leakage.
“But the core is small enough to retrofit to 0.13µm, maybe for very specific motor controllers, possibly even embedded in the motor body itself.”
“In the 90nm generation, we are moving to an even wider flash memory bus than we had before, based on the knowledge that we are moving up in the frequency range. What we have seen since the version of the M3 is a very much improved memory path. And get much better performance than ARM7,” said Lees. “We are getting 40 to 50 per cent better cycle times from memory at the same clock speed.”
Sadasivan said the M4 adds about 30 per cent to the core area compared to the M3; the floating-point unit adds a further 40 to the the space taken up by the M4. “All told, it is about 2x the M3,” he said.
Lees said there is a push towards greater concentration on analogue circuitry at NXP that will probably results in the greater use of multichip packages: “As part of the company’s migration to high-performance mixed-signal since the divestment of the mobile and set-top box units, we have been developing pipelined ADCs moving up to 20Msample/s. You need an appropriate processing engine to go with that.
“We see high-performance analogue becoming much more mainstream. The industrial market has always been very conservative but is becoming more focused on energy efficiency. With all the legislation and drive to reduce energy consumption, this is becoming one of the big drivers,” said Lees.
“The challenge in 90nm is that the analogue circuitry doesn’t shrink and nor does the power. You have to really consider whether to go multi-die.”
“I think multicore with different architectures and supporting extreme DSP implementations would be very much more difficult for the customers in the microcontroller world. It’s OK for SoC but not viable with the support required across an entire market. But multiple cores based on the same architecture: it’s attractive to combine the M0 with one of these higher-performance implementations,” said Lees, pointing to the use of the M3 in a set-top box chip designed by NXP that used a faster ARM processor for the main applications. The M3 acts as a power-management controller.
“The CoreSight debug system is fully compatible across the full range so there is one coherent development environment,” said Lees.