Monsters incapacitated

Multicore processors provide unprecedented compute density, but have wound up being the parts used to scare designers about the future. Are these things really beasts to program or sheep in wolves' clothing? E&T finds out.

Coregeddon, Corezilla, Coreblimey - the arrival of multicore processing in computing and embedded systems has often recalled the marketing for a B-movie. Indeed, what makes the analogy quite apt is that we may be promised the terrors of the Earth, but reality turns out to be an inoffensive stuntman in a cheaply assembled rubber suit.

There are, nevertheless, significant issues here, not made any simpler by the facts that (a) we are using a 'hardware' term to describe what is primarily a 'software' problem and (b) the concept still covers a whole range of possible use-cases.

Consider the two most frequent use-cases. First, there are those instances where a company wants to consolidate four separate processes that may have each occupied their own printed circuit boards within a system. It wants to bring all these functions onto one piece of silicon for reasons such as power efficiency, size and cost.

"Here, we already have the hypervisor technology that can perform the partitioning and isolation of the software so that the tasks run independently," says Kent Fisher, chief systems engineer at Freescale Semiconductor. Indeed, many of the other necessary components to realise this kind of project are already in place, as we will see below.

The second use case is the more challenging. Here, the goals are performance rather than technology driven. The software community would have loved to ultimately get single core 10GHz processors, but it has now become clear that physics is going to force it to live rather with 10x1GHz processors.

"This is where you are getting more of the fear and uncertainty," says Fisher, "and that's particularly true where you have OEMs with a lot of legacy code."

In this second case, you can leverage symmetric multi-processing (SMP) technologies that are already in place for operating systems such as Linux. In return, you will get incremental improvements in performance and this is happening today.

But, the legacy code issue is a nightmare. It will, almost certainly, have been written for a single-threaded, single-core environment. A multicore environment needs software that is multi-threaded, that is parallielised, that is - and here you should immediately see pound and dollar signs - rewritten and reoptimised for the new processor architecture.

US company Tensilica has been a long-time player in the configurable processor space, although one of its problems had arguably been that despite some novel technology, its customers would prefer that it did the configuring for them, and for free. The shift to multicore though may see Tensilica's offerings come of age.

According to chief scientist Grant Martin, the company is looking at opportunities in the 'data plane', which manages the data flow in an embedded system, as opposed to the 'control plane', which manages its communications. In a more specific sense, one is looking at the kind of heterogeneous subsystem that might be found within a smartphone embracing video, camera, MP3, GPS functions in addition to the RF.

"There are a couple of initial things here," says Martin. "You have loosely coupled cores where what matters is the interface between them. Well, when you have configurability, you can better control that. But also you're not getting trapped into trying to do everything at a homogeneous general purpose processing level.

"For example, if you can do MP3 at 5.7MHz on an application-specific instruction-set processor, why go to general purpose where you can't hope to get that efficiency. If you have a smartphone customer, he's going to want efficiency first."

Today, hypervisor technology is offered by many chip and operating system vendors to manage this kind of system. But there is a concern that while it can cope with today's multicore devices - which boast up to about 16 cores, but more typically four or eight - should Moore's Law directly translate into Cores' Law, then systems with tens, hundreds and even thousands of cores are not that far off. At this point, managing task allocation could become more onerous than the tasks themselves.

"What is trivial here and what isn't? When you have four applications mapping to four cores and a need for communication and synchronisation within that system, it's not necessarily trivial but it is a relatively straightforward task. However, once the number of cores start to multiply, it starts to become more complicated," says Markus Levy, president of The Multicore Association (TMA).

Martin points out here that Tensilica has TIE, its Instruction Extension language, that does allow users to more closely define inputs and outputs rather than setting a straitjacket based on a particular operating system.

TMA has also published specifications for MCAPI, a dedicated multicore application programming interface for communication and synchronisation between closely distributed cores or processors in embedded systems.

Don't panic!

"The big message is like the front page of the 'Hitchhiker's Guide To The Galaxy': 'Don't panic'," Martin says. "A lot of the concentration in multicore is at the general purpose level, at the control plane. What you need to ask yourself, though, is, 'Can we break that down, can we split things into a multi-plane world, and does that help us?'."

Consolidation is important and in many cases it is taking place today in instances where reducing power consumption or lengthening battery life are the most important goals for a design team. However, boosting performance will still normally be on the agenda, and it is in the pursuit of such boosts that we begin to see the greater problems facing designs for multicore silicon.

Most applications have been coded sequentially - that is, they have been written to be executed in a single thread on a single processor or core. In a multicore world, however, the greatest performance stands to be gained by having a programme execute simultaneously across as many cores as are available - the program should be written to be executed in multiple threads in parallel. Or to be more precise, the program should be rewritten. This is no simple task in an economic, practical or even realisable sense. Consider the views of three executives struggling with this problem.

"It varies a lot depending on the application, but projects are typically looking to reuse 60 per cent to 70 per cent of the existing code as they move from one generation to the next," says Frank Schirrmeister, director of product marketing for system-level solutions at EDA vendor Synopsys. "If you look at that in terms of the big telecoms infrastructure jobs, you could be looking at millions of lines of code, but you are looking at huge amounts of legacy code across the board."

And those millions of lines do not just pose a volume problem. "Very often, you are looking at a code base that has built up over years and years. Many of the people who wrote the original code will no longer be with the company or the records will be very vague," adds David Stewart, CEO of Scottish embedded software tools specialist Critical Blue.

"Another important thing here is that you don't just say, 'OK, I'm going to sit down and rewrite this as parallel code,'" says Max Domeika, senior staff software engineer in the developer products division at Intel. "It is a discipline in itself. We have been using multicore processors for some time now, and the high-performance computing space has been working with parallelism in programming for many years, but it is still a comparatively small number of people who are experienced in the area."

There are some techniques available. "You can leverage SMP with Linux, but it won't give you 2x - it won't give the customers what they may have been used to or expect, and they are definitely looking to us to help them out here," says Freescale's Fisher.

Critical Blue

In this discussion, Critical Blue often comes up: in particular its Prism software that was publicly launched earlier this year. A simple description of how it works is that it looks at existing code and gives the user an idea of what can be done, what will happen under various scenarios, and then how to paralellise the program and verify the process.

"The important thing that we do is to help the user see what his ROI [return-on-investment] may well be," says Stewart. "He will have his constraints, and they are going to be a mix of cost, performance, time-to-market and so on. Companies have gone into the process pretty blind on those points. OK, we'll work on the software but we have no idea how much it will cost, what are options are, what we need to do or even if we have to people to do the job."

At tools supplier Synopsys, Schirrmeister is a fan. "I really like what they [Critical Blue] are doing because it does come at this from the perspective that it really is all about the programming and you have to ask those questions: 'Can I afford this?' 'Can I manage this?'

"What I think that we also bring to this, though, is the expertise in virtual platforms. It's all about the fact that nobody can afford to write a blank cheque on all this. You need to have the pre-silicon platforms to look at the software development and what it means before you really start to roll up the hardware costs as well."

So far, it does appear that an infrastructure is beginning to develop around multicore to take the pain away. Indeed, software analysis combined with the kind of virtual hardware prototype long envisioned, but not that often exploited, by ESL design strategies seems an eminently practical way forward. However, there still remains the issue of getting people who can write the code.

TMA launched a strand dedicated to best practices in multicore programming a little over a year ago, headed up by Intel's Domeika and Critical Blue's Stewart. "The good news is that all the time more people are having to come to terms with these issues so that means that the skills base is expanding and there are more people who've learned things that can go into our work," says Domeika. The group expects to publish more before the end of the year.

Meanwhile, mainstream computing is drawing down on the existing body of knowledge about parallel programming that has built up in the world of high-performance computing.

"There, the tradition has long been more multiprocessor than multicore, but the need for parallelism in the programming is still there," says Stewart. "Although we are learning about one limitation. If you look at the big, long-term players in HPC, they are areas like financial services, and what they have been after are systems that perform very specific, very targeted tasks.

"There is a viable comparison with embedded [computing] there, but it is also a very constrained, even bespoke pot of examples, so you need to be careful."

Meanwhile, where are the monsters in the bad make-up? Even if there are no unified solutions, it sounds as though there is surely now a clear path to clarity in multicore, notwithstanding some important limitations.

Certainly, Stewart thinks one beast has been slain, or is, at least, looking a bit poorly: "I'm hoping that one thing we have got past are the exotic reference designs," he says, "because they might have pushed the performance of the chips, but the fact is that people were sick and tired of seeing them. Your typical user's code looked nothing like them, and because this is so much of a software issue, they really did not help.

"But what you are now seeing is multicore breaking into new markets. We've obviously had it on the desktop, but with the [ARM] dual core Cortex A9, it's now going into wireless handsets. So the confidence is spreading."

However, as with all good monster movies, there is something in the shadows.

"I think we're doing a lot to help the customers and to understand what they need here. I think we're seeing more customers who know what they want to do with the technology rather than looking for us to hold their hands, but you do need to remember that there is still a lot of variation across applications and within them," says Freescale's Fisher.

"The thing to remember is that you never say never, but even 5, 10, 15 years from now, there won't be a single solution. There ain't gonna be a silver bullet."

Oh dear, he comes Corewolf, but with a bit of work will probably turn out to be White Fang.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles