The great builders
The amount of hardware to design digital silicon keeps spiralling upward. Can the tools cope?
As we move further into the multicore processing age, it is tempting to see EDA as a vanguard sector. It helped design the chips. Maybe, its natural participation in designing the generations and refinements that follow will help us solve the accompanying multicore programming problem.
Economics do push EDA to the latest hardware platforms. Chips do not just have multiple cores. The first two billion-transistor design has been publicly reported - Intel's Tukwila server chip for the Itanium family. Silicon with more than 100 million gates is set to become commonplace. All this means that there are vast amounts of data to crunch for any design, particularly during such stages in a flow as verification, and tight time-to-market constraints. Moreover, while the hardware is relatively cheap, the costs of maintaining and even just cooling it are also on the rise. Combined, this all puts serious pressure on the software to meet high performance standards.
"The good news is that this is not a new problem. You could even say it's been around for a decade," says Duaine Pryor, high performance computing architect for the Calibre product line at EDA vendor Mentor Graphics. "Quite a few design tasks exceeded the computational capacity of a desktop workstation some time ago and went out to these increasingly large compute farms - verification is one obvious example. As they made that shift, we obviously had to start thinking about how tools would exploit parallelism."
With this in mind, EDA has made some impressive advances already in delivering tools that are well-suited to farms of the latest quad-core server processors. Mentor's Calibre suite uses a variety of multicore techniques and extends along a design flow from physical verification to mask data preparation. It does this so well and so efficiently that it is also one of the big drivers behind Cadence Design Systems' current hostile bid for the company.
Rival vendor Synopsys has also recently revealed that it has added multicore capabilities to a number of its tools or introduced entirely new ones, and launched a major multicore initiative earlier this year. Its experiences are instructive in that they show that different tasks in a design flow often demand emphasis on different techniques from within the multicore toolbox: 'horses for courses' is a good maxim to keep in mind. Here are a couple of examples.
First, take the Proteus back-end tool the company announced in February. Traditionally, mask synthesis has sequentially preceded mask data preparation. In Proteus, the two tasks have been pipelined so that they can be undertaken concurrently, taking a good chunk out of the design cycle time.
Now, take the example of ZRoute, the new router Synopsys is adding to its IC Compiler suite. It makes more use of multi-threading. The company also took the decision to develop ZRoute from the ground up, and is claiming excellent results: a three- to four-fold basic performance, boosted a further 3X, for an overall gain in the region of 10X when running on quad-core CPUs.
"What can you learn from this?" asks Saleem Haider, Synopsys' senior director of marketing for physical design and DFM. "In part, it is that multicore is naturally driven to the most computationally intensive parts of the design flow, to the bottlenecks. This is where the customers need it most. You are not going to change an entire flow overnight - it will be a gradual process.
"Second, innovation is another area that will benefit from multicore sooner. ZRoute was done from scratch because it addresses the needs of the 45nm node. There is innovation to deal with emerging problems. However, I do not think that any vendor has the wherewithal to take every single tool in a flow and completely rearchitect it to run on multicore systems overnight - it simply wouldn't make commercial sense. Again, this is something that will happen over time."
At Mentor, Sudhakar Jilla, director of marketing for the company's place-and-route products, agrees with this combination. "If you look at P&R, a lot more is being asked of it because of the issue around timing closure and signal integrity, so the computational demands really have risen. There are big pressures to bring down the runtimes and also to bring in new techniques such as multi-corner, multi-mode analysis," he says
Moreover, in the case of the tools Jilla is discussing, it is also significant that they are, like ZRoute, comparatively new. Mentor has them in its line-up following its acquisition of Sierra Design Automation last year. This is the new stuff, folks - it was written when multicore was already on the agenda. So far, we have seen that multicore has had a large upside for the EDA business. Better tools, running faster. All very good, thank you.
However, there are some significant tensions here. The first of those is a caution among the vendors as to just how much they can sell to their customers. This is because performance scaling with multicore is not consistently linear. "OK, so let's say you throw 50 CPUs at the problem and you get 50X improvement, but if you throw 100 CPUs at, you get only 80X - and that pretty much describes the kind of scenario we're seeing," notes Mentor's Pryor. "I think that there are then two questions we need to communicate to the customer. One - up to what point do we still get linear scaling and what are the economic implications of that? Two - how far beyond that point where linear scaling stops does it remain economically viable to keep squeezing out what improvements you can?
"And those answers are going to vary. If you're talking to an IDM, he's concerned about everything from the start of place-and-route right through to mask data preparation. But a fabless guy has his focus on functional verification, and the mask house is just concerned with the data prep. Also, there are stages where you need to get the full linear improvement for the sums to add up, but also those where just a minor improvement is still worth a lot of value because you're still addressing cost and complexity.
"What matters is that we don't oversell this, but get it to match what the users need. That's better for us in terms of the R&D investment and better for them for tool cost and performance."
This acknowledgement that different parts of a design flow may stand to gain by different degrees then points to another tension. The last few years have seen all the major EDA vendors move away from selling tools to do specific jobs - so-called 'point tools' - and look more to sell integrated flows. As complexity has grown and the need for closer relationships between various parts of the flow have become clear, this flow-led approach has been broadly accepted.
The problem is that while some tools might be parallelisable to good effect for multicore, overall flows are a great deal more stubborn. "If you are going to successfully parallelise an operation then you have to control the overhead," says Patrick Groeneveld, chief technologist for Magma Design Automation. "You need to keep down the dependencies and interactions between the threads, avoid bottlenecks and reduce the burden of partitioning and then re-assembling. You want tasks that are 100 per cent independent."
Thus, he adds, you can fairly conclude that analysis is a relatively straightforward task to parallelise, and that routing something is easier than then optimising it in the light of relationships between various elements in the design.
"Another way of looking at things is that parallelisation is something that lends itself more to graphics processing. There, you have a great deal of floating-point activity relative to a small amount of data. However, with EDA, you are looking at a lot less floating-point, but across a much larger amount of data," Groeneveld says.
Finally, another problem lies in the naturally algorithmic nature of EDA tools and the limits set for high performance computing by Amdahl's Law. The relevant issue here is that it defines the need for sequential processing as one of the main limits on the improvements you can achieve.
"And, if your tool or your strategy is based on sequences of algorithms, then the limitations on what you can do to parallelise that part of the flow are intrinsic," says Groeneveld.
So while one might be able to secure a 10X improvement for a single element in a flow by optimising for a multicore platform, Groeneveld's argument is that the 'brakes' might hold back overall improvement to closer to just 2X - and that the more difficult elements in that flow could potentially hold the performance boost to this level for some time to come.
A helping hand
Magma is talking about the problem at the flow level because it believes it has an approach that will work. Its Hydra technology allows for micro- and macro-partitioning of a design as appropriate within the progress of the flow, retaining and then re-inserting the information that will allow these elements to ultimately be stitched back together. And yet partitioning, however smart it is, is still a performance limiter relative to the ideal.
The lessons from EDA then are that there remain many specifics to resolve. The industry is attacking its lower hanging fruit - although given the complexity of even this software task it seems mealy-mouthed to call them such - but will over time still have to address more points.
Skynet isn't here yet - the process still needs a human guiding hand to help it make smart workarounds.