Many compute-intensive applications have a short-term need for supercomputing power but lack the long-term pockets to pay for it. High-performance computing is set to change all that.
Supercomputing has long been regarded as a high-end technology for the big users with big applications and big processing needs. However, suppliers of supercomputers - or high-performance computing (HPC) - are now targeting a much wider audience, using a mixture of cheaper and more powerful hardware, sophisticated new system architectures, and software that allows resources to be shared or even rented online. Their overall aim is to reach users who might use one occasionally, but cannot afford a supercomputer of their own.
The suppliers are not just reaching out to smaller versions of the current HPC user base, either; they hope to enable new markets and new groups of users. From cloud supercomputing through shared systems, to distributed HPC architectures, vendors such as Bull, Fujitsu, and SGI are offering to solve key computational problems and bring high-end power to bear for even quite small manufacturers, designers, and research groups."HPC is pervasive now, whether it is in design or manufacturing," says Rob Maskell, Intel UK director of HPC. "Fluid dynamics, computer-aided engineering (CAE), destruction testing, supply chain modelling, healthcare, are all sectors seeing take-up." He adds that, for Intel and others, the big target is "the missing middle - those middle-sized but complicated users who are not using HPC yet".
Those users are the target market for cloud HPC, says Pascal Barbolosi, vice president of the extreme computing business unit at France-headquartered IT giant Groupe Bull. He claims that Bull's Extreme Factory service, launched in 2010, gives companies of any size access to high-end computer-aided engineering applications, such as computational fluid dynamics (CFD) and virtual prototyping tools. It offers 18 applications, some commercial and some free, on 175 teraflops of its own Bullx supercomputing power.
In contrast with business-grade cloud computing, it is a full supercomputing experience and able to take on tasks that require much more communication between the processors. While Amazon's EC2 service, for example, is great for parallel tasks such as generating 3D renderings, the relatively slow links between the participating processors, which can be in different data centres or even on different continents, would make real-time 3D visualisations of a highly complex model less practical.
Amazon's 2011 launch of a new Cluster Compute EC2 instance, comprising two eight-core Xeon processors on a 10Gb network, demonstrates that HPC requires you to build your cloud in a different way. In particular, it is a lot less practical to virtualise your compute resources, says Tony DeVarco, senior manager for strategic partners and cloud computing at SGI, which launched an on-demand cloud HPC service called Cyclone in 2010.
"We decided at the beginning that Cyclone would not be virtualised," he explains. "Traditional cloud environments are completely virtualised - it is a great model for business applications, and in the HPC realm it is great for embarrassingly parallel applications, such as movie production or genome sequencing.
"But virtualisation is not so good if the problem you're trying to solve involves communication between processors, for example crash simulations, CFD and weather simulation - anything where something that happens in one part of the model needs to be transmitted to other parts. They can still be run as batch jobs, but they need communication, and that means fast interconnection."
Shrinking the supercomputer
Supercomputing has been getting more accessible and more standard for years. Small systems fit under a desk and are powered by Intel or AMD processors, often assisted by GPU (graphical processing unit) chips from the likes of Nvidia, which were originally developed for the highly-parallel task of processing 3D graphics. Open-source message-passing software also makes it possible to build highly-parallel clusters of Linux systems, called Beowulf clusters, and use those for HPC purposes.
Compact supercomputers have their limitations though, and can also be noisy and power-hungry, notes DeVarco. Anything bigger remains expensive, either financially or in terms of the time needed to build and configure it. For real performance, by which we mean something that will get you into the upper reaches of the TOP500 List world supercomputer rankings, you need specialist hardware and software, as well as thousands, not dozens, of processor nodes.
"You can't just use a standard operating system on a system with thousands of nodes," Bull's Barbolosi points out. He adds that Bull not only has its own full software stack for HPC, but it has developed its own clustering application-specific integrated circuit - ASIC - and can therefore cluster 16 nodes, where Intel can cluster only eight. "We have the biggest team of HPC experts in western Europe," he boasts. "We have more than IBM, Hewlett-Packard, and so on."
He continues: "These days, supercomputing covers almost the entire spread of business, doing simulation and modelling in areas as diverse as weather forecasting, financial risk management, automotive and aerospace, defence, and high-energy physics. Video rendering and post production is a growing area."
Cloud HPC's proponents argue that it is a great way to bring this high-end technology to organisations or departments too small or under-resourced to buy it outright, as well as being a perfect solution for larger organisations that need only occasional access to a supercomputer.
One example was when automotive parts supplier Takata Corporation, which was normally happy running its own jobs using the simulation package LS-DYNA, urgently needed to run hundreds of simulations over a holiday weekend. Each would have typically taken four hours on four cores, but by uploading them to a cloud-based SGI Altix ICE cluster with 512 cores, the company was able to run the lot in 31 hours, reports Dan Long, Takata's program manager.
SGI's Tony DeVarco adds that the cloud can also be a good way to learn about HPC and see if it suits your needs. "Cyclone is a real completer for us," he says. "We have customers buying big systems, and they're not going to be delivered for two months, so they can use this in the meantime to get going. Or you can start small, learn how to use the system, then maybe buy a small system and come back when you get a big job." It can also help to cover peak loads, with extra work 'bursting' to the cloud once your own supercomputer is fully loaded, explains Barbolosi. That means you only have to buy enough in-house HPC capacity to meet your regular workload, and you only pay for the extra when it is needed.
Beyond that, it is the ability both to move the cost from the capital budget to operating expenses, says DeVarco: "When we talk to customers, they do not want guaranteed cycles per month. They want to work for a week, then go away and come back a month or two later. Then there's OpEx and CapEx - if you have the budget for a project, but not the budget to buy equipment, it's a good way to manage that."
Who's for HPC?
An intriguing feature of cloud-based HPC is that is potential customers are relatively self-selecting, defined by the nature of their need. Says DeVarco: "For us, it is small and medium-sized manufacturing companies, square on. Eighty per cent of our customers are from manufacturing, plus some government, research, universities."
Andrew Carr, Bull's UK sales and marketing director, suggests that anyone who uses CAD workstations or highest-end PCs to run simulations could potentially benefit from doing the job faster; another benefit is that its frees-up the in-house resources to tackle less-critical projects.
"We are now looking at markets that should be getting benefits from HPC but aren't, especially life sciences and manufacturing," Carr adds, noting that the first eight customers for Extreme Factory "were all manufacturers".
In automotive and aerospace there is also the potential customer base represented by the supply chain, Carr adds: "You might have a FTSE500 company at the front, but the supply chain is 20 to 30 organisations long. The cloud could give them access to HPC too. If I work in a large automotive company, say, I've invested significantly in HPC and I have an obligation to push that down to my supply chain."
Sharing the load
To be sure, it doesn't have to be a public cloud - a supercomputer could also be hosted as a private cloud, and shared out among small businesses, departments within a large organisation, or the residents of a science park, say.
Once such system is the University of Loughborough's Hydra, a £1m Bullx cluster with 2000 cores and 4TB of RAM. Martin Hamilton, Loughborough's head of Internet services, says that the benefits of making supercomputing more widely available are clear, because as soon as you offer people access to it, they find all sorts of new, good and interesting uses for it. As a result, Hydra is now used by departments right across the university, to the extent that it has had to be expanded three times already to meet increasing demand.
Much of this demand comes from applications like modelling jet engines and airflow to understand aircraft noise, for instance; but as Hamilton notes: "They couldn't do it unless they had something like a Bull supercomputer working in the background. Also we have civil engineers modelling the airflow in schools and using models for human thermal comfort, trying to get the ventilation right in new school designs.
"And we have a lot of work on renewables, such as a project with EON on wind flow and turbines design - what happens in a hurricane, say? That might not matter in Europe, but it is important if you're building offshore turbines for the US."
More unusual applications include nanotribology - modelling surface scratching - and the sports science department using it to research biomechanics, capturing gymnasts' movements in 3D to see the effects of small changes to their routines, for example. Hamilton adds: "A lot of our supercomputer use is not software developers, but people who have a large pile of data, and who just need to smear it across as many cores as possible, often using standard software packages."
There are challenges with this kind of approach. One is that contemporary systems such as Hydra are based around Nvidia GPUs, because that is the only real way to get high power without reinventing the process, even though it means rewriting software for the different architecture. Intel's Maskell says that this should change during 2012 as Intel gets its x86-compatible many integrated core (MIC) architecture into the marketplace. This is derived from the low-power Atom processors used in many netbooks, and will initially appear in the form of PCI cards.
Tony DeVarco says that there are also difficulties with software licensing in the cloud. "A big issue is where are the independent software vendors (ISVs) around cloud," he says. "For instance, if a customer wants to run LS-DYNA, I have to verify that that have at least one licence already, because they have to have a support contract. No-one - no-one I know, at least - allows their software to be run without at least one licence. There may be ISVs that will set up a service online, like Salesforce.com, but not yet. It means we do have some customers who are a licensee of an ISV, but they don't want to run the software themselves - they upload the work to us. It lets a one-person consulting business bid for bigger jobs, say."
He adds that another issue is the question of IT skills - some potential customers already know about command lines, job schedulers, how to submit jobs to a Unix cluster, how use SSH to connect in, and so on. "Then there are the people who call in and ask for say ANSYS Mechanical, they're currently on a Windows cluster but they know a bigger HPC would enable them to get more done and faster."
Not having those skills is not a problem, he says, but it means that an HPC cloud service must also provide technical support and expertise. It is almost - not quite - the exact opposite of what commercial cloud operators such as Amazon and Rackspace do to keep costs down, which is to offer a small range of standard off-the-shelf virtualised services that customers configure for themselves.
There is also an issue common to all cloud services, which is that most enterprises and other organisations are unable to easily or accurately calculate what it costs to operate a computer system 24x7, and a cloud-based alternative can make all that explicit. "There is a sticker shock," according to DeVarco. "Some people want to move to the cloud, I run the numbers for them, and they're like, 'Wow, that's twice the cost'. I reply 'But do you pay the electricity bill?' - 'No.' - 'Do you pay for the admin?' - 'No, that's IT'."
Ultimately though, cloud's flexibility could make up for all that. Martin Hamilton says that while Loughborough can justify investing in additional HPC hardware to meet additional demand now, that could change. He says that cloud supercomputing has the potential to meet at least some of that extra demand, especially since the university's 10Gb SuperJANET backbone means that (for now) it has few bandwidth worries.
"The key advantage of the cloud is that your data is smeared across the Internet, not all in one rack," he observes. "I could see in maybe three or four years the university data centre being offloaded. It still needs standards though - the data centre equivalent of a 13A plug." *