The gift of the lab
The scientific research sector and mainstream IT are reaping the rewards of a two-way exchange of ideas.
The relationship between scientific computing and enterprise IT has become much closer since the ascent of the World Wide Web entered into public consciousness during the 1990s. Before then scientific computing was the fount of most innovations in IT, including the Internet itself, which initially evolved as a vehicle for exchanging information and ideas within research communities.
Scientific IT was the driving force for progress in high-performance computing, data analysis, database management, and real-time process control, with the business community gratefully picking up the crumbs some years later.
This relatively distant relationship served businesses well enough in the early days of IT when commercial applications were often mundane accounting packages and batch processes, and databases were manageably small. The data volumes then of a large corporation could today be contained and administered comfortably by a domestic laptop.
But increasing dependence on websites, coupled with massive proliferation of data and growing commoditisation of technologies for high-bandwidth networking, cluster computing, and statistical analysis, have brought IT to the coal face, and turned it into a widespread force for competitive edge. Now enterprises in all sectors and a growing number of SMEs are seeking to exploit emerging information technologies more quickly than their competitors.
The objectives are various, ranging from the adept handling of huge amounts of data - as in digital rendering for video and film production - to intelligent mining for interesting or significant patterns within customer databases by retailers or telecommunications providers, or the incorporation of human expertise into real-time algorithmic trading systems where huge sums can change hands on the basis of decisions executed in a microsecond.
All these areas are still benefiting from technologies developed for scientific projects or applications, but with intellectual property transfer occurring much more quickly than used to be the case. The transfer is even starting to take place the other way, with some areas of science now gaining from technology or tools deployed first in commercial sectors where competitive forces are stronger and budgets larger.
Fat cat stats
An unlikely transfer in this direction is between econometric modelling and climate forecasting, which both turn out to require similar kinds of statistical analysis. The surprise here is that climate forecasting, a field supposedly upon which the future of the planet hinges and decisions of huge economic and geopolitical importance are based, should lag behind analysis of commodity prices and financial markets.
But this is the case, to the extent that statistical economists took part in a recent European Science Foundation workshop on climate science (‘Econometric time-series analysis applied to climate research') to make suggestions on how predictions of global warming and its regional consequences could be made more accurate. Both fields involve variables that change randomly within certain constraints.
Temperature, for example, can change over time, but only to a certain degree, and shaded by varying external factors, with the more likely direction being upwards in the event of rising atmospheric carbon dioxide levels. Similarly, the value of the pound changes randomly against the dollar, but again only so much, and the most likely direction is also affected by factors that feed into the model, such as relative interest rates between the UK and US.
The subtext here is that climate modelling is not as accurate a science as the IPCC (and indeed the likes of former US Vice President Al Gore) would have us believe - but that is another matter. On the positive side, it shows that the climate forecasting community recognises that it needs the very best statistical modelling tools available, and is quite willing to look abroad if necessary.
Going the distance
According to Peter Thejll, senior scientist, Atmosphere & Space Research Division at the Danish Meteorological Institute, and co-chair of the ESF workshop, it also reflects the fact that economics is a dynamic field involving the real world, while until recently it was taken that climate was relatively constant over time. This assumption has been blown away by the reality of rising carbon dioxide levels raising the possibility that climate could become as volatile as the global economy.
"Not until climate researchers started thinking about climate change did the need for these methods in that field really arise," said Thejll.
Yet, irrespective of how accurate climate forecasting models actually are, the spectre of global warming is now exerting a great influence on the shape of enterprise IT by focusing investment and attention on energy conservation, where many of the ideas did evolve first in scientific computing. At the high end of performance, whether in supercomputers comprising a handful of CPUs, or highly parallel machines, the two constraints on processing speed have been heat dissipation and latency caused by the total distance electrical signals have to travel in executing a task.
Much of the effort in scientific computing has gone into reducing those, with the only viable strategy being to tackle both at the same time by bringing components closer together, whether this is on a single chip, circuit board, or at the level of multiple CPUs. As a general and simple rule, the shorter the distance travelled, the less energy dissipated by heat, and the faster the task is completed.
Not all efforts on this front have been successful, and some have been abortive, such as attempts to develop very large-scale integrated circuits during the 1970s and 1980s. Instead, general advances in design of conventional-size CPUs have in effect led to systems on a single chip, culminating in current production-line multi-core technology from Intel and AMD.
These chips are now re-incorporating graphics and maths functions - such as matrix manipulation - that were previously implemented on separate dedicated processors. But the point here is that, for enterprise data centres, the new accent on energy efficiency gains from the very same technologies that reduce heat dissipation in order to increase performance in scientific computing. The less heat produced, the lower the energy consumption.
Heat control hots up
The motives for reducing heat production are slanted differently for enterprise IT, with the accent on saving energy rather than increasing performance to the limit. Yet, while this difference is reflected in architectural design, the underlying technologies are similar, as Steve Bowden, chief technology officer for IBM's Green Computing initiative, intimated. "The technology we've implemented in the scientific arena, even at the wafer level, is really critical now to get more performance per Watt out of every CPU we design," Bowden insists.
At the wafer level, IBM now allows individual sections of the processor to be shut down while others are still running, delivering significant savings in power consumption.
But modern virtualised environments offer the potential to go much further by consolidating workloads on as few processors as possible dynamically at every stage of execution, and temporarily switching off whole CPUs when they are not needed.
Furthermore, the same economy can be extended dynamically to storage and networking components if they are virtualised as well. The key here is the ability to dynamically move applications between processors while they are executing, which again exploits techniques first developed in the scientific arena.
Additional savings can be achieved by consolidating a data centre down on a permanent basis to a smaller number of individual systems, to take advantage of superior energy management and higher component densities.
The quest for savings is accelerating demand for centralised virtual environments, drastically reducing both the footprint and number of servers, according to Bowden.
IBM has taken its own medicine, consolidating 4,000 disparate servers, many x86-based, down to just 33 mainframes, and this according to Bowden has cut the company's energy bill for computing to just 20 per cent of its former level. Many large enterprise can expect to make similar 80 per cent savings, Bowden insisted, and even SMEs with less scope for consolidation should be able to cut bills by at least 40 per cent via virtualisation. The seeds were sown, again, in scientific environments by technologies for intelligent execution of workloads across large clusters and parallel processors.
While hardware developments originating in science may be having a dramatic effect on the bottom line, the intellectual traffic at the level of software and algorithms is creating new business opportunities and also leading some enterprises to recruit IT development staff from more technical and academic backgrounds. The latter is happening particularly in the financial sector, where the level of analysis and statistical modelling required now matches the sophistication of many scientific problems, as explains Aly Kassan, senior applications engineer at Mathworks, one of the world's leading providers of technical software for both scientific and commercial customers.
"Financial modelling is one area where more and more they're looking to civil engineering, physics and maths to recruit a new breed of quantitative analyst," Kassan says. "The mathematical models are becoming more sophisticated with each passing year, and are being used in a wider and wider context."
The increasing sophistication lies in seeking certain recurring patterns within markets that can be exploited to make money if they can be spotted sufficiently quickly. This can involve ‘statistical arbitrage', where a position arises in a particular market where there is no certainty but a better than normal probability of gaining from a particular combination of prices.
This could be, for example, where the statistical analysis of recurring patterns suggests that the relative price of two currencies is likely to change in a particular way. This type of application also reflects the increased scope of statistical modelling in financial trading, as Kassan points out: "Before, modelling was used mainly for measuring risk and working out how exposed a position was. Now it is being used in algorithmic trading, looking for specific patterns to recur".
The search for patterns of interest within financial trading data itself requires new analysis tools. It is now possible to capture huge amounts of market information containing a mine of potential valuable data about the micro-movements of prices. A significant development here was the launch in June 2005 by Reuters of its Tick Capture engine following a major development project, to capture every single tick of the market.
This created an analysis challenge similar in scale and sophistication to gene sequencing and analysis, with both requiring recognition of patterns distributed at varying time scales across very large data sets.
To take an overly simplistic example in finance, a company's stock might exhibit a tendency to rise initially and then fall after reporting of sales figures that are poor but not as bad as expected, with initial relief being followed by a realistic reappraisal of prospects by investors. This pattern might show up repeatedly in historical data but not always over exactly the same time scale.
Similarly, after being given a drug, certain genes in mice and humans might respond via changes in levels of expression in a characteristic way, but not equally quickly. In both cases, in order to identify or compare such patterns, a statistical ‘non-linear' method that filters out the variations in time is needed.
Dynamic time warping
Such an algorithm, called Dynamic Time Warping (DTW), was developed in 1978 for a different application again - speech recognition, where the requirement was to identify individuals on the basis of a spoken word, against a database comprising recordings of that word by each person to be recognised. The algorithm had to allow for variations in the time taken to speak the word, since some people talk faster than others, posing exactly the same problem.
This algorithm was then adapted for comparison of gene expression early this decade, in a Java program called GenTxWarper, and has since been adapted for a wide range of problems of commercial application whose common thread is that they involve recurring patterns that vary in their duration.
For example, the Malaysian Electricity supply industry has used DTW to identify theft of power, which can be perpetrated by meter tapering, meter bypassing or just not paying the bill, with each showing up as uncharacteristic patterns of behaviour on the part of the customers over varying time scales.
DTW is also seen as somewhat of a blessing for customer relationship management - a field that had become bogged down by rather uninspired and unenlightening static analyses. DTW allows the evolving behaviour of customers - and crucially also potential customers - to be analysed from a combination or internal and external data sources. This provides the potential to exploit tendencies for recurring purchases and also identify the most lucrative prospects not just on the basis of spending power but also by changing tastes or desires.
The real unifying theme though between scientific and commercial IT is information, according to Vernor Vinge, emeritus professor of computer science at San Diego State University, and author of science fiction novel ‘Rainbows End', the 2007 Hugo award winning science fiction novel depicting the Internet of 2025.
"One of the most important technologies is simply the use of computers for gathering, storing, and analysing vastly greater bodies of data than in the past," says Vinge. However, as he added, the sciences are no longer ahead in terms of volume or complexity, and so we can expect more innovation and research in future to emanate from the worlds of finance, retailing, and services.