Legacy data - leaping the generation gap

Programmers are looking to grapple with the convolutionary nature of legacy data.

Computing legend speaks of legacy data as a heaving hydra with many heads, any one of which can give an enterprise a nasty gnaw - even when its fellows have all been bludgeoned. Legend tells us legacy data is immortal and practically indestructible, although it can be tamed - even harnessed, if properly treated.

Perhaps legacy data's true monstrousness lies in the definition, given that no two experts seem able to agree on what legacy data is. The only point of consensus over definition is that most enterprises have data in many places that was created at different times, incurring varying burdens and costs on the IT function; that when data exists in an obsolete format, or requires archaic stored procedures, we can call it legacy, is undisputed. 

There is little badinage over the big issue of confronting all data, legacy or not, which is the advent of SOA (Service-Oriented Architecture). According to Anthony Swindells, senior program manager for data services at DataDirect Technologies: "SOA is coming out of research into production, and enterprises are starting to run into all sorts of issues."

DataDirect Technologies is a provider of software for data connectivity and mainframe integration, and it is from the mainframe that many legacy data issues emanate. Most of these come from failure to acknowledge the challenge posed by what Swindells calls 'legacy assets', comprising both the data, and software used to access it. Too many enterprises, Swindells argues, employ SOA architects from the more recent era of distributed computing, who have scant concept of legacy assets, which in many cases reside on a mainframe. By mainframe, we mean the large-enterprise centralised platforms from vendors like IBM, now called the z Series, whose evolutionary roots lie in the System/ 360 introduced in the 1960s.

Swindells urges not to rely just on a SOA architect who is likely to trivialise the legacy data issue, only for it to return as soon as the inevitable performance and reliability issues arise. "Enterprises need to employ the services of a data architect, focused on data aspects, and separate the data access from the business logic," avers Swindells. This side-steps any pedantry over definitions, for all data is now treated the same, assuming, in Data Direct's case, that it is on a mainframe.

Data Direct's main package for legacy integration is called Shadow; it has a long pedigree, with the key point being that it integrates Web or SOA processes with legacy applications using whatever native protocol was originally used to access data. "We wrap legacy assets into Web services using native database connectivity, and separate out the data access," explains Swindells. This, he says, was essential for ensuring that performance targets are met when using data sources serving far greater numbers of users or end-clients than was intended.

This leads onto another point about legacy made by Colin Bannister, director of strategy at Computer Associates (CA). CA was briefly the world's largest software company in the 1980s and, still a major player in the enterprise arena, its multi-billion dollar revenues suggest that there's still brass to be had from buffing-up old data sets.

Bannister reckons that, as principle repository of legacy software, the mainframe is gaining yet another lease of life from the SOA phenomenon by sucking-in the necessary middleware: "SOA is one reason for the resurgence of the mainframe, which is still the best-performance platform known to man… Corporations are now putting their middleware layer on them as well, leading to real growth in the mainframe once again."

Cross-legacy functions

IBM does not pretend, though, that the mainframe represents the epicentre of the legacy universe, for its application data management package, called Optim Data Growth Solution, runs on all leading platforms, according to Scott Ambler, practice leader for Agile Development at IBM. Agile here relates to emerging software development techniques relevant for SOA, in which applications are developed in small components at a time to reduce risk and encourage interactivity among developers.

"This is a unique offering in the market place precisely because it works across different platforms, databases and applications," Ambler says. This, he adds, was necessary for meeting the major challenges of legacy, which are knowing where the data exists, how to access it, and then assessing its value.

This last point is crucial in determining what strategy to adopt, according to Nick Rowley, managing director of Oceanus, vendor of case management systems embracing documents and processes. Rowley cited the case of Oceanus customer O2, which typifies the mobile phone industry in having accumulated a huge amount of legacy data about customers, bills and location, relatively rapidly. It was imperative for O2 to migrate its data to a consolidated platform in order to meet performance requirements, and to ensure that data was readily accessible. Rowley also makes the point that strategies for legacy data should be determined by the length of the business cycle: "Making sure you understand the business cycle, how quickly things turn, is going to affect your data, where you store it, and the medium you store it on."

The nub here is that, although for some enterprises the best approach is to leave legacy data where it is and deal with any inconsistencies arising when accessing it, this may not be feasible when the data is highly critical.

This dichotomy is reflected in significant differences between industry sectors over the extent of the legacy data burden, believes Matthew Thomas, technical services director at Progress Software, a vendor of application integration software. "In the telecoms sector, the problem is diminishing as the industry is evolving into much more modern systems, and has been doing so for many years," Thomas says. "But in certain segments of the finance world the problem is increasing as more and more systems come online that desire or need access to the data."

Irrespective of the sector, or platform it resides on, a common problem with legacy data is simply the mass accumulation of junk or redundant information, according to Stewart Buchanan, research director specialising in IT asset management at the Gartner Group. "It can accumulate like toxic waste - a lethal cocktail of cost and legal liability," Buchanan warns. "Legacy data creates unmanaged cost, risk and liability. It can require unplanned spending to continue storing and maintaining data in an accessible way, spending to find and to retrieve data (e-discovery), and further spending on the consequences of failure."

Management and more

By Buchanan's definition, data is legacy when not properly managed, which means adopting a sound strategy for data retention as soon as it is created, avoiding duplication,
and adhering to agreed standards to ensure that access can be assured. For most enterprises, though, this happy situation does not exist and, therefore, investment is needed in essence to clean up legacy data if it is to be of any value for emerging SOA applications, or any processes that require live access.

Such was the case for Thames Power, which, like other energy utilities, faces stiff competition in a market where wholesale prices have mushroomed. Thames Power decided it needed to review prices paid for goods and services in the past in order to obtain better value in future. Such records were in various legacy formats that would make regular access too slow and unreliable, with frequent errors likely. This meant that the data had to be migrated to a new platform, as part of a general clean-up process, with enterprise application vendor IFS hired for the job.

This brought several challenges that are fairly typical in such projects where legacy data is still being created at the time the migration takes place. "At a point in time the legacy system has to be frozen, and no new records should be added as these may be missed in the new system," says Colin Beaney, consulting group leader for enterprise asset management at IFS. "There have to be clear processes defined and documented for what happens after the data has been fixed."

Clean slates

But the biggest challenge lay in clearing out the rubbish. Cleaning the data accounted for 65 per cent of the cost, according to Beaney, compared with 25 per cent for extraction, and 10 per cent for loading into the new system. The high cost was incurred largely because it is impossible to automate overall review of the data, which had to be done manually.

While some legacy data characters and formatting can be removed or cleaned via macro automation tools, spelling and naming type convention errors can only be picked up by hand at present.

At least Thames Power now has a system it can use to search for historical purchase or works orders that are relevant for prospective purchases, and which could not have happened without the migration, according to Ian MacDonald, senior control systems and IT engineer. "We now have all our legacy data available to our users without them having to open up other systems to extract data that they may require," says MacDonald. "I believe, without the legacy data being migrated, the project would have been a failure, as it would make the end-users' job more difficult, reducing their productivity and increasing their frustrations."

Yet the high cost of data cleaning does suggest it is worth avoiding data becoming sullied in the first place. This is the other side of legacy which, at last, enterprises are starting to consider, according to Gartner's Buchanan. "Data retention costs and options are now better understood, and starting to be factored into business models," says Buchanan.

"A few organisations still try to avoid data management spending at all costs, but by now most of them have found out the hard way - trying to keep legacy applications alive indefinitely can cost more in the long-term than confronting the problem and doing things properly in the first place. If data retention is a cost of doing business, denial does not pay the bill."

This is an old lesson to seasoned ears, perhaps, but in the world of enterprise IT, legacy has a habit of haunting those that fail to heed.

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them

Close