Can de-duplication have stave off the data avalanche? By Nick Cater, Iron Mountain Digital.
A key way of reducing data is to examine how much information is simply replicated by multiple users: standardised operating systems and applications have brought thousands of identical files on legions of computers. Add to that identical multi-megabit attachments stored in multiple recipients' inboxes, and it's easy to see how much duplicate documents add to an organisation's storage requirements.
De-duplication deserves to score higher on the IT agenda as a method to help reduce storage and power costs through streamlining the amount of information needing to be backed up. It also helps to address issues such as business continuity, e-discovery, and compliance requests.
De-duplication technologies can take a number of forms, but the core methodologies are elimination of identical duplicate files across the network, incremental backups and file compression.
These techniques can strip out a huge amount of backed-up data that simply isn't needed. The technology also works across distributed data centres, ensuring one centralised version of a document is backed up, rather than several different versions held on different devices.
Useful though it is, de-duplication only tackles the initial symptoms of the data deluge, and will not be able to match the growing average size of files as video and media files increasingly used. Data reduction takes de-duplication one step further, moving it from a reactive to a proactive approach to data management. The technique automates data movement and deletion from the desktop, reducing the physical volume of data moving around the organisation.
Policy-driven, the technique 'tags' files that are deemed no longer required: this is established through a rules-based system that can be set up by administrators or IT managers. These files can then be extracted from their current position, and either moved to the archive, or deleted securely.
Data reduction should, theoretically, reduce the requirement to educate users about how to manage their own data storage effectively. Moving data management to an automated, policy-driven mechanism removes the need for colleagues to worry about where and when their data is backed-up.
However, ensuring users understand why data reduction policies are in place - and how they can remove any blockages to the backup pipeline - will always help an organisation's long-term data strategies to succeed. Common practices such as using email inboxes as a secondary storage system for large documents will always continue, so IT managers should encourage users to take a robust and rigorous approach to their individual storage habits - before somebody else does.