A new vogue in optimised storage
Mustering for clustering
There was a time, not so long ago, when size was everything in the data storage world, but now capacity alone is not enough. Where a mere decade ago storage area networks (SANs) and network-attached storage (NAS) were all the rage, now it is the turn of clusters, exploiting commodity hardware to provide greater scalability and I/O performance for large unstructured files. Clustered storage also simplifies management, because there is just one file system holding all data in a single volume. The complexities of locating files are shielded from the data centre and its applications, as opposed to many SANs where files are distributed across multiple arrays that need to be managed.
There are as many definitions of clustered storage as there are of clustered servers, but several key practical criteria can be applied. First, a clustered system avoids I/O bottlenecks that may be present on NAS and SAN systems by distributing I/O bandwidth to each node.
This also enhances scalability, as I/O and storage capacity can be increased independently as required. For instance, Isilon's IQ family of cluster systems has two types of cluster, one comprising storage integrated with I/O, and the other I/O on its own, which becomes useful where multiple applications or users require access to a single file.
Although clustering is relatively new to storage, it was invented for servers over 20 years ago by Digital Equipment Corp (DEC) for its then state-of-the-art VAX systems. This was driven by the arrival of cheaper more commoditised hardware combined with proliferating demand for CPU power. Similar forces lay behind clustered storage, in this case a combination of rapid growth in unstructured data such as video and image files, combined with continuing dramatic fall in price-per-MB of disk drives.
This makes clustered storage highly attractive economically for such data, although even Isilon, the field's pioneer which decided to be bet on clustered storage in 2001, admits that NAS and SAN may be better for emails and applications generating large numbers of small transactions. "SAN and NAS are arguably better-suited for small transactions and email," agrees Isilon's EMEA marketing VP Phil Crocker.
This is because small transactions do not benefit from the distributed I/O of clustered systems, which comes into its own for large files that can be accessed in parallel across multiple clustered nodes. Yet Crocker insists that some of Isilon's customers did use clustered storage for small transactions that can be batched up into larger files and distributed across multiple nodes.
A major source of recent growth in unstructured data is video, which in turn brings a requirement for time-shifted access to the same content. Clustered storage is ideal for this because the data is distributed across multiple drives sharing the I/O burden, although even then the relatively slow read/write of disk storage is a potential bottleneck.
Isilon and other storage vendors are working on systems incorporating solid state ('flash') drives, although a new I/O interface will be needed to maximise the potential of flash storage for multiple read/writes within single files.
Given these trends, Isilon faces competition as its rivals muscle in on its act. IBM acquired Israeli-based XIV in January 2008, while HP (Hewlett-Packard) bought clustering technology from Polyserve in February 2007. "Our two-year relationship with PolyServe has convinced us that its technology will help accelerate HP's growth, and complement our HP StorageWorks, ProLiant and BladeSystem businesses," says Bob Schultz, senior vice president and general manager, StorageWorks Division, at HP.
Another contender, NetApp, has been trying to embrace full-blown clustering by converging its N-way GX clustering software with its file operating system called Data OnTap, which aims to enable customers to handle mixed SAN and NAS environments in one platform. In September 2008, NetApp announced availability of enterprise application development and test solutions for heterogeneous SAN storage environments, so that its customers can use NetApp storage solutions with multivendor SAN storage.
How much storage do we really need? See Viewpoint in this issue.