Is it the data centres' fault if their technology is used in a way that throws up some contentious issues? Here are six examples showing how these 'tech satanic mills' have ended up as the scapegoats of the digital society.
Data centres have come in for their share of stick in recent years. Whilst recognised as the global powerhouses of the online economy, technologically enabling everything from ecommerce and social networks to streamed media and cloud computing, data centres are also viewed by some as the digital equivalent of William Blake's 'dark satanic mills' of the industrial revolution.
Sometimes seen as sinister and secretive, often guarded by high-security fences and thick, bomb-proof walls to protect the sterilised environments within, the proliferation of data centres has elicited concern from environmental monitors and critics of the way the Internet is encroaching on many aspects of our physical lives. To conspiracy-seekers they are also the places of secrets, where organisations amass big data sets that inform surveillance measures and govern our human rights.
Yet the plain fact is that most people are unaware that the great majority of their online activity would simply have been impossible before multi-host data centres started to proliferate in the early 2000s. And before the arrival of the data centre, it's not as if all kinds of organisations weren't already storing data about us on their own computer systems.
Of course, very little happens in cyberspace without a data centre being involved at some point, and this means that they are liable to take flak for issues that are not really down to them. Take broadband roll-out in the UK, for instance. The government has been pushing for higher- speed broadband throughout the country, because this will better help smaller businesses compete online. However, faster broadband also allows for greater data throughput, and very often the biggest fans are in the consumer market rather than the enterprise market.
The result is often the case that the supercharged connections prove more popular with home users intent on using them to watch streamed video content, play high-performance games, or share huge files. All this extra traffic chugs through a data centre at some point, massively increasing the amount of energy it needs, and also, arguably, foreshortening the life of its computing components, thus causing hardware refresh cycles to get shorter, so new kit has to be installed more often. More resources turned into servers, more carbon expended to keep the LEDs blinking. It's not really the fault of the data centre but it is at data centres' doors – not portals of government, or those belonging to feckless end-users or the network operators and ISPs – that blame is most often laid.
So here we have pulled together six examples where the data centre can reasonably plead extenuating circumstances. Some of these resonate with trends in the IT biosphere, such as the accumulation of cloud-based services and enterprise IT models. Advocates may bang on about the advantages of migrating to the cloud, but are less voluble when it comes to describing the effect large-scale cloud uptake is having on data centres around the world. And when data centres have been in the eye of the storm of the green IT debates, it is often overlooked that they aren't generating the computing activity that's consuming all that energy – that's coming from the millions of users who are in effect using data centres as a sort of remote-central system where the bulk of the heavy-duty processing occurs.
The WAN/LAN bottleneck
'The stupid Internet's running slow again'
With so many consumers and businesses starting to rely on cloud services to host and process their applications and data, any period of downtime at the provider's data centre, however brief, is likely to generate negative headlines. While some of those outages can be blamed squarely on the cloud services provider itself for any number of system-based reasons (such as unforeseen software incompatibilities), culpability may in part also lie with the Internet service provider or telco connecting the hosting facility to the rest of the world.
Big players such as Amazon, Google, Microsoft, Sony, and others, have all been in the firing line in recent years, and have decided that as far as the customer is concerned, the buck stops with them, and unseemly mud-slinging bouts between them and their network partners should be confined behind tightly-closed doors.
There is, in fact, much data-centre operators can do to mitigate against, if not completely eliminate, network outages. On the wide area network (WAN) side, this includes installing multiple routers and fault-tolerant connectivity which divides the load between multiple ISPs and reroutes traffic to another provider or connection if one becomes unavailable for whatever reason. They can also try to limit potential bottlenecks on the local area networks (LANs) servicing multi-tenant data centre architectures which support thousands of customer virtual workloads simultaneously by deploying redundant network interface cards, switches, routers, and storage area network components over multiple connection paths, and by using virtual network overlays such as VLAN (virtual LAN) or NV-GRE (Network Virtualisation using Generic Routing Encapsulation) to support more efficient transfer of virtualised traffic between multiple data centres for failover purposes.
The M2M data challenge
'They get devices to yack to each other'
Data centres often get blamed for 'generating' the information that streams in and out of them, but they are not 'data factories': what gets pumped out is governed by what gets pumped in; the data centre is merely a conduit. Machine-to-machine (M2M) communications is arguably the major next frontier for the data centre, set to give another twist to the data spiral, imposing new strains on processing, storage, and network bandwidth alike.
M2M data will be generated by a plethora of devices and things that will vary with industry sector, but include sensors, networking devices, and electronic equipment, generating a lot of 'big' data. According to storage vendor EMC, M2M will be responsible for a lot of the projected growth in the world's data, which it expects to increase from 2.8 zettabytes (ZB) to 40ZB by 2020. This data will be associated with a proliferating number of connected devices, set to double over the next eight years to 50 billion globally, according to the GSMA.
With M2M being generated already in fast-growing volumes, data centres need to embrace it within their strategies, which boil down to two related requirements – scalability and distributed data analytics. A lot of data will be streamed and require at the very least acknowledgement in real time, and will grow in volume as will the actions surrounding them. As well as scale, the data centre will have to become more responsive to fluctuations in load, which all points towards cloud computing. But even then the cost of scalability given the volume and diverse nature of the data being generated would be prohibitive, as would the cost of transporting it all over the network. Distributed processing will be required, both to cull M2M data at source and to perform some of the more basic analytics. Some of the decisions based on M2M data can be made at source without involving the data centre, reducing overall operating and capital costs.
Re-architecting to save energy
'Pointless Web searches burn loads of carbon'
A study performed by Harvard University researcher Alex Wissner-Gross found that just two searches on the web consumed as much energy as boiling an average-capacity kettle. But most of that energy is not down to the server or the communications network – most of it is what the PC burns while sitting idle waiting for the user to make a decision.
The screen alone, thanks to its backlight, is a major culprit in energy consumption. Yet around the world of computing and IT the critical attention in this perceived wastefulness has fallen on the data centre. One answer that society has so far avoided is to push much of the data to the edge of the network instead of forcing everything through centralised systems, which would make the most of the architecture of the Internet.
The technology exists and, more than a decade ago, Ross Anderson, professor in security engineering at the University of Cambridge Computer Laboratory proposed it as a way of keeping terabytes of data safe instead of being consigned to dedicated archives. Many of the network-attached storage drives consumers can buy off the shelf support it today. And it is widespread; it's just not very legal in most cases.
Peer-to-peer communications technology provides a way of breaking up data and storing it on hundreds or thousands of nodes such that, if one disappears, others will take its place. The technique can make use of the spare compute cycles in computers and NAS devices to serve, at the very least, large data files, perhaps using scaled-down servers as indexes. However, many industries are nowhere near the stage where they will consider moving to P2P because of the way in which it has become associated with copyright infringement.
Redesigning the server farm
'Server design is still stuck in the 1980s'
For decades, server design was the preserve of 'big iron' computer vendors. Customers had comparatively little choice in what they bought other than processor, memory, and disk capacity. The mass-produced PC overturned this hegemony, but for the last 15 years data-centre operators have largely had to use the inner architectures mandated by server makers. But techno-evolution has now reached the point where some data-centre operators have decided to cut out the middle man and go direct to the electronics companies who supply those computer makers.
The decision by Facebook and other data-centre operators to develop their own specifications for servers indicates the industry supplying them has not met their needs, particularly where it comes to maintainability and servicing. Facebook's Knox design for its servers largely focuses on mechanical rather than electronic issues; but made it open-source in the hope that other data-centre users would improve on it and provide novel approaches. AMD, Calxeda and Intel have joined the Open Compute Project, and have come up with motherboard designs that they are prepared to offer on an open-source basis. Google, on the other hand, keeps the internals of its self-tuned machines a closely-guarded secret – an indication of how important the user-customised server has become.
The trend is not likely to stop at mechanical framework. Research at universities such as Stanford and EPFL in Lausanne has shown that the processor and memory combinations being used by new servers, particularly those in applications such as search and social media, are very energy-inefficient, and are recommending a shift to techniques used in the mobile-phone space. By being able to tune applications and component hardware together, data-centre users now occupy an ideal position to remake the server.
Data ownership rows
'They're totally crammed with pirated content'
January 2012 saw US data-centre operator Carpathia Hosting come under fire for its part in storing copyrighted content accessed by users of the Megaupload file sharing site shut down by the FBI, with investigators alleging that the colocation company was more than an innocent third party left to foot the bill when its servers were seized by the authorities as part of the investigation.
Like other hosting companies, Carpathia is protected in the US by the Digital Millennium Copyright Act (DMCA) which shields it from liability for any copyright violations committed by users. But that does not stop lawyers for the major Hollywood studios looking to prepare the ground for civil litigation by claiming that Carpathia executives knew exactly what sort of content Megaupload users were sharing based on the sites phenomenal traffic volume. Carpathia, along with Internet service provider and co-lo provider Cogent Communications, has always maintained that it "does not have, and has never had, access to the content on MegaUpload servers".
Like other data-centre hosting and public cloud service providers, it provides dedicated physical servers or virtual machines running on those servers, which are provided on-demand by its customers in blocks of CPU, RAM, and per-GB storage instances. Introducing processes to monitor or approve the content that thousands of users upload to those instances would go some way to undermining a competitive infrastructure as a service (IaaS) business model which relies on high volumes of low-margin transactions (other co-lo companies like Equinix sell content protection and digital rights management services as a value added service). Nobody expects self-storage companies to inspect every item stored in their containers or warehouses or check whether they are the subject of any ownership disputes – why should data be any different?
Broadcasting & Narrowcasting
'I just want to watch TV: why so much crashing?'
Broadcasting has transformed itself into one of the most challenging data-centre operations, with a number of dedicated data centres offering hosting and services for broadcasters, with video distribution increasingly migrating to offsite cloud-based platforms. The proliferation of TV channels, a growing number of which are in HD, has increased dramatically the amount of video data that has to be stored and transmitted.
The most profound change is in the number of different platforms broadcasters have to support in addition to conventional telly: PCs, tablets PCs, smart phones, and games consoles, with many categories of each, so that some broadcasters are having to encode up to 18 different versions of the same content for distribution. Furthermore, they increasingly have to deliver via the public Internet – over which they have no control – and so have adopted adaptive bit-rate streaming (ABRS) to deliver the maximum video quality current network conditions will allow while avoiding pauses for buffering when the available bandwidth suddenly drops.
In that event the stream adapts to a lower bit-rate by reducing the picture quality temporarily. The onus of facilitating this digital grunt work is thrown back at data centres. But while ABRS has been hailed as a panacea for online video distribution over unmanaged networks, it is a headache for the data centre. ABRS delegates selection of the content quality to the device, which in principle is a good idea because it minimises delay in responding to changes in available bandwidth. But unfortunately not all devices behave properly, with some being greedy for bandwidth and grabbing more than is needed to receive video at the maximum resolution they can play back. The playout operation in the data centre has to cope with such vagaries, as well as having the scale to serve a fast-growing population of disparate TV viewing devices.