Airport terminal graphic

Big data at large - you may not see it but it will see you

When an E&T writer set off to attend an industry event in Amsterdam he discovered that his trip was being helped along at all stages by an unseen force.

As the importance of 'big data' continues to grow in 2013, it occurred to me that, as an IT writer who's followed the phenomenon for years, I should examine the extent to which all the trumpeting of big data 'changing the way we live' is justified.

Immediately this raises another key question: what is actually new here, when computer-based data has been growing steadily in volume and variety for decades? Does big data really provide, for the first time in the history of enterprise computing, the 'big picture'?

In a general context big data is an aggregation of data sets that are so large and complex that it becomes difficult to process using readily available database-management tools or traditional data-processing applications. This challenge also contains an opportunity for commercial organisations that are equipped to find ways to use it – or elements of it – to inform and enhance revenue drivers (see 'Puzzling out big data', E&T Vol 7 Issue 12).

The term 'big data' is partly a recognition of the growing relative weight and importance of unstructured data not amenable to conventional database analytics and reporting tools and techniques. Above all, though, it embodies an ambition to extract value from data, particularly for sales, marketing, and customer relations.

Taking this definition I decided to put big data to the test by measuring how often it 'touched' me in my professional and social existence. I decided to start with a somewhat atypical day in my life, journeying from London to Amsterdam to attend an industry conference, but also scrolling forward to the next day to take in some big-data contact points I'd missed while on the move.

I came to two somewhat opposing conclusions. First, big data really is now being applied everywhere, and touches our lives far more frequently than most non-techies appreciate, although most who are touched by it must be aware that something clever is at work behind the scenes. This is partly because big data sometimes operates automatically in background mode as a filter, as when executing security checks when applying online for a credit card, say.

On the other hand big data is very much a work in progress, and we are only beginning to scratch the surface of its possibilities. Some of the more exotic applications, burrowing deep within the vast repositories of machine data now being generated, are just starting to materialise, although I did brush against one example during my week of counting my big data touch points.

My other main conclusion was that the application of big data is perhaps not absolutely always in our best interests, and can even be a facilitator for malevolent intent. Cyber-criminals, fraudsters, and hackers are all at it too, possibly operating on stolen or misappropriated big data, and, although not a victim of illegal activity during my journey, I have received some strange phone calls and emails that must have come from access to my digital profile.

Starting-off points

My first brush with big data came when I booked my hotel in Amsterdam. I had stayed before in a somewhat minimalist establishment called CitizenM, the room interiors of which might not be to everyone's taste (the bedroom loos are located in cylindrical clear glass cubicles in the centre of the room). In any case the hotel ensured it remained in my thoughts by popping-up little adverts all over the place, especially on YouTube, and again I sensed big data in action.

I was obviously on CitizenM's customer database, and my name had been linked with YouTube, possibly via my Google Apps for Business account, as I was accessing from the same computer and IP address. This was one example of a situation where one can see big-data analytics going much further toward paying its way, for example, by offering personalised terms and special rates for booking at off-peak times when the hotel wants to fill untaken rooms.

I felt the same when booking my flight with Easyjet, sensing that the airline too could be doing more here, especially as it boasts about being ahead of other airlines in exploiting big data. As a regular Easyjet user, I do receive offers by email, but I'm a bit surprised that they don't go further and send personalised invitations rather than generic approaches of the kind I also get from British Airways. A typical email message from Easyjet might read: "SALE – up to 25 per cent off for five days only", or "our biggest sale ever ends midnight tonight".

This is driven by the airline's own reservations systems to determine when it is worth trying to flog seats, and then how much to discount; but so far it has not harnessed personal customer information as far as is possible, from my perspective. Yet I note that the airline has recently been soliciting my feedback on the overall flying experiences, and on how I rate its performance, which I imagine will be provide fodder for further big-data-based operations and marketing activities.

Touching the void?

In the end I counted at least three other big-data touchpoints with Easyjet. The first was when I printed my boarding pass, knowing checks were being run in real time against several databases, including ones external to the airline, such as checking my name against international 'no fly' lists to ascertain that I was not a registered offender, for instance.

Then when I arrived at Gatwick Airport I found that Easyjet was checking-in passengers who didn't have boarding passes with portable scanners communicating with its own system. This system, called Halo, uses the airport's Wi-Fi network, backed up by 3G cellular, also providing greater flexibility around the departure gate.

It communicates with Easyjet's central reservation system via Microsoft's Azure cloud computing platform. A staff member told me that it would soon provide other facilities for passengers, such as taking take chip-and-pin payments for any excess charges that need to be collected before boarding. That is probably convenient for the airline's cashflow, and I suppose should be acknowledged as another application where big-data shuffling in real time is showing benefits.

Easyjet calls Halo an 'application' of big data; one might argue that really the main innovation is in the use of speedy Wi-Fi and the Cloud, rather than any radically new algorithm. Big data genuinely does though come into play for Easyjet when it comes to turnaround times. The airline runs a very tight schedule, of course, and relies on being able to prepare a plane for the next lot of passengers within about 25 minutes of the last lot disembarking, with penalties to be paid if it runs late or if it is the cause of delay, as well as loss of customer loyalty.

Easyjet also collects data from a variety of sources that relate to turnaround time, particularly on factors that cause it to be late, which could be a delay in re-fuelling, or baggage handling. Not all factors are entirely within the airline's control, but by analysing them 'collectively' it has shaved turnaround time almost to the bone, and it is hard to devise much more room for further improvement. As it happened, the incoming flight was delayed by half an hour, but the airline managed to turn it round in barely over 20 minutes, and we touched down at Amsterdam Airport Schiphol virtually on time, aided by a 100mph tail wind that the pilot picked up with some turbulence.

It is, of course, quite likely that big data derived from meteorological agencies' supercomputers systems played a part in producing the weather and flight-conditions updates fed to the air crew during the course of their operating day – but that's been happening for ages.

Social media's buzz factor

During the flight I got chatting to the fellow seated next to me, and he told me how his employer, a direct marketing company, was experimenting with some internally-developed software to measure perceptions of the company's brand, even how this changes on a daily basis in response to successive news and advertising campaigns. It is well known that a TV spot advert during a popular programme or major sporting event can trigger a big spike in sales for a retailer. This particular software measures the associated impact of such events and others on how the brand is perceived, by analysing the 'buzz' on social networks, especially Facebook, YouTube, and Twitter.

New algorithms were created that extract the essence of this unstructured data and translate it into metrics that can be scored. While obviously a work in progress, with much more refinement to be done, this holds the potential for experimenting with campaigns, including relatively low-cost viral measures that attempt to generate social network activity, and correlate this with actual bottom-line profits or sales. We can assuredly expect to see more of this – lots more.

Taxi driver talks up tech

Back at Amsterdam Airport, I did not have long to wait for my next big-data premium: on the taxi from Schiphol airport in Amsterdam to my hotel. I noticed that my driver did not take the normal route, and I asked him why. Even I was a tad taken aback when he replied (in English) "big data"!

I should not have been in the least surprised, really... The driver did add that his TomTom satellite navigation service made use of live traffic data from various sources to calculate in real time the likely quickest route between two points at the time, and feed that data back to vehicles.

The value of this assistance to drivers goes without saying. Travelling in the reverse direction such traffic intelligence could make the difference between making and missing a flight.

My technologically-informed taxi driver had not finished: he went on to say how he'd heard data was becoming "a tradable commodity around which a secondary market was developing" (I kid you not). There was the potential for an online trading exchange to connect those with data they want to sell with others prepared to pay for it. Part of the service would involve transforming the data into a useable form, as in the case of Twitter Tweets or online testimonies that, in their raw form, are of negligible value.

He and I speculated how aggregation services might arise where data is combined to increase its potential value even if more processing and integration was required to deliver an end-user service.

If I still harboured any residual doubt over the force of big data integration, I had only another half an hour to wait to have it dispelled in short order. My hotel – the CitizenM Amsterdam City on Prinses Irenestraat – may in truth have skipped the opportunity to conjure-up much in the way of big data acrobatics when I checked in, but Google soon made up for it when I tapped into the Android Voice Search App. This is the equivalent of Apple's Siri, and delivers results instead to the screen in the usual way, but with much greater accuracy than Siri.

Apart from any arguable differences over search capability, Android Voice Search is much better at recognising human speech, having at last achieved a significant advance in a field that had appeared somewhat stuck in a rut for decades, with only painfully-achieved incremental advances. This progress has less to do with Google's estimable research team, being more directly attributable, of course, to big data.

Google was, in fairness, one of the first tech players to appreciate that the traditional approach to voice recognition – based on artificial linguistic rules for processing speech components – was running up against a wall that no amount of computational power could surmount. Google spotted that the greater data bandwidth of 3G mobile networks would enable a massive speech learning effort to take place on banks of large networked computers, or, as we say now, "in the cloud". While the individual speech component recognition would take place still in the handset, tuning would be performed in the cloud to improve accuracy.

The real point is that 3G made it possible to collect data about human speech patterns and match it with search query content and context, all in vast quantities. Google has changed speech recognition into a 'big data'-driven cloud service that is self-training, and surely a model for many other future applications.

It also has interesting implications for competition: for while Google's rivals, especially Apple, are striving to catch up, they will do so only if they can match Google's scale, and certainly will have to emulate its geographical and mapping capabilities. And what's more, it's unlikely Google will be selling this data at any price...

Three Vs or four?

Early the next morning Google was also behind my next brush with big data, or rather lack of it. While browsing the Internet via the hotel's Wi-Fi I noticed the adverts placed against my searches were fewer and much less relevant than at home, where my identity can be determined by my PC's IP address. I had spotted Google's advert targeting getting smarter and while it still hadn't got me to make a purchase it was homing in on my credit facilities. It was going beyond pushing adverts merely related to my current or recent search activity to draw in my recent correspondence on Gmail.

Could this be construed as an invasion of my privacy? Not that I am particularly neurotic about such things. But no, Google was probably just tapping into the tone of my emails, and not perusing the content in detail. To Google I am just a number linked to an IP address that can be used to improve ad ratings; but even so, the sum total data involved is very big indeed.

This example highlights what lies at the heart of big data itself. There are two components – not three, as the likes of industry analyst Gartner would have us accept. Gartner talks about the three Vs: volume, velocity, variety; but these are merely necessary conditions for big data to deliver 'value', which also requires analysis and computational speed. The analysis makes use of the scale of big data to improve the quality of decisions, while speed ensures that those decisions are executed when they are relevant. Go-getting exponents of big data who have extensive budgets – manly in the financial trading sector – are busy investigating the potential benefits of real-time analytics.Not all big-data decisions need to be made at speed though.

What is it you really like?

The next example of its pervasiveness came a few days later when I was browsing (for work purposes) at home and was amused to see an advert appear for an online dating site specialising in people aged around 50. Being happily married I was not in the least interested in its introductory services (honestly), but out of professional curiosity opened the link, and was actually impressed by the quality and good taste of the site, once again exhibiting the efficacy of big data to make connections between personal situations as divulged through incidental preferences pulled together to create a composite profile.

Soon after this incident I received a call from someone purporting to be an engineer from my broadband supplier, BT. He apologised for the poor Internet performance I had been experiencing recently, which was incorrect as my line had been fine, and it was quickly apparent this was a spoof call whose purpose was not quite clear. Usually such calls are either for 'phishing' details that might be used for identity theft, or eliciting calls to fraudulent premium-rate numbers.

I didn't bother to pursue this particular case, but it reminded me of a presentation I had just heard while in Amsterdam from CenturyLink, the third-largest US Telco, which has been telephoning its Voice-over-IP (VoIP) customers in the event of service problems, to apologise and offer immediate compensation. The aim is to nip churn in the bud (if you'll allow the mixed metaphor), to stop a disgruntled customer leaving.

The big-data angle is that CenturyLink has been analysing machine data produced by equipment such as routers and IP phones to identify when problems at the network level are occurring. It is digging deep into data that other operators have typically discarded as useless almost the moment it has come in. This is a good example of one major direction big data is now set to take: enabling enterprises to access information that appeared beyond analysis.

I make no apology for the fact that my final example, occurring just as I was winding down toward bedtime, is also drawn from the Google empire, but this reflects that it has gone closer in my opinion to the big data ideal than say Amazon or Apple. I was browsing YouTube, and up popped several recommendations for tracks or albums from 20th century Jazz great Miles Davis, even though I had yet to listen to any online, having all I wanted in my own music collection; but I then realised that other YouTube selections being displayed had, between them, included just about every musician who played at some time with Miles'' and the impressive thing was that one could have been forgiven for thinking that it had been compiled by an human über-buff with an encyclopedic knowledge of his career. Impressive.

The thought of big data doing the thinking for me was sufficiently gratifying to ensure a good night's sleep in preparation for my return journey - who knows what further adventures the next big data-informed day would bring...

Further information

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them