People sitting on the beach using an Amazon connected device

Alexa, Cortana, Siri et al: do our digital assistants hear more than we want them to?

The new generation of voice-activated digital assistants gives tech companies and hackers another way to collect sensitive personal information.

Voice-activated digital assistants are a fast and easy way to look up information, initiate web-based communications and keep on top of busy schedules. But users may need to start censoring what they say, or face the very real prospect of a digital spy leaking more information than they care to divulge.

Samsung’s newly acquired Viv artificial intelligence platform is about to join the growing number of voice-activated digital assistants available to consumers – these already include Apple Siri, Google Assistant, Microsoft Cortana and Amazon Alexa on their smartphones, dedicated Wi-Fi-enabled smart speakers like Amazon Echo and Google Home, and also Microsoft’s Xbox One entertainment console. Facebook added similar capabilities to its Messenger platform in 2015 while Google embedded an intelligent agent to its new messaging app chatbot, Allo, this year.

Our usage of those digital assistants is steadily increasing. Google estimates that one-fifth of the searches on Android phones in the US are by voice. Microsoft said in May this year that 141 million users a month were using Cortana while Apple claims it has reached two billion Siri interactions a week, with an estimated 41.4 million monthly active users in January.

The vast majority of interactions with digital assistants involve smartphones, laptops, PCs or entertainment consoles. But data collected by independent research company Verto Analytics from over 20,000 US consumers between May 2016 and May 2017 highlighted the growing popularity of voice-activated digital assistants like Amazon Alexa and Google Home.

Strategy Analytics estimates that the total number of digital home assistant devices shipped by Google and Amazon will reach about three million in 2017, numbers very much in the ‘early adopter’ rather than mass take-up ball park.

Whatever the host device, the expected expansion of the digital assistant application and service ecosystem – whereby third-party providers integrate their own apps – will do much to drive their broader adoption. Ocado recently became the first UK supermarket to launch an app for Amazon Alexa, allowing customers to order their shopping using voice commands.

Digital assistants use ML software to process natural language requests for information, goods and services, which offers considerable time saving for internet users who no longer need to type the same requests into browsers and search engines.

But that convenience may come at a price in the form of greater risks to data security and individual privacy, particularly when digital assistants are pre-configured to record all of our conversations and send transcripts of them to remote servers.

“The difference is that you are choosing what you want to put into that keyboard,” said Simon Edwards, cyber security architect at Trend Micro. “The biggest problem with digital assistants like Alexa and Siri is that if you choose to let them do it they are listening all the time – do you really want Apple or Google to listen to all of your conversations?”

Verta’s research also offers detailed insight into which apps US consumers access via their digital assistant, dominated by web browsers, maps, app stores and social media sites – all of which record personal information, preferences and location.

While most of the information shouted at the digital assistant might appear mundane – suggestions for restaurants, planned meetings, recommended temperature settings to optimise home energy consumption, for example – the data offers important clues as to the users whereabouts.

Access to messages, contacts and photos make it possible to snoop on other people’s phones and gather information that can orchestrate other cyber attacks by providing criminals with locational information (so they know your house might be empty) and identity details which can be used (or sold) to form phishing attacks or fraud. Elsewhere IP addresses and unique device identifiers can be harnessed to launch distributed denial of service (DDoS) attacks that bring down networks and websites by flooding them with spurious data traffic.

The unwary could inadvertently disclose usernames, passwords, national security numbers and bank accounts. There is a small chance that information could be accessed by hackers, either during activation or transmission, but those odds increase significantly if it is then stored for long periods on cloud-hosted servers.

Another concern is the integration of digital assistants with IoT-connected home appliances – primarily lighting and heating controls, but also door locks and thermostats, which could be used to gain access or start a fire. Research company Gartner predicts that digital assistants will serve as the primary interface to IoT-enabled connected home services within 25 per cent of households in developed economies by 2019 – food for thought given their remote control capabilities.

Apple, Google, Microsoft et al point out that nothing is recorded until the digital assistant is activated using the ‘hotword’ (Hey Siri, Hey Cortana etc), and that if no vocal match is recognised, the feature will not activate. In some cases (Apple Siri, Microsoft Cortana) you can simply choose not to enable it in the first place.

The situation is very different for smart speakers and digital home assistants, where anybody can request information with no authentication. That means that if you have given Google Home access to your calendars, messages or other personal information, anyone can call them up. Even where the owner’s voice is the authentication mechanism, anybody that can successfully mimic the voice gets full access.

If a stranger is in your home or office you may be the victim of crime already. But the ‘inside job’ is not uncommon in cyber-security incidents, and theft of electronic data can prove a more valuable enterprise for a criminal than making off with the TV.

On any mobile device vulnerable to theft or accidental loss the situation is more perilous – one reason why Google has different policies for Google Home than for Assistant or Allo. Digital assistants on mobile devices and PCs are usually afforded some degree of protection by standard authentication processes, which involve account numbers, pin numbers, passwords and even biometric validation in the form of fingerprint readers and facial recognition.

Evidence suggests that it is easier to gain access to somebody else’s data through their digital assistant than the internet companies would have us realise – just because the device is turned off, for instance, does not mean it is not recording. Cortana listens in and provides access to calendar, email, messages and other content even when the device is locked by default, although the feature can be disabled.

Security company Trend Micro found a passcode override for Apple Siri in 2015 when it discovered that certain questions would provide personal information even when the mobile device it was running on was password protected, including “what’s my name”, “text name/number message”, “call name/number”, “post Facebook status message”, “first name”, “what’s my email address”, and “show me date/timeframe schedule” etc.

More recently, researchers at China’s Zhejiang University hacked digital assistants running on mobile devices and PCs using voice commands outside the range of human hearing. Dubbed DolphinAttack, the technique allowed them to translate normal voice commands into ultrasound which were then tested against over a dozen voice assistant systems including Siri, Alexa, Google Assistant and Cortana. The commands – which included the activation phrases and prompting a Macbook to open a malicious website containing malware – were universally obeyed, even though the humans could not hear the communication happening.

We do know that other smart devices with recording capabilities have been hacked in the past. Samsung, for example, was famously the victim of the Weeping Angel CIA hack that allowed the US government to compromise its F8000 smart TV and listen in to what its owners were saying.

While no demonstrable exploits for digital assistants have so far been discovered outside of the labs, that is not surprising at this stage. Hackers tend to maximise their chances of success by targeting operating systems and applications with the largest number of end users. Digital assistants are a long way off the cyber criminal’s radar, but that is certain to change as their popularity spreads.

We are all well accustomed to personal data being collected – it is nothing that we do not already type into browsers and search engines. But the risk of the digital assistant could be more sinister given that we may not always be aware our conversations are being recorded, processed, analysed and stored.

In a recent report, ‘The New Privacy: It’s All About Context’, research firm Forrester notes that companies often collect far more data than they actually need or use, and were promoting a culture of ‘collect if you can’ among online and mobile applications irrespective of if they thought they could use it to good effect.

Apple has been careful to take a comparatively tough line when it comes to how much data Siri records or sends back to its servers. The information Siri transmits is anonymised using a random identifier rather than an Apple ID, email address or other personal data and deleted when the user turns Siri off. The company says that voice clips are only retained for the purposes of training or improving the accuracy of Siri’s voice recognition engine. Nevertheless they are retained for a full two years, leading to concerns about what happens to them if Apple’s hosting infrastructure is hacked.

Google Assistant, available in Android, Home and Allo versions, provides access to contacts, storage and calendar, name, search history, voice and audio activity and other information on user accounts. Google is honest about using people’s browsing history to deliver targeted advertising (which help keep its services free) and also admits to storing conversations on servers in its own data centres.

The company insists users can view and delete past interactions with the Google Assistant in My Activity, but the data is only permanently deleted from the user’s Google Account – certain ‘service-related’ information concerning the use of Google products is kept, ostensibly to prevent spam and abuse and improve services. There is no time limit on how long the data is kept, only until the user chooses to delete it.

While Google does not sell personal information to anyone, it admits to sharing information with its affiliates and business partners about which the user requested information (e.g. a restaurant or airline).

Amazon’s Internet Privacy Policy explains the type of information it collects through Alexa, its website and browser extension software, and what Amazon does with it. In some cases, that information may be personally identifiable, though Amazon insists it does not take active steps to determine the identity of any Alexa user.

That data can include a name, email address, country of origin, nickname, telephone number, website, company and title for example, as well as browsing history, information about your operating system, a unique identifier enabling Alexa to identify the user’s device, and the date and time the information was logged. A history of the online advertisements displayed on the websites Alexa users visit is also retained, including text, source and URL, alongside the terms entered into search engines and their results.

Not only that, but Amazon can gather other information from third-party websites, such as social media sites, used to interact with Alexa, including much of the personal content posted there.

Given the type and volume of personal data involved, it is no surprise that companies collecting it occasionally clash with national and regional privacy regulation.

Some experts have warned that recording the voices of children could contravene the Children’s Online Privacy Protection Act (COPPA) in the US, for example, which was originally devised to protect young people from pervasive data collection. Video advertising from Amazon and Google frequently features images of children communicating with digital assistants, seemingly unaware or unconcerned that these interactions may contravene data protection laws.

COPPA precludes the storage of a child’s personal information, including recordings of their voice, without the explicit consent of their parents. The forthcoming EU General Data Protection Regulation (GDPR), compliance with which becomes mandatory in May next year, provides strict guidelines on obtaining parental consent before storing and processing personal details (including audio recordings) of EU minors.

Any organisation offering voice-activated services to UK citizens needs to comply with the Data Protection Act 1998, says the UK information commissioner’s office (ICO).

“This means that users need to be informed of how their voice recordings will be used, particularly where those uses might not be expected, such as disclosures to any third parties,” said an ICO spokesperson. “Organisations that store voice recordings abroad will also need to ensure they have a proper lawful basis for doing so.”

Consumers should also consider whether conversations recorded by digital assistants will ever be made available to police and other enforcement authorities on request to help with investigations or be used as evidence in criminal cases.

Google makes it clear that it shares information for legal reasons – where disclosure is necessary to meet applicable laws, regulations, legal processes or government requests, for example, but also to help detect, prevent or otherwise address fraud, security or technical issues, or to protect against harm to the rights, property, safety of Google, other Google users or the public as required or permitted by law.

Earlier this year, Amazon was forced to hand over data recorded by its Echo smart speaker to the Arkansas police for use as evidence in a murder trial. The company initially refused, arguing that such a move would constitute a violation of consumer rights and that the investigator’s case did not merit sufficient cause. The dispute was ended prematurely when the defendant agreed that the audio files could be accessed.

Amazon still maintains that protection of free speech is enshrined in the First Amendment, and as such the police need to follow very specific legal procedures to gain access to any recording made by Alexa and prove that they have a compelling need for the information to be disclosed. Legislation will inevitably take time to catch up with new digital assistant technology, but in the meantime it is highly likely that similar cases will advance a definitive conclusion and that the feds will eventually get their way.

In 2010, Facebook chief executive Mark Zuckerberg argued that consumers’ social norms regarding privacy have “evolved over time”, which justified ongoing changes in his company’s approach to privacy. While privacy concerns are far from dead, it is true that people who have grown up sharing many details of their lives with social media sites appear more comfortable with IT companies’ collection, storage and processing of their data than others.

Wading through the privacy policies for Siri, Cortana, Alexa etc is an onerous task, one which few people bother to undertake when setting up digital assistants, or devices with digital assistants enabled by default, for the first time.

But as the ICO advises, it is well worth the time to familiarise ourselves with the exact terms of those policies to set expectations and help us understand exactly what information is being collected and how it is being used.

“If people have concerns about the privacy issues of such services they should ensure they have thoroughly read and understood any privacy policy before purchasing any device that includes such capability, or before deciding to use any such service that may be available in a product they already have,” it says.

Those that don’t mind the idea of their home becoming the equivalent of a station waiting room or café, where they have no idea who is listening at the next table or what those eavesdroppers could or would do with any snippets of private information overheard, will no doubt carry on regardless.

Otherwise maybe it’s time to either switch off the default ‘always on’ listening mode in the digital assistant or start to consciously censor what is said within its earshot.

“At the end of the day it is another incursion into people’s privacy and the more we get used to having these things around, the more lax we will be,” says Trend Micro’s Edwards. 

Google Home Mini caught eavesdropping on early adopters

Google has been forced to permanently disable a feature of its new smart speaker, the Home Mini, following reports that the gadget had been secretly recording and storing private conversations without its users’ consent.

Artem Russakovskii, an early adopter of the Home Mini, discovered that the device had been constantly recording him for days after noticing its “weird” behaviour and checking its logs, where he found evidence of extensive eavesdropping. According to Russakovskii, the Home Mini had taken thousands of recordings a day and sent these files to Google.

Google confirmed that the recording had taken place, and said that the fault was due to an oversensitive touch control mechanism, which caused the device to repeatedly activate itself autonomously.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles