The not-so-private party
Image credit: Philips Electronics
As the public web marks its 30th anniversary, has our data now become too public? Can lives be private in the face of a massively expanded web of devices?
Though we are meant to be celebrating 30 years since the birth of the World Wide Web, the kind of web we have today still has a couple of years to go before its thirtieth birthday. The version proposed by Tim Berners-Lee at the end of the 1980s focused mainly on text organised through hyperlinks. The web’s inventor came up with the idea of the far more general universal resource identifier (URI) in June 1994, just at the point that the public was gaining access to the internet.
At the time, the web was just one of many contenders for the system that people would use on the internet to find stuff out. Gopher was far better known in the early days but quickly lost ground as Mosaic spread and itself was steamrollered by Netscape’s Navigator and Microsoft’s Internet Explorer. Though most people would think of the web as being about pages at URLs, the URIs in RFC1630 covered just about anything that might publish data and information to the internet.
The concept of the URI underpins an attempt by Berners-Lee to fix some of the problems that have undermined confidence in online activity. The web was meant to be a vehicle for bidirectional communications. In contrast to the broadcast-centric models of traditional print, radio and TV publication, users were free to contribute their own content and opinions. Social-media companies have encouraged them to publish more and more, letting those companies mine the data for insights into behaviour. Spawning a start-up company on the way, Berners-Lee and researchers at MIT proposed the Solid protocol as a way to return control over their data to internet users.
With Solid, companies that need to identify users, process applications and obtain data from users for other applications would no longer pull that information into their own databases. Instead, the user data goes into ‘pods’ that provide links to the data these organisations want to use through URIs. They will, in principle, be able to see when organisations and their contractors access the data. Users can turn off access when they want to instead of having to guess at how their information will be used when signing up for a service based on what appears in the licence agreement.
Even if it takes off, there will inevitably be gaps in the Solid wall. Though users can retrospectively disable live access to personal data, there appears to be little to prevent content they upload being mined for insights as soon as it is made public in some way. Deep-learning AI models routinely mine pictures taken in public places that very often contain complete strangers, who may be tagged based on social-media records they have uploaded themselves. The problem for those strangers is they have no practical way to object to having their location at these times stored in a database for later analysis.
This inability to control third-party devices that intrude on their privacy concerns people such as Stephen Rainey, research fellow at the Oxford Uehiro Centre for Practical Ethics. He believes there should be limits if not a complete prohibition on data scraping of this kind: fencing off public areas digitally. “Devices could take pictures of course: it’s up to people what they want to do,” he says.
There should, in Rainey’s view, be strict controls over what happens next. “The automatic broadcasting of images and data, and the ability to scrape data from, say live-streamed videos of Leicester Square would be hampered,” Rainey argues. “More generally, if the idea were mooted it could prompt a discussion. Prompted by a proposed regulation, things could at least be articulated and either accepted or rejected by the public in an informed way.”
The issue of unwitting leakage of personal information goes way beyond just appearing in the background of a selfie at an amusement park. Local government agencies around the world want to use forests of sensors and cameras to create what they see as smart cities in an extension of a concept that started as a way of selling consumer electronics.
A decade after Berners-Lee published his ideas for the web, Philips and Palo Alto Ventures formulated the idea of ambient intelligence based on sensors around the home, a forerunner of the modern internet of things (IoT). In this type of system, each device publishes one or possibly a bunch of URIs that let remote computers extract the information from each one and apply a meaning to it.
In principle, you had control over these devices because you or other members of your family would own them. How the devices would take account the conflicting opinions of the other occupiers or even visitors was left as an open exercise. Philips set up its HomeLab to demonstrate what was possible. The projects of the early 2000s predicted services similar to Amazon’s Alexa, though with less emphasis on the remote cloud computing that today powers these services.
Through smart-city and other infrastructure projects, ambient intelligence has moved way beyond the home. At the 6G World Symposium in May, Ian Oppermann of the University of Technology Sydney, pointed to the growing use of roving information robots in airports and similar locations that are meant to guide visitors around. In this scenario, it would make sense for the robot to be able to find out where the visitor is meant to be flying. “It’s probably an inefficient process to try to understand the language I’m speaking,” he noted: it would be easier for the robot to access an online calendar or flight schedule.
The question is whether the robot gives you information you want or need or whether its behaviour is based on what the operator wants or needs. “What if the services offered to me are based on what’s learnt about me and delivered back to me? If my data forms part of a robot’s future knowledge base, can it be removed? What’s happening to my data? And how informed is my consent?”
This issue may seem to be simply one of individual privacy. We could, as with apps today that ask if we want them to track us every once in a while, tell the robot that it can only learn so much from us. But that does not cover the range of possibilities that a combination of AI and practically ubiquitous communication deliver.
“Privacy is important but it’s only part of the story. In the 5G world we are moving into, the way we are observed and surveilled by devices will be by devices we don’t control,” says Peter Leonard, principal of legal and business consultancy Data Synergies and a professor a the University of New South Wales’ business school. You can turn off tracking for apps on phones and other personal devices for the most part. “Things go out of the window when the device ceases to be a device that we control.”
Oppermann points to the increasing use of drones for surveillance. “What if the drone is above my personal space? And what if I don’t know that it’s there?” he asks.
‘We must sort out these issues of who owns, who controls, who has access to and who guarantees access to data.’
The drone itself probably does not know or care that you are below it. What it needs to detect is a clear space in a garden for a delivery, the pattern of traffic movements or possible parking violations that alert a nearby warden to take a closer look. However, the drone’s operator may want to maximise revenues by collecting more data than what is needed for the primary application. It may pass over the same location on a regular basis and so could build up a picture of how people are moving around. To help with that it might harvest Bluetooth, Wi-Fi and other IDs to put into a database in the cloud that makes it possible to see whether the same people keep turning up. To gain some more revenue, the drone operator provides access to the database to other organisations that want to mine it for insights or even try to build profiles of the people it finds.
“There is a range of different business opportunities that start to become possible but we must sort out these issues of who owns, who controls, who has access to and who guarantees access to data,” Oppermann says.
Part of the issue is that people can easily feel the adverse effects of the data exchanges without even being fully identified. Operators that manage individual sensors may not implement privacy checks because they see the data as being inherently anonymous. It is only when data streams from different networks are aggregated that it becomes possible to identify groups or even individuals. On top of that, existing legislation such as the European Union’s General Data Protection Regulation (GDPR) does not readily fit the environment that is likely to evolve, an issue that was recognised more than a decade ago. In a 2011 paper, two US academics specialising in privacy law, Paul Schwartz and Daniel Solove, argued there is a need for a category of information that lies between completely anonymised and personally identifiable, as trying to handle it through privacy legislation would most likely be doomed to failure.
“The scope is broadening quickly,” said Kelsey Finch, senior counsel at the Future of Privacy Foundation in a seminar looking at the privacy issues facing smart cities. Many of the concerns are around discrimination at group level rather than the individual. “It is not just about a name being attached to an individual. We need to pay attention to profiling and tracking. If I can present a different pricing model, change access to services or provide a different vision of the world based on the data, that has an impact.”
One possible response to the issue is to call a halt to the trading of data. “Collection should be on a closed-circuit model, essentially: not one of broadcast, or networking. Data collected from the public shouldn’t be sold,” says Rainey. “I’m dead against a data economy. If you want to know about people, you should ask them, not monitor, digitise and try to predict their next move.”
Another response is to allow some level of data trading but to restrict the types of content that can be shared or how those data sources can be processed. In late June, the European Data Protection Board, set up as part of the GDPR, called for the European Commission to ban the use of facial recognition in public spaces and on other uses of machine learning, such as emotion detection or for social scoring. Individual devices, particularly those installed by businesses and local government, may be prevented from uploading exactly what they detect.
Finch points to the example of cameras on streetlights that are installed mainly for congestion tracking and management. “You can limit the amount of data that can be transferred off the device, which can alleviate a lot of concerns,” she says.
Another route is one of greater transparency in the analysis of aggregated data that might be used to sort individuals into groups. Leonard says organisations would have a responsibility to handle the information in a demonstrably fair and responsible way. An important underpinning is the ability to bring cases independently of the individuals it affects, which is currently not possible with privacy laws. In the US, for example, lawyers have to demonstrate actual harms to individuals caused by privacy breaches even though the breaches were caused by negligence or bad practices.
“It must be sufficiently clear and transparent, that advocates of AI and privacy rights can hold them to account even if, as a consumer, I don’t understand or care enough to hold them to account,” Leonard says.
Backlashes as smart-city and other projects develop may force the hand of governments to act more strongly to define the boundaries between private and public data. But, 30 years on, the web remains a work in progress that could easily spring more surprises on populations trying to navigate the increasing digitisation of society as billions more URIs join the network.
Privacy on the road
Few expect autonomous vehicles to have to work in an environment where all they have to rely on are their built-in sensors and cameras. Vehicle-to-X communication provides a way to give each automobile a much more rounded picture of what is happening around it. The X covers everything from pedestrians to lampposts, with other vehicles just being part of the overall data traffic. In principle, the moving vehicles need not just recognise pedestrians crossing the road visually but from beacon signals sent by their smartphones. Cameras on lampposts can update traffic signals on the flow of traffic, including all those vehicles that, in the early days, will not transmit any V2X data at all.
The most frequent signals, though, will most likely come from the cars that have onboard guidance systems and the V2X transceivers that go with them. Using them, the guidance computers will update other nearby vehicles of which lanes they will use and which turns they intend to take. This is why V2X could prove to be a major obstacle to staying private outdoors.
Perhaps luckily for the public, V2X sits in the intersection between two highly regulated industries: automotive safety and cellular communications. The 5GAA’s work has attempted to take privacy into account while trying to give vehicles a sense of each other’s direction.
The issue today is there is not yet a solution that guarantees privacy. The trade-off is not directly between privacy and accuracy, but boosting one can easily compromise the other. In the V2X system proposed by the 5GAA, vehicles do not have to confirm the identity of each sender to trust it: the messages just have to come from what the system considers a valid participant. To maintain some degree of anonymity, each vehicle or user can pick from a selection of pseudonym certificates and switch between them over time.
An eavesdropper can potentially link these pseudonyms together using other sources, such as street cameras or watching fixed addresses that users visit repeatedly. Countermeasures revolve around techniques to try to make the data fuzzier. One possibility is to try to avoid situations where there is just one vehicle in an area with a particular combination of attributes. This makes it harder to detect which vehicle has switched pseudonyms when a changeover occurs. Some researchers have gone to the extent of proposing fake data being added to the streams to throw off eavesdroppers, though they were only simulated in relatively simple two-car scenarios. The problem with any approach that tries to confuse vehicle IDs even for a short time can just as easily confuse the navigation systems.
As the V2X messages contain references to vehicle size and weight, the Car-2-Car Communications Consortium proposed making the dimensions fairly coarse: to a precision of 10cm or more. Another technique involves changing transmission behaviour when an ID changes, picking different time slots to make the vehicle seem like a different one if the eavesdropper is unable to see what is happening on the road itself.
As it stands, one potential weakness of the anonymisation approach used in the current protocols is the public-key infrastructure (PKI) used to check credentials itself. In an environment where millions of devices need to interact quickly, it is unwieldy as it involves frequent downloads of certificates from the cloud and places too strong a reliance on identification. This is where novel forms of cryptography may play a role.
A team based at the Surrey Centre for Cybersecurity proposed the use of a scheme called direct anonymous attestation (DAA) that uses mathematical zero-knowledge proofs to determine whether a sender is valid or not, avoiding the need for either side to learn the identity of the other. Though the Trusted Computing Group (TCG) adopted DAA a decade and a half ago for the trusted module that goes into servers and other computers, work is continuing on the protocols for real-world use. A particular problem is how to reliably remove compromised nodes from participating. The question of how private automated driving will be used remains open and may call for further work on the underlying encryption technologies.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.