How online facial-recognition systems could endanger our privacy
Image credit: Dreamstime for background; Cloaked image of Ben Heubl
E&T reverse-engineered an online facial-recognition system and revealed how the trend towards facial-recognition-supported online search could expose users to unexpected privacy risks.
When I first used a tool called FindClone, I found myself face-to-face with a fake social media profile that stole my headshot as their profile image. A person decided it’s worth stealing my face for his or her fake VK profile, a social media platform dubbed the Russian answer to Facebook.
FindClone relies on facial-recognition software and compares your input image across millions of VK profile images. It’s free, but you must submit your phone number. I wasn’t shocked that someone stole my image – impersonation using another person’s headshot is scarily common; hackers, criminals and fake online daters (‘catfish’) do it all the time – what really concerned me was how easy it was to make a connection between my image and one of me swirling around in the deepest internet.
A concerning example is Clearview. According to news reports, the New York-based tech start-up scraped three billion headshots and other facial images from social media pages. Scraping is the term developers use for the act of saving data and images on local servers.
With this controversy, the firm hit the headlines around the world. Its CEO justifies the collection spree with being in accordance with applicable laws, arguing: “Individuals in these countries can opt-out.” So why is the company so hungry for our images?
The firm sells access to its database to law-enforcement agencies. The system simplifies their search, enabling them to compare a suspect’s image across Clearview’s database in a matter of moments. If they find a match, they can gather enough intelligence to make an arrest. The company’s marketing material celebrates helping to catch terrorists.
Critics said the company has changed little since its big coming out in 2019, including continuing to operate during the pandemic era. In March, Josephine Wolff, a New York Times opinion writer, wrote that the company’s product remains every bit as dangerous, invasive and unnecessary as it was before the spread of coronavirus.
What are we giving up if companies like Clearview can scrape billions of our personal social images? First, let’s look at the business. Clearview is not alone in the space. A recent investigation by Netzpolitik found Polish company Pimeyes, which scrapes images from the web and sells services to law-enforcement agencies. Pimeyes’ founders, Łukasz Kowalczyk and Denis Tatina, are reported to have amassed a database of 900 million faces.
How profitable these businesses are, we don’t know – as start-ups like these don’t have to disclose their accounts publicly. But it does raise questions whether new demand could encourage new supply. In other words, if business for serving law-enforcement services is lucrative, could we soon expect more firms to follow the examples of Pimeyes and Clearview to broker deals around our faces?
To answer that, we must ask what Clearview and Pimeyes do exactly. How does their offering work? It is best explained by a simplified example. We found that almost anyone can replicate the basic concept of building a facial-recognition system.
E&T emulated their operation by building a small facial-recognition database that was tested by being fed images of my face. It was straightforward, an endeavour you can easily replicate at home if you are a little bit tech savvy. It involved no more than a few lines of Python code (a popular programming language) to match images from a database with ones we wanted to test. The process is described step-by-step below.
How to build your own online facial-recognition system
To build such a simplified system, we must create a database of images to reflect the billions of headshots that Clearview collected. Then we run the algorithm to match those with our input images.
We will eventually explain what it takes to scale it up to dangerous levels – we don’t advocate this step, but it will help clarify the risks it poses to our online privacy.
Step 1: Installing the software
We installed a popular open-source python library called FaceRecognition. There are others, but we used this one as it’s an often-fancied option praised for its simplicity and alleged accuracy. The producers say it’s “the world’s simplest facial-recognition API for Python and the command line”.
E&T’s experiment compared faces of the same person and tested whether the facial-recognition algorithm can identify the person from other images.
FaceRecognition was created using dlib’s face recognition and built relying on deep learning, according to its owners – deep-learning AI systems rely on use of multiple layers in the network. Model accuracy is 99.38 per cent on the Labelled Faces in the Wild, a public benchmark for face verification, making it a popular option and perfect for an experiment like ours.
E&T tested it on a few headshots of myself that have made it onto the internet over the course of several years. The results show the algorithm can make solid distinctions between various facial image types. It recognised me from a self-portrait, as well as from an image that hides my mouth (see graphic).
To get a more reliable sample and see what works and what doesn’t, we added additional images. Image-heavy social media networks are a good way to get started. Sources like Twitter or Linkedin, both of which I have used for years, as well as online magazines and newspapers that display my image, offer enough pictures for experimentation.
Visual data such as video content may also serve as an input source. Despite suffering in resolution quality, facial-recognition searches can work with videos if they are broken down into individual frames. When it comes to video, PimEyes’ partnership with Paliscope is noteworthy. Law-enforcement services use Paliscope’s facial-recognition capabilities to identify people in videos as well as documents.
So, why is this a concern for privacy? Let’s assume you went to a location that reveals personal information; a nightclub, a drug rehabilitation centre, an STI-testing clinic, or a regime critical protest, for example. Now, let us assume a stranger at the same location recorded a video or took photos and uploaded them to the internet. Theoretically, if this material reveals your face, it could expose you to anyone that has access to such software and who wishes to investigate you. A user who sees you and recognises you online remains relatively low risk, particularly given how big the internet is. An automated computer process that looks for your face can be more effective.
However, if Clearview has a client searching for you, an algorithm that scrapes the relevant images could help them spot you in no time. Needless to say, any third party – including the government – with the power to link your identity to the location or people you are seen with, could reveal personal information that you might prefer to keep private.
Step 2: Collecting a database
Online, I copy-paste every image I find of my face and save in a dedicated folder. Arguably, the process is more advanced for companies like Clearview. Instead of copy-pasting each image one-by-one, professionals run automated scraper programs that accelerate the data-gathering process.
To establish a connection between identity and photos, the online images must have a reference. By this, we mean they need to be tagged with a name or information that links them to your identity. In order to be useful – or harmful, depending on your view – for anyone, including Clearview’s clients, the reference images must be indexed. We will do this by calling images by the correct name, such as ‘Ben_glasses.png’, ‘Ben_winterhat.png’ or ‘Ben_cap.png’ (see above).
Governments have an edge when it comes to indexed personal images as they may have your image already on file. They know who you are. For instance, I have a passport and a driver’s licence with an image of myself, of which authorities have a copy. If a malicious government wanted to check whether you went to those aforementioned locations, it could use your indexed passport photos on an image database to compare online videos and images. To what extent European GDPR rules can prohibit companies like PimEyes remains largely unclear.
Note how we keep two folders for our system: one folder with indexed images – the ‘known folder’; the other one with images that aren’t known – the ‘unknown folder’. We tell the open-source Python library to compare the folders. As the system finds matches between indexed and unindexed image folders, we are told about it via a note in the Mac OS Terminal window. You may also request a matching score. If we think the algorithm is a tad too insensitive – i.e., resulting in too many matches – we can adjust the dial for how sensitive the algorithm should be in comparing images.
Step 3: Scaling it up
The final step involves scaling it all up. This means instead of me using only a handful of images, we collect and compare billions of online images – one reason Clearview now faces an international probe.
To see a scalable version working, you can try both FindClone and PimEyes. Both are freely accessible, which makes them more likely to be subject to abuse – for PimEyes, however, you can only upload an image shot from your laptop’s camera, which operators hope will demotivate abuse by those who like to find other people.
If you are based in a Western democracy, including the UK or the USA, PimEyes may give you better results as FindClone only operates on (mainly Russian) VK profiles.
We tested PimEyes and found the results to be astoundingly accurate. By uploading pictures of your face, results reveal where your visage appears on various platforms and which account was responsible for posting it. Out of five results for my lockdown look, which included new glasses, three were accurate. Two of the results surprised me because I completely forgot where and why I had taken the photos.
Are all image searches bad? Some point to tech giant Google, which still offers a reverse image search. You can upload a picture and Google’s search results may include similar colours, patterns or backgrounds, and sometimes the same image that was uploaded. It does though, at the time of writing, avoid running facial-recognition software on your search. How long until this changes? A powerhouse like Google may find it a trivial challenge to make facial images searchable, and the implications are far-reaching. Any image taken of a person in the streets could suddenly become subject to reverse facial-image search. Results for social media profiles or documents could immediately reveal a person’s identity.
Google is constantly trying to improve searches, including those via images. You can already improve your odds of finding faces by adding “&imgtype=face” after the URL from which you’ve specified face results. But results remain mediocre at best and aren’t reliant on facial recognition.
Competition may also motivate Google to add more intrusive facial-recognition search features. Naturally, search engine operators aim to provide the best service as they try to avoid losing users. It is only logical to ship new features to produce better results. What does the competition do? Russian rival Yandex has already switched on facial-recognition features for its image search. That’s why its results are, some say, often superior to Google’s. Yandex also allows for searching images and text together.
If enough people switch away from Google to other search engines, could it push the company to make potentially unethical choices? Ethics is at the heart of the discussion. NtechLab, a facial-recognition company, was reported to have supplied the Russian government with mass-surveillance technology. Today, it serves the Russian state in Moscow to accommodate the effort of mass surveillance.
In 2016 NtechLab launched FindFace, which subsequently got shut down for public use and now only offers a paid-for version, which E&T did not test. It presented something similar to what PimEyes or FindClone offer for free.
Perhaps more controversially, since the pandemic it offered clients the ability to identify people who break Covid-19 lockdown rules. On its website, NtechLab claims “we at NtechLab are hard at work on adjusting and implementing our outbreak and quarantine control system to fight the pandemic”. NtechLab promises it can “recognise home-quarantined people and sends immediate notifications upon their appearance in the camera view even if the face is covered by a medical mask”. Privacy rights activists may find the idea irksome to allow facial recognition to help in hunting down lockdown breakers.
It’s not all bad, however, and online facial-recognition can have some advantages. With improved search features, finding intelligence on other netizens can come in handy for the police, investigators and users. Let’s assume you are ‘blind dating’ a person from online dating application Tinder. Checking whether the person is real and matches the description before you’ve met could help avoid nasty surprises and improve users’ safety.
There is a case for technical investigative journalists employing facial recognition for open-source intelligence work, and there have been occasions where FindClone or PimsEye proved to be useful to check disinformation and individual sources. In both cases here, the question is whether the threat to privacy is greater than the benefit.
Facial recognition remains a controversial topic and various governments have decided it is safer to outlaw it. Some concerns stem from caveats such as inaccuracy and concerns linked to racial bias. Recent Black Lives Matter protests highlighting the lack of racial equality will only add pressure.
In January, the EU commission said it would consider banning facial recognition for up to five years until it discovers acceptable ways to prevent abuse of the system.
Social media companies don't like it: PimEyes’ use of Instagram and YouTube content motivated them to take legal action against the search engine, and PimEyes risks heavy fines for potentially breaching GDPR rules – details remain unclear as to how high those fines could be, but similar breaches suggest they could be considerable. Last year, a fine of €200,000 was imposed on a company for using personal data from public sources.
So, what’s the solution to the privacy conundrum? More extreme policy intervention might work. There are other, more technical, solutions. One is image cloaking, which refers to a technique to make it harder for facial-recognition systems to identify people from images. By alternating tiny, pixel-level changes invisible to the human eye, a personal image is made unrecognisable by the facial-recognition system if the original model was trained on the basis of the altered image. Results by others, including tests run by the New York Times, confirmed that it works on new algorithms that use ‘cloaked’ images.
One major drawback remains: “Cloaking photos with Fawkes does not make your photos difficult to recognise,” explains Ben Y Zhao, professor of computer science at the University of Chicago. E&T tested image cloaking first hand by running the previous experiment on my face, though this time we used the Fawkes closing system on input images. An open-source Mac OS software package offers easy access to run the tool on images locally. Zhao’s explanation is the reason the cloaked images could still be matched with our DIY facial-recognition system, despite being cloaked. In short, cloaking doesn't work immediately and will only pay off over time as algorithms will use my cloaked images that I first have to make available on the web.
Systems like Fawkes still offer some hope in the fight for online privacy (all images in this article received a cloaking treatment). Perhaps one day we can go back to being anonymous netizens, something that made the internet a hit in the first place.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.