Is it time to make touchscreens contactless?
Image credit: Dreamstime
With health concerns driving the uptake of new ways of paying in shops and restaurants, simple but ingenious audio technology could reduce the risks involved in placing an order as well.
Contactless technology is proving invaluable during the current coronavirus pandemic by eliminating the need to exchange cash or press buttons on a chip-and-pin machine.
However, for many businesses, such as fast-food restaurants, the only option for taking a customer’s initial order without face-to-face contact is a touchscreen - and that actually risks spreading infection more widely.
Antibacterial touchscreens exist, but this misses an important factor. As well as being genuinely hygienic, kiosks have to be perceived to be safe by customers. An anti-Covid coating isn’t likely to cut it for the general public: many people won't trust it, no matter how good it is.
Even before the current health crisis, there were reports of gut and faecal bacteria being found on every fast-food touchscreen swabbed in a study conducted by London Metropolitan University.
Could the latest audio technology provide a contactless solution?
If we could just ask for a burger and chips rather than having to go through a complicated on-screen menu system, we could remain socially isolated and talk to the kiosk, then use contactless payment and pick up our order from a service counter. The amount of physical contact would be minimised to a single one-way contact from the short-order chef to the customer.
It's also much quicker to say "Two burgers with chips, one with cola" than go through the laborious process of finding the right buttons to press on a touchscreen.
There are two drawbacks with this audio scenario: the first is general background noise, while the second is other people at neighbouring kiosks who will be making additional noise as they place their own orders.
Automatic speech recognition (ASR) has been around for decades and reliability is on an ever-upward trend, although it is most reliable on systems that require only a limited vocabulary. That's fine, as most kiosks only need a limited vocabulary anyway. However, if you add in a noisy environment with lots of other people around, things aren’t so great for ASR. And, of course, touchscreen kiosks are often in high-street shops with a high customer footfall and are therefore far from quiet.
There are also likely to be several kiosks installed. How could you ensure that your order for burger and chips doesn’t get mixed up with the person at the neighbouring kiosk ordering chicken nuggets? What if your daughter tries to tack on a sneaky cola or ice cream?
It's tempting to think that noise suppression is the answer; after all, there have been some amazing advances in the last few years, fuelled by the artificial intelligence revolution. The results can be great if the signal was already clearly intelligible, but if you listen to what a microphone picks up in a noisy shopping mall, it is hard for a human to understand the raw speech, let alone an ASR system. That’s exactly where noise suppression falls down – it can’t pull out the speech cleanly from such high levels of noise.
You might think beamforming holds the key – focusing the audible signal towards a specific receiving device - but this is surprisingly difficult to do well unless you are willing to invest in a large number of expensive calibrated microphones. Unlike noise suppression, beamforming can provide some improvement in intelligibility, but still nowhere near enough for a general high-volume kiosk.
This is where blind source separation (BSS) comes into its own. It simply needs between four and eight off-the-shelf microphones, with no calibration required – the sort of microphone already found in a mobile phone. The array geometry is flexible: anything between 5-30cm across. Ideally, there is a clear line of sight to the customer and, if space allows, a two-dimensional array is preferable, although a linear array also works.
BSS separates the incoming audio back into its constituent sources automatically, so not only is the customer's voice brought out of the background noise, it’s also clearly distinguished from the voice of the person at the neighbouring kiosk. It can even separate you from your daughter trying to add that sneaky ice cream.
This is all done with data-driven machine learning. The system is continuously analysing the sound field and can pick out the speech of the person in front of the kiosk, adapting automatically to the lunchtime rush or the quiet of a 2am motorway pitstop. Just like a human, but with no social distancing required.
Perhaps it’s time to put a 'Do not touch' sign on our touchscreens.
Dave Betts is chief science officer at blind audio signal separation specialists AudioTelligence.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.