vol 7, issue 10

Advances in eye tracking and speech synthesis

22 October 2012
By Christine Evans-Pughe
Share |
A close up of a human eye

E&T looks at developments in eye tracking and speech synthesis

Tobii IS-2 chip with a pen for size reference

Eyetracking technology like this Tobii IS-2 has come down in price

Quovadis's wearable brain interface

For those whose eye function cannot interact through gaze, Quovadis is developing a wearable brain interface

A young child using Eyegaze Edge

Children as young as 13 months have used Eyegaze Edge with some success

Eye tracking and speech synthesis are now able to give voice to those with even severely limited movement.

In 'Diamonds are Forever' (1971) James Bond's nemesis Blofeld uses an electronic gadget to synthesise casino owner Willard Whyte's voice to fool our favourite spy. Fast-forward to today and we have technologies that can not only synthesise any voice or accent but that can generate speech by someone glancing at text on a screen. For Tony Nicklinson, suffering from locked-in syndrome after a brain stem stroke in 2005, communication software The Grid'2 linked to an eye-tracker enabled him to argue eloquently for doctors to end his life, until his natural death in August 2012 shortly after losing his High Court appeal.

Nicklinson used a southern English voice by Acapela called 'Graham' that comes with The Grid 2. 'Graham' is a modern text-to-speech engine built using strung-together snippets of real recorded speech that capture changes in intonation and frequency spectrum that make each human voice unique and expressive.

Edinburgh-based CereProc is another company whose high quality British regional voices (including Scottish, Irish, and Black Country) show how far speech synthesis has progressed since we first heard the flat robotic tones of physicist Professor Stephen Hawking, whose voice harks back to an earlier technology based on mathematical models of the human vocal tract.

Eye'm in control

However, authentic synthetic speech is merely the headline technology in breaking down the human isolation of severe disability. The decreasing cost of eye-tracking interfaces over recent years has arguably been more important. Stephen Murray, a professional BMX rider before he was paralysed in 2007, still has his voice but describes his eye-control system from Swedish firm Tobii Technology as like an antidepressant that has put him back in control of his life.

Eye-tracking systems use tiny infrared-sensitive video cameras positioned below a screen to enable users of assistive communication software like to generate speech, control their environment (lights, call bells, television etc), use email, the Web, and social media and even work full-time using only eye movements.

A technique called Pupil Centre Corneal Reflection (PCCR) captures the instances when the eye pauses on a specific area of the screen, and tracks the rapid movements of the eyes between pauses. PCCR works by filming (at 30 to 60 frames per second) the reflections on the cornea (the transparent front of the eye) and in the pupil from an infrared LED light source. Image processing algorithms estimate the position of the eye and the point of gaze by analysing the vectors between the pupil centre and corneal reflections. Bright pupil eye-tracking, where the infrared LED is placed close to the optical axis of the camera (causing the pupil to appear lit up) is the most widely used form of lighting.

Some systems are so accurate that babies can use them. Among the youngest users of the Eyegaze Edge made by LC Technologies (an American company that built its first systems in 1986) is a 13-month-old baby girl with spinal muscular atrophy. "She is smart, understands cause and effect, and is able to run picture-based programs in The Grid 2," says Nancy Cleveland, the company's medical director.

A tiny red dot serves as a screen cursor (in effect the x-y coordinate of the gaze-point) to show where the eye is pointing to on the onscreen pictures of keys and symbols. These graphics make an audible click and flash when they activate, based on a gaze-time that can be set for each specific user. The shortest activation gaze-time is around one-fifth of a second.

Eyegaze Edge systems can work with only one eye, giving freedom of head position for the user, which means babies and adults who have to lie on one side with their head at an angle can use them. LC Technologies' youngest user before was an 18 month-old who couldn't move or speak and was on a ventilator. "He figured out the system in no time at all and is now three years old and uses the system every day," says Cleveland. "A lot of what he understood at first was cause and effect. For example, looking at a cell with a picture of a lion would play a video of a lion. Now there is serious effort being made to teach him to use symbols to communicate."

Algorithms that interpret eye movements are the main patented IP behind these systems, most of which run on PC hardware with relatively low-cost software programs like The Grid 2. But eye-tracking systems still cost several thousand pounds because of the high cost of the camera hardware.

Mass market eye-gaze interaction

Tobii Technology started life in 2001 with eye-tracking systems for studying human behaviour and human-computer interaction. Tobii is now a leading seller of eye-controlled all-in-one computers for people with disabilities. Its recent projects includea concept eye-controlled laptop built by Lenovo; field-tests of an eye-tracking system for driver drowsiness-detection in cars; an Asteroids arcade game that works with both eye and head movements; a prototype eye-controlled television made by Hai; and most recently a concept tablet with embedded eye tracking by NTT Docomo points the way forward.

In March 2012 the company was the recipient of $21m from Intel Capital towards bringing its technology to the mass market. "Computer peripheral eye-trackers used in assistive communication cost around '4,000. To bring that price further down you need consumer volumes," says Sara Hyl'en Tobii's marketing director.

As part of its strategy to bring gaze interaction into the mainstream Tobii now has a 3W single-board eye-tracking camera component that can be integrated into any product. It includes system-independent processing and measures 200 x 25 x 15mm.

Back to the voice of the future

Off-the-peg text-to-speech engines are generally bundled into the communications software and so are not costly. But the future takes us back to Willard Whyte. Today's version of 'voice transformation' means capturing a small speech sample to quickly produce a custom voice. "The goal over the next three years is to be able to produce any voice in this way," says Acapela's chief technology office Fabrice Malfrére.

Voice transformation uses Hidden Markov Models that 'learn' from a small database of information relating to linguistics and prosody (the music of speech) rather like the databases in today's unit selection-based voices. From this material it generates parameters to create speech from a mathematical speech model (vocoder).

Eventually anyone may be able to connect to a website, record 100 sentences or so and automatically get a synthetic version of their voice. R&D systems already exist but for the moment they require more recordings to be able to produce commercially usable speech synthesis. Malfrére sees this technique as being a way to quickly and cheaply add unique voices to all kinds of products as part of the brand-identity whether it is car GPS systems or voices that read the newspaper on your smartphone.

"Improving the quality of long pieces of text is the next challenge," says Malfrére. Building a text-to-speech synthesiser that could read a book (or this article) in a natural way is a task related to the computer understanding of meaning which means using elements of language-context analysis, text pattern recognition, sentiment- and humour-analysis.

Cloud computing would be one way to handle the complex processing, says Malfrére, allowing owners of smartphones, tablets and e-books to access reading services on-demand from a smart server.

Perhaps the population of ageing and increasingly infirm baby boomers who enjoyed James Bond gadgetry first time round will be equally appreciative of the modern successors.

Share |

Speech synthesis: Cut and paste

The smallest components of computer-synthesised speech are phonemes, which are in effect vowels and consonants. So 'Hello' can be split into four phonemes/h/ /eh/ /l/ and /ow/. More of a voice's character can be captured using transitions between phonemes, known as diphones (there are over 1,400 diphones for English). 'Hello' is made up of five diphones/silence:H/ /H:EH/ /EH:L/ /L:OW/ and /OW:silence/.

Most modern text-to-speech synthesis programs use 'unit-selection'a program called a linguistic module first converts text into phoneme sequences (making use of lexicon or letter-to-sound rules). It then selects and strings together appropriate diphones that have previously been snipped out of real sentences generated in a recording studio by a specific speaker.

These multiple examples of possible diphones have linguistic context tags that allow the unit-selection algorithm to choose those that best match the context of the words in the text. For instance tags will indicate the part of speech (noun, verb, etc) and also mark the diphone's original position (beginning or end) in both the syllable and in the sentence it was snipped from. "If you have the same context but one diphone originates from a noun and the other from an verb, the algorithm will prefer the diphone that comes from a noun," says Fabrice Malfrére, CTO of Acapela, a text-to-speech company formed as a spin-off from Mons Polytechnic in Belgium.

A good unit-selection algorithm also takes account of 'acoustic cost'it tries to match adjacent diphones by length, pitch, and frequency spectrum. "For example, it will avoid putting a diphone with a rising pitch next to one with a falling pitch because that produces an impossible pitch pattern with no continuity," says Malfrére.

Prosody - the music of speech - is also important. Humans would say 'Charlie went to the shop to buy some coffee' with a slight break after 'shop' because the sentence consists of two smaller units. And we put pitch accents on words that carry the most information such as 'Charlie', 'shop', and 'coffee'. Unit-selection systems use techniques such as grammar tags and statistical rules to label each syllable with high or low pitch or a transition between high and low to achieve a similar effect.

Software: Grid 2 & JayBee

Sensory Software's The Grid 2 program and Time Is Ltd's JayBee software were both developed in the UK by British engineers. Sensory Software's The Grid 2 provides text or speech output from libraries of symbols, pictures and words. "A child born with cerebral palsy, for instance, could start to talk and use the eye-gaze system to make requests and build sentences from early on using symbol libraries," explains Dougal Hawes, business development manager for sensory software and Smartbox AT.

For literate users, there are grids that include a full keyboard with word prediction, phrase prediction, instant message cells and also ready-made grids (some of which can be downloaded as apps) for common computer tasks including Internet browsing, Facebook, SMS text messaging, Twitter and so on.

Ian Schofield developed JayBee with the text-to-speech company CereProc after two friends succumbed to motor neurone disease (MND). JayBee uses a pattern-matching algorithm developed for the satellite industry to learn about the words and sentences a user commonly employs so it can flag them up as the user starts typing so they can speak in almost real-time. "We call our approach Predetermined Text as it adapts to the user's patterns of communication as it goes along," explains Schofield. "It means that the user almost never has to finish typing a word."

JayBee has been successfully used with Alea Technologies' IntelliGaze eye-tracking system and Schofleld is working with an American company, Grinbath LLC, which has developed a low cost eye-tracking system for around $500.

Related forum discussions
forum comment To start a discussion topic about this article, please log in or register.    

Latest Issue

E&T cover image 1607

"As the dust settles after the referendum result, we consider what happens next. We also look forward to an international summer of sport."

E&T jobs

  • Spectrum Technology Analyst

    Ofcom
    • Baldock, Hertfordshire
    • £Competitive Plus Comprehensive Benefits Package

    Ofcom is the independent regulator and competition authority for the UK communications sectors and we are globally respected for the work we do.

    • Recruiter: Ofcom

    Apply for this job

  • Test Engineering Opportunities

    HMGCC
    • Hanslope Park, Milton Keynes
    • Salary offered will depend on skills and experience

    Push incredible innovations beyond their limits. Opportunities for Software, Hardware, EMC, Test and Inspection Engineers!

    • Recruiter: HMGCC

    Apply for this job

  • Development Engineer Opportunities

    HMGCC
    • Hanslope Park, Milton Keynes
    • Salary offered will depend on skills and experience

    At HMGCC, we’re the place where exceptional creativity, ground-breaking ideas and cutting-edge technologies unite.

    • Recruiter: HMGCC

    Apply for this job

  • Head of School of Engineering and Advanced Technology

    Massey University
    • Albany or Palmerston North

    This role offers an outstanding opportunity to lead and further develop a well-established and internationally recognized School.

    • Recruiter: Massey University

    Apply for this job

  • Engineering Support Opportunities

    HMGCC
    • Hanslope Park, Milton Keynes
    • Salary offered will depend on skills and experience

    Working in one of our support roles, you’ll be integral to the creation of some of the most advanced bit of kit in the world.

    • Recruiter: HMGCC

    Apply for this job

  • Programme Manager, Network Resilience

    Energy Networks Association
    • Westminster
    • Competitive salary, dependent on experience

    Co-ordinate the network resilience, emergency planning and the Single Electricity Number (SEN) work in the ENA Engineering team.

    • Recruiter: Energy Networks Association

    Apply for this job

  • Senior Engineer - Configuration

    BAE Systems
    • Surrey, Frimley, England / England, Weymouth, Dorset
    • Negotiable

    Senior Engineer - Configuration Would you like to assist the Combat System Configuration Manager in ensuring that changes to the Common Combat System design are sufficiently assessed, approved, implemented, managed and controlled in accordance with BAE Sy

    • Recruiter: BAE Systems

    Apply for this job

  • System Planning and Investment Engineer

    SSE
    • Reading
    • 37,000 - £55,000 Plus excellent benefits package - salary depending on experience

    System Planning and Investment team act as custodian of the 132kV and EHV distribution network, provide business with technical expertise.

    • Recruiter: SSE

    Apply for this job

  • Chair in Integrated Sensor Technology

    The University of Edinburgh
    • Edinburgh, City of Edinburgh

    The University of Edinburgh is one of the world’s top 20 institutions of higher education.....

    • Recruiter: The University of Edinburgh

    Apply for this job

  • Metering Engineer

    Department for Business, Innovation and Skills
    • Teddington, United Kingdom
    • £24,109 - £27,961 plus EO Electronics PE of £8,090.00

    We are now looking for a Metering Engineer to deliver RD’s In-Service Testing (IST) scheme for gas and electricity meters.

    • Recruiter: Department for Business, Innovation and Skills

    Apply for this job

More jobs ▶

Subscribe

Choose the way you would like to access the latest news and developments in your field.

Subscribe to E&T