Christine Evans-Pughe looks at a group of computer programmers and researchers who are trying to turn gigabytes into giggles.
"My dog has no nose." "How does it smell?" We all know the traditional punchline to this old groaner: "Terrible!" Computers, however, really struggle to get the joke. It is illogical ("how does it detect odour?" "terrible!"), while retaining an echo of meaning ("how would you define its odour?" "terrible!").
If computers could understand such wordplay, some scientists think they might be used more widely as conversational agents, to aid language teaching, and even to be congenial companions to the elderly and isolated.
The terminally gloomy Schopenhauer and Aristotle (2,000 years earlier) thought that what makes us laugh is the contrast between expectation and outcome: the greater the unexpected incongruity the funnier we find it.
Question: Why did the chicken cross the road? Answer (according to Schopenhauer): It was driven by the incessant striving of the will, only to become dissatisfied upon reaching the other side.
Despite such philosophical attention, a tightly defined formula for what constitutes humour is yet to emerge. However, scientists have been making progress in creating successful joke generating programs, adding simple irony recognition modules to avatars, and developing ways to automatically recognise when a text is intrinsically amusing.
LIBJOG was an early joke-generation program, created by American academics Victor Raskin and Salvatore Attardo in 1993. It used a template to link a specified target group with a stereotypical trait to generate a lightbulb joke: How many [lexical entry head] does it take to change a lightbulb? [number1]. [number1 – number2] to [activity1] and [number2] to [activity2].
Joke-generation has moved on since then, one notable example being the Joking Computer software running at Satrosphere, a science centre in Aberdeen. Graeme Ritchie and Kim Binsted of the Computing Science department of the University of Aberdeen developed this software as part of the EPSRC-funded STANDUP project (System To Augment Non-speakers' Dialogue Using Puns) in collaboration with the Universities of Dundee and Edinburgh.
The system creates simple puns and riddles from sources including a dictionary that contains information about synonyms, simple links between words, and information about phonetics. "Such riddles do not involve much world knowledge – no one needs to know about politics, religion or sex to find them funny – their effect relies on the meanings and sounds of words," explains Ritchie.
The jokes rely on fixed linguistic patterns, such as 'what is the difference between x and y' or 'what sort of x does y' or 'what do you get when you cross an x with a y'. Ritchie and his colleagues devised a number of rules to compute suitable 'x' and 'y' for these patterns, and to relate these words to phrases containing a play-on-words for their working system.
The aim of the STANDUP project (which finished in 2007) was to explore how humour might help non-speaking children use language more effectively. Children with cerebral palsy had a chance to use the software, and got into sharing jokes.
"We interviewed the parents, teachers and children who all felt it was successful but it wasn't a quantitative scientific result in terms of measuring effectiveness. Although one child confided to a research assistant that the jokes were not very good," said Ritchie. Judge for yourself on p87.
Breaking down humour
The inherent funniness of a joke is hard to measure, because amusement is not a readily measurable variable, and even if it was, what amuses each of us varies. Instead, Victor Raskin with colleagues Christian Hempelmann and Julia Taylor at the University of Purdue in Indiana have been pruning jokes to find out at what point they stop being funny. This is part of their work on the Ontological Semantic Theory of Humour (see 'Nailing down humour', opposite). "We know that jokes can vary because so many versions exist," explains Taylor. "So far we have done a pilot study and not reached any major conclusions but the narrative strategy in a joke certainly seems important to its funniness."
In the pilot, the researchers used 10 jokes with five variants of each, which were shortened by various amounts, had dialogue left in, and dialogue removed. Some 200 people recruited using Amazon's Mechanical Turk rated the various jokes online. Results will be discussed at the Annual Meeting of the International Society of Humor Studies. Evidence so far suggests that condensing jokes to the absolute minimum (so there is no extra material before a punchline) can stop them working. Removing the dialogue also appears to reduce the funniness.
The influence of dialogue – a very human element – chimes with research Rada Mihalcea of the computer science department of University of North Texas carried out with Stephen Pulman of Oxford University's computational linguistics group in 2006. They wanted to see if it was possible to separate funny from unfunny texts using machine-learning algorithms to look at content and stylistic features.
They compared a set of 16,000 humorous one-liners and 1,125 funny news stories (from satirical American magazine the Onion) with non-humorous equivalent texts of a similar length.
One notable characteristic they discovered was that the humorous texts were human-centred with lots of personal pronouns. The researchers concluded that the results support suggestions made by Freud and Minsky that laughter is often provoked by feelings of frustration caused by our own, sometimes awkward, behaviour.
One perhaps surprising finding was that of a predominance of negative words such as doesn't, isn't, can't, don't, won't; for example, "Money can't buy you friends, but you do get a better class of enemy", or "If at first you don't succeed, skydiving is not for you."
"We generally think of humour as positive. But negatives always come with some kind of prior context. So there is an expectation and the negation seemed to be defusing that," says Pulman.
The element of surprise
Since then, Mihalcea and Pulman have gone on to analyse humour-processing using eye-tracking experiments, which also have shown the influence of negative words (see 'Negativity and surprise', below). And with Carlo Strapparava of the Fondazione Bruno Kessler (FBK) research institute in Italy, they have developed models to detect incongruity, showing Aristotle and Schopenhauer might have been right.
In the latter work, they focused on linguistic measures of surprise to see if in a dataset of one-liners, each made up of a set-up and four possible punchlines (only one of which was funny), they could to pick out the funny one. One simple measure was to look at membership of a particular domain (medicine, sport, architecture, and so on) to see if the funny punch line had the least similar domain to the set-up. Dictionaries such as WordNet can label words like this in a hierarchy of the most often used meaning. Another approach was to measure 'semantic distance' (how similar in meaning two concepts are) between the punchlines and the set-up, using vector space models.
Vector space models use vectors of numbers to represent texts. Each number is a measure of how many times a target word appears. If proportions of target words are the same, the documents are likely to be about similar subjects. Set-ups and punchlines can be viewed in this way like lines on a graph and the larger the angles between the lines (the cosine is generally used) the less alike they are, the greater the incongruity.
The researchers also ran two simple experiments on the punchlines to test two theories that humour exploits keeping alive multiple meanings (polysemy) and that alliteration can induce expectation and increase surprise.
It turns out that knowledge-based measures using WordNet hierarchies are not very successful alone in detecting incongruous contexts. Better results were found by looking for 'joke specific' features such as surprise word associations, which are detected by a vector space model trained on a corpus of jokes. Not surprisingly, the very best results were obtained by combining all of these measures, suggesting that each one captures an aspect of incongruity.
Multi-lingual analysis of jokes is next on Mihalcea's research list. "There are certainly cultural differences. For instance I'm from Rumania and the most popular jokes all used to be about dictators," she says. "But regardless of language, I would speculate that the dominant features will still be about humans and will be negative."
And a hint of irony
While computers are getting better at detecting humour in text, making them respond in kind so they can interact with humans will need further work, as Stephen Pulman discovered in the EU-funded Companions project.
The aim was to create a sociable computer avatar that could chat about subjects such as a user's day at work by aligning with the user's emotional state to sympathise or cheer him or her up. Pulman's group worked on the analysis of the speech (transcribed to text), detecting linguistic clues associated with emotions like surprise or sadness using software called AffectR that Pulman developed with former doctoral student Dr Karo Moilanen.
AffectR assigns positive, negative or neutral emotional labels to words from a database. It then analyses the grammatical structure of the text in which they occur. In this way, the system can look at mentions of people, organisations and things in context to find out what the speaker/writer feels about them.
In an effort to add a sense of humour, Pulman with colleagues at the University of Teesside hand-coded some software rules that would trigger a humorous ironic response from the avatar if the speaker said something positive in a negative way, or vice versa. The ironic response in effect flipped the emotional polarity as follows.
User: At my performance review I got offered a promotion but it just means more hard work and responsibility.
Avatar: Who likes getting a promotion anyway? You will further your career, isn't that awful?
Additionally, the researchers hoped the text-based clues could be used in conjunction with the phonetic properties of the speech (picked up by a speech recognition package called EMO-voice) to sense when speech and text had different emotions, indicating sarcasm. This was less successful. "The EMO-voice system, while impressive in many ways, was often inaccurate unless you behaved like a drama-queen," says Pulman. "Sometimes it was unintentionally hilarious. The user might say: 'I had a really terrible day, my mother died'. And the system would say 'Great'."
Humour is thought by some to be AI-complete, implying that it is equivalent to making computers as intelligent as people. Or as Graeme Ritchie says: "If you want to do the full scope of humour humans are capable of, you would first need to solve the problem of building a human, which means understanding intelligence, emotion, reasoning, and managing world knowledge." Now, that really is a serious undertaking.