Don’t feed the trolls: computer model predicts toxicity in online debates
Image credit: Dreamstime
US researchers have developed a computer model that is able to suggest when an online discussion is due to descend into personal attacks and division.
Godwin’s law states that: “As an online discussion grows longer, the probability of a comparison involving Hitler approaches 1 [certainty]”. The proposal may have been tongue-in-cheek, but it describes with surprising accuracy the rapid descent into rude, abusive and unconstructive discussion online, particularly in anonymous spaces.
According to the UN, the majority of women suffer from online abuse, and UK politicians have blamed online abuse for discouraging people – particularly women – from engaging with politics. In December 2017, a parliamentary committee published a report recommending the introduction of laws to make internet companies criminally responsible for abusive content hosted on their platforms, while Mayor of London Sadiq Khan has established a police unit dedicated to tackling online hate crime.
Now, researchers based at Cornell University have proposed a model to predict when online discussions may descend into vitriol.
“There are millions of such discussions taking place every day, and you can’t possibly monitor all of them live. A system based on this finding might help human moderators better direct their attention,” said Professor Cristian Danescu-Niculescu-Mizil, co-author of the study.
“We, as humans, have an intuition of whether a conversation is about to go awry, but it’s often just a suspicion. We can’t do it 100 per cent of the time. We wonder if we can build systems to replicate or even go beyond this intuition.”
The researchers based their work on 1270 conversations between Wikipedia editors, and also used Google’s Perspective, a machine learning tool that evaluates the ‘toxicity’ of statements. The researchers looked at exchanges in polite-toxic pairs, such that the results were not skewed by certain divisive subjects.
Their findings led them to create a program that scans for common warning signs in language used by participants at the beginning of conversations which suggest that a conversation may lead to personal attacks. Early exchanges containing the words ‘I’ and ‘we’, as well as greetings, expressions of gratitude and moderating phrases such as ‘it seems’ were likely to remain polite.
However, conversations that began with many uses of ‘you’ and repeated, direct questioning were least likely to end pleasantly.
The researchers hope that this model could be used to detect and potentially save conversations with risky beginnings, allowing online discussions to be saved without the need for banning certain users or subjects. These interventions could include sending automatic notes to users suggesting that their comments could be perceived as aggressive.
“If I have tools that find personal attacks, it’s already too late because the attack has already happened and people have already seen it,” said Jonathan Chang, a PhD candidate. “But if you understand this conversation is going in a bad direction and take action then, that might make the place a little more welcoming.”