Automated cyberbullying detection remains tough
Image credit: Dreamstime, E&T
E&T tested the hypothesis of whether detection of toxic content on social media can be automated. The results suggest technology has still a long way to go.
The need for spotting toxic content online is getting bigger. Experts say that cyberbullying can have a long-term impact on health. That makes it problematic especially for young people, who are at a greater risk. In recent months, school closures due to the global pandemic have moved more schooling to the web. As a result, some experts fear there has been an increased risk of youngsters being exposed to toxic content.
Billions of messages are uploaded to online social networks every day. For content curators with limited staff and resources, it remains a challenge to stop such content at its inception.
E&T wanted to know to what extent a machine can tell whether a Tweet reflects a threat and where systems fall short.
Take the following example of a fictional tweet: “Dear cyberbullies, do the world a favour and kill yourselves”.
E&T went about and collaborated with researchers to put an automated system to the test. Via state-of-the-art technology, an algorithm was consulted on whether texts like the sentence above constitute toxic content.
The experiment used a new semantic text system currently developed by researchers at the University of Exeter Business School. E&T collected and then submitted more than 13,000 tweets to the team of academics.
All of these tweets used the Twitter hashtag #cyberbullying over the past months. Since January, E&T found a slight upwards trend, which could be a sign of more awareness of the problem.
What does 'toxic content' mean? David Lopez, lecturer for digital economy at the Exeter business school, created a tool called LOLA to detect misinformation, cyberbullying and other harmful online.
Together with his team, he categorised each of the tweet texts into various categories of emotions. Words imply meaning and each tweet is given a score on how angry, joyful, sad, lovely, trusting, or fearful it is.
The final scores show to what extent the text meets the algorithm's criteria on how 'toxic', 'obscene', 'an insult', how 'severely toxic', 'a threat', or being 'identity hate' it is.
Each received a score ranging between 0 to 1, zero being not at all associated with the measure, and 1 having the strongest association.
To make the experiment a bit more interesting, E&T then inseminated 17 fake tweets into the mix of the thousands of real messages. To a varying degree, all of them were of a toxic or insulting nature. Some were more hostile while others were framed in a more subtle manner. These were all composed by an E&T journalist and are all entirely fictional (see results).
The findings revealed that Lopez's system somewhat correctly categorised nearly half of the fake toxic tweets. Those landed among the top 5 per cent of the most toxic ones measured.
The algorithm had next to no problems classifying tweets like: "RT It's the last time, [Name] I will tell you, you suck, you suck, you SUCK!!!!!".
The text received a 100 per cent toxic score and was 98 per cent perceived as an 'insult and obscene'. It received a score of 40 per cent to be 'severely toxic', though it seems it scored low on the anger emotion measure, only at 5 per cent, which raises some questions on how the algorithm could come up with the final result.
Where did the system fall short in meeting E&T's expectations? This was the case especially for tweets that did not use inflammatory words. One example included: “[Name] is too shy to attempt suicide”, the system only measured 0.03 (3 per cent) in the level of toxicity. The piece of text did arguably receive a higher anger score, of 0.43 but the word 'suicide' did not alarm the system enough to bump up the scoring for toxic content, per se.
Lopez admits that spotting irony and cynicism is a hard nut to crack using an algorithmic approach. He says it requires a mixed approach. One element is to avoid leaving the decision-making entirely to the machine alone.
Similarly, E&T finds that the algorithm found it difficult to correctly classify insinuation attempts, such as for committing suicide: “Maybe you, [Name], should just give up and end it”. The words 'give up and end it' were not a strong enough indication for the system to ring the alarm bell.
The algorithm also struggled with mocking text content and those expressing a level of sarcasm. The text “[Name] Nice perfume. How long did you marinate in it?” were measured to expose as much 'fear' as 'joy', according to Lopez's system.
100 per cent error-free semantic classification remains difficult. Lopez says his system, which is currently being tested in collaboration with Spanish police agencies such as the Spanish Agency for cybersecurity, is not supposed to run on its own: "We always leave the decision-making to a human. In the end, what we do is we filter, we narrow down the search space from millions to hundreds to thousands of conversations. But in the end, it's a human who decides whether this is something abusive or not".
Does Lopez worry about whether a system like his could be misused for repressing or monitoring perfectly legal activities, such as someone saying "I don't like my government" or "my government is corrupt"?
His counter-argument is that organisations that monitor people will likely already possess these kinds of technology anyway: “the more the society is aware that this is already happening, the more we will be prepared to prevent abuses from security and defence agents. Those have the ability to monitor our conversations in near real-time whenever we use the mobile phone”.
Automating misinformation is becoming increasingly important. Twitter said on its blog in 2017 that it “cannot distinguish whether every single Tweet from every person is truthful or not”. In 2020, the company revealed that it is experimenting with new features in its beta app that it hopes will address toxic content, according to an interview with Twitter's head of product, Kayvon Beykpour.
Lopez stresses that social media platforms do have the ability and the technology to address misinformation. "They are not doing it because they don't want to, not because they are not able to", he says. He thinks if he, "someone from a remote part of southern Europe, is able to partially address these concerns, certainly people at Google DeepMind can do much more with their technologies".
Lopez's attempt to curb toxic online content with AI is not new. He merely "fine-tuned existing technologies with existing data sets", he adds. Last year, his team tested the system on toxic tweets sent to Greta Thunberg, a young Swedish environmental activist. Thunberg received a great volume of hateful messages online. The system correctly flagged insulting texts as toxic content and served Lopez with a proof of concept.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.