The Department of Homeland Security

Text analysis targets terrorists

US mines online rhetoric for attack warnings

Automated text analysis could track consistent linguistic ‘fingerprints’ and, to a lesser degree, pre-attack increases in terrorist rhetoric within online postings, according to research funded by the US Department for Homeland Security.

Two IT-based techniques, Linguistic Inquiry Word Count (LIWC) and frame analysis, have been used in an initial study that was disclosed at last week's annual meeting of the American Association for the Advancement of Science.

The project studied and compared documents from Al Qaeda and its sister organisation Al Qaeda in the Arabian Peninsula to others from two groups with similar philosophical aims but which do not use violence.

LIWC detected a gradual increase in ‘function words’ – personal pronouns, prepositions and other words that make up more than 50% of everyday usage. Frame analysis found that terrorists used pre-defined concepts that indicated their ideologies.

In the LIWC analysis, the terrorist groups used fewer long words (those with more than six letters) than the peaceful ones. They also used significantly more social and emotional terms, implying less cognitive complexity. And these trends accelerated in the run-up to an attack.

“And, in other research, we have found similar linguistic shifts in [former US President George W.] Bush around the time that the US went to war in Iraq,” said LIWC trial leader, Professor James Pennebaker of the University of Texas.

In the frame analysis approach, conducted by a team led by Dr Antonio Sanfilippo of the Pacific Northwest National Laboratory, the terrorist documents were more marked by terms that fell into four categories: moral disengagement (‘hate’, ‘fear’, ‘judge’, criticise’); the violation of sacred values (particularly a sense of attacks on religious belief); social isolation (‘confine’, ‘abandon’, ‘withdraw’); and violence and contention (‘attack’, ‘fight’, ‘kill’).

A third manual technique, integrative complexity, found similar results.

All three research groups stressed that this early research had limits. They were able to study 320 documents in total, but only in English translations. They also said more work is needed to build up a lexicon to analyse the content.

“You have to keep in mind, it’s going to be pretty crude, but it’s a pretty crude business we’re in,” added Prof Pennebaker.

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them