Twitter and mobile phone data to gauge how big crowds are

Geographical data from Twitter and mobile phones could be used to estimate the size of a large crowd, a new study has shown.

Researchers from Warwick University analysed geo-tagged tweets and mobile phone use over a two-month period in Milan and were able to estimate attendance numbers for football matches.  

They used the San Siro Stadium and Linate Airport – locations with known numbers of visitors – to conduct the study and found that mobile activity rose and fell in close step with the flow of people.

The scientists said the analysis could help measure unpredicted events like evacuations, crowd disasters or protests.

“We found that this automatically generated data provides an excellent basis for estimating the size of a crowd,” said Federico Botta, a co-author of the study. “Quick and accurate measurements of crowd size could be of vital use for police and other authorities charged with avoiding crowd disasters.”

But other researchers said there were limitations and biases with this type of data, because only parts of population use smartphones and Twitter, and telecommunications interference would make the data unreliable.

Tobia Preis, co-author, said: “Our research provides evidence that accurate estimates of the number of people in a given location at a given time can be extrapolated from mobile phone or Twitter data.

“This shows that data generated through everyday interactions with our mobile phones could be of clear value for a range of business and policy stakeholders, potentially offering an almost instant measurement of the size of crowds.”

The study, published in the journal Royal Society Open Science, is part of a growing body of research looking into how online activity could translate into specific behaviours and other real-world phenomena.

Preis said that the relative sizes of the spikes in online activity strongly resembled the official attendance for each match.

“By drawing on historic Internet activity in the San Siro, we were able to generate estimates of the number of attendees which fell within 13 per cent of the true value,” he said.

The airport estimations based on passenger activity were weaker, they said, because exact passenger counts were not available. However, they calculated the number of people in the airport by assuming passengers arrived two hours before their flight and those landing left an hour after touchdown.

“The relationships are weaker than those found in the case study at San Siro, but remarkable given the coarse nature of our estimate of the number of passengers,” said Botta.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles