A brown cat taking a selfie

Neural network puts an end to awkward auto-cropping of Twitter pics

Image credit: Dreamstime

Twitter is introducing a machine-learning algorithm capable of identifying the most interesting part of a photo and cropping the preview to focus on that area.

Social media is largely about competing for attention among users with snappy text and appealing images. Millions of photos are shared daily on Twitter alone.

However, when images are posted in a range of shapes and sizes, they must be cropped in previews for the sake of a consistent user experience. Often, the most important part of a photograph will be left out of view; for instance, an image preview may display a plain patch of carpet rather than the cat sitting on it.

Given that most users who view a post are likely to scroll past without clicking to see the full images, this is an issue that frustrates many Twitter users.

According to a blog post by Lucas Theis and Zehan Wang, members of Twitter’s machine-learning team, which describes a new approach to the preview problem, this is because Twitter uses simple facial-recognition software to identify the most important part of a picture. While this works on many photos of humans, it does not work for pictures of landscapes, objects or pets. When a face cannot be identified, the preview image is simply cropped to the centre of the image.

In order to fix the problem of awkwardly cropped preview images, Theis, Wang and their colleagues tried a new approach to identifying the focal point of an image which takes into account saliency.

The Twitter engineers collected data from eye-tracking studies which displayed images to participants and recorded the points on the images that the eye freely focuses on first. This data was used to train a neural network to predict which parts of new images their users may be most interested in looking at.

Artificial neural networks – computer systems loosely inspired by biological neural networks – learn by example; for instance, analysing a data set of photographs of humans in which the face has been identified. They are used extensively in image recognition.

Although this approach produced good results, the engineers found that using these neural networks to identify the salient parts of images would take too long, and have a negative impact on user experience.

They made the decision to trade off precision of the predictions for efficiency. To maximise efficiency, they trained a smaller neural network to imitate the neural network trained on the study data, using an approach known as knowledge distillation. They combined this with a ‘pruning’ technique which removed parts of the neural network that were computationally costly to run, but did not contribute much to the final result.

This allowed the neural network to crop pictures ten times faster than the original approach; rapid enough to perform saliency detection on every picture as soon as it is uploaded.

This new machine-learning approach is being rolled out to Twitter users.

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them