View from Washington: Timnit Gebru isn't going away any time soon

A year after her controversial departure from Google, the computer scientist has secured backing for a research institute tackling some of the main challenges in AI.

Former Google AI researcher Timnit Gebru has unveiled an interdisciplinary research group that aims to highlight some of the key data-reliability challenges facing artificial intelligence and how they are treated in the design process.

The Distributed AI Research Institute (DAIR) launched yesterday (2 December) with a goal to “encourage a research process that analyses [AI research’s] end goal and potential risks and harms from the start”.

Gebru has become one of the main voices warning about bias embedded in datasets, particularly those claimed to be so big that they dilute the risk. 'On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?', a paper she co-authored and which prompted her controversial departure from Google, provided evidence to suggest that the reverse was true – a position her then employer disputed.

Nevertheless, its observations have since been echoed by model leaders within Big Tech. Notably, the Microsoft-Nvidia team behind this year’s Megatron-Turing NLG language model acknowledged that while it has so far been refined to a staggering 530 billion parameters, bias within it remains a concern.

As one of DAIR’s first two projects, Gebru wants to promote the distribution of datasheets that should be made available to technology users documenting the “motivation, composition, collection process, recommended uses, and so on” of datasets.

In a paper in the latest issue of Communications of the ACM, she and six co-authors directly compare the proposal to the datasheets that support electronic components. These ensure that device makers have thoroughly reviewed their hardware before its public release, as well as enabling engineers to judge whether parts are appropriate to the projects on which they may be used.

The idea reflects a mounting concern within AI that systems are being developed before many of their elements have been sufficiently stress-tested by creators. It sits alongside worries that security is also being overlooked in the race to grow the AI sector. Together, both issues speak to the age-old observation that problems become far harder to fix the later they emerge and are addressed in a design flow.

DAIR’s second project is building on existing research into ‘The Legacy of Spatial Apartheid’. It will consider how the enforced geographic boundaries between ‘white’ and ‘non-European’ communities have evolved since racial segregation in South Africa was finally abandoned.

It addresses another group of hot topics within AI. Race, gender and diversity have mostly been discussed publicly around bias, not just in language models but equally in areas such as facial recognition. This work aims to show Gebru and her team now seeking to expand the agenda to look more closely at how AI and related machine-learning can be applied to resolving them. The institute is also committed to diversity within its staffing as well as its research.

DAIR joins a growing number of initiatives seeking to address how environmental, social and governance principles should be addressed in AI alongside reliability and trust. These all aim to highlight best practices and potentially inform the various national and multinational regulations being drafted or researched by governments. DAIR, though, is well placed to become a powerful voice given the wide respect Gebru commands in the AI research community and how her experiences are seen as highlighting the problems AI faces both technologically and in the way some of its major players treat the workforce.

DAIR has secured funding from three well-known US foundations, and a gift from the Kapor Center, one of the main vehicles for diversity research in Silicon Valley.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles