
Google claims to have fastest supercomputer for AI and ML tasks
Image credit: Canva
Google has claimed that the Tensor Processing Units (TPUs) used to train its artificial intelligence (AI) models are faster and more power-efficient than Nvidia's A100 chips.
Alphabet's Google has released new information about the systems used to power and train its AI supercomputers.
In contrast to most major software companies, which rely on Nvidia's A100 processors for AI and machine-learning workloads, Google has developed a custom chip, which it uses for over 90 per cent of its AI training work.
The search giant has described in a blog post how this bespoke system reportedly outperforms Nvidia's processors in both speed and processing capabilities. The company has revealed how it has connected over 4,000 TPUs to create a supercomputer and developed custom optical switches to help connect individual machines.
Google claims that its supercomputers make it simple to reconfigure links between processors on the run. It also stressed that its chips are up to 1.7 times faster and 1.9 times more power-efficient than a system based on Nvidia's current-generation A100 chip.
“Circuit switching makes it easy to route around failed components,” wrote Google's Norm Jouppi and David Patterson. “We can even change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model because of this flexibility.”
Improving these connections has become fundamental for companies looking to develop AI-powered chatbots such as ChatGPT, which have taken the world by storm in the past few months.
Google joined the chatbot race in February with the launch of Bard. However, the chatbot's wrong responses to questions in its advertisement material caused Google's stocks to temporarily plummet.
The rise in popularity of these models requires improved computing capabilities, as the large language models that power technologies like OpenAI's ChatGPT and Google's Bard are too large to be stored on a single chip.
The models must instead be split across thousands of chips, which must then work together for weeks or more to train the model.
While ChatGPT’s model was trained using up 1,000 Nvidia processors; Google's largest publicly disclosed language model to date, called PaLM, was trained by splitting it across two of the 4,000-chip supercomputers over 50 days, the company has claimed.
The company also stated that its model was used to train the AI-image generation tool Midjourney, which created the cover of one of E&T's magazine issues.
Google has been using the supercomputer since 2020 in a data centre in Mayes County, Oklahoma, although details about the system are only being released now.
Google did not compare their fourth-generation product to Nvidia’s current flagship H100 chip, but it did hint that it might be working on a new TPU to compete with it, with Jouppi telling Reuters that Google has “a healthy pipeline of future chips."
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.