Biology continues to inspire AI
Three Turing Award winners got together at Nvidia's GTC to talk about the future direction of artificial intelligence and deep learning and how it still relies on emulating brains.
There is no guarantee that artificial intelligence needs to match what happens in biology or even be biologically inspired. The early days of AI research focused far more heavily on building machines that could reason more formally about the world around them compared to the approaches in vogue today; these consist of feeding enormous quantities of data in the hope that a training algorithm will help a similarly large network of simple arithmetic blocks figure out some complex, common pattern intuitively.
“My big question is how do we get machines to learn more like animals and humans? We observe astonishing learning abilities from humans who can figure out how the world works partly by observation, partly by interaction. And it’s much more efficient than what we can reproduce on machines. What is the underlying principle?” Yann LeCun, Meta chief AI scientist asked rhetorically in a panel session organised by Nvidia at its Fall GTC conference that included fellow Turing Award winners Yoshua Bengio, scientific director of the Montreal Institute for Learning Algorithms, and Geoffrey Hinton, professor of computer science at the University of Toronto.
Hinton said he has spent the past couple of years trying to find biologically plausible algorithms for learning that he can fit into the visual-recognition neural network architecture he calls Glom, named because of the way the model is an agglomeration of blocks or capsules of artificial neurons. This is different to the original deep-learning networks where the neurons were not separated out, which led to the problem that it is not possible to allocate groups of neurons to different parts of a task dynamically based on what the system sees.
For biologically plausible learning, the cornerstone of deep learning, backpropagation – to create the gradients that would let an artificial system learn from its inputs – probably has to go away. “I think it’s a fairly safe bet that the brain is getting gradients somehow, but I no longer believe it’s doing backprop.”
Finding alternatives to backpropagation that work, however, has proven difficult. With Glom, one answer might be to stick to what Hinton regards as a fairly dumb algorithm such as the one behind reinforcement learning but apply it to small modules, each of which only perform a limited set of functions. Scaling happens by adding lots of these modules together.
Another key attribute of biological learning is that it happens fairly naturally for animals: they observe and do things and learn from the experience. This is with the exception of algorithms such as clustering, where the machine tries to group like elements together based on their properties, much of what deep learning does on painstakingly labelled data and lots of it, with the emphasis on lots. The one exception is in large language models where the AI is more self-supervised: it uses patterns in the libraries of text it ingests to try to infer patterns and connections.
“Self-supervised learning has completely taken over natural language processing,” LeCun said. “But it has not yet taken over computer vision but there is a huge amount of work on this and it’s making fast progress.”
Bengio said his recent work has been on the “lots of data” problem and how to avoid it. “I’ve been focusing on the question of generalisation as an out-of-distribution generalisation or generalisation into really rare cases and how humans managed to do that.
“Scale is not enough. Our best models that are in vision or playing the game of Go or working with natural language are taking in many orders of magnitude more data than what humans need. Current language models are trained with a thousand lifetimes of text. At the other end of the scale, children can learn completely new things with a few examples,” Bengio said.
Though it’s different to Glom, Bengio’s work has been to look at how neural networks might be designed to incorporate more structure and modularity and in doing so get better at picking apart what they see so they can make inferences about the different things in each image or the concepts in a paragraph. “We’ve been working on generative models based on neural nets that can represent rich compositional structures, like graphs: the kinds of data structures that up to now it was not obvious how to handle with neural nets.”
LeCun added: “I certainly think scaling is necessary but I also think it is not sufficient. I don’t think accelerating reinforcement learning in the way we do it currently is going to take us to the type of learning that we observe in animals and humans. So I think we’re missing something essential.”
Hinton, however, is not convinced that the components are necessarily missing, they just might not be used in the right combinations. “I was kind of shocked by one of the Google models that could explain why a joke was funny,” he noted. “I would have thought that explaining why a joke was funny required the kinds of things we thought these models didn’t have.”
It’s possible, Hinton argued, that better reasoning could emerge without radical changes though it may entail inventing some new modules that can work with the existing set to make them work more efficiently. “I’m not convinced we won’t get a long way further without any radical changes,” he said, and which could simply involve more of the Transformer structures already prevalent in the large language models.
“Those things work surprisingly well to the point that we’re all surprised by how well they work,” LeCun agreed. “I still think though they are missing essential components.”
One key issue is that the models in existence today do not readily handle situations they have not seen before. “We need ways for machines to reason in ways that are unbounded,” LeCun added.
Bengio cautioned that the joke-explanation AI may have received more hints than might be expected. “These models are trained on so much data that it is hard to know if there was not a very similar joke elsewhere and its explanation was also somewhere in the data.”
Another issue Bengio raised is how the models deal with uncertainty. Very often models are quite certain about their predictions even when they should be reporting that they don’t know. “Some people in machine learning have been thinking about this for decades. They invented things like Gaussian processes in the 1990s. They didn’t really compete when neural nets became large but they do have a point.
“Recently, I was really struck by a discussion with a physicist who's trying to use neural nets for discovering phenomena that are going on in physics that they don't have good explanations for,” Bengio added. “He said, 'well, if you give me one model, one neural net that fits all the data well, it's not acceptable for me. Because, if there are multiple theories and they contradict each other, I could just be fooling myself.' It’s another way of saying there needs to be a way to account for uncertainty that's richer than the way we're training these things currently.”
One answer might be to have the model opt for the situation that fits the data best. “But if you consider a task where there isn’t that much data, it becomes much more serious,” said Bengio.
This may be where biology and AI need to diverge as human brains are not always good at recognising where they should be uncertain. The Necker cube, cited by Hinton in the discussion, is one example where the brain flip-flops between two interpretations of the same image. And when you think about it, neither is actually correct. Both are illusions.
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.