AI learning time for machines slashed by 95 per cent using new shortcut
Computer scientists at Rice University, Texas, have adapted a well-established method in order to cut 95 per cent of computations required for deep learning, or even more for massive neural networks.
As tech giants such as Facebook, Google and Apple venture into building colossal deep learning networks as the brains for self-driving cars and other consumer products, they suffer from a fundamental limitation. Training neural network takes an enormous amount of time and energy as the networks sift through millions or billions of data points.
Deep learning is a computationally intense form of machine learning. Deep learning is based on artificial neurons – mathematical functions – which all start out the same and then adapt as they become trained on data sets and pick up patterns in the data. Multi-layered networks can learn to perform more complex tasks, such as speech recognition.
“Adding more neurons to a network layer increases its expressive power and there’s no upper limit to how big we want our networks to be,” said Professor Anshumali Shrivastava, who led the study. “Google is reportedly trying to train with 137 billion neurons.”
Computer scientists are limited by the amount of time it takes to train neural networks, as well as by consumption of energy, memory and computational cycles; the Rice University researchers focused on addressing these limitations by reducing the computational effort required for deep learning.
They decided to adapt a tried-and-tested technique for rapid data lookup in order to cut drastically the amount of computation required for deep learning. Hashing – an indexing method which uses smart hash functions to convert data into small numbers (hashes) in tables – can be adapted to reduce the number of operations required for deep learning.
“This applies to any deep-leaning architecture and the technique scales sublinearly, which means that the larger the deep neural network to which this is applied, the more the savings in computations there will be,” said Professor Shrivastava.
“In small-scale tests we found we could reduce computation by as much as 95 per cent and still be within one per cent of the accuracy obtained with standard approaches.”
According to Ryan Spring, a graduate student at Rice University, efficiency from hashing will be even larger on massive deep networks, because their technique involves exploiting “sparsity” in data; essentially a measure of how empty a data set is of meaningful points.
“So while we’ve shown a 95 per cent saving with 1,000 neurons, the mathematics suggests we can save more than 99 per cent with a billion neurons,” he said.