AI finds new drug molecules in deep chemical space

An artificial intelligence system has learned to pick new drug molecules from the vast reaches of chemical space, a place where the pharmaceutical industry conventionally gets lost in its search for chemical compounds that might cure disease.

An artificial intelligence system has learned to pick new drug molecules from the vast reaches of chemical space, a place where the pharmaceutical industry conventionally gets lost in its search for chemical compounds that might cure disease.

Chemical exploration takes drugs designers into a place defined by the 1060 number of compounds made possible by just a handful of base molecules that are biologically active and have potential to form new medicines. The possible combinations are so vast that, researchers at North Carolina University said last week, they could not be tested exhaustively for their viablity as drugs.

The team of researchers used machine learning techniques that had been around since the 1990s, but were undergoing a "renaissance", they said. The technique, called Reinforcement Learning, was famously demonstrated by the recent victory Google's DeepMind computer had over reigning champions of GO, the Chinese board game likened in its complexity and difficulty to chess.

A study of the medical breakthrough, published in the journal Science Advances last week, described how the North Carolina researchers used RL methods to explore the 1060 possible combinations in chemical space intelligently: singling out compounds most likely to have the desired chemical properties, instead of having to do brute-force checks of every single one.

Molecular alphabet

They did it by the use of a molecular language that can describes the points of chemical space. It's alphabet was comprised of chemicals, its words were made of chemical combinations, and its syntax and vocabulary described which words might be valid biologically, and not just long strings of chemical gobbledygook.

The North Carolina team created a system that would impose certain conditions on their search through chemical space, which defined the physical, chemical or biological properties of the compounds they sought. Their proof of study concept sought - and found - a molecule that inhibited an enzyme associated with Leukemia. Another parameter singled out by the team, was that the system find only compounds that were feasible to produce.

The results would be used to create what is known as a computational library -- a library, defined computationally, from which drugs companies could pick chemicals as candidates for new drugs, that they would put through the lengthy processes that determined their actual use.

Conventional computational library design methods were criticized, said the journal paper, for exotic compounds that were so difficult to synthesise that it simply could not be done with today's technology. They were often also biased toward known chemical chemicals. This was, they said, because conventional methods had little control over the characteristics of compounds they proposed as potential leads for further medical research.

The AI system had "potential to dramatically accelerate the design of new drug candidates", said the team, from University's Chapel Hill Eshelman School of Pharmacy, in their paper. Most of the chemicals they found to fit their search criteria had never actually been found before.

They named their system using an acronym they said described its significance as a breakthrough in medicine. They called it ReLeaSE, short for Reinforcement Learning for Structural Evolution. To release, they said, meant to "allow or enable to escape from confinement" -- "to set free".

Reinforcement learning

The system's artificial intelligence was housed in two neural networks that partnered in pilot and co-pilot  roles for their exploration of chemical space. The North Carolina team likened it to a teacher-student relationship. One of the neural networks -- the pupil -- generated novel molecules that were chemically feasible and might fit the search criteria. The teacher then produced statistical analysis of the likely behaviour of the compounds it produced. It rewarded the student with a chemical score, or a penalty if if the predictions were bad. The neural generator was programmed to seek the greatest reward.

Alphabet's DeepMind claimed the first ever victory of a computer over a human Go player. But it said, its victories in multiple-game matches over Lee Sedol, considered the greatest living Go player were so profound in their originality that they overturned hundreds of years of received wisdom about the game.

Yet most distinctive innovation the Eshelman School researchers claimed for their work was satisfyingly simple. They based their molecular alphabet on a lettering system (called SMILES - the simplified molecular-input line-entry system), designed in the 1980s to represent the diagrams of interlocking hexagonal shapes usually used to represent chemical structures. It boiled complex chemical structures down to a single ASCII string. As like the aspirin molecule is represented: [CC(═O)OC1═CC═CC═C1C(═O)O]. The neural networks thus did their processing on strings of computer characters. The results they produced were likewise, simple ASCII strings.

Processing such strings in such vast numbers as exist in chemical space was nevertheless made possible only by recent developments in natural language processing and machine translation, said the researchers in their journal paper.

Another critical difference they claimed over other methods for virtual screening of chemical libraries was their unconventional application of statistical models common to the biological sciences and engineering. That was the QSAR (Quantitative structure–activity relationship) model. They fed their statistical analyses back into the system, using them to put pressure on the generative (student) neural network to be more accurate in its search for molecules.

"The ability of the algorithm to design new, and therefore immediately patentable, chemical entities with specific biological activities should be highly attractive to an industry searching for new approaches to shorten the time to bring a drug to clinical trials," said Alexander Tropsha, one of the authors, and a professor and associate dean of biomedical engineering, pharmacoinformatics and computer science, in a University press statement about the journal article.

Recent articles

Info Message

Our sites use cookies to support some functionality, and to collect anonymous user data.

Learn more about IET cookies and how to control them