dna strands

Microsoft takes step towards practical DNA data storage

Image credit: Dreamstime

Microsoft, working with a head of researchers at the University of Washington, have reported a breakthrough in using DNA for data storage, describing the first nanoscale DNA storage writer, which the researchers expect to scale for a data density 1,000 times higher than before.

Data are being generated at a rate that exceeds current storage capacity.

Synthetic DNA is an attractive prospect for long-term data storage due to its density, longevity, sustainability, and ease of replication. DNA is estimated to have a density capable of storing 1EB (one million TB) per square inch: magnitudes higher than linear tape-open storage. Storing data this way could also, in theory, keep it safe for thousands of years.

We are accustomed to storing data using bits (0 and 1). Data are encoded in sequences of the four chemical bases of DNA (A, G, C, and T), which can then be 'written' in molecular form through DNA oligonucleotide synthesis, then preserved and stored. When data must be accessed, the relevant DNA is amplified via PCR and sequenced, returning the chemical sequences to digital form for decoding.

DNA storage is not practically useful at present due to high expense and extremely slow read and write throughput. However, Microsoft, which is already a major player in cloud storage, is hoping to harness DNA storage to gain an advantage over competitors. In its latest advance, working with the University of Washington’s Molecular Information Systems Laboratory, it has demonstrated the first nanoscale DNA storage writer.

The researchers produced an electrode array and demonstrated DNA synthesis with electrode sizes and pitches that enables density of 25 million oligonucleotides per square cm: the estimated density necessary to achieve the minimum write bandwidth (kilobytes per second) of data storage in synthetic DNA.

In the synthesis process at the heart of DNA storage, the state of an electrode (activated or deactivated) during a certain step in the process controls whether a new base will be added in the following step. When scaling down the electrode pitch, there is the risk of acid diffusion compromising accuracy, so they designed electrodes arrays where each working electrode is sunk in a well to confine acid. This allowed them to miniaturise the electrode array to the nanoscale (650nm electrodes).

Having demonstrated well-controlled oligonucleotide synthesis on the electrode array, next they evaluated the maximum length of DNA that could be synthesised reliably using the nanoarray. They settled on a 100-nucleotide-long DNA sequence as a good length for their demonstration.

Finally, they demonstrated that the quality of DNA synthesised using the nanoarray was sufficient for DNA data storage by encoding a 40-byte message ('Empowering every person to store more!')  in DNA synthesised on a single nanoarray. They were able to sequence and decode the whole message with no bit errors. The study has been published in a Science Advances paper.

While in this case electrode density was limited by the process node used to produce the array, the researchers expect that the technology could scale to billions of features per square cm, enabling synthesis write throughput to reach megabyte per second levels – approximately 1,000 times higher than previously demonstrated – making DNA storage competitive with other forms of storage.

An additional benefit is that, since synthesis happens in parallel, there is the potential to reduce the cost per DNA sequence significantly.

Writing in a blog post, two of the Microsoft researchers said: “More broadly, this work demonstrates control over the electronic-to-molecular interface, which we posit opens a door to new applications. For example, electrochemical control methods enable spatial control of enzymes at the nanoscale. Beyond DNA, this could also be a tool for drug discovery, by enabling rapid combinatorial organic synthesis as a platform for screening drug-protein binding kinetics. Other examples are a tool for assays that detect disease biomarkers or even a platform for sensing environmental pollutants.”

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles