DNA sequencing companies gear up for the sub- $1,000 genome.
Lee Hood of the Institute for Systems Biology in Seattle, Washington is in little doubt as to the amount of information he wants to capture to be able to diagnose diseases and provide more effective treatments. "I like to go deep," he says.
He wants to capture as much information as possible, as clues to the sources and treatments of disease for individuals will come from many different parts of the genome as well as information from the proteins and other substances each organ of the body produces.
The problem that faces biologists is that no-one is quite sure yet which parts of the genome will affect treatment. The workhorses of the body are proteins. They are long chains of atoms that fold up in different ways to expose tiny chemically reactive areas. The active ingredient of the blood cell, haemoglobin, is a protein that lets four molecules of oxygen bind to it on the way out of the lungs. Once it has carried the oxygen to a muscle to be used, it takes molecules of carbon dioxide back to be exhaled.
Body functions are almost all handled in some way by at least one protein, and genes provide the basic blueprints for those proteins. Cells are made out of a combination of complex molecules from basic chemicals and it falls to proteins to build most of those molecules. Before the human genome was sequenced, what biologists knew was that genes tell some of these builder proteins what to make. By unlocking the secrets of the genome, biologists could surely work out how the body works.
Life is nowhere near that simple, it seems. Scientists found there was a problem with the genome that had been sequenced: there were fewer genes in the human DNA sequence than expected.
Estimates had put the figure at 100,000 at the start of the project in 1990. By 2004, researchers concluded there are fewer than 25,000. This is ten times lower than the types of protein in each person's body. Projects to sequence other organisms turned up similar results. What is even worse is that the difference between human's genomes and different species, such as chimpanzees, is pretty small.
Biology at work
As they tried to work out what the genome project revealed about the body, biologists began to understand that more complex processes needed to be invoked to determine how each of the active genes produces the proteins that appear in each different type of cell that makes up the human body.
Those focusing on the genetic effects on disease looked closely at tiny variations in the genome, where a single part of the code - a nucleic-acid base - was swapped with another. These single nucleotide polymorphisms (SNPs), generally nicknamed 'snips', looked promising for a while but, even there, scientists are finding it hard to explain the variation in humans using SNPs alone. The difference in SNPs between individuals' genome is less than 1 per cent.
Michael Snyder of Yale University told delegates at the recent International Conference on Systems Biology in Gothenburg, Sweden that there may be another, larger source of genetic variation between people. These are structural changes: the genes are all there but they turn in different places in different people. And there may be different numbers of copies of the same gene, some of which are silent and others that have an effect on how the body develops.
"Between two individuals, the difference may only be a few per cent, but that is still more than what has been detected by SNPs," says Snyder.
If the work by Snyder and others turns out to be correct, it will have ramifications on the way physicians perform genetic tests on patients. If variation was just a matter of looking for important SNPs, it would be a relatively simple matter to make chemical probes to look for them. If structural variation is important, then it means nothing short of full sequencing will be needed. Doctors will not be able to decide which drugs will have an effect unless the computers they use to help them know exactly where all the genes are. On the basis that medicine will need intensive knowledge of patients' genomes, companies are racing to develop much more efficient, and cheaper, ways of building those complete sequences.
The problem with gene sequencing today is that it is incredibly expensive and labour intensive. Although Celera Genomics, founded by J Craig Venter, found a way to use supercomputers to speed up the process, it requires a lot of human intervention to generate just one genome sequence. When 454 Life Sciences sequenced the genome of James Watson, who helped uncover the basic structure of DNA in the 1950s, it used a faster procedure than that used for the Human Genome Project. But it still cost millions.
Companies are using automation to speed up the process of sequencing, using robots to prepare samples for processing and deliver fragments of DNA to massive arrays of chemical sequencing machines.
Automation is gradually bringing the cost of sequencing down. However, current technology is unlikely to get anywhere near the target of sub-$1000 sequencing.
The technology needs to shift to methods that do not involve lengthy preparation steps which demand that huge numbers of copies of DNA fragments be generated in order to get enough data to process. The trend is toward techniques that deal with single molecules of DNA.
In an ideal world, you would take an individual chromosome and feed it to a machine that would snip away at the polymer one base at a time and generate the sequence for you.
Unfortunately, this is the opposite of what happens in today's sequencing technology (see box 'Shreds of evidence'). But companies such as Helicos Biosciences, Oxford Nanopore, Pacific Biosciences (PacBio) and VisiGen are moving towards this ideal. They take advantage of different ways of getting organic chemistry to interface with electronics.
With the exception of Oxford Nanopore, the entrants are taking advantage of the fact that there already exists one system that can detect bases on a DNA strand. It evolved several billion years ago and has barely changed since. Unfortunately, the DNA polymerase enzyme, which generates double-stranded DNA from a single-stranded template, does not give us a direct readout of the bases it encounters as it generates new strands of the macromolecule. However, it is possible to get a picture of what the enzyme does by watching which molecules it attaches to the growing strand.
For years, biologists have been using tagged nucleic acids to analyse how life works. They started off using radioactive tags, often replacing the phosphorus in the backbone of DNA with the 31P isotope. They later shifted to fluorescent tags: nucleic acids that carry side chains that glow when lit. A camera attached to an optical microscope picks up flashes of light caused by the activation of fluorescence.
If you sit a DNA polymerase in a bath of these phosphorescent chemicals together with single-stranded DNA, you can track which nucleic acids get added by recording progress with a camera. Helicos, PacBio and VisiGen are taking this route to building smaller, cheaper gene sequencers.
The resolution of optical microscopes is way too coarse to distinguish the fluorescent flashes from two DNA molecules that are less than a few nanometres apart. DNA polymerase itself is a big molecule but still less than 20nm across.
Helicos separates the strands by populating a slide with short strands of DNA that contain just a sequence of thymine, or T, nucleotides, and a fluorescent side chain.
The DNA segments fed to the machine are each attached to a chain of adenosine bases, which attach themselves to the T strands. A computer works out where they are by looking for the array of glowing strands captured from a CCD camera, providing a template for the actual reading process. This involves flowing a suspension of polymerase and fluorescently tagged molecules of a single base over the slide. If those bases match the next one in line on a strand of DNA, the polymerase will add it.
Everything is then flushed away and the fluorescent tags snipped off by another enzyme. This is repeated three times with different bases until all of the chains of DNA have have had a base matched. The machine keeps flushing, adding chemicals and taking pictures until the reactions have stopped, indicating the end of the chain.
The computer should then have sequences for each of the strands that can be seen by the camera.
PacBio captures the molecules in so-called zero-mode waveguides etched into the surface of a slide. These waveguides are shaped to confine light to tiny areas, giving the optical microscope a helping hand. Instead of putting DNA into the holes that form the centre of the waveguide, PacBio attaches a mutated form of polymerase that is able to add fluorescently tagged bases so that the 'old' tag is snipped off as each new one is added.
Although these techniques show promise, they still demand a source of expensive chemically tagged nucleic acids. This is where the nanopore technology comes in. "Since the dawn of sequencing there have been fluorescent tags. But what we are doing is label free," says Gordon Sanghera, CEO of Oxford Nanopore.
The idea behind the nanopore is that you can sense the nucleic acids directly if you can get them to pass through a small enough hole. The idea behind the process is to pass sodium and chloride across the hole. When a nucleic acid base is threaded through, it blocks the channel temporarily. But, each one has a slightly different effect that is large enough to be picked up by a current sensor. There is no need to add tagged bases or deal with enzymes, you just thread DNA strands through the holes. That is the principle, at least.
Right now, making holes of the right size is not easy and there is a further problem. Once a strand of DNA finds the hole and starts moving through it does so at high speed. "They run through at a million bases per second," says Sanghera, "too fast to be detectable."
So, the company has worked on a way of slowing the process down until it can develop a detection technique that is able to cope with high-speed translocation. To get the hole to the right size, the company floods the surface of the nanopore slide with fatty acids. This forms a soft membrane layer into which a cluster of proteins can sink. They cluster around the holes, providing a snug fit for the DNA strands to flow through.
Because large strands of DNA would slip through too quickly, the company slows the process down by not allowing entire strands through. The protein has been altered so that it snips one base off the strand at a time, which then sinks through to the detector. Until it runs out of DNA, the enzyme keeps eating.
Oxford Nanopore is working on a demonstration of a prototype system and has signed up with several universities in the US as well as Oxford to try to become the centre of DNA sequencing using nanopores.
The company is likely to follow those backing fluorescent sequencing into the market. Helicos is already selling its system and PacBio is aiming for a 2010 launch.
Nanopore technology is at a less well-developed stage but, if it works, should be cheaper than the other techniques and fit into smaller hardware, although it will still need some pre-treatment of the DNA. Single-molecule sequencing remains some time into the future. But everyone has their eyes on the same prize: a mass market for DNA sequencing that may not need to work at the single-molecule level.