AGBT: Speculating on Third Gen Tech

So, AGBT is over. I’ve reported on the existing tech in my previous post; one thing that I haven’t covered so far is 3rd Generation sequencing. Time to rectify this.

We have three major players, two of which had a strong presence at AGBT. Pacific Biosciences had a major launch (covered extensively elsewhere), and Life Technologies gave a surprisingly awesome presentation on their new Quantum Dot sequencing technology, QDot. Left out was Oxford Nanopore, the other major player in the 3rd Gen sphere; they did not present anything at AGBT, and I hope they all know that I am very angry about this.

We now have some information about these technologies; we know, in broad terms, how they work, and we can make some guesses about how they’ll compare. Based on the extremely limited amount of data we have at the moment, and a few speculative computer simulations (the R code for which can be read here), I’m going to draw some overall conclusions about how each tech will perform in terms of read length, yield, and accuracy.

To get a lot of this data, I’ve made “educated guesses” at the parameters (mostly based on what we know from the PacBio machine). This is an ‘all things being equal’ analysis; I assume that PacBio, Nanopore and all have the same read density, and the same enzyme efficiency; i.e. that the probability of the QDot polymerase dying is the same as for the PacBio polymerase, which are both the same as the DNA strand falling off the nanopore. If any of the companies feel like providing me with their molecule densities and decay parameters for their enzymes (ha!), I’ll happily fix the plots.

I really must repeat; all of these graphs or figures are guesses. I have no actual data beyond what has publicly been announced by the companies. I fully expect much of this to be proved wrong over the next year: this is just my guess at what the machines may look like. I should especially emphasise that I have not used any information sourced via my employer about any of these technologies.

Sequencing Method

Very quickly; all the sequencing methods are based on reading single molecules in some way. PacBio uses a DNA polymerase enzyme anchored to the bottom of a special well that focuses light onto it (the Zero-Mode Wave Guide), allowing the detection of flashes of light as the polymerase makes new DNA.

The QDot system uses a special polymerase with a quantum dot attached; the quantum dot emits light, and passes it on to base-specific dyes; both the dye colour and the quantum dot signal are detected by a laser. Because the quantum dot polymerase floats freely, it can be added directly to any DNA, meaning that new enzyme can be added into the mix to restart the reaction.

The Nanopore system uses pores embedded in membranes, surrounded by exonuclease enzymes, which eat up DNA and then spit it into the pore. DNA is sequenced by measuring the change in current across the membrane when the base pairs go through the pore; different base pairs have different conductance, and thus different signals.

In all cases, like in second generation sequencing, the secret to getting much use out of these methods is to make them highly parallel; each machine will contain thousands or millions of wells, quantum dots or nanopores, each of which can produce sequence. It will be molecule density that has the largest ability to determine which system will have the highest throughput (incidentally, exactly the thing that I assumed to be equal. Sigh)

Read Lengths

Both PacBio and Nanopore have the same limiting factor in their read length distribution; an exponential loss in molecules (through enzymes dying or DNA falling off). While both machines can generate long read lengths, it seems likely that the majority of them will be relatively short. Note, however, that it is possible for an exponential distribution to be nearly flat if the enzyme is super-magical (we know that PacBio isn’t achieving this, and I’d be surprised if Nanopore is either, but we can dream):

The Q-dot system can extend the expected read length indefinitely; if you can get a 100kb read to sit in your system, you will be able to read it eventually, if you run the machine enough times. This isn’t actually that crazy an idea; apparently, you can get DNA molecules to lie flat, allowing very long molecules to get sequenced (you can track the polymerase as it moves along using the quantum dot). The read length distribution below assumes that you restart the system every 1kb. It looks pretty awesome (we are only interested in the single pass distribution right now):

Yields

In terms of yield, Nanopore has the most efficient system; exopores don’t really die (as anyone who has used an RNA-free room knows), and once the DNA falls off another strand can attach, so the yield don’t fall off with time. You can probably keep the machine running for as long as you want (there there will probably be something that wares out eventually, but we don’t know what yet). Nanopore gets the most bang-per-pore, versus the other two.

The yield of PacBio has to be traded off against read length; you can run short (15 minute) runs to produce reads of 1kb or less, but without many of the enzymes dying, or you run for longer, losing a lot of efficiency as the laser starts to fry the polymerase, but produce the very long reads (5kb or more) that everyone is itching for. And remember that efficiency decays exponentially with maximum read length; this is a painful trade off, and is made worse by the cost of constantly chewing through chips.

QDot has a similar problem to PacBio; the enzyme is being lazered, and so is liable to die. However, the restart system means that you can just replace the enzyme, making the trade-off purely one of cost; more restarts equals greater yield AND greater read lengths, but a higher enzyme cost.

Multipass Accuracy

Both PacBio and QDot have ‘multipass’ methods, which sequence the same molecule multiple times, in order to increase accuracy. This is good, because both have the potential to have low accuracy; last time PacBio published any data they had a problem with insertions and deletions, and from the presentation I got an inkling that QDot may have a relatively high substitution rate (this is just a gut feeling from looking at their traces).

PacBio connects the two strands of DNA molecules together to make ‘SMRTbells” which the enzyme runs around until it dies, sequencing both the forward and backward strand over and over. This limits the size of the reads to a prespecified length, and the yield drops with increased read length; the plot below shows the percentage of reads that reach a certain number of passes for various SMRTbell read lengths:

QDot, once again, looks to be the master of this particular game. You can remove the synthesised strand, and resequence the original template again as many times as you want. Not every base of every molecule is read multiple times (the enzyme may die before it hits the top), but much of them are; the red curve on the QDot plot above shows the distribution of read lengths that are sequenced 5 times after 5 runs; about 80% of bases are sequenced 5 times. However, you’d have to run the machine 25 times to get this distribution, which may be pretty costly (depending on, you know, the cost).

I don’t know what the accuracy, or error modes, of nanopore will be. They can’t have any sort of multi-pass system, because they physically eat their DNA. However, given Nanopore’s previous data, I expect the single-pass accuracy will be higher than for the other two systems. Whether it is high enough to compensate for the lack of multi-pass sequencing is left to be seen.

Conclusions

In terms of replacing Second Generation machines for massive throughput, the constantly-running exopores and lack of an optics system makes Nanopore the machine I reckon is most likely to sit in the dozens in sequencing centers (but this relies on Nanopore getting a good pore density in their machines, which remains to be seen). The wonderful flexibility of the QDot system will be good when you really require very very high-accuracy, high-read length, and don’t care about the cost or runtime, and for clever applications like in-situ sequencing (stick quantum-dot fitted polymerase into a cell, and see what it does!). PacBio will have advantages for rapid turnaround of high-quality short-read data via the SMRTbell, and the pretty huge advantage of actually being ready to use ASAP.

Of course, all these conclusions are based on the molecule density and the enzyme efficiency being equal across all methods, which it quite clearly is not. We’ll probably have to wait months for accuracy and read-length distribution stats for the PacBio machine, and probably a year or more before we hear similar things about QDot (out on pre-release in Q4 of this year) and Nanopore (date unknown). However, as someone recently pointed out to me, Illumina is marketing the Nanopore machine, and given how secret they managed to keep the HiSeq, we may well not know anything about Nanopore machines until they appear ready for sale out of the blue.

This entry was posted on Tuesday, March 2nd, 2010 at 7:12 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

9 Responses to “AGBT: Speculating on Third Gen Tech”

Clive G. Brown says:

March 3, 2010 at 6:29 pm

Hi,

The half life of an engineered exonuclease can be very long. As there are no lasers or dyes in the system, there is no photodamage of enzymes or DNA - a major limitation of the fluorescence based systems. Nanopores can be flushed away and replaced with fresh ones during a run.

c.
Luke says:

March 3, 2010 at 6:45 pm

@Clive

Thanks for the info. I sort of assumed that given how long-lived exonucleases are in the wild, they would live pretty much forever (or at least several days) inside your machine. That you can replace the nanopores mid-run is awesome, though.

The decay for your machine that I’m talking about in this post is not decay of the pore or the exonuclease, but the read length decay caused by the disassociation of the DNA from the exonuclease, which I would guess it the major limiting factor in generating long reads (and the decay parameter for this being the driver of the median read length). Or am I barking up the wrong tree here?
Clive G. Brown says:

March 3, 2010 at 7:09 pm

Hi Luke -

I couldn’t resist comment - but we are not currently discussing technology publicly beyond what is published or available on our website. I know that is frustrating but there are good commercial reasons for it. We are working very hard to ensure it is worth the wait and the results are very competitive.

Nature provides.

c.
Luke says:

March 3, 2010 at 8:21 pm

@clive

No problem; I wasn’t trying to push you into saying things that you weren’t ready to say, and I understand why you have to keep quiet. Thanks for stopping by, and hopefully hear something data-wise from you soon!
Stanley Han says:

March 5, 2010 at 4:06 am

A dsDNA or ssDNa binding domain may be attached to the exonulease to reduce the dissociation rate and thereby increase read length.
james@cancer says:

March 5, 2010 at 3:14 pm

Are Illumina ‘marketing’ Oxford nanopre now? I knew they had some investment but not that this was so far under their wing?
Luke says:

March 5, 2010 at 3:19 pm

Yeah, Illumina and Nanopore entered into an exclusive contract to market, sell and distribute Nanopore machines early last year:

http://investor.illumina.com/phoenix.zhtml?c=121127&p=irol-newsArticle&ID=1242758
Veerle says:

March 9, 2010 at 10:34 am

Awsome blog, thanks a million.
Great Content from the Advances in Genome Biology and Technology (AGBT) Conference | Persistent Change says:

March 23, 2010 at 3:34 am

[...] After submitting this post, Luke Jostins informative analysis of the presented 3rd generation sequencers. This includes products from Life Technologies, Pacific Biosciences, and Oxford [...]

Genetic Inference