Basics: Sequencing DNA, Part 1

For an embarrassingly long time, I had very little idea how we read people’s DNA. We deal with DNA sequence so often, and use it for such a plethora of things, that if I thought about it at all, my thoughts would have been something along the lines of “er, well, you just, you know, sequence it, right? Run it out on a gel, or, er, something”. I remember years ago admitting this ignorance to a friend, who said “Oh, they have machines that do it”; this response is both reassuring and terrifying. Anyway, I finally rectified my ignorance (about the time I read the book Genomes 3, which filled in a lot of the blanks on the molecular side of my subject); it is actually a pretty fascinating topic, and also a pretty important one, since progress in sequencing technology drives progress in much of genetics generally. So, I thought I’d dedicate a series of posts to sequencing.

While it seems so simple, sequencing DNA is a pretty major challenge. If you ever hold DNA in your hands, it basically takes the form of a long-chained acid dissolved in water. If you want to know something about it you can dye it and run it out on a gel, to see how large the molecules are (large molecules run more slowly through a gel, so you can tell how big the molecule is by how far it moves), but doing much beyond that requires quite a bit of thinking.

Basic Basics: DNA Structure

As I’m sure you’re aware, a DNA strand is made up of molecules called Nucleotides, held together by a backbone made of phosphate and sugar (together, a nucleotide with a bit of backbone is called a deoxyribonucleotide, or a dNTP). Generally, two of these DNA strands will be coiled together. Crudely, double-stranded DNA looks something like this:

dna_labeled

The white shapes represent the nucleotides, which come in four flavors: adenine (A), cytosine (C), guanine (G) and thymine (T), all of which differ slightly in their molecular structure. Nucleotides come in pairs that tend to stick to each other, A with T and C with G; it is this stickiness that holds the two DNA strands to each other. Notice that this means that the two stands have to have a similar sequence (called the Reverse Complement sequence), so that each nucleotide on one strand lines up with its pairing partner on the other strand. It is this relationship that is the key to replication of DNA; given one strand, you can reconstruct the other. It is also the key to Sanger Sequencing, the first form of sequencing, as we will see later.

The PCR Reaction

To understanding Sanger Sequencing (we’ll get there soon I promise), first you have to understand the PCR reaction. The Polymerase Chain Reaction reaction is an artificial version of natural DNA replication, which uses an enzyme called DNA Polymerase to rapidly amplify up DNA. You start with some standard double-stranded DNA:

dna_whole

You heat it up, so that it breaks into two strands:

dna_split

To get the DNA polymerase started, you use what are called primers: short pieces of synthetic DNA that stick to the start of your DNA (or, more generally, the bit of your DNA that you want to start amplifying up):

dna_primer

The polymerase enzyme recognizes these primers once they have stuck onto the DNA, and starts filling in the rest of the sequence after the primer:

The enzyme (represented by a blue blob in the animation above) grabs free dNTPs from the solution around it, and incorporates them into the DNA. It knows which dNTP to use, as only one nucleotide, the pairing partner for the nucleotide on the other strand, will stick at that location. For a flashier animation of PCR see this video, and for an explanation of what happens in the natural version see this video (youtube is awash with pretty impressive science videos if you ever get bored).

So, as I said, we use the PCR reaction an awful lot in molecular biology (I say ‘we’: I spent around 3 months last year trying to get a fiddly double-PCR reaction to work, eventually gave up, and now intend to never go near the thing again). However, there is one little trick that you can do with PCR that leads directly to a way of reading the actual sequence of nucleotides off the DNA.

Sanger Sequencing

There exist broken forms of dNTPs, called dideoxynucleotides (ddNTPs). These are perfectly functional dNTPs, except for the fact that they are missing a little bit of the backbone: in particular, a bit that is required for the next dNTP to attach. Thus, if we have ddNTPs in solution when we run the PCR reaction, this happens:

When the DNA polymerase adds a ddNTP (shown in red), it cannot progress any further, and so stops making DNA. Now, this may sound pretty useless, but think about it for a moment: say we only include G ddNTPs. When the reaction has run for a long time, we will have a load fragments of the original DNA, all of which end in G. Now, we can run these out on a gel, as I described above, and figure what length molecules we have. Since the polymerase starts at the same place each time, we can use the length to figure out which base pairs are G (if we have fragments 10 nucleotides long, the 10th nucleotide must be a G, and so on). If we repeat this for the other 3 bases, we can figure out the exact nucleotide sequence!

This was how the method was first done back when Fred Sanger developed it back in the 70s. However, improvements have been made to the method since. The state-of-the-art in Sanger sequencing is the 4-channel capillary sequencing, developed in the ’90s: each ddNTP base is labeled with a different colour dye, so after the PCR reaction, instead of just having DNA fragments ending in G, you have red DNA fragments ending in G, blue DNA fragments ending in T and so on. Now, you run the DNA down a capillary tube; the smaller fragments move faster, the larger fragments move slower, and you can read off the colour of the DNA fragments as they come out one after another to figure out the sequence.

Developed in the ’90s?

So, Sanger Sequencing is Old. The modern capillary machines are somewhat newer, but even those are pushing 10 years old. However, in certain respects Sanger Sequencing is still top-of-the-range. The 4-channel capillary approach lets you sequence DNA pretty fast, and each machine can often do hundreds of reactions at once. It was these machines that first sequenced the human genome. In addition, you can sequence relatively long sequences of DNA; up to 1000 nucleotides (after that, it gets difficult to tell the difference between molecules of different sizes). We have yet to make a machine that can sequence DNA fragments of this length faster than Sanger Sequencing (yet…). The modern Sanger Sequencing machines were a revolution, they started the so called High-Throughput sequencing, in which a single lab could sequence millions of base pairs, rather than the thousands that could be done prior to their introduction. It is for this reason that these machines are called the First Generation of Sequencing Technology.

However, these machines are still pretty slow. Having to run an entire PCR reaction for each fragment of DNA means that you can’t get more than a few hundred reactions going at once on a single machine. It would take a First Gen machine dozens of years to sequence an entire genome, and cost an unfeasibly large amount of money. For the large scale genome projects that are going on at the moment, like the 1000 Genome Project, or the Cancer Genome Project, a faster approach is required. In the second part of this series, I will discuss Second Generation sequencing, and how it has changed the way we do genetics.

Share and Enjoy:
  • Digg
  • Reddit
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed

7 Responses to Basics: Sequencing DNA, Part 1

  1. I love your polymerase animations! Very impressive :) I must admit, I’d never actually look into how they did sequencing at the Sanger, despite sending a few things off to them to get sequenced.

    Btw, i did send you a facebook message, but I don’t know if you got it: I am now the happy if slightly bemused owner of a T4 bacteriophage. Was it you? It is sitting on the piano threatening my E. coli :D

  2. Great article, and i can’t wait to see part 2, but i can’t find it, is it still understructure or somewhere else? 3xs

  3. Thanks :-)

    I haven’t got around to writing Part II yet, but I have some free time over the next few days so it will be done by the end of this week.

  4. Pingback: Pathogens: Genes and Genomes » Undergraduate teaching materials are neither created nor destroyed

  5. Pingback: Blog

  6. Thanks for this. DNA sequencing can preserve the DNA of significant people throughout history. Blaine Bettinger at The Genetic Genealogist suggested that people like Albert Einstein, and Abraham Lincoln should also have their DNA sequenced as a memento of the past. What do you think of this? http://www.americanbiotechnologist.com/blog/sequencing-sitting-bull/

  7. I must admit that I have never understood DNA sequencing so well until I have read this very illustrative note of yours. I sincerely appreciate you for teaching me and wish you could email to me the promised second generation sequencing when you finish dealong with.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>