The first day of the Advances in Genome Biology and Technology conference is not until tomorrow, but today there were a couple of pre-sessions. These were on pretty much opposite ends of the spectrum; one was a series of general, high level talks by the users of high-throughput sequencing, and another was a series of technical talks by a manufacturer of sequencing machines.
As usual, this blog post is just a summary of a few aspects that I found interesting. More in depth coverage can be found on my Twitter feed, @lukejostins.
Running a Sequencing Facility
The subject of how to build, run and scale up a sequencing facility may seem, at first glance, a little dry; but I found two talks by two heads of sequencing labs fascinating.
Debbie Nickerson gave a talk about her experience with scaling from a few machines to a major operation. Debbie seems to keep it running smoothly mostly through collecting crazily amount of data on samples, libraries and runs, and producing a range of tools to quickly examine this data to locate problems, and to keep information flowing between the different parts of the lab. A nice example was a tool to flag up common library failures from sequence data, and automatically e-mail the library prep team to reprep the sample.
Susan Lucas gave an overview of the kind of questions you need to think about when planning a genome center. There are obvious things you need to consider; for example, making sure that you can transfer, store and process data. However, she also talked about some more interesting questions: Can your pipeline incorporate new technologies, or new platforms? Can it handle plant DNA, or E. coli? Have you considered the ergonomics of the space; will repetitive tasks cause repetitive strain injury? Do you have an emergency strategy? What will you do if the sprinklers go off?
Sequencing center logistics is right at that interesting intersection between data management, tool development, statistical inference and decision theory; I love the contrast between the the statistically high-flying (“detect signs of quality degradation in sequencing imaging”) and the mundane (“tell Bob to clean the lenses”).
The late afternoon was taken up by a series of talks by Illumina on new developments in their sequencing tech. It was interesting to see, given that Illumina have already broken their big story with the HiSeq 2000; as opposed to any single big announcement, the talks were all about how extra sequencing capacity is squeezed out of the existing technologies (though ‘squeezing’ is probably not the right word for Illumina’s 12X increase in GAIIx sequencing yield over the last year).
Particularly interesting was Sheila Fisher‘s the talk on the performance of the Broad Institute’s Illumina pipeline; 60% of their machines now run the 2X100bp protocol, producing over 5Gb per day per machine, with other machines running 2X150bp, with a higher cluster density, giving 7Gb/day. For the latter, a single machine could produce a 30X genome in a single 2-week run, or all the sequence produced in Pilot 1 of the 1000 Genomes Project in 9 months. Once the HiSeq is brought up to the same cluster density, it will produce 43Gb per day; enough to generate all the Pilot 1 data in 6 weeks.
The scale of this sequencing production is staggering; the HiSeq could get to 43Gb a day without any new innovations, and I expect that there is another 2-4X increase in capacity that Illumina could bring in from incremental changes to cluster density and imagine processing over the next year. As I’ve said before, second generation sequencing still has a lot of room to grow.