The first day of Biology of Genomes has now come and gone. The ambiguities about twitter and blog reporting have been resolved, and tweeting has started in earnest; you can follow live coverage of most of the talks on the #BG2011 hashtag.
The subject of the evening session was high- throughput genomics. There were a lot of cool talks, but two in particular got my excited, for reasons that I will explain at the end of the post. Both describe very elegant new sequencing-based techniques for studying gene regulation through transcription factor binding.
Digging into transcription factor binding dynamics
Standard ChIP-Seq measures which transcription factor binding sites are occupied over a given period of time; while this is useful for finding which binding sites are occupied, it fails to capture any of the complex dynamics of transcription factor binding.
Jason Lieb and his crew at the University of North Carolina have designed a high-throughput technique to measure transcription factor turnover rate, an important aspect of transcription factor binding dynamics. This assay, called Competition ChIP, allows them two track two separately labeled copies of the same transcription factor, one of which is inducibly expressed. By measuring the latency between the induction and the change in relative occupancy, they can measure the turnover of transcription factor in any particular binding site.
Applying this to a test transcription factor highlighted a number of sites that, while seeming identical to standard ChIP-Seq, have very different dynamics. Some sites had very high turnover, and cycled rapidly between histone and transcription factor occupancy, whereas other sites had long transcription factor residency times.
Modeling complex promoter interactions
Barak Cohen presented research from his group at Washington University in St Louis, into untangling combinatorial cis-regulation (non-additive interactions between non-coding regulatory elements). Cohen’s group uses synthetic promoters in yeast to model transcription factor binding interactions, and has developed thermodynamic-style models to explain gene expression in these models.
So far, this has worked perfectly for synthetic promoters made up of a few transcription factors; however, the more binding sites you have, the more parameters the model has, and the more data is required to fit it. More data means more synthetic promoters, and for more than 4 or so transcription factors this means far more promoters than could be constructed manually.
Instead, the researchers have developed a system for randomly generating synthetic promoters, each of which drives the expression of a gene that contains a random barcode. Paired end sequencing of genomic DNA can be used to discover which barcode is attached to which synthetic promoter, and then RNA-Seq can be used to measure the expression of each barcoded gene.
Using this method, they were able to fit a complex model with 214 different binding sites, which could explain 82% of variation in gene expression. They can use this model to annotate the different binding sites according to the role they play in cis-regulation; as switches, modulators, amplifiers and so on.
High-throughput goes mainstream
One aspect that makes me very happy about these methods is that the both use largely automated, high-throughput techniques to study very specific biological mechanisms. This helps somewhat to fight the fears that many high-throughput biologists have; that high-throughput biology is a short-term project, a flash in the pan that will generate a lot of data resources, but will fizzle out once these datasets are complete.
I have heard statistical geneticists worry about finding all the heritability of complex disease, and having to pass off their lists of associated loci to lab biologists for laborious manual follow-up with their pipettes and reagents and things, leaving us suddenly out of a job.
Instead, we are seeing how the high-throughput paradigm is going mainstream, with new technologies and sophisticated high-throughput assays being used to solve more and more fundamentally biological (rather than statistical or bioinformatics) problems. This is good news for biology, but most importantly, good news for those of us who have trained in the new paradigm, and would probably lose an eye if forced to pipette something.
Thank you to Jason Lieb and Barak Cohen for giving permission for me to write about their work, and giving helpful comments on the post. The image at the top is taken from Wikimedia Commons, and shows the Zinc fingers of the transcription factor ZIF268 (blue) binding to DNA (orange).