Day three of the Biology of Genomes conference is complete. The ELSI session about “Public Genomic literacy for the public” was pretty interesting, and taught us that Americans actually know far more about science than most countries, despite being hobbled by bad high school science education and a preponderance of hostile religious beliefs, and the UK has pretty bad science literacy despite the fact that we once ruled the world. That’s all a bit beyond my pay grade so I won’t comment on it further. I have sort of lost track of the days, so this blog post is a mix of Wednesday and Thursday sessions, tied together by a pretty strong non-coding genome regulation theme. The presentations I am going to talk about came from “Computational Genomics” and “Functional And Cancer Genomics”.
Ewan Birney reported on results from the ENCODE project, a truly massive consortium project to look at the role of non-coding functional DNA. My take-home message was that a chunk of the genome significantly larger than the entire exome can be confidently said to be bound by a protein in at least one cell line. This includes bound transcription factor motifs and DNase1 footprints: these aren’t wooly definitions, and certainly miss out lots of important non-coding annotations. Ewan also presented evidence implying that we have only captured half of what we eventually could if we sequenced all human cell types. Overall, we can guess that for every base pair that codes for part of an exon, there will be another four base pairs responsible for binding proteins. A few other ENCODE talks dug deeper into the data, including one from Mark Gerstein about using transcription factor binding to construct gene networks. Interestingly, he showed that networks connected via distal regulation (i.e. regulation via proteins bound outside the promoter) are very different to those formed by proximal regulation.
As well as cataloging this variation, researchers are also getting better and better at figuring out the mechanisms of genome regulation, and there were quite a few talks that really dug down into the specific dynamics of the genome and its assorted bound products.
Daniel Gaffney presented a sequencing study that mapped the positioning of nucleosomes throughout the genome. A surprisingly large proportion (over 50%) of nucleosomes seem to be non-randomly placed. In some cases, this seems to be because they are stacking up against bound proteins, and this was reinforced by the observation that genetic variants that correlated with reduced DNase1 hypersensitivity also correlated with less strictly positioned nucelosomes. However, in many cases the nucleosomes are held in place by the sequence of DNA itself, by biased binding to different dinucleotides. There are even some repetitive regions that form “nucleosome arrays”, with hundreds of nucleosomes neatly lined up over tens of kilobases.
Presenting the final talk of Wednesday (at 10 o’clock at night), Jason Gertz gave a very punchy presentation about the binding of the estrogen receptor transcription factor. He looked at two cancer cell lines, used ChIP-Seq to find binding sites, and characterized them into those that were shared in both cell lines and unique to one tumor line. It seems that these two groups have very different properties. The shared binding sites were characterized by strong canonical binding motifs and occurred in relatively closed chromatin (at least before estrogen treatment was applied), suggesting deterministic sequence-specific occupancy. By contrast, cell-specific binding tended to occur in the absence of binding motifs, and was often in open chromatin, suggesting active regulation. Specific mechanisms of this regulation could be seen, including enriched co-occurrence with other transcription factors, and increased methylation in the cell line that didn’t show binding.
Moving outside of human lines, Robi Mitra looked into predicting what drives the stochastic variance in gene expression yeast, which occurs even in cells with identical DNA and environment. Robi described a dynamic chemical model that predicted expression variance given RNA synthesis and degradation rates, and looked for genes that showed more variance than would be expected under this model. Intriguingly, most of this additional variation (around 60%) seemed to be driven by the positioning of nucleosomes over the gene’s promoter. Robi hypothesized that the cell may be deliberately placing these nucleosomes in order to drive increased expression noise: some evidence for this was provided by experiments involving stressing the cells with ethanol. This is all especially interesting given Daniel Gaffney’s insights into the diverse mechanisms that can place these nucleosomes.
Thanks to Ewan Birney, Mark Gerstein, Daniel Gaffney, Jason Gertz and Robi Mitra for giving me permission to write about their work. The image above is a structure of a nucleosome from wikipedia.