AGBT: Taking the Statistics out of Statistical Genetics

The second day (or the first day, depending on if you count yesterday’s pre-sessions) of AGBT is nearly done. There has been a lot of things going on today, but I’m only going to cover one; once again, you can get more detail on all the talks I’ve seen on my Twitter feed (@lukejostins).

Other things that I’ve done: I had a very interesting talk with Geoff Nilsen at Complete Genomics, in which I got to ask various questions, including: “Why don’t they use color-space?”, “It confuses customers, and the error model is good enough already”. “In what sense is Complete ’3rd Gen’?”, “Because it’s cheaper”. I also saw a set of presentations from 454 on de novo assembly, and the new Titanium 1k kit, which actually contains virtually no 1kb reads: mean read length is about 800bp, but beyond 600 the error rates get very high.

There has been some other blog coverage of AGBT from our army of bloggers: MassGenomics has some first impressions, and Anthony Fejes is uploading his detailed notes about all the talks. You can also follow a virtual rain of tweets on the #AGBT hashtag.

Fun with Exome Sequencing

Debbie Nickerson (again!) gave a talk about sequencing genomes to hunt down the genes underlying Mendelian disorders. The process is very simple; you sequence a 4-10 exomes of suffers, look for non-synonymous mutations shared in common between them, and then apply filters (such as presence in HapMap exomes) to find SNPs that are likely to be causal. Debbie is in the process of sequencing 200 exomes for 20 diseases, and has a real success story under her belt in tracking down the genes for 2 disorders. She raised the interesting question of how to validate the discovered genes, given that Mendelian disorders tend to have a large number of independent mutations.

Stacey Gabriel gave a related talk on exome sequencing, focusing on using the method Debbie described to track rare variants for complex traits. To do that, you ‘Mendelianise’ the trait, by only picking extreme individuals; She did this for high and low LDL-choloresoral, giving some candidate genes, but no smoking gun.

Let’s look slightly closer at this; you sequence a number of individuals with extreme traits, look for genes with shared non-synonymous mutations, and look for functional effects. This is a linkage study! A very small and underpowered linkage study, with a variant-to-gene collapse method (like a poor-mans lasso), and some sort of manual pathway/functional analysis (a poor-mans GRAIL), but linkage all the same. This is really re-inventing the wheel, without really learning any of the lessons that the first round of linkage analysis taught us (or even stopping to ask whether, if such variants existed, they would have been picked up by linkage in the first place).

It is not that Stacey Gabriel is doing anything wrong; it is just that she is failing to consider that she is attempting to solve non-statistically a problem that statisticians have worked on for decades. In short, she is risking taking the statistics out of statistical genetics.

Share and Enjoy:
  • Digg
  • Reddit
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed

4 Responses to AGBT: Taking the Statistics out of Statistical Genetics

  1. Elisabeth Rosenthal

    I think that your critique on ‘re-inventing’ linkage analysis is unjustified. The problem with complex traits, is that they may have many underlying causes. If we can detect linkage, in the usual way, then looking more closely at the sequence information is quite useful. I think you’ll be seeing some interesting research in the near future pointing to this.

  2. I didn’t say that the approach was a bad one per se. I just think that it shows a recklessness to do what are essentially linkage studies without first considering previous linkage studies used on the same disease.

    Stacey Gabriel is looking for a variants with assumed large odds ratio, with certain functional characteristics, within certain genes. People have previously done association and linkage studies that do the same thing, and lots of times nothing has been found; she should look to see whether her study falls into the realms of one of those.

  3. When I first saw the title of this post, I thought it might be about my talk. If you sequence the whole genome, you don’t need to do [mumble mumble] statistical genetics, because every variant is typed, and so no inference needs to be made about linkage – most of what statistical genetics is about. So by sequencing the whole genomes of families, we are “taking the statistics out of statistical genetics”. Admittedly, there are a lot of caveats in the “mumble mumble” qualification of the claim.

    I, too, was a little surprised by the glibness of Debbie’s description of her group’s approach to Booleanize data that would seem to be much richer if not so transformed. However, I have a tremendous amount of faith in Debbie and her collaborators and her track record. She was giving a very short talk (on the order of about 20 minutes) and most likely needed to quickly describe one easy-to-understand approach. I feel very confident that they will analyze their data in a robust manner, taking advantage of multiple statistical genetic approaches to analysis. There are some issues of multiple test correction and maximizing the power of their study that might benefit from an up-front declaration of a Boolean approach as one primary outcome variable.

    With respect to whether whole genome studies should consider the results of previous linkage studies, I would recommend that most not include data from previous studies in a primary analysis, but that a secondary analysis be performed with those results. Long conversation, but in short: “looking under the lamp post.”

  4. Pingback: how is personality developed? What roles do genetics and environment play in personality development? | Overcoming Procrastination

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>