The American Society of Human Genetics’ annual meeting has now kicked off in San Francisco. The usual terrifyingly large conference center, dizzying collection of stalls and posters as far as the eye can see are all very much in evidence, as is the migrating herds of human geneticists wondering the landscape looking for food in the perpetually busy local restaurants.
The first session I attended was given the perplexing title of “Yes Virginia, Family Studies Really Are Useful for Complex Traits in the Next-Generation Sequencing Era”. The session was in honour of the 80th birthday of statistical geneticist Robert Elston, whose development of the Elston-Stewart algorithm in the 1970s kickstarted the field of parametric linkage. It covered the use of next-generation sequencing family studies in the discovery of the hypothesised rare risk variants that everyone hopes to find. This isn’t really a new topic, and I last wrote about it at ASHG 2010, but it had a very different feel to two years ago.
Bigger datasets, fancier methods
The session started with John Blangero, who talked about various quantitative trait studies in families. What came out most clearly from the talk was how many new analysis tools have come online for large pedigree datasets in recent years. Perhaps the most interesting has been the development of efficient variance component (or mixed) models that can be applied to large and complicated datasets. Mixed models have been pretty influential in controlling population stratification in genome-wide association studies, but the same framework can also be used to perform generalised association testing in related individuals (including large, complicated pedigrees). The same framework can also be used to do efficient power calculations in complicated family structures that previously would have been impossible.
Francoise Clerget-Darpoux gave an interesting talk about estimating the true effect size and genetic model for association results using family information. She started with a historical example from the 1980s, when affected sib pairs were used to demonstrate that the INS VNTR was in fact causal in diabetes, and then mentioned a few examples from more recent studies. The general theme of her talk was that identity-by-descent patterns within a family give an independent piece of information that can be used to refine and improve association results. My one problem with this talk was that Clerget-Darpoux seemed to be under the impression that researchers who carry out GWAS are unaware of the issues with taking effect size estimates at face value. In fact, this issue is well known, and many fine-mapping studies have been carried out that estimated true effect sizes without the need for family studies (e.g. 1 2).
Beyond the theoretical, both Blangero and Michael Province described some of the large family datasets that have become available in recent years. Blangero described the analysis of the San Antonio Family Studies dataset and the subset of 600 whole-genome sequences that make up the T2D-GENES project, and Providence described linkage, association testing and sequencing of ~500 candidate genes in over 500 families with extreme longevity.
Who does family studies?
The session ended with a reflection on the history of human genetics from Robert Elston himself, comparing the status quo of genetics in 1953 during his lectures from R.A. Fisher to the status quo from today. However, one statement he made raised a round of applause from the audience, and consternation from me: he described researchers carrying out case-control studies as epidemiologists who falsely believe they are doing genetics.
The reason I found this confounding was that, from where I am sitting, the people who do family studies are largely the same people who do GWAS. The author of MERLIN, one of the most popular programs used for the analysis of family data, also developed METAL, one of the programs commonly used for GWAS meta-analysis. The same group who performed the largest linkage analysis of family data in type 1 diabetes also carried out the largest GWAS meta-analysis of the same trait. On a smaller scale, I have spent about as much of my PhD pouring over large pedigrees, chasing suggestive linkage peaks and analysing sequence data from families as I have carrying out GWAS. People have been running linkage studies, collecting large pedigrees and prioritising samples with family history of disease for sequencing and genotyping throughout the GWAS age, for all the reasons that the speakers discussed and more. However, if you listened to some of these speakers you may have come away with the false impression that people who carry out GWAS are a different species, ignorant and uninterested in the classical techniques of statistical genetics and family studies.
The reason that the last five years has been the age of GWAS is not because a dominant set of researchers have only been doing case-control studies: it is because it is GWAS, out of all the studies that we have done, that has generated the most results. If the 2009 type 1 diabetes linkage meta-analysis of 5,000 cases had firmly established 18 new loci, and the GWAS meta-analysis of 7,500 cases had discovered a few sub-significant suggestive peaks, then I expect we would be talking about the age of the linkage meta-analysis. And if in the next five years we start seeing family studies with titles like “20 new penetrant risk variants for major depression” I don’t doubt we will start calling this the age of pedigree sequencing. But we shouldn’t confuse “it was successful” with “it was what was in fashion”, and especially not with “nothing else was tried”.
The image at top of the page is a family tree, taken from .