Tag Archives: genetics

David Goldstein Proves Himself Wrong

A recent paper in PLoS Biology by David Goldstein’s group is being seen as another ‘death of GWAS’ moment (again?). I have a lot of issues with this paper, but I will be brief and stick to my main objection; the authors attempt to demonstrate that common associations can be caused by sets of rare variants, and in doing so inadvertantly show they most of them are not.

The Paper and the Press

This is another example of a scientific paper being careful to make no solid, controversial claims, but being surrounded by a media story that is not justified by the paper itself. The only real solid claim in the paper is that, if you do not include rare SNPs in your genome-wide association study, and rare SNPs of large effect are contributing to disease, then you will sometimes pick up more common SNPs as associated, because they are in Linkage Disequilibrium with the rare SNPs. Pretty uncontroversial, in so far as it goes. The paper makes no attempt to say whether this IS happening, just says that it CAN happen, and that we should be AWARE of it.

However, in the various articles around the internet, this paper is being received as if it makes some fundamental claim about complex disease genetics; that this somehow undermines Genome-Wide Association Studies, or shows their results to be spurious. David Goldstein is quoted on Nature News:

…many of the associations made so far don’t seem to have an explanation. Synthetic associations could be one factor at play. Goldstein speculates that, “a lot, and possibly the majority [of these unexplained associations], are due to, or at least contributed to, by this effect”.

Another author is quoted here as saying

We believe our analysis will encourage genetics researchers to reinterpret findings from genome-wide association studies

Much of the coverage conflates this paper with the claim that rare variants may explain ‘missing heritability’, which is an entirely different question; Nature News opens with the headline “Hiding place for missing heritability uncovered”. Other coverage can be found on Science Daily, Gene Expression and GenomeWeb.

Does this actually happen?

Is all this fuss justified? How common is this ‘synthetic hit’ effect; are a lot of GWAS hits caused by it, or hardly any? There are many ways that you could test this; for instance, you could make some predictions about what distribution of risk you’d expect to see in the many fine mapping experiments that have been done as follow ups to Genome-Wide Association Studies (this would be trivially easy to do using the paper’s simulations).

However, there is an even easier way to test the prevalence of the effect. If most GWAS hits are tagging relatively common variants, then you would expect to see most disease associated SNPs with a frequency in the 10% to 90% range (the range for which GWAS are best powered). However, a SNP with a frequency of 50% is less likely than one with a frequency of 10% to tag a SNP with frequency 0.5%, so if most GWAS hits are tagging rare variants, then you would expect to see most associated SNPs with a frequency skewed towards the very rare or the very common.

In fact, the paper makes an explicit calculation of the expected frequency distribution of GWAS hits, under their synthetic model. In-double-fact, the paper plots this distribution against the distribution of know GWAS hits. And here is that plot, taken directly from the paper (Figure 5):

The green line is the expected frequency distribution of ‘synthetic’ associations; the red line is the actual distribution. We can see that the GWAS hits we do see fail to follow the distribution for synthetic associations; in fact, they follow pretty much exactly the distribution we’d expect if most common associations are tagging common causal SNPs.

The paper manages to pretty conclusively show both that demonstrate that synthetic SNPs can occur, but they rarely do.

Dickson, S., Wang, K., Krantz, I., Hakonarson, H., & Goldstein, D. (2010). Rare Variants Create Synthetic Genome-Wide Associations PLoS Biology, 8 (1) DOI: 10.1371/journal.pbio.1000294

The Economist Mangles Disease Genetics

The Economist has a rather distressingly bad article by the evolutionary psychologist Geoffrey Miller, about the supposed general failure in human disease genetics over the last 5 years. The thesis is that Genomes Wide Association Studies (GWAS) for common diseases have been a failure that geneticists are trying to keep hidden, and that the new techniques required to solve the problem of disease genetics will raise ‘politically awkward and morally perplexing facts’ about the different traits and evolutionary histories of races. The former claim is pretty much the same as Steve Jones Telegraph article earlier this year, and is just as specious. I will look at both claims separately.

A quick point of terminology: Miller uses ‘GWAS’ to refer to studies that look for disease association in common variants using a genotyping chip, and acts as if sequencing studies are not, in fact, GWAS. In fact, a sequencing association study is just another type of GWAS, just looking at a larger set of variants.
Continue reading

How Many Ancestors Share Our DNA?

This post was written four years ago, using a quick-and-dirty model of recombination to answer the question in the title. Since then a more detailed and rigorously tested model has been developed by Graham Coop and colleagues to answer this same question. You can read more about the results of this model on the Coop Lab blog here and here. Graham’s model is based on more accurate data, more careful tracking of multiple ancestors and a more realistic model of per-chromosome recombination, and thus his results should be considered to have superseded mine.

Over at the Genetic Genealogist, Blaine Bettinger has a Q&A post up about the difference between a genetic tree and a genealogical tree. The destinction is that your genealogical tree is the family tree of all your ancestors, but your genetic tree only contains those ancestors that actually left DNA to you. Just by chance, an individual may not leave any DNA to a distance descendant (like a great-great-great-grandchild), and as a result they would not appear on their descendant’s genetic tree, even though they are definitely their genealogical ancestor.

At the end of his post, Blaine asks a couple of questions that he would like to be able to answer in the future;

  • At 10 generations, I have approximately 1024 ancestors (although I know there is some overlap). How many of these ancestors are part of my Genetic Tree? Is it a very small number? A surprisingly large number?
  • What percentage, on average, of an individual’s genealogical tree at X generations is part of their genetic tree?

I think that I can answer those questions, or at least predict what the answers will be, using what we already know about sexual reproduction.
Continue reading

ASHG: Quantifying Relatedness and Active Subjects in Genome Research

Well, the American Society of Human Genetics Annual Meeting is coming to a close for another year. My talk is done and dusted, so I no longer have to lie awake at night worrying that I will forget everything other then the words to “Stand By Your Man” when confronted by the crowd. My white suit is now more of an off-white suit, with regions of very-off-white and pretty-much-entirely-out-of-sight-of-white. I’m looking forward to getting back home to catch up on my sleep.

For the last time, I’m going to give a little summary of talks today that I thought were interesting, or gave some indication of where genetics may be heading in the future. I will write up some more general thoughts about the meeting in the next few days, as soon as the traveling is out of the way and my mind has recharged.

If you would like some second opinions on the conference, GenomeWeb has a number of articles, including a couple of short summaries, as well as a nice mid-length article about the 1000 Genomes session; there are also a number of articles over at In The Field, the Nature network conference blog.
Continue reading

ASHG: Finding Mendelian Mutations and Inclusive Population Genetics

Third day down, one to go. I am starting to suffer from conference fatigue somewhat. I’m not going to any other talks this evening, so I am going to try and get some relaxation time in from this point on. But first; the summary of Day 3.

Today I saw a lot of talks over three sessions, and many of them were very interesting. However, I won’t talk about everything, or even my favourite talks. I’ll go for the talks that seem to tie together into nice stories about a few directions genetics seems to be heading in.
Continue reading

ASHG: Statistical Genomics and Beyond GWAS in Complex Disease

The second day of the American Society of Human Genetics Annual Meeting is drawing to a close; here’s a lowdown of what talks I’ve enjoyed today.

Remember, follow @lukejostins on Twitter if you want more up-to-the-minute details on the ASHG talks.
Continue reading

ASHG: Rare Variants, and the 1000 Genomes Project

Hello all (it is taking every bone in my body not to say ‘Aloha’ here).

So, today was the first real day of the ASHG Annual Meeting; after accidentally falling asleep for basically all of yesterday, it was good to finally see some familiar faces and dig my teeth into some real science.

I’m going to write a little about the first couple of sessions I’ve seen, and say what sort of themes are being shouted loud enough to get into my jetlagged mind. I have also been tweeting the conference at quite a high frequency (about 30 tweets so far), and in more detail than I have given here; follow me on @lukejostins if you are interested. To see all the ASHG twittering, check out #ASHG2009.

The blogs posts over the next few days will be aimed mostly at those who are, at least vaguely, In The Know about genomics. However, if there are people who would like a less jargonistic lowdown of the conference, please leave a comment and I’ll see what I can do.
Continue reading

Recombination in the X and Y Chromosomes


Rosser, Z., Balaresque, P., & Jobling, M. (2009). Gene Conversion between the X Chromosome and the Male-Specific Region of the Y Chromosome at a Translocation Hotspot The American Journal of Human Genetics, 85 (1), 130-134 DOI: 10.1016/j.ajhg.2009.06.009

There is new paper is out in American Journal of Human Genetics about how the X and Y chromosome might not be as separate as we think, and in fact might undergo regular recombination in certain regions (you can read a press release for the paper here).

Specifically, the paper is a resequencing study of the X and Y chromosome homologues PRKX and PRKY in around 60 individuals, looking for signatures of recombination. In summary; it is an interesting and well supported paper in as far as it goes, but it raises more questions about Y chromosome evolution than it answers
Continue reading

How Much Health Information Is In A Person’s Genome?

How much information can we get from a genome scan? Many companies, such as 23andMe and deCODE Genetics sell genetic tests that allow you to determine parts of your DNA sequence: one selling point is that it can tell you how susceptible you are to various diseases. But how much can a genome really tell you?

In general, people say ‘not much’, and cite the importance of the environment, social, cultural factors, and our lack of knowledge of disease genetics: these are all valid and important points. But, can we put some figures on exactly how much a genome scan can tell us? Can we calculate exactly how much the average person’s predicted probability of getting a disease will change after they get their DNA scanned?

In this post, we will take three important diseases of decreasing rarity, and take all the genetic variants that are known to influence them. We will see exactly how much we expect this information to change someone’s likelihood of getting the disease.
Continue reading

Books for Bioinformatics Beginners

Olaf left a comment asking about what books a mathematically competent and generally informed non-geneticist can read to learn about modern genetics. As he notes there tends to be a bit of a lack of books that assume you are know the basics, but does not assume you have an undergrad degree. You tend to find things that are either of the form “this is Mr Gene, he makes proteins!”, or of the form “a non-Bayesian could infer with certainty an inversion-deletion event had caused this ribosomal disruption, so attached are they to their bootstrapped pseudo-statistics!”.

This sort of request also tends to come from the very large number of undergrads trained in genetics in some classical sense (a mixture of population and functional genetics) who want to get a general understanding of this whole Modern Genomics phenomenon that basically all of genetics is at least partly involved in these days.
Continue reading