How Much Health Information Is In A Person’s Genome?

How much information can we get from a genome scan? Many companies, such as 23andMe and deCODE Genetics sell genetic tests that allow you to determine parts of your DNA sequence: one selling point is that it can tell you how susceptible you are to various diseases. But how much can a genome really tell you?

In general, people say ‘not much’, and cite the importance of the environment, social, cultural factors, and our lack of knowledge of disease genetics: these are all valid and important points. But, can we put some figures on exactly how much a genome scan can tell us? Can we calculate exactly how much the average person’s predicted probability of getting a disease will change after they get their DNA scanned?

In this post, we will take three important diseases of decreasing rarity, and take all the genetic variants that are known to influence them. We will see exactly how much we expect this information to change someone’s likelihood of getting the disease.

To do this, we will look at three measures. Firstly, the mean absolute probability difference, which measures how far the average prediction using the genetic information will be from the prediction based only on how common the disease is in the population; we will also use the mean relative risk, which measures how many times bigger or smaller the genetic prediction is on average. Secondly, we will look at ‘low-risk’ and a ‘high-risk’ profiles. These are genetic profiles that are lucky or unlucky, but not hugely unlikely; specifically, 10% of people will have genetic profiles at least as ‘extreme’ (in either direction) as this. Finally, we will look at the distribution of genetic risk in the population as a whole.

Crohn’s Disease

Crohn’s disease is a type of Inflammatory Bowel Disease; it is a rare autoimmune disorder in which the immune system overreacts to microbes in the intestine wall, attacking tissue and causing inflamed, sore patches to appear throughout the gut. Around 1 in 2500 people suffer from Crohn’s disease in the UK, or around 0.04% of the population.

I have taken data for Crohn’s risk variants from Barrett et al, a recent meta-analysis that replicated or discovered a total of 30 variants that were associated with Crohn’s. In the absolute worst genetic case, having 2 copies of each of the 30 risk variants, you would have a 13.5% chance of developing the disease, and in the best case, having no copies of the risk variants, you’d have an essentially 0 chance (though you’d have to be exceptionally unlucky or lucky, respectively, for either of these to be the case).

The average person will change their probability of getting the disease by around 0.02%, or an average proportional change of around 2.16-fold. The sort of low-risk profile you might expect would have a 0.01% chance of developing the disease, and a high-risk one might have a 0.1% chance. The distribution of risk profiles looks like this:


You lie somewhere within this range, most likely towards the peak in the middle, the mass of people who are neither particularly susceptible not particularly resistant to Crohn’s disease. However, you may lie within the long tail of people who have a higher-than normal chance of developing the disease. But even these numbers never really get large enough to be that useful; you may have a 1 in 500, rather than 1 in 2500 chance of getting the disease, but what does that really tell you?

Type I Diabetes

Type I Diabetes, or insulin dependent diabetes, is a relatively common metabolic autoimmune disorder in which the immune system attacks the pancreas and destroys its ability to produce insulin, meaning that the body is unable to properly regulate blood sugar. In the UK, around 1 in 230 have the disease (0.44%).

A lot of work has been done on the genetics of Type I Diabetes, and we know of a large number of variants that affect your disease risk. I have taken data from three T1D studies, Todd et al, Cooper et al and Barrett et al. In total, this gives 44 risk alleles: in the worst genetic case, having 2 copies of each of the 44 risk variants, you would have a 31% chance of developing the disease, and in the best case, you’d have a 1 in 35 000 chance (though, once again, these would be extremely unlikely).

The absolute probability difference is 0.3% (average relative risk of 2.4-fold). If you have a risk profile toward the disease resistant end, you could expect to have a 1 in 1250 chance of developing the disease (0.08%), and if you are towards the high end, you this can go up to 1 in 77 (1.3%). The distribution of risk profiles is shown below:


In this case, the long tail starts to hit relatively significant figures, going up towards 1 in 50.

Type II Diabetes

Type II Diabetes, or non-insulin-dependent diabetes is another metabolic disease, in which cells of the body lose the ability to respond fully to insulin, and causing misregulation of blood sugar. The incidence in the UK is around 1 in 25 (4%).

The most up-to-date study that I know if is Zeggini et al, who reported 11 Type II Diabetes-associated variants (there may well be newer, fancier studies out there). In the unrealistic worst case, these variants could cause a 28% chance of developing the disease, and in the best case, a 0.64% chance.

The absolute probability difference is 1%, and the average relative risk is 1.3-fold. A reasonably lucky profile would have a 1 in 40 (2.5%) chance of developing the disease, and an unlikely one would have an 1 in 12 (8.5%) chance. The risk distribution looks like this:


While the information still isn’t hugely predictive, there is potentially useful information here. An individual with a high-risk profile has a relatively high chance of developing the disease, and if I find out I have such a profile, and I eat a diet rich in sugar, I know that doing so is riskier than I thought.

Does This Mean Anything?

The probability that I will have one of the high or low-risk profiles that I have mentioned above for a particular disease is 10%. The probability that I will have an extreme profile in at least one disease is 27%. There is a significant chance that I have a significantly different risk to what I would otherwise expect. Obviously, the numbers are still small (a change from 4% to 8% is hardly life-shattering), but they are non-trivial.

One big thing that I have left out here is that people already have a good idea of what their genetic risk for disease is, by knowing their family history. Do any members of your family have Crohn’s disease? If not, then you are almost certainly not in the long tail of susceptible individuals. The genetic information can still tell you additional things, especially if your family is small, but it reduces the amount that the genetic data changes your predictions.

Finally, the probability changes that I have given above are larger than could have been achieved this time last year, and will continue to improve as our understanding of genetics increases. It will be interesting to re-do this once we have started to do sequencing studies on human disease.

Share and Enjoy:
  • Digg
  • Reddit
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • FriendFeed

6 Responses to How Much Health Information Is In A Person’s Genome?

  1. Pingback: Tweets that mention How Much Health Information Is In A Person’s Genome? « Genetic Inference --

  2. Pingback: Twitted by dnatestindex

  3. Nice article on the what-we-know-now. For thoughts on why even the best predictions may not be that exciting, even after we’ve discovered all the relatively common genes implicated in disease, have you seen this paper?

    Clayton, D.G. (2009) Prediction and interaction in complex disease genetics: the experience in type 1 diabetes. PLoS Genetics, 5 (7), e1000540.

  4. Dear Luke,

    I’m working on genetic models for explaining GWAS results, and
    have found this entry very interesting.
    May I ask how was the disease risk distribution you’ve plotted computed?
    So my best guess is that for each of the 3 papers, you extracted for each SNP both its population frequency and it’s odd ratio for getting the disease.
    So for example for chron’s disease, you’ve got 30 loci and you know their frequencies. Now you can sample genotype vectors with these 30 loci, and for each such genotype you need to compute the disease risk – so what is your assumption when computing this? do you just multiply the odds ratio, i.e. assume a multiplicative model? or are there any other assumptions? I guess without knowing the true genetic architecture and without further assumptions you cannot really know the distribution of disease risks. Any clarifications would be greatly appreciated.


  5. Yes that’s pretty close to what I did. The only step you missed out was that I converted the odds ratios from the papers, which are given as the ratio of odds for the minor allele and the major allele, and converted them to the ratio of odds for the minor allele and the mean population odds (which requires an estimate of the frequency of the disease). I multiplied THESE odds ratios together to give the overall odds ratio.

    Multiplying odds ratios together is valid only if the risks for each variant are independent; however, it seems that most common variants have relatively small interaction terms. From what I’ve seen, I’d expect the odds ratios calculated above to be slight overestimates, but probably pretty close guides.

  6. Thanx a lot, it’s clearer now.

    Would be interesting to compare the prediction power you can get with current loci to the maximum achievable one for each disease (based
    on heritability, from e.g. twin studies etc.)


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>