Tag Archives: statistics

ASHG: Rare Variants, and the 1000 Genomes Project

Hello all (it is taking every bone in my body not to say ‘Aloha’ here).

So, today was the first real day of the ASHG Annual Meeting; after accidentally falling asleep for basically all of yesterday, it was good to finally see some familiar faces and dig my teeth into some real science.

I’m going to write a little about the first couple of sessions I’ve seen, and say what sort of themes are being shouted loud enough to get into my jetlagged mind. I have also been tweeting the conference at quite a high frequency (about 30 tweets so far), and in more detail than I have given here; follow me on @lukejostins if you are interested. To see all the ASHG twittering, check out #ASHG2009.

The blogs posts over the next few days will be aimed mostly at those who are, at least vaguely, In The Know about genomics. However, if there are people who would like a less jargonistic lowdown of the conference, please leave a comment and I’ll see what I can do.
Continue reading

Unbelief, Class and Bad Statistics

I like to read the Comment is free: Belief section of the Guardian website; it has comment pieces on a pretty large range of religious matters, which satisfies both my “reading about religious affairs” and “reading things that annoy me” urges (in particular, I find a bit of an odd pleasure in reading the regular posts by Andrew Brown, a man who has a wonderful habit of saying things I would probably agree with in such a smug, sneering style that I can’t help but disagree).

Today there is a post up called Religion And Learning: what do we know, by Nick Spencer of the religious thinktank Theos. It feeds on from a somewhat substance-light Andrew Brown post claiming that atheism (or ‘new atheism’, or some ill-defined form of non-belief) is becoming a way of the upper middle class setting themselves apart from the Daily Mail-reading working class.

Nick Spencer’s article attempts to shed some light on the relationship between class and religious belief by looking at the relationship between NRS social classes and belief in God, based on some Theos data from their recent Darwin report. The report, unfortunately, doesn’t tell us very much; it says that Atheists tend to come from higher social grades (AB), and theists tend to come from lower social grades (DE); this was already well know.

Spencer goes on to look at the social classes of converts; he finds that converts to theism tended to come from the roughly the same classes as atheists (ABC), and that converts to atheism tended to come from the same classes as theists (DE). This is unsurprising, it basically says that there is no real social indication of conversion; it is more or less a random process, with atheists going to theists and vice versa more or less independent of social class. It would be interesting to follow up these converts over larger periods of time, or to break it down into recent and older conversions, to see whether converting to a religion causes a change in class, but so far we have no evidence for this. So, in conclusion, the data doesn’t tell us anything interesting about how religion and class or education beyond what we already knew; if anything, it tells us that class or education aren’t really playing much of a factor in conversion.

Or, that is the conclusion that any non-reaching person would draw. What is odd is that Nick Spencer uses this essentially Null result to argue that some sort of revolutionary change on the nature of atheism:

On a less grand scale, the data suggest that the effect of vocal atheism over the last decade has been to reach successfully into previously uncharted demographic territory (witness The God Delusion’s sales figures) but at the cost of losing some of its intellectual credibility (the critical review of The God Delusion in the London Review of Books, for example).

If this is happening, we might expect to see atheism become increasingly “religious” in its composition if not in its size.

So, according to Nick Spencer, the expected class distribution of converts if there is no relationship between class and conversion is an indicator of a grand, sweeping change in the nature of atheism, and we should all be prepared for atheism to become a ‘religion-like’ mass movement of unthinking godlessness. This is completely the opposite conclusion that I have made from the same data; that non-belief is going on the same as it always has (as far as I can see, the current bunch of non-believers are no more outspoken or populist than Carl Sagan, Richard Feynman, Bertrand Russell and Thomas Huxley), and there is no particular evidence that anything new is going on in unbelief.

Punishment, Praise and Regression to the Mean

I am currently preparing a paper for publication, and the last author sent it out to a bunch of people for comments. A common complain was a discrepancy between the run times of the same algorithm in two different parts of the paper. I ran a number of algorithms 12 times each, and then later on in the paper I picked the fastest and re-ran it another 36 times; the average time taken for the fastest algorithm in the second set of runs was significantly slower than in the first. Two different people asked me to fix this, but it isn’t a mistake, it is of course regression to the mean.

Anyway, this inspired me to post a very interesting anecdote from the economist Daniel Kahneman, writing about punishment, praise and regression to the mean:

I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, “On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.” This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.

Books for Bioinformatics Beginners

Olaf left a comment asking about what books a mathematically competent and generally informed non-geneticist can read to learn about modern genetics. As he notes there tends to be a bit of a lack of books that assume you are know the basics, but does not assume you have an undergrad degree. You tend to find things that are either of the form “this is Mr Gene, he makes proteins!”, or of the form “a non-Bayesian could infer with certainty an inversion-deletion event had caused this ribosomal disruption, so attached are they to their bootstrapped pseudo-statistics!”.

This sort of request also tends to come from the very large number of undergrads trained in genetics in some classical sense (a mixture of population and functional genetics) who want to get a general understanding of this whole Modern Genomics phenomenon that basically all of genetics is at least partly involved in these days.
Continue reading

A Quick Note On Copyright

Just a quick note. Nick Loman notes that he intends to use material from my Basics: Sequencing series in his undergrad lectures. That is pretty awesome, and I feel an urge to reciprocate by using one of the things he’s blogged about, but given that I teach mathematics on a blackboard, I’m not entirely sure how to do so*.

To clarify, the images and material in those posts, and indeed everything written in this blog, can be used freely for any purpose. I would like it if you would provide a link back here, or note who created them verbally, but that is by no means required.

* Ohh ooh I’ve got one, a question for my first year Elementary Mathematics for Biologists students:

Question 1

The Sanger Centre owns 42 sequencing machines, of which 2 are 454 and 40 are Illumina. Throughout the rest of the UK, there are 12 Illumina machines, 9 454s, and 3 SOLiDs (1). Perform a chi-squared test of independence to see whether there Sanger Centre has significantly different purchasing priorties than the rest of the UK. Is this test valid in this instance?

(1) According to data found at http://pathogenomics.bham.ac.uk/blog/2009/08/sequencing-in-the-u-k/

Answer 1:

The contingency table is:

ILMN 454 SLD TOT
SC 40 2 0 42
UK 12 9 3 24
TOT 52 11 3 66

The expected values are thus:

ILMN 454 SLD
SC 33.1 7 1.9
UK 18.9 4 1.1

Chi-squared score is thus ~19.04. This is larger than the 95% critical value of 6.0 for df = 2.

This test is not valid in this case, for two reasons. Firstly, the expected values are very low, and thus the normal approximation is unlikely to hold; we should instead use Fisher’s exact test. Secondly, each purchase of a sequencing machine is not independent of the result of the last purchase; you are more likely to buy the same machine again, since you have invested in equipment, software and training for that type of machine.

Research Interests Translated

I recently updated the information on my website, and in doing so I decided to produce two versions of my research interests. The first is for other scientists, and the second is a translation for lay people. I would be interested to know how people think this is pitched; is the lay-information too confusing, or is it too simple and patronising?

I think every scientist should try and do this at some point. It is an interesting exercise to see how well you can communicate and summarise the entirety of your research in a way that doesn’t use the shared lingo and knowledge base that you have access to when taking to other scientists. Plus, of course, communicating your work to the world outside of academia is generally A Good Thing.
Continue reading

On the UK’s DNA Database, Part 2

This is the second part of a double post in the UK National DNA Database.

In the first part of this double post I talked about what information the DNA database holds, and who it holds it on. In this second part, I will discuss what this information is used for, what it could be used for in the wrong hands, and how bad this could be.
Continue reading

On Bayes and Me

This post carries on from my previous excursion into Bayesian statistics.

Bayesian Science

A mathematician friend once told me that Bayesian inference is the type of inference that fits in most readily with the scientific method (that being the method I am most prepared to use in the majority of situations). It is true that a Bayesian inference, if done properly, represents a mathematical version of an idealised scientific inference - we have some explicitly stated prior beliefs, based on previous evidence, and we look for data, in the form of experiments or observations, which are combined to form an inference. Lovely.
Continue reading

On Eugenics

I hope that you will tolerate this post, as it is mostly me thinking out loud. Thinking out-loud, whats more, about a potentially distressing subject, namely that of the relationship between the history of biology, genetics and statistics (which are, after all, tied tightly together) and eugenics - the project of increasing the fitness of the human gene pool, by controlling the breeding or death rates of various parts of the population.

The problem I have is that many people that I would call my heroes, or at least people along whose intellectual footpaths I wander (Ronald Fisher, Francis Galton, Karl Pearson, John Maynard Keynes) supported the eugenics movement. Am I to assume that all these people, while intellectual giants, were monsters or fools? Can we (you and I, for by embarking on this journey with me you too, kind reader, must shoulder my burden) find where these people went wrong, and what can we learn by looking at those people who shunned eugenics?

Continue reading

On Bayes

This is a backdated post, written more recently than the date claims, in order to give the impression that this blog has History. This is a dirty trick, but a necessary one, and I know that you, gentle reader, will keep my secret safe.

I have a feeling that, in some sense, someone in my position (a position which you, my noble blog adventurer, are likely to learn more of in time) should have on record a position on the Bayesian Issue. I will start this by dedicating a post to explaining The Bayesian Issue, and then later on having a post on where I stand on it.

Continue reading