Posts Tagged ‘statistics’

Punishment, Praise and Regression to the Mean

Wednesday, September 9th, 2009

I am currently preparing a paper for publication, and the last author sent it out to a bunch of people for comments. A common complain was a discrepancy between the run times of the same algorithm in two different parts of the paper. I ran a number of algorithms 12 times each, and then later on in the paper I picked the fastest and re-ran it another 36 times; the average time taken for the fastest algorithm in the second set of runs was significantly slower than in the first. Two different people asked me to fix this, but it isn’t a mistake, it is of course regression to the mean.

Anyway, this inspired me to post a very interesting anecdote from the economist Daniel Kahneman, writing about punishment, praise and regression to the mean:

I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, “On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.” This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.

Books for Bioinformatics Beginners

Wednesday, August 19th, 2009

Olaf left a comment asking about what books a mathematically competent and generally informed non-geneticist can read to learn about modern genetics. As he notes there tends to be a bit of a lack of books that assume you are know the basics, but does not assume you have an undergrad degree. You tend to find things that are either of the form “this is Mr Gene, he makes proteins!”, or of the form “a non-Bayesian could infer with certainty an inversion-deletion event had caused this ribosomal disruption, so attached are they to their bootstrapped pseudo-statistics!”.

This sort of request also tends to come from the very large number of undergrads trained in genetics in some classical sense (a mixture of population and functional genetics) who want to get a general understanding of this whole Modern Genomics phenomenon that basically all of genetics is at least partly involved in these days.
(more…)

A Quick Note On Copyright

Monday, August 17th, 2009

Just a quick note. Nick Loman notes that he intends to use material from my Basics: Sequencing series in his undergrad lectures. That is pretty awesome, and I feel an urge to reciprocate by using one of the things he’s blogged about, but given that I teach mathematics on a blackboard, I’m not entirely sure how to do so*.

To clarify, the images and material in those posts, and indeed everything written in this blog, can be used freely for any purpose. I would like it if you would provide a link back here, or note who created them verbally, but that is by no means required.

* Ohh ooh I’ve got one, a question for my first year Elementary Mathematics for Biologists students:

Question 1

The Sanger Centre owns 42 sequencing machines, of which 2 are 454 and 40 are Illumina. Throughout the rest of the UK, there are 12 Illumina machines, 9 454s, and 3 SOLiDs (1). Perform a chi-squared test of independence to see whether there Sanger Centre has significantly different purchasing priorties than the rest of the UK. Is this test valid in this instance?

(1) According to data found at http://pathogenomics.bham.ac.uk/blog/2009/08/sequencing-in-the-u-k/

Answer 1:

The contingency table is:

ILMN 454 SLD TOT
SC 40 2 0 42
UK 12 9 3 24
TOT 52 11 3 66

The expected values are thus:

ILMN 454 SLD
SC 33.1 7 1.9
UK 18.9 4 1.1

Chi-squared score is thus ~19.04. This is larger than the 95% critical value of 6.0 for df = 2.

This test is not valid in this case, for two reasons. Firstly, the expected values are very low, and thus the normal approximation is unlikely to hold; we should instead use Fisher’s exact test. Secondly, each purchase of a sequencing machine is not independent of the result of the last purchase; you are more likely to buy the same machine again, since you have invested in equipment, software and training for that type of machine.

Research Interests Translated

Friday, August 14th, 2009

I recently updated the information on my website, and in doing so I decided to produce two versions of my research interests. The first is for other scientists, and the second is a translation for lay people. I would be interested to know how people think this is pitched; is the lay-information too confusing, or is it too simple and patronising?

I think every scientist should try and do this at some point. It is an interesting exercise to see how well you can communicate and summarise the entirety of your research in a way that doesn’t use the shared lingo and knowledge base that you have access to when taking to other scientists. Plus, of course, communicating your work to the world outside of academia is generally A Good Thing.
(more…)

On the UK’s DNA Database, Part 2

Friday, May 15th, 2009

This is the second part of a double post in the UK National DNA Database.

In the first part of this double post I talked about what information the DNA database holds, and who it holds it on. In this second part, I will discuss what this information is used for, what it could be used for in the wrong hands, and how bad this could be.
(more…)

On Bayes and Me

Monday, January 5th, 2009

This post carries on from my previous excursion into Bayesian statistics.

Bayesian Science

A mathematician friend once told me that Bayesian inference is the type of inference that fits in most readily with the scientific method (that being the method I am most prepared to use in the majority of situations). It is true that a Bayesian inference, if done properly, represents a mathematical version of an idealised scientific inference - we have some explicitly stated prior beliefs, based on previous evidence, and we look for data, in the form of experiments or observations, which are combined to form an inference. Lovely.
(more…)

On Eugenics

Tuesday, December 30th, 2008

I hope that you will tolerate this post, as it is mostly me thinking out loud. Thinking out-loud, whats more, about a potentially distressing subject, namely that of the relationship between the history of biology, genetics and statistics (which are, after all, tied tightly together) and eugenics - the project of increasing the fitness of the human gene pool, by controlling the breeding or death rates of various parts of the population.

The problem I have is that many people that I would call my heroes, or at least people along whose intellectual footpaths I wander (Ronald Fisher, Francis Galton, Karl Pearson, John Maynard Keynes) supported the eugenics movement. Am I to assume that all these people, while intellectual giants, were monsters or fools? Can we (you and I, for by embarking on this journey with me you too, kind reader, must shoulder my burden) find where these people went wrong, and what can we learn by looking at those people who shunned eugenics?

(more…)

On Bayes

Saturday, December 27th, 2008

This is a backdated post, written more recently than the date claims, in order to give the impression that this blog has History. This is a dirty trick, but a necessary one, and I know that you, gentle reader, will keep my secret safe.

I have a feeling that, in some sense, someone in my position (a position which you, my noble blog adventurer, are likely to learn more of in time) should have on record a position on the Bayesian Issue. I will start this by dedicating a post to explaining The Bayesian Issue, and then later on having a post on where I stand on it.

(more…)