Archive for August, 2009

Books for Bioinformatics Beginners

Wednesday, August 19th, 2009

Olaf left a comment asking about what books a mathematically competent and generally informed non-geneticist can read to learn about modern genetics. As he notes there tends to be a bit of a lack of books that assume you are know the basics, but does not assume you have an undergrad degree. You tend to find things that are either of the form “this is Mr Gene, he makes proteins!”, or of the form “a non-Bayesian could infer with certainty an inversion-deletion event had caused this ribosomal disruption, so attached are they to their bootstrapped pseudo-statistics!”.

This sort of request also tends to come from the very large number of undergrads trained in genetics in some classical sense (a mixture of population and functional genetics) who want to get a general understanding of this whole Modern Genomics phenomenon that basically all of genetics is at least partly involved in these days.
(more…)

A Quick Note On Copyright

Monday, August 17th, 2009

Just a quick note. Nick Loman notes that he intends to use material from my Basics: Sequencing series in his undergrad lectures. That is pretty awesome, and I feel an urge to reciprocate by using one of the things he’s blogged about, but given that I teach mathematics on a blackboard, I’m not entirely sure how to do so*.

To clarify, the images and material in those posts, and indeed everything written in this blog, can be used freely for any purpose. I would like it if you would provide a link back here, or note who created them verbally, but that is by no means required.

* Ohh ooh I’ve got one, a question for my first year Elementary Mathematics for Biologists students:

Question 1

The Sanger Centre owns 42 sequencing machines, of which 2 are 454 and 40 are Illumina. Throughout the rest of the UK, there are 12 Illumina machines, 9 454s, and 3 SOLiDs (1). Perform a chi-squared test of independence to see whether there Sanger Centre has significantly different purchasing priorties than the rest of the UK. Is this test valid in this instance?

(1) According to data found at http://pathogenomics.bham.ac.uk/blog/2009/08/sequencing-in-the-u-k/

Answer 1:

The contingency table is:

ILMN 454 SLD TOT
SC 40 2 0 42
UK 12 9 3 24
TOT 52 11 3 66

The expected values are thus:

ILMN 454 SLD
SC 33.1 7 1.9
UK 18.9 4 1.1

Chi-squared score is thus ~19.04. This is larger than the 95% critical value of 6.0 for df = 2.

This test is not valid in this case, for two reasons. Firstly, the expected values are very low, and thus the normal approximation is unlikely to hold; we should instead use Fisher’s exact test. Secondly, each purchase of a sequencing machine is not independent of the result of the last purchase; you are more likely to buy the same machine again, since you have invested in equipment, software and training for that type of machine.

On Lamarck and Trees

Monday, August 17th, 2009

Over at Genetic Future, Daniel MacArthur quotes Joel Parker berating as ‘embarrassing’ biologists who claim that it was Darwin, and not Lamarck, who came up with the idea of an evolutionary tree:

I have noticed many evolutionary biologists making an embarrassing mistake of falsely attributing the first use of the tree analogy to Darwin. This has occurred in numerous documentaries and on websites which I will pass on naming here. Ironically, the earliest use of the tree analogy diagram to depict evolution was published in the year of Darwin’s birth (1809) by Lamarck in his book Philosophie Zoologique (see pg 463, http://tinyurl.com/knt7vr). Lamarck even uses botanical terms (branches and rameaux) to describe the origin of animals with respect to this figure. The figure that is usually cited from Darwin’s notebook is from 1837 (http://tinyurl.com/6hs5uv), a full 8 years after Lamarck’s death. Even with our high admiration for Darwin, we should at least give credit where credit is due, and not forget that much of evolution was becoming understood before Darwin. Explaining the mechanism of natural selection was Darwin’s great contribution.

This is actually largely correct; Lamarck did have a view of evolution that involved what we would now call evolutionary branching, though it was very different from what we now know to be the case. Lamarck deserves to be read and understood as one of the first people to put together a coherent view of evolution.

However, the statement is very wrong in a number of ways. It is far from a mistake to refer to Darwin as the originator of the evolutionary tree, and those of us who do so do so not out of ignorance.
(more…)

Research Interests Translated

Friday, August 14th, 2009

I recently updated the information on my website, and in doing so I decided to produce two versions of my research interests. The first is for other scientists, and the second is a translation for lay people. I would be interested to know how people think this is pitched; is the lay-information too confusing, or is it too simple and patronising?

I think every scientist should try and do this at some point. It is an interesting exercise to see how well you can communicate and summarise the entirety of your research in a way that doesn’t use the shared lingo and knowledge base that you have access to when taking to other scientists. Plus, of course, communicating your work to the world outside of academia is generally A Good Thing.
(more…)

Basics: Sequencing DNA, Part 2

Thursday, August 13th, 2009

This post follows on from my previous post on Sanger sequencing, and is part of an ongoing series that looks at how we take DNA, hidden away in our cell nuclei, into read the sequence of base pairs that make up our genetic code. In this post, we look at the Second Generation Sequencing machines, that are currently sequencing thousands of genomes-worth of DNA per year throughout the world.
(more…)

Guest Post at Genetic Future

Tuesday, August 4th, 2009

I haven’t got a new blog post for you at the moment, but I do have a guest post up on Daniel MacArthur’s blog, Genetic Future, over at ScienceBlogs. Check it out:

Guest post: Luke Jostins on the twice-sequenced genome