Olaf left a comment asking about what books a mathematically competent and generally informed non-geneticist can read to learn about modern genetics. As he notes there tends to be a bit of a lack of books that assume you are know the basics, but does not assume you have an undergrad degree. You tend to find things that are either of the form “this is Mr Gene, he makes proteins!”, or of the form “a non-Bayesian could infer with certainty an inversion-deletion event had caused this ribosomal disruption, so attached are they to their bootstrapped pseudo-statistics!”.
This sort of request also tends to come from the very large number of undergrads trained in genetics in some classical sense (a mixture of population and functional genetics) who want to get a general understanding of this whole Modern Genomics phenomenon that basically all of genetics is at least partly involved in these days.
There will always a bit of a problem here, and to a certain extent bioinformatics is something you have to learn on the go. I expect virtually no-one had come across Burrows-Wheeler alignment, pretty ubiquitously used for aligning short-read DNA to a reference genome, in any biological textbook (basically no-one had heard of it 2 years ago). The field grows so fast that books just have no way of keeping up.
However, what I can tell you about are the books that helped me a lot during my difficult transition from “person trained in biology in some vague sense” to “person who works with large-scaled sequence data”. (Though for actual biologists thinking of going into genomics, I can also recommend taking an MPhil in Computational Biology).
Something very useful these days is Google Books, as you can view the contents pages and normally in the region of 25-50% of the book online, making checking whether a book is what you are looking for a whole lot easier. For all the books I talk about below I include a link to the google books page.
Genomes 3
For a reference textbook to everything genetic, with a large scope and a pretty large depth, the best book is Genomes 3 by Terry Brown (google, amazon). Definitely worth reading the first few chapters, to get an idea of just how clever genetics can be.
This isn’t a mathematical (or at least, the bits I’ve read haven’t been), and it is now out of date; the most up-to-date sequencing they mention is 454 pyrosequencing, and they somewhat hilariously state that assembling an entire genome by shotgun sequencing without a reference genome is ‘impossible’. Genomes tends to be updated every 3-4 years, so hopefully we’ll see another one next year.
Biological Sequence Analysis
Biological Sequence Analysis by my co-supervisor Richard Durbin (google, amazon) is a pretty good look at where sequence analysis was 10 years ago (circa 1998), including hidden Markov models of DNA, optimal alignment, constructing phylogenies, and higher-order gramours. It misses out basically all short-read alignment, all assembly and all error handling, basically everything that was built up in response to Second Generation Sequencing. But it is still a very good introduction to sequencing for people who know some maths, and some molecular biology, but are not immersed in the bioinformatics world. I got a lot of good practice in Python implementing sequencing aligners from this book.
Computational Molecular Evolution
Computational Molecular Evolution by Ziheng Yang (google, amazon) is an almost entirely up-to-date text on modeling the evolution of DNA specifically, including a nice treatment of Bayesian analysis and MCMC/MC3. But, it is very specific, and doesn’t really contain anything beyond evolutionary modeling.
Statistical Methods for Bioinformatics
Another book that I got some good use out of Statistical Methods for Bioinformatics (google, amazon); this basically builds up statistics from complete scratch up to the point that you can start doing some pretty powerful informatics. I haven’t read most of it myself; I mostly use it because it has the eigenvectors and eigenvalues of a load of evolutionary transition matrices.
Any others?
I’ve almost certainly missed out a lot of good books here. For those who have also climbed the bioinformatics ladder, what helped you up?
I’ve always been partial to “Bioinformatics: Sequence and Genome Analysis” by Mount. The second edition is already 5 years old, but perhaps a new edition will be forthcoming.
Bioinformatics is the future !!
I think you just have to knuckle down and read the core texts in each major application area. Here, that’s genetics, algorithms, and stats.
I love Dan Gusfield’s book, “Algorithms on Strings, Trees, and Sequences”. While it’s also a bit out of date, it covers roughly complementary topics to Durbin et al., including all the basic sequence alignment algorithms, including suffix trees, which form the logical basis of Burrows-Wheeler alignment. But you’ll need to understand something like Cormen et al.’s algorithms book as a prerequisite.
I also loved Alberts et al. “Molecular Biology of the Cell” book. I’m a stats algorithms person who mainly works in computational linguistics, but I found it easy to read. I read this with only high school chemistry as a background. You really have to read the first section of that, or something similar, to understand the problems. And it’s a joy to read!
Nothing out there really helps with the kind of factor models that are interesting for statistical modeling for, say, expression, such as was used in d-Chip for microarrays or some of the RNA-Seq stuff coming out now. My fave book in that area is Gelman and Hill’s “Data Analysis Using Regression and Multilevel/Hierarchical Models”, or at a more advanced level, Gelman et al.’s “Bayesian Data Analysis”, but they have no sequence data, and only the occassional biology example. The Bishop or Hastie et al. machine learning books also cover some of this same material, with the advantage of covering all the non-probabilistic approaches like SVMs. These all presuppose you have basic math stats down, such as found in Degroot and Schervish or in Larson and Marx.
The information regarding to the books of bio informatics is good .thank you for this information .