Catching up on my RSS feeds, I came across a post at PolITiGenomics, about the European Bioinformatics Institute’s Paul Flicek taking part in one of those ‘I am a person of significance, I use a Mac’ videos:
First the most important bits. At 0:06, THAT’S MY COLLEGE! And at 0:25, THAT’S THE BUILDING I WORK IN! And at 2:24, I EAT THERE! How exciting.
The video is part of the Science Profiles project that Apple runs. The project consists of taking scientists or scientific projects, and putting together videos and accounts of what they do, and how Apple products help them. The profiles are actually very interesting, and they are put together with Apple’s trademark style, and are well worth watching.
I am not particularly a Mac fan myself. I know that there are things that they do very well; I go to the Mac room whenever I want to make 3D models of molecules, and generally for producing pretty slide-shows or documents, Mac does it better than anyone else. The main reason I don’t own one is that I’ve never been able to justify the expense (either to myself or the the Sanger); my cheepo-Linux box does perfectly well, and at a tiny fraction of the cost. However, I can understand why people like the out-of-the-box user friendliness of Macs, and many of these profiles show cases where I can believe Macs are exactly what is needed (for example, the pretty awesome video on reconstructing dinosaur musculature).
[ Warning: this post gets somewhat Computer-geeky from this point on ]
However, the one thing that bugs me in the Flicek video is the bit about the 1000 Genomes Project. Now I work pretty much solidly on the 1000 Genome data these days, and have interacted with pretty much every stage from sequencing to quality control to analysis. And I have never once encountered a Mac anywhere in the pipeline.
The video has a pretty nice description of what the 1000 Genomes Project does, but in doing so, in a Mac advert, it implies that the 1KG project is in some sense Mac-driven. However, the pipeline actually runs something like this:
1. The DNA is prepared in a wet lab, and is then delivered to the sequencing machines. [No computers]
2. The sequencing machines scan the DNA as images into a computer [The sequencers use custom Dell computers running Windows-based software]
3. The images is copies to a set of servers in the Sanger’s Information Center [Mirrored to a Unix-based Lustre filesystem]
4. The images are then analysed using a set of Linux-based analyzers to extract the sequence, and Quality Control is done [On a IBM Blade Unix cluster]
5. The sequence is then held on a Unix-based oracle database called MPSA (Massively Parallel Sequence Archive)
6. Individual scientists then pull out data they want and do science on it [ the majority of this is done on the Sequencing Farm, a Linux-based computing cluster, or on various other Unix clusters]
If a Mac is used at any point in this process, it is just to connect to the various Linux- and Unix-based computers, with the possible exception of the late-stage data analysis, which I suppose is varied enough that it might be done on any platform. In fact, as far as I know, the vast majority of 1000 Genomes things would be entirely implausible to do on a Mac, because the software to manage massive-scale projects such as these just does not exist for Macs.
These scientists use the Mac because they find its other features useful or cool (e-mail, document writing, user-friendly desktops, sexy design). But this video seems to be implying that the Mac has the necessary tools to manage and analyze petabytes of sequence data, which is clearly false; it can act as a nice wrapper to the Unix systems which do the real grunt work.
Is that the Sanger Centre featured int eh background then? I’ve never been there, I thought at first your comment ‘I eat there!’ meant that kings coffee shop wouldd be randomly features
I’m quite proud I understood most of the geeky computery stuff in that.