This is the second part of a double post in the UK National DNA Database.
In the first part of this double post I talked about what information the DNA database holds, and who it holds it on. In this second part, I will discuss what this information is used for, what it could be used for in the wrong hands, and how bad this could be.
What do they do with this information?
So, what can the governmental do with this information? To start with, what do they do with the information? Currently, any new crime investigation samples are profiled and then checked against all entries of the database. Likewise, whenever anyone is arrested, their sample is checked against the existing database. This flags up crime-crime connections, and crime-suspect connections.
If the probability of identity given by the manufacturers is to be believed, it is very unlikely that you will get false positives between two properly profiled samples (if there is a 1 in 3 trillion chance of getting a false match for one individual, there is about a 1 in a million change of getting a false match across a database of 5 million people). However, if the figure given by the police is correct, the chance of getting a false positive goes up to 1 in 500; for every 500 checks, you’ll produce an innocent person who matches. However, it is claimed that there has has never been a false match between two fully profiled samples, suggesting that the the 1 in a billion figure is an underestimate. It is worth noting that the false positive rate rises as the number of entries increases - if the NDNAD included everyone in the UK, the false positive rate (with the 1 in a billion estimate) would be 1 in 20.
Either way, these figures aren’t hugely relevant; a bigger problem is partial samples. Often, crime investigation samples will be in a degraded condition, and only a subset of the STR sites will be able to be profiled. As a result, the false positive rate rises massively, potentially throwing up hundreds of innocent partial matches. Each of these individuals have to be investigated; this may mean innocent people entirely unrelated to the crime being made to give an alabi to the police.
However, none of these problems are by any means DNA-specific. Checking out everyone who knew a victim, or everyone who worked in a corrupt company, or drove the same colour and make of car as a criminal, all have have exactly the same high-false positive problem. There is a sort of instinctive feeling that there is something inherently more invasive, more arbitrary, about being pulled out of a database based on your genetic information, rather than the car you drive or your physical description, but I don’t think it really is. The governmental already holds a pretty scary amount of information on us (c.f. the recent report by the Joseph Rowntree Reform Trust), and investigating people based on genetic data seems much less problematic than investigating people because of their race or social class.
What can they do with this information?
Putting on the Paranoid Hat for a moment; what is the worst the government could do with this data, if they wanted to?
One of people’s biggest general fears about the database is that it holds some vital aspect of ourselves. DNA codes for the processes of life, right? And that means that the government can read our very life code? Is Shami Chakrabarti’s description of the profiles as “most intimate details” correct? More concretely, couldn’t they use the DNA to select people who have genes for disease for a eugenic program, or sell this information to insurance companies?
Well, no, or at least not to any large degree. The STRs themselves are just junk; they do not code for proteins; the sites that that I gave in my last post that start with “D” and then a number are just random bits of DNA from somewhere along the chromosome (D21S11 is the 11th such site on chromosome 21). As such, they cannot tell you anything about a person’s biology; perhaps if you had many thousands of them, and they lay on the same chromosome, you could make some predictions using a process called Imputation, but with only one per chromosome, they contain basically no information.
However, FGA, vWA and TH01 all lie within genes, and the defects in these genes are associated with various diseases, specifically hemophilia for the first two and schizophrenia and alcohol-withdrawal delirium for the third one (see this paper). However, even if you did create a map between the SMG+ STRs and those diseases, it would not contain anywhere near as much information on their disease risk as you can get by combining their age, sex and ethnicity (the latter of which is often correlated with social class, one of the best predictors of health).
The one thing idea that actually makes me worried is that the system has essentially two features that it does really well; it contains a record of people who are arrested, and it can link family members together. Thus, it would be a perfect tool for identifying families that tend to be (in some sense) involved in crime. You could find siblings, parents, grandparents, cousins, who tend to have more than their fair share of criminals in their family. Of course these families are likely to be those that have fallen pray to cycles of poverty, or perhaps even just happen to be a Pakistani family in an area with a few racist policemen (remember, these people just have to have been bought in by the police, not convicted or even tried). However, there are plenty of people who believe that these families will be genetically stupid, or violent, and who would act on this belief; perhaps by profiling people who apply for certain jobs or services, and denying them if they match one of these family clusters. Think of the legal opinion of the US Judge Oliver Wendell Holmes, Jr on enforced sterilization: “three generations of imbeciles are enough”.
Conclusions
I originally wrote a somewhat wordy conclusion to this post about the implications of the DNA database, and how it stands in relation to other police databases (such as the PNC, which holds detailed information on all convicted or cautioned felons, or the vehicle registration number database). However, I get the distinct feeling that I’m not really qualified to say whether the DNA database is or is not qualitatively different from other databases (I do not know how many people have access to the various databases, which is a very important consideration). If you want to find out for yourself, I suggest you read Database State, though keep in mind the proposed changes to the database.
I can say that the importance of DNA data in general can be strongly overestimated, and this especially applies to the neutral markers used by the NDNAD. It is not a reading of your soul, it cannot be used to find your intelligence or health (you couldn’t even do that from your entire genome sequence). I would say that, it terms of learning about you in a general sense, the DNA information is somewhat less important than your Date of Birth, and only slightly more important than the randomly generated database accession code. You get far more information from knowing that you have been arrested in the past, and even more from knowing your demographic data. I would be annoyed at the police taking my DNA information without my consent, but I would be nowhere near as worried as I would be if they recorded my parents’ income.
While I am entirely happy to have the NDNAD used as a rallying point for those who oppose the continued erosion of our civil liberties, we must remember that there are far older, and far more dangerous forms of information collection available to those in power, if they choose to abuse them.
Very good last sentance. I’ve never given a DNA sample (well, not to my knowledge), but there are several large databases floating around that know enough about me to impersonate me well enough.
And as you point out, there is nothing especially *dangerous* that can be done with someone’s DNA, it’s just another piece of information that is held about you, worrying only becausse it makes you realise just how much information certain organisations hold about you.
I am in complete agreement with you that DNA data is one among many information files that are held by governments and private organisations. I do not agree with you that it is the ‘same’ as information about what car we drive or our physical description. This type of information can be misused and not properly understood by the judiciary and juries. But, as is evidenced by thousands of cases throughout the world, genetics evidence is very poorly understood by the courts and the public.
The first wave of the use of probability theory in jurisprudence was in the 17th and 18th centuries. The probability of a jury reaching the correct verdict given evidence XYZ was greatly discussed. What counted as good evidence in a court changed during this period and our present system is pretty much still based upon it. Evidence that a defendant owned the same car that was identified at the scene of a crime falls within this tradition (even if some numbers are added in the form of ‘there are only X number of cars in the country, thus the probability of it being the defendant’s is Y‘). Evidence that DNA evidence matches a defendant does not always fall within this tradition because it is presented by scientists and expert witnesses. In most cases this should not be a problem, indeed it should improve the system, but in some cases judges and juries can be misled by bad science and bias expert witnesses. For sure we have had the standards of DNA evidence kept reasonable high by decisions such as that made by Mr Justice Weir when he rejecting low-copy number (LCN) DNA tests in the Omagh judgement of 2007. It is nonetheless possible that a scientists might be able convince a judge and jury that it was possible to obtain the “ideal conditions” that could produce probabilities of one in 3 trillion. In practice these probabilities are impossible, the taking of DNA evidence being as much an art as an exact science.
I am inclined to agree with the Massachusetts Bar Association that:
DNA evidence is one of the most powerful evidentiary tools to become available in the history of criminal litigation. The evidence demands the highest degree of scrutiny that can be brought to bear and that includes a rigorous review of the highly complex procedures used to create it in each particular case. Anything less creates the risk that our criminal justice system will be converted to one driven by scientific test results, rather than legal principles and common sense.
Another problem is, as with the Bolam test, bad legal principles can be made when the courts listen to expert witnesses and forget about common sense. This is why we must be extremely circumspect when adopting a powerful new scientific method of collecting and storing evidence. The technical and ethical problems that are raised by the storage of DNA data and all the other data on us is yet another problem. I am please to hand that over to Ross Anderson and his colleagues.
I enjoy your blogs - well done.