Barking Up the Wrong Tree: Accuracy, Drug Dogs, and the Supreme Court’s Error in Florida v. Harris

Pictured is an MOD search dog, trained in detecting explosives, with his handler and a spotter.The Ministry of Defence Police (MDP) has one of the largest Dog Sections of all British Police Forces. General Purpose (GP) Police Dogs and their handlers are deployed at 23 Dog Sections across the United Kingdom. The GP teams are trained and licensed in accordance with the National Police Dog Assessment Model. Selected teams will undertake further training to enable them to deploy in a tactical firearms support role alongside firearms officers. The Force operates several classifications of specialist search dogs which are deployed to search and indicate the presence of explosives and drugs, as well as firearms. All dog teams within the Force Dog Section undertake regular Refresher Course and Continuation Training which is delivered by accredited instructors.

February 12, 2016

By: Tyler Jones

As I’ve covered before, drug dogs aren’t as reliable as most law enforcement officers (or courts) believe them to be.  In short, recent experiments in controlled settings and analysis of field data have posed serious questions about the reliability and accuracy of drug dogs.  Due to these studies, several state courts, including the Florida Supreme Court, imposed more stringent record-keeping requirements on drug dog alerts admitted into evidence for criminal trials.  Specifically, the courts ruled that police had to maintain a “comprehensive list of the dog’s prior hits and misses”; absent that list, a drug-dog alert could not establish probable cause.   While this rule wasn’t perfect, requiring law enforcement to keep detailed records of drug-dog accuracy was an important first step to filtering out unreliable dogs.

This progress, however, was short lived.  Speaking for a unanimous United States Supreme Court, Justice Kagan struck down the record-keeping requirement.  While this decision was primarily rooted in how the courts should evaluate evidence of admissibility (the opinion described Florida’s test “the antithesis of [the appropriate] approach”), it also delved into the science and statistics behind drug dogs.  The Supreme Court lambasted the Florida opinion for its “treatment of field-performance records as the evidentiary gold standard when, in fact, such data may not capture a dog’s false negatives or may markedly overstate a dog’s false positives” and went so far as to say that field records are of “relatively limited import” when evaluating accuracy.  Instead of field results, the Supreme Court viewed laboratory tests as more accurate, reasoning: “…inaccuracies do not taint records of a dog’s performance in standard training and certification settings”.  This preference for laboratory tests led the Supreme Court to require, in the absence of contradictory evidence, that alerts from certified drug dogs be admitted in court proceedings.

The Supreme Court erred in two critical areas in this opinion.  First, they chose the wrong type of accuracy to examine; second, in doing so, they hamstrung efforts to collect the information necessary to calculate the correct accuracy rate.  Depending on one’s purposes, there are myriad ways of examining accuracy, but for the purposes of this discussion we will focus only on two: sensitivity and positive predictive value (for further discussion, see this summary).  When the Supreme Court discussed drug-dog accuracy rates, they really meant drug-dogs sensitivity (i.e., probability the dog will alert when drugs are present).  Sensitivity is easy to measure in a laboratory setting (divide the number of correct alerts by the number of possible correct alerts). This number is what drug-dog advocates point to when defending their usage, as many dogs have high (e.g., 95% or more) sensitivity rates.  Sensitivity, at first glance, appears to be a valid measure to use when judging the evidentiary value of an alert.  After all, if a dog has a sensitivity of 95%, doesn’t an alert from that dog give us a 95% chance of finding drugs?  This logic, while tempting, is not correct.  Sensitivity’s fatal flaw is that it assumes that the drugs are always there.  A more accurate measure of whether or not a drug dog’s alert is meaningful is its positive predictive value (i.e., the probability that an alert will be a true positive).  This number is a bit more difficult to calculate, as it shifts depending upon the prevalence of the quarry in the target population (see here for the full equation).  The impact of prevalence means that, unless we have up-to-the-minute information on the portion of American’s who carry drugs on a given day, we can’t calculate the positive predictive value of an alert in the lab. Thus, we need field data to tell us if a dog’s alert is actually meaningful.

This is where the Harris decision is particularly detrimental.  By explicitly banning the practice of requiring field records, the Supreme Court has prevented lower courts from being able to evaluate the positive predictive value of a given dog unless law enforcement, through their own volition, collects that data.  And even when that data is available, the Supreme Court’s ruling has made it clear that field data alone is not sufficient evidence to find a drug dog unreliable.  This policy has resulted in appellate rulings such as U.S. v. Green, where a drug dog with a field accuracy rate (aka positive predictive value) of only 22% was admitted as reliable, or U.S. v. Bentley, where the court openly derided the accuracy of the drug dog in question before begrudgingly admitting its alert as reliable evidence due to the Supreme Court’s ruling in Florida v. Harris.

To give the Supreme Court their due, they were correct that field records have their own issues.  Accurate alerts may be falsely categorized as incorrect if drugs are particularly well hidden or recently consumed, and requiring evidence of a reliable record before admission in court does create a catch-22 for newly-trained dogs.  However, these are logistical hurdles that can be overcome. The magnitude of these inaccuracies pales in comparison to the potential negative impact of basing a decision on the wrong form of accuracy or trusting that certification by third-parties (which currently have no regulatory oversight) is sufficient to ensure reliability.

116 queries in 2.321 seconds