Itre.cis.upenn.edu

More deceptive statements about Voice Stress Analysis

2014-05-18

Leonard Klie, "Momentum Builds for Voice Stress Analysis in Law Enforcement", Speech Technology Magazine, Summer 2014:

Nearly 1,800 U.S. law enforcement agencies have dropped the polygraph in favor of newer computer voice stress analyzer (CVSA) technology to detect when suspects being questioned are not being honest, according to a report from the National Association of Computer Voice Stress Analysts.

Among those that have already made the switch are police departments in Atlanta, Baltimore, San Francisco, New Orleans, Nashville, and Miami, FL, as well as the California Highway Patrol and many other state and local law enforcement agencies.

The technology is also gaining momentum overseas. "The CVSA has gained international acceptance, and our foreign sales are steadily growing," reports Jim Kane, executive director of the National Institute for Truth Verification Federal Services, a West Palm Beach, FL, company that has been producing CVSA systems since 1988.

How does this stuff supposedly work?

CVSA works by measuring involuntary voice frequency changes that would indicate a high level of stress, as occurs when someone is being deceptive. Muscles in the voice box tighten or loosen, which changes the sound of the voice, and that is what the CVSA technology registers.

"The technology uses proprietary methods to process the vocal input, typically yes or no responses to direct questions," Kane explains. "CVSA analyzes vocal input and identifies responses where stress is either present or absent and provides graphical output for each yes or no response."

Here "proprietary", as far as I can tell, means something like "The original 'voice stress' ideas have been thoroughly debunked both theoretically and practically, so now we won't tell anyone how our products work, so that no one can test the ideas without buying our stuff and taking our training — and if they do that and fail to find positive results, we can say that it's because they did it wrong…"

In "Analyzing voice stress", 7/2/2004, I complained that I and others had been trying for 30 years to validate the claims behind "voice stress analysis", without even being able to find evidence for the stable measurement (and even the existence) of the features (like variable "micro-tremors" or other "involuntary voice frequency changes") that this technology is supposed to be based on.

How can I make you see how amazing this is? Suppose that in 1957 some physiologist had hypothesized that cancer cells have different membrane potentials from normal cells — well, not different potentials, exactly, but a sort of a different mix of modulation frequencies in the variation of electrical potentials between the inside of the cell and the outside. And further suppose that some engineer cooked up a proprietary circuit to measure and display these alleged variations in "cellular stress" (to the eyes of a trained cellular stress expert, of course), and thereby to diagnose cancer, and started selling such devices to hospitals, and selling training courses in how to use them. And suppose that now, almost half a century later, there is still no documented, well-defined procedure for ordinary biomedical researchers to use to measure and quantify these alleged cell-membrane "tremors" — but companies are still making and selling devices using proprietary methods for diagnosing cancer by detecting "cellular stress" — computer systems now, of course — while well-intentioned hospital administrators and doctors are occasionally organizing little tests of the effectiveness of these devices. These tests sometimes work and sometimes don't, partly because the cellular stress displays need to be interpreted by trained experts, who are typically participating in a diagnostic team or at least given access to lots of other information about the patients being diagnosed.

This couldn't happen. If someone tried to sell cancer-detection devices on this basis, they'd get put in jail.

But as far as I can tell, this is essentially where we are with "voice stress analysis."

In "Speech-based lie detection? I don't think so", 11/10/2011, I cited several studies that tested the then-available versions of such technology, and found that they didn't work: Harry Hollien and James Harnsberger, "Evaluation of two voice stress analyzers", J. Acoust. Soc. Am. 124(4):2458, October 2008; James Harnsberger, Harry Hollien, Camilo Martin, and Kevin Hollien, "Stress and Deception in Speech: Evaluating Layered Voice Analysis", Journal of Forensic Sciences 54(3) 2009; Robert Pool, Field Evaluation in the Intelligence and Counterintelligence Context, National Research Council, 2009. The last reference includes an especially systematic review of the literature.

The only substantive argument in favor of CVSA in the Speech Technology article is that polygraph "lie detection" often fails:

Part of the reason for the growing acceptance of CVSA technology, according to Kane, is the attention now being given to several recent high-profile failures of the polygraph. Former NSA employee and whistle-blower Edward Snowden, for example, reportedly passed two polygraph exams during his tenure with the federal agency.

The article does quote a number of unsubstantiated claims from the CVSA salesman and from an industry lobbyist:

Kane says that compared to polygraphs, CVSA is easier to use; takes less time per exam; is less expensive; yields more positive results; is harder to defeat; has a very low error rate; is noninvasive; and works with voice recordings as well as live interactions.

"As an investigative and decision support tool, CVSA has proven itself to be invaluable to law enforcement," adds Lt. Kenneth Merchant of the Erie, PA, Police Department and legislative director of the National Association of Computer Voice Stress Analysts.

Independent research has tied CVSA technology to an accuracy rate that exceeds 95 percent. Polygraph, Merchant says, "is not nearly as close. Results can be inconclusive, which is not something that you have with CVSA."

No indication is given of what this "independent research" is. A search on Google Scholar for "computer voice stress analysis" turns up only nine hits since 2010:

Five of these are patent applications, which contain no test results.

One is a legal document "Recommendation to Retain under DOD Control for Guantánamo Detainee, Gha'im Yadel" which mentions that "detainee was given a Computer Voice Stress Analysis, which showed deception on a number of questions", and another is a journalism-school masters project about an unsolved murder, which mentions that "Mallory changed his story after police told him a Computer Voice Stress Analysis test appeared to be a “deceptive indicator". These show that authorities sometimes use CVSA, and that the results sometimes succeed in pressuring suspects (which seems to be the main value of such tests).

One is a review article that cites one of the studies showing that CVSA doesn't work ("A well controlled study from the University of Florida, on 70 adult volunteers using computer voice stress analysis (CVSA), indicated that the sensitivity of CVSA was about as accurate as chance (Hollien et al., 2008)").

And the last hit is a book on The Unsolved Mystery of Noah's Ark, which doesn't appear actually to contain any references to CVSA.

So I looked at the web site for the National Institute for Truth Verification: Federal Services ("The World Leader in Voice Stress Analysis"). I couldn't find any references to the "Independent research that has tied CVSA technology to an accuracy rate that exceeds 95 percent". It's possible that this quote is the result of some kind of telephone-game exaggeration chain; but it also may reflect an application of the methodology that I discussed in "Determining whether a lottery ticket will win, 99.999992849% of the time", 8/29/2004.

I found only one somewhat-relevant thing on the NITV web site – a page entitled "U.S. Air Force Research Lab Study",, which says, in its entirety

A New Study Of Voice Stress Analysis Funded by the National Institute Of Justice and Conducted By The U.S. Air Force Research Lab Establishes VSA’s Accuracy As “Performance approaching that of current polygraph technology.”

In a highly regarded report presented to the 38th Hawaii International Conference on System Sciences, researchers reported findings that directly contradict both previous and current polygraph-funded studies. These polygraph funded studies, which did not utilize the protocols established by the manufacturers, found the accuracy of voice stress analysis as a truth verification device to be less than chance. The Air Force Lab researchers, using protocols established by the manufacturers of the VSA, were able to determine that VSA technology is, in fact, a viable alternative to the polygraph.

This seems to be a reference to C.S. Hopkins et al., "Evaluation of Voice Stress Analysis Technology", HICSS 2005, which does indeed conclude that "This study has found that VSA technology can identify stress better than chance with performance approaching that of current polygraph systems". though the report adds that "However, [VSA] is not a technology that is mature enough to be used in a court of law".

The study used real-world materials:

The audio data collected consisted of recorded truth verification examinations, where bipolar (Yes/No) responses were given. Ground truth was required to identify deceptive/non-deceptive results. This ground truth typically consisted of a confession and some form of corroborating evidence. In the case of the non-deceptive individuals, a confession or arrest of another person or clearing by other means of investigation was sufficient.

The specific test results cited come nowhere near "an accuracy rate that exceeds 95%":

I believe that "Positive" and "Negative" mean "True Positive" and "True Negative". Thus overall, there were 118+198=316 cases where an analyst decided that the subject was lying, and the analyst was wrong 118/316 = 37.3% of the time. There were 127+73=200 cases where a trained analyst decided that the subject was telling the truth, and here the analyst was wrong 73/200 = 36.5% of the time. So these findings might support a claim of 63% accuracy, but certainly not 95%.

And it's important to note that these were not automated lie/truth outputs from a machine, they were the interpretations of analysts with a lot of training, and often with decades of law-enforcement experience:

It's not clear from the report how much of the audio recordings the analysts had access to, in addition to whatever parts they put through the VSA systems, but the report's description of VSA methodology goes into considerable detail about "pre-test", "test", and "post-test" procedures, used to establish subject-specific baselines comparable to the methods used in polygraph examinations. So a plausible control might have been to ask experienced law-enforcement interrogators to evaluate truthfulness on a purely (rather than partly) subjective basis.

The NITV web page does not provide a link to a later NIJ-funded study on voice stress analysis. This might be because its headline is "Voice Stress Analysis: Only 15 Percent of Lies About Drug Use Detected in Field Test" (NIJ Journal No. 259, March 2008), and it starts this way:

Law enforcement agencies across the country have invested millions of dollars in voice stress analysis (VSA) software programs.[1] One crucial question, however, remains unanswered:

Does VSA actually work?

According to a recent study funded by the National Institute of Justice (NIJ), two of the most popular VSA programs in use by police departments across the country are no better than flipping a coin when it comes to detecting deception regarding recent drug use. The study's findings also noted, however, that the mere presence of a VSA program during an interrogation may deter a respondent from giving a false answer.

VSA manufacturers tout the technology as a way for law enforcers to accurately, cheaply, and efficiently determine whether a person is lying by analyzing changes in their voice patterns. Indeed, according to one manufacturer, more than 1,400 law enforcement agencies in the United States use its product.[2] But few studies have been conducted on the effectiveness of VSA software in general, and until now, none of these tested VSA in the field—that is, in a real-world environment such as a jail. Therefore, to help determine whether VSA is a reliable technology, NIJ funded a field evaluation of two programs: Computer Voice Stress Analyzer® (CVSA®)[3] and Layered Voice AnalysisTM (LVA).

Researchers with the Oklahoma Department of Mental Health and Substance Abuse Services (including this author) used these VSA programs while questioning more than 300 arrestees about their recent drug use. The results of the VSA output—which ostensibly indicated whether the arrestees were lying or telling the truth—were then compared to their urine drug test results. The findings of our study revealed:

Deceptive respondents. Fifteen percent who said they had not used drugs—but who, according to their urine tests, had—were correctly identified by the VSA programs as being deceptive.

Nondeceptive respondents. Eight and a half percent who were telling the truth—that is, their urine tests were consistent with their statements that they had or had not used drugs—were incorrectly classified by the VSA programs as being deceptive.

Using these percentages to determine the overall accuracy rates of the two VSA programs, we found that their ability to accurately detect deception about recent drug use was about 50 percent.

CVSA performed somewhat worse than LVA in these tests, identifying only 8% of the deceptive responses as deceptive, and attributing deception to about 10% of the truthful responses. Here the accuracy was 10% or less, not 95%.

As I wrote back in 2004,

I'm not prejudiced against "lie detector" technology — if there's a way to get some useful information by such techniques, I'm for it. I'm not even opposed to using the pretense that such technology exists to scare people into not lying, which seems to me to be its main application these days. But when a theory about quantitative measurements of frequency-domain effects in speech has been around for half a century, and no one has ever published an equation, an algorithm or a piece of code for making these measurements, and willing and competent speech researchers (like me) can't create reliable methods for making such measurements from the descriptions we find in the literature… something is wrong.

Previous LLOG posts on speech-based lie detection:

"Analyzing voice stress", 7/2/2004

"Determining whether a lottery ticket will win, 99.999992849% of the time", 8/29/2004.

"KishKish BangBang", 1/17/2007

"Industrial bullshitters censor linguists", 4/30/2009 (see especially the comments threads, e.g. here, here, here, here.)

"Speech-based lie detection in Russia", 6/8/2011
"Speech-based lie detection? I don't think so", 11/10/2011