Bruno Verschuere (University of Amsterdam), Ewout Meijer (Maastricht University), Bennett Kleinberg (Tilburg University)
Lie or truth? Exactly that question recently led to a heated debate in Dutch Parliament when outgoing Prime Minister Mark Rutte claimed amnesia about sidelining a political rival. Several experts deemed the Prime Minister’s amnesia plausible; two others thought that he had been lying, and yet another stated that Rutte did not know he was lying. This lack of consensus among experts illustrates the difficulty in lie detection. Indeed, one of the main difficulties with deception judgments is the overwhelming number of cues that can be considered. When 138 police officers were surveyed about what could be good cues for lie detection, they mentioned 655 cues (1). Understandably, people struggle with selecting, weighting, and combining these often conflicting cues, and perform barely better than chance (2).
One counterintuitive way of dealing with large amounts of information is to ignore most of it (3). Heuristics simplify difficult decisions by relying only on the most diagnostic cues (’Use the best, ignore the rest’), and have proven useful in diverse settings, from business to emergency rooms (4). In a recent pre-registered and replicated study, we found that the use-the-best heuristic can also be helpful for deception detection. That study shows that relying on one good cue can lead to a better judgement than considering a multitude of cues. Some participants were allowed to consider any possible cue when asked about the veracity of an alibi statement they had just read. Their judgements did not significantly differ for honest and dishonest statements (Cohen’s d = -0.10; see Figure 2). But when considering only a single diagnostic cue (statement detailedness), there were significant and large differences (Cohen’s d = 1.15). The obtained lie-truth effect size is remarkably high, particularly because judgements were brief and did not require any technology, practice, or training. The accuracy was even above that obtained by the count of details by trained coders, the current state of the art in human credibility assessment ( d = 0.77). While these findings are very promising, it is vital to examine whether they generalize to other contexts (Study1) and for judging other diagnostic cues (Study2).
Study1 will examine whether the promising findings obtained on alibi statements generalize to statements about autobiographical events. 64 statements were obtained by Bogaard et al. (2014). Senders wrote one statement about a negative event (e.g., having experienced a car crash) they had actually experienced, and one statement about a negative event (e.g., a beloved family member being treated for a severe illness) that they never experienced and fabricated. Each judge will evaluate 16 statements (8 honest, 8 dishonest; 4 different sets of statements). Prolific judges will either be allowed to incorporate any possible cue into their veracity judgements (i.e., Is this statement truthful? from “deceitful” [-100] to “truthful” [+100]), or are led to follow a simple heuristic relying only on a single, diagnostic cue (‘Is this statement detailed?’; ratings from “not detailed” [-100] to “detailed” [+100]). Hence, we use a 2 (Statement Veracity: Honest versus Dishonest Statement; within-subjects) by 2 (Judgement Methods: Judge detailedness versus Judge veracity; between-subjects) design.
When people talk about a past and specific event, detailedness provides for one of the most valid (and most investigated) cues to deception (5, 6). But we do not consider statement detailedness to be Pinocchio’s nose. Study2 will examine whether the ’Use the best, ignore the rest’ holds for another diagnostic cues (statement clarity). 72 statements were obtained by Verigin et al. (2019). Senders declared to have performed a specific activity (e.g., look for a woman named Michelle at a bus stop) whereas only half of them had actually enacted that activity, and the other half lied about it. Each Prolific judge will evaluate 12 statements (6 honest, 6 dishonest; 12 different sets of statements). Prolific judges will either be allowed to incorporate any possible cue into their veracity judgements, or are led to follow a simple heuristic relying only on statement detailedness or statement clarity (i.e., Is this statement clear, sharp, and vivid?). Hence, we use a 2 (Statement Veracity: Honest versus Dishonest Statement; within-subjects) by 3 (Judgement Methods: Judge detailedness versus Judge veracity versus Judge clarity; between-subjects) design.
Sample size was determined considering two design aspects. First, we wish to obtain reliable estimates for each judgement method. Previous studies were mostly run in the lab, restricting resources, and resulting in underpowered research (5). Simulation studies show that at least 1000 judgements are needed for stable estimates (Tim Levine and Jaume Masip - Want findings that replicate in deception detection research? - YouTube). Hence, we will collect a minimum of 1000 judgements in each design cell of each study. Second, we wish to obtain robust evidence with regard to the difference in judgement methods (the Statement Veracity by Judgement Method interaction). After testing an initial n=200 (Study1) and n=300 (Study2), we will assess the evidential value of the interaction, stop when the evidential value is high (as determined by the Bayes factor for the model with interaction vs the model with main effects only; we stop if the BF is higher than 10 or smaller than 0.10 – each constituting strong evidence in either direction), or collect another round of n=200 (Study1) and n=300 (Study2), until obtaining high evidential value or n=1000 (Study1) and n=1500 (Study2) is reached. We will thus collect data from 200-1000 participants in Study1 and 300-1500 participants in Study2.
We follow Prolific’s recommended fair pay policy of 7.50£ per hour (hence 2.50 £ for the max 20 minute study), and additionally reward 5£ to the 10% best performing participants. For Study1 this leads to a maximal cost of 4,000£. For Study2 the maximal cost is 6000£. Hence, the total cost is max 10,000£.
We will preregister the studies on the Open Science Framework before June, 30 on https://osf.io/z26ar/.
All materials, code , and data will be made publicly available on https://osf.io/z26ar/.
Ethics approval has been obtained, and archived as 2021-CP-13171.
G. Bogaard, E. H. Meijer, A. Vrij, H. Merckelbach, Strong, but wrong: Lay people’s and police officers’ beliefs about verbal and nonverbal cues to deception. PLoS One (2016) https:/doi.org/10.1371/journal.pone.0156615.
C. F. Bond, B. M. DePaulo, Accuracy of deception judgments. Personal. Soc. Psychol. Rev. (2006) https:/doi.org/10.1207/s15327957pspr1003_2.
G. Gigerenzer, D. G. Goldstein, Gigerenzer, G., Todd, P.M., & the ABC Research Group. (1999). Simple Heuristics That Make Us Smart (1999).
G. Gigerenzer, W. Gaissmaier, Heuristic decision making. Annu. Rev. Psychol. 62 , 451–482 (2011).
T. J. Luke, Lessons from Pinocchio: Cues to deception may be highly exaggerated. 1–30 (2018).
B. M. DePaulo, et al. , Cues to deception. Psychol. Bull. (2003) https:/doi.org/10.1037/0033-2909.129.1.74.