I am working on a budget for a new study, and I am trying to get a sense of what percent (if any) of the data may need to be discarded once I conduct post-hoc data quality checks. For participants who pass attention checks, what percentage are likely to produce data that needs to be discarded for other reasons (e.g., outlier analysis, invariance/longstring analysis)?
I have never done a statistical analysis, and I am just speaking from memory but my perception is that:
A large proportion do the tests at an amazing speed at which it may be difficult to believe that they are answering anything but randomly but a study on this issue found test retest reliability. So i think that quite a lot of Prolific respondents are just very fast.
Prolific respondents are good at getting past IMCs (instructional manipulation checks?) that tell them to “click response 5 or whatever”, if they are going at breakneck speed, or even perhaps if they are not paying as much attention as researchers would like.
Nonsense items that look more like the rest of the questions such as “I am proud of being a rodent” mixed into a self-esteem scale, can result in about a 5-10% failure rate. I don’t think that this means that they are necessarily not paying attention. They may think it is a spelling mistake, misunderstand, misread, or think it metaphorical. To maintain quality I have in the past not used about 5% of data for this reason.
I have sometimes been scared of sock puppet type multiple responses from the same respondent, due to the similarity of free response items, from participants with similar Prolific IDs (when reordered by ID). E.g. “Name someone you admire” and the only two respondents to mention non-humans were adjacent in such an ordering with, “My dog”, and “My cat”. But I have no proof. Prolific uses checks to prevent this and I know a participant who was ejected from the participant pool for just for having a VPN running on their computer.
I would be interested to here the results of your outlier analysis, and or invariance/longstring analysis (even if I am not sure what they mean).
Thank you so much for the thoughtful response, Tim!
This is not really an answer to your question but here’s a paper that compared the data quality of different online participant pools like Prolific and MTurk, it might be of interest: Peer, E., Rothschild, D., Gordon, A. et al. Data quality of platforms and panels for online behavioral research. Behav Res 54 , 1643–1662 (2022). Data quality of platforms and panels for online behavioral research | SpringerLink
If I remember correctly, Prolific was one of the platforms that provided best data quality.