[Proposal] Complex tasks and motivating crowdworkers – an experiment

Complex tasks and motivating crowdworkers – an experiment


Crowdsourcing has become increasingly popular in recent years. Per piece payments are commonly used as an instrument to motivate crowdworkers to work harder on simple and countable tasks. However, performance-based payments are less suitable for tasks for which the objective quality is difficult to assess. Such tasks might include classification of videos, pictures, or texts which are oftentimes used as training sets for many machine learning algorithms. For such tasks, per piece payments may increase the speed but actually reduce the quality.

With oftentimes short-term relationship between the employer and the crowdworker other better-suited motivational techniques are needed. Our study is meant to close the gap in the literature on crowdworker motivation for tasks which quality is rather difficult to assess. In our online experiment, the task will be to read and rate newspaper articles. Regarding the motivation of crowdworkers, we will study the use of (i) recognition and appreciation which, due to the nature of the task, will not be conditioned on the relevant performance; (ii) positive versus negative framing regarding the payment. We will use different performance measures as well as measures of crowdworkers’ motivation and subjective job satisfaction to assess the effectiveness of our treatments. In order to assess performance on the more subjective dimensions we will make use of the fact that each specific article will be read by six different crowdworkers, such that we will be able to compare different solutions among each other.


Participants perform a task in which they have to read a newspaper article about a natural disaster and answer an array of questions about it. We ask readers about their subjective perceptions of the articles (emotions). We also ask whether specific information is included, for example, individual stories or statistics about deaths and injuries.

We include three attention checks which are easy to answer and have an objectively true answer. For each correctly answered attention check the participant will receive an additional bonus. Each time the participant fails an attention check, they are informed about the wrong answer and about the impact on their bonus payment. The task concludes with an array of questions about subjective perceptions regarding the task that pertain to job satisfaction and worker motivation. The total payment depends on the length of the article and the number of correctly answered attention checks.


We follow a 3x2 between-subjects design. Each article will be read and rated by six participants such that all six treatments are implemented within the same article.

Treatment differences in the first dimension concern the inclusion of recognition: Control treatment (C) is the treatment without any specific recognition; treatment R includes four phrases that express recognition of the work done; and treatment A includes four appreciation phrases.

After finishing each of the four pages with the questions about the article participants see a screen with a short phrase. The control group, however, sees a blank page for the same amount of time instead. If participants did not answer the respective attention check correctly, they see the notification that they failed the attention check instead of the screen with a phrase. Note that even if the recognition/appreciation is conditional on answering the attention checks correctly (except the last instance), they are not related to the relevant performance regarding the actual task—article rating. Therefore, we speak about performance unrelated recognition.

C Control

  • [Blank pages] x 4

R recognition

  1. Great work!
  2. You did a good job!
  3. Nice job!
  4. Well done!

A Appreciation

  1. Thank you!
  2. Your help makes a difference!
  3. We appreciate your support!
  4. Your work matters!

The second dimension consist of positive versus negative framing regarding the performance-dependent part of the payment.

Treatment L (loss domain)
If you finish the task you will be paid up to £X/Y/Z .1 Your final payment depends on you meeting the quality requirements. The quality requirements are based on three simple questions that have a clear and objectively correct answer. These questions are positioned between other questions and tasks. Each time you fail to answer the quality check question correctly, your payment will be reduced by £W . The maximum reduction equals thus to £3W , in which case you will be only paid £X-3W/Y-3W/Z-3W.

After the first [second] {third} failed attention check:
You did not answer the quality check question correctly. Your final payment of £ [the corresponding amount] will be reduced by £W.

Treatment G (gain domain)
The payment information is reversed: “you will be paid at least,” “your payment will be raised,” ”maximum additional payment” etc.

Main hypotheses

H1 a-f: recognition/appreciation increases worker motivation / job satisfaction / performance

H2: [Overconfidence and inattention] Better performance in A than in R.

Explanation: Potential opposite effects of overconfidence resulting from too much praise for work done will lead to inattention.

H3: [work to rule] The probability of mistakes is lower in treatment L than in G for objectively verifiable tasks and higher for less objective ones

Explanation: In the loss domain, participants are more likely to “work to rule” - tasks that can be objectively validated will be correctly completed but less effort will be put in creative or tasks that cannot be objectively assessed.


For the additional hypotheses and specification of the outcome variables, see the preregistration documents at the AEA RCT Registry.


We will collect responses from around 4,500 crowdworkers on Prolific.


For any comparison, including interactions, we will have enough power (0.8) to detect a small standardized effect size of 0.15 with alpha 0.05.


We expect an average payment of around £ 1,7 totaling £ 10.175 (including a 33% fee).

Open Science

We will publish a working paper in several repositories, for example SSRN. We will also share the code and the data allowing to reproduce our results.


We have obtained an IRB approval for this study.

1 We divided the articles in 3 groups depending on length. The different baseline amounts depend on the length of the article: short/middle/long.