Electronic Supplementary Material

ELECTRONIC SUPPLEMENTARY MATERIAL

The Economics of Altruistic Punishment and the Maintenance of Cooperation

Martijn Egas*†§ and Arno Riedl‡†

* Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O.Box 94084, 1090 GB Amsterdam, the Netherlands

‡ Department of Economics, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands

Advantages and potential pitfalls of using the internet for experiments

The current standard of performing game-theoretical experiments with humans in experimental economics and biology, is to invite participants to a computerized laboratory. In this way, participants know that other humans are present, but generally each participant is interacting with others through a computer interface and in a cubicle (to maintain anonymity of participants and prevent disruption of the experimental conditions by conferring, teaming up etc.). Simultaneously, the experimenter can answer questions of individual participants on the experimental procedure, and make sure that the participants stick to the rules of the experiment and do not quit before the experiment is finished.

Experimental designs using the internet allow much larger-scale experiments with many more participants than can fit in a laboratory. They also enlarge the subject pool by facilitating the participation of non-students thereby contributing to the external validity of experimental results. These advantages come at the potential cost arising from the fact that the experimenter necessarily has to give up some control over the participants. However, measures can be undertaken to ensure anonymity, prevent teaming up, and allow re-entry to the experiment in case the internet link may be broken. The design of our software and the design and procedures of our experiment allowed us to reach a level of control very close to that in a laboratory.

First, the client software allowed a participant who accidentally lost his connection to log in again and continue where they left off. In case a participant quit during the six rounds of the experiment, the software was programmed to make the theoretically predicted decisions of no contribution and no punishment, so that the other 17 participants in a group could still continue. Fortunately this never happened. A small number of participants quit the experiment at the end of the instructions phase, but before entering the waiting queue (on average one participant per session).

Second, the implemented experimental procedures made it virtually impossible that participants could plan to team up or confer with each other. Participants were unaware of the actual content of the experiment until they participated, precluding any planning ahead. Furthermore, potential participants had to indicate two preferred sessions when they subscribed for the experiment and, given that they were randomly selected (with probability of 0.25) to participate, were assigned to one these two. They were informed about the assignment to a session no more than 24 hours in advance. Finally, in each session participants were allocated to groups of 18 via waiting queues. Participants were not told that there were various groups of 18 per session. Hence, even participants that knew each other and were lucky enough to be invited to participate in the same session (which could occur with probability 0.125 only) had no possibility of knowing they were in the same group and most likely were not.

Third, people should not participate more than once, even in different treatments, because they would have experience which may influence their behaviour. Since potential participants had to subscribe with a name and e-mail address, it is possible that individuals subscribed several times using different names and e-mail addresses despite our warning that this was not allowed for scientific reasons. (There were a few individuals who subscribed twice under the same name but with different e-mail addresses – in such cases one of the two subscriptions was deleted from the records). We were not able to ensure beforehand that participants did not cheat on this rule and subscribed several times. However, our procedure of random selection and random allocation to sessions made it very unlikely that a participant actually participated more than once (the probability for this event is roughly only 1/16 for participating twice and decreasing fourfold with each additional participation). In hindsight we are quite sure that nobody participated more than once because to pay out the earnings, participants had to fill in their bank account details at the end of the experiment, and in these records there are no occurrences of double bank accounts or double names.

Fourth, it is good experimental practice to ensure that the participants are anonymous to the experimenter. In the laboratory, participants get a fake ID and are paid on the spot, so that the experimenter has no way of linking the decisions of a participant to a name on the participant list. In this internet experiment, participants logging into the experiment were assigned an ID number that was in no way coupled to the subscription details. We paid our participants through their bank accounts. Participants had to fill in the details of their bank accounts at the very end of the experiment, after the six rounds and after the questionnaire. The software made a list of bank details without any information on participants’ ID numbers or subscription details, so that the experimenters have no way of linking a participant’s decisions to an identity. All participants were informed about this practice beforehand.

Fifth, in the laboratory, individual participants can ask clarification of experimental procedure if they do not understand the instructions, or fail to answer a control question correctly. To allow such questions in our internet experiment, a chat box was built in the software that allowed the participants to ask questions to the experimenters during the instructions and control questions phase.

Finally, as in any lab experiment there is the possibility that participants get the impression they are interacting with a computer instead of other human beings. It is well known that humans make different decisions when interacting with a computer than with other humans, so it is important to avoid such impression. There is nothing we can think of to alleviate this potential pitfall except for making clear in the instructions that the interactions are between humans (as we did) and to appeal to other scientists not to use deception in their experimental procedures. Fortunately, the fact that the results of our standard treatment do not differ much from results of similar treatments in laboratories strongly indicates that this was not a problem in our experiment.

Socio-economic characteristics of the subject pool

The socio-economic characteristics of the participants differ clearly from a student population, to which participants of controlled laboratory experiments usually belong. The subject pool is a reasonable reflection of the Dutch society, although it also reflects the fact that the experiment was conducted via the Internet.

There are relatively few female participants (28%). The age distribution of the subject pool lies between 12 and 80, with an average of 34.6 and a median of 33 (Figure S1a). A clear majority (58%) of our participants is either employed or self-employed, whereas only 29% is still in training (pupils, college and university students). The remaining subjects (13%) are either not employed, retired or are not covered by any of these categories (Figure S1b). The (gross) income distribution reflects that a sizeable fraction (those still in training) has a low income. The modal income is in reasonable agreement with that of The Netherlands (roughly 2000 euro per month; Figure S1c).

The majority of participants (65%) does not have any children (likely due to the over representation of younger highly educated adults; Figure S1d). Nevertheless, 83% share a household with other people (Figure S1e) and virtually everybody has at least one sibling (Figure S1f). Also, only 10% of the participants did not vote in the recent national elections (88% answered “yes” to this question). The distribution of participants over the political parties shows a left-liberal bias compared to the outcome of the respective election (Figure S1g).

Figure S1

Regression analysis

Legend to the following regression tables S1-S6:

“Txy dummy” represents the intercept of treatment “xy”; “neg.dev.” (“pos.dev.”) is the negative (positive) deviation in contribution of punished participant; “neg.dev.*Txy dummy” (“pos.dev.*Txy dummy”) is an interaction term of “neg.dev.” (“pos.dev.”) with treatment “Txy ” and reflects the marginal change of the dependent variable with “neg.dev.” (“pos.dev.”) in this treatment; “own contributions” is the contribution of the punishing participant, “tot.contrib.others” is the total contribution of both other group members, and “roundzdummy” is a dummy variable for round “z”; robust standard errors are corrected for possible dependencies of observations within each independent group of 18 participants.

Table S1

Tobit regression of allocated punishment points on the deviation in contribution.

# of obs. = 8424
Wald Χ2(19) = 1708.25
Log pseudo-likelihood = -6966.3362 / Prob > Χ2 = 0.0000
(standard errors adjusted for clustering on replicated groups)
dependent. variable: / Robust
allocated punishment points by punisher / Coef. / Std. Err. / z / P>|z|
T31dummy / -6.7610 / 0.6720 / -10.0600 / 0.0000
T33dummy / -4.2537 / 0.5462 / -7.7900 / 0.0000
T11dummy / -2.9684 / 0.5233 / -5.6700 / 0.0000
T13dummy / -1.3385 / 0.5216 / -2.5700 / 0.0100
neg.dev*T31dummy / 0.5989 / 0.0631 / 9.4900 / 0.0000
neg.dev*T33dummy / 0.5107 / 0.0405 / 12.6000 / 0.0000
neg.dev*T11dummy / 0.5556 / 0.0325 / 17.1000 / 0.0000
neg.dev*T13dummy / 0.5561 / 0.0303 / 18.3600 / 0.0000
pos.dev*T31dummy / -0.1967 / 0.0622 / -3.1600 / 0.0020
pos.dev*T33dummy / -0.2131 / 0.0572 / -3.7300 / 0.0000
pos.dev*T11dummy / -0.2405 / 0.0268 / -8.9900 / 0.0000
pos.dev*T13dummy / -0.1877 / 0.0360 / -5.2100 / 0.0000
own contribution / -0.2843 / 0.0226 / -12.5900 / 0.0000
tot.contrib.others / 0.0728 / 0.0111 / 6.5400 / 0.0000
round2dummy / -0.1728 / 0.1840 / -0.9400 / 0.3480
round3dummy / -0.4697 / 0.2277 / -2.0600 / 0.0390
round4dummy / -0.6206 / 0.2259 / -2.7500 / 0.0060
round5dummy / -0.5837 / 0.2697 / -2.1600 / 0.0300
round6dummy / -1.1008 / 0.2254 / -4.8800 / 0.0000

PPs dealt out are significantly increasing with deviation in contribution (p<0.01 in all treatments). Slopes of the regression lines in Figure 2a are not significantly different from each other (neg.dev*T13:0.5561, neg.dev*T11: 0.5556, neg.dev*T33: 0.5107, neg.dev*T31: 0.5989; Χ2=1.84, p=0.6064, joint test); intercepts are significantly different and have the order T13dummy>T11dummy=T33dummy>T31dummy; (T13 vs. T11, Χ2=7.81, p=0.0311; T11 vs T33, Χ2=4.44, p=0.2111; T33 vs T31, Χ2=19.12, p=0.0001; p-values Bonferroni adjusted for multiple comparisons); estimated deviation thresholds (TH) up to which deviation in contribution goes unpunished have the order TH(T13=2.41)<TH(T11=5.34)<TH(T33=8.33)<TH(31=11.29); (T13 vs. T11, Χ2=13.29, p=0.0016; T11 vs T33, Χ2=9.49, p=0.0124; T33 vs T31, Χ2=8.08, p=0.0268; p-values Bonferroni adjusted for multiple comparisons). PPs dealt out to participants with higher contribution (so-called “counter-intuitive punishment”) increased significantly but only slightly with negative deviation in contribution (pos.dev*Txydummy values in the Table give the slopes of the regression lines); For T13, these results compare very well with those of Fehr & Gächter (2002): 28% of total punishment acts were “counterintuitive”, and the coefficient for negative deviation is -0.19 (coefficient for positive deviation in their notation). Counterintuitive punishment acts as percentages of total punishment acts for the other three treatments were: 22.3 for T11, 18.5 fot T33, and 13.1 for T31.

Table S2

Logit regression of the likelihood to punish on the deviation in contribution.

# of obs. = 8424
Wald Χ2(19) = 2895.03
Log pseudo-likelihood = -3529.5275 / Prob > Χ2 = 0.0000
(standard errors adjusted for clustering on replicated groups)
dependent. variable: / Robust
punishment (yes = 1) / Coef. / Std. Err. / z / P>|z|
T31dummy / -2.5569 / 0.2459 / -10.4000 / 0.0000
T33dummy / -1.4091 / 0.2007 / -7.0200 / 0.0000
T11dummy / -0.9861 / 0.2041 / -4.8300 / 0.0000
T13dummy / -0.1774 / 0.2166 / -0.8200 / 0.4130
neg.dev*T31dummy / 0.2444 / 0.0226 / 10.8000 / 0.0000
neg.dev*T33dummy / 0.2088 / 0.0171 / 12.2500 / 0.0000
neg.dev*T11dummy / 0.1982 / 0.0153 / 12.9600 / 0.0000
neg.dev*T13dummy / 0.2114 / 0.0200 / 10.5400 / 0.0000
pos.dev*T31dummy / -0.1193 / 0.0413 / -2.8900 / 0.0040
pos.dev*T33dummy / -0.1125 / 0.0333 / -3.3700 / 0.0010
pos.dev*T11dummy / -0.1207 / 0.0188 / -6.4100 / 0.0000
pos.dev*T13dummy / -0.0974 / 0.0159 / -6.1100 / 0.0000
own contribution / -0.1241 / 0.0122 / -10.1700 / 0.0000
tot.contrib.others / 0.0296 / 0.0052 / 5.7000 / 0.0000
round2dummy / -0.0977 / 0.0878 / -1.1100 / 0.2660
round3dummy / -0.2621 / 0.0948 / -2.7600 / 0.0060
round4dummy / -0.3137 / 0.1010 / -3.1100 / 0.0020
round5dummy / -0.3625 / 0.1168 / -3.1000 / 0.0020
round6dummy / -0.5949 / 0.0997 / -5.9700 / 0.0000

The marginal likelihood that deviating participants are punished when increasing deviation in contribution is significantly increasing in all treatments (p<0.001) but not significantly different across treatments (neg.dev*T13: 0.2114, neg.dev*T11: 0.1982, neg.dev*T33: 0.2088, neg.dev*T31: 0.2444; Χ2=4.29, p=0.2319, joint test). The estimated intercepts follow the order T13dummy>T11dummy=T33dummy>T31dummy, where inequalities indicate statistically significant differences (T13 vs. T11, Χ2=10.84, p=0.0059; T11 vs T33, Χ2=2.89, p=0.5335; T33 vs T31, Χ2=17.96, p=0.0001; p-values Bonferroni adjusted for multiple comparisons).

Table S3

Tobit regression of the effect of punishment on the deviation in contribution.

# of obs = 8424
Wald Χ2(19) = 1135.85
Log pseudo-likelihood = -8260.3915 / Prob > Χ2 = 0.0000
(standard errors adjusted for clustering on replicated groups)
dependent. variable: / Robust
received punishment points by punished / Coef. / Std. Err. / z / P>|z|
T31dummy / -14.7099 / 1.4231 / -10.3400 / 0.0000
T33dummy / -8.6045 / 1.1331 / -7.5900 / 0.0000
T11dummy / -7.1948 / 1.1163 / -6.4500 / 0.0000
T13dummy / -1.9414 / 1.1255 / -1.7200 / 0.0850
neg.dev*T31dummy / 1.2315 / 0.1387 / 8.8800 / 0.0000
neg.dev*T33dummy / 1.1706 / 0.1259 / 9.3000 / 0.0000
neg.dev*T11dummy / 1.0532 / 0.1022 / 10.3100 / 0.0000
neg.dev*T13dummy / 1.3543 / 0.0938 / 14.4400 / 0.0000
pos.dev*T31dummy / -0.4347 / 0.1385 / -3.1400 / 0.0020
pos.dev*T33dummy / -0.4656 / 0.1319 / -3.5300 / 0.0000
pos.dev*T11dummy / -0.5264 / 0.0762 / -6.9100 / 0.0000
pos.dev*T13dummy / -0.3944 / 0.0943 / -4.1800 / 0.0000
own contribution / -0.6372 / 0.0844 / -7.5500 / 0.0000
tot.contrib.others / 0.1570 / 0.0270 / 5.8100 / 0.0000
round2dummy / -0.3646 / 0.4059 / -0.9000 / 0.3690
round3dummy / -1.0002 / 0.4922 / -2.0300 / 0.0420
round4dummy / -1.4835 / 0.5151 / -2.8800 / 0.0040
round5dummy / -1.4087 / 0.6121 / -2.3000 / 0.0210
round6dummy / -2.4772 / 0.4705 / -5.2600 / 0.0000

Slopes of the regression lines in Figure 3 are not significantly different from each other except for the comparison of T13 with T11. (neg.dev*T13 vs. neg.dev*T11, Χ2=10.13, p=0.0088; for all other comparisons p>0.6295; p-values Bonferroni adjusted for multiple comparisons); intercepts are significantly different and have the order T13dummy>T11dummy =T33dummy >T31dummy; (T13 vs. T11, Χ2=16.39, p=0.0003; T11 vs T33, Χ2=1.41, p=1.0000; T33 vs T31, Χ2=23.14, p=0.0000; p-values Bonferroni adjusted for multiple comparisons); estimated deviation thresholds (TH) at which punishment becomes materially noticeable are significantly different and have the order TH(T13=1.43)<TH(T11=6.83)=TH(T33=7.35)<TH(31=11.95); (T13 vs. T11, Χ2=59.27, p=0.0000; T11 vs T33, Χ2=0.37, p=1.0000; T33 vs T31, Χ2=23.97, p=0.0000; p-values Bonferroni adjusted for multiple comparisons).