Supplementary methods and results

Participant sample description

52 healthy volunteers (age 18 – 30) took part in the study. Participants were included in the study after undergoing a psychiatric and medical screening. Participants were included if they showed a body mass index between 18-30 kg/m2, passed basic medical screening for current or past physical illness relevant to the study, were non- or light-smokers (<5 cigarettes a day), had taken no CNS-active medication during the last 6 weeks,were not currently takingd-cycloserine, ethionamide or isoniazid, were screened to be free of a past or present Axis 1 disorder on the Structured Clinical Interview for DSM-IV(First et al., 2002), did not have a first-degree family member with a psychiatric history, and had fluent English skills. Female participants were neither pregnant nor breast-feeding.

Five participants (all in the placebo group) were excluded from the study at the point of data analysis because, when making decisions, they did not use the learnt the probability information at all, but only the explicitly cued magnitude information to make their decisions, making it impossible to estimate their learning rate (corroborated by the logistic regression of past outcomes, see ‘Logistic Regression Analysis’ in the main article for details).The exclusion assessment was made independently by two researchers (JS and NK) who were blind to their group assignments.

This experiment was collected as part of a larger study in which a third drug (hydrocortisone) was used (data not reported).

Participant task instructions

Before the experiment, participants were instructed about the task features. They were informed that the magnitudes were drawn randomly and independently on each trial. They were also informed that the four probabilities were independent from each other. However, they were only told that the probabilities would change over the course of the experiment at varying speeds, not that it was always only one probability that was changing at any given time or that the minimum/maximum probability values were 20% and 80%.

Learning about the unchosen option

To examine the effects of the chosen and alternative (unchosen) option we performed three analyses. In the first two analyses, we looked at the influence of the past chosen or unchosen outcomes on upcoming choices. In the third analysis, we looked at learning differences for the chosen and the unchosen option separately.

First, we ran a regression analyses, predicting switching or staying based on the probabilistic outcomes (1/0) for the chosen reward, the unchosen reward, the chosen loss and the unchosen loss. We also included regressors for the current reward and loss magnitudes. The results are shown in supplementary figure 1 and supplementary table 1a. In both groups, participants were influenced in their future choices by the past chosen and unchosen probabilistic outcomes. However, there was no difference between the groups in the impact of past chosen or unchosen reward or loss outcomes (reward chosen, p=0.94; reward unchosen, p=0.71; loss chosen, p=0.67, loss unchosen p=0.78).

Second, we ran a regression analysis, again predicting switching or staying, based on the Bayesian probability estimates. The regressors were: Bayesian reward and loss probabilities for the past trial chosen and unchosen options and explicitly shown reward and loss magnitudes. Again, we found no group differences in the impact of the past chosen or unchosen options (supplementary figure 2, supplementary table 1b).

Third, we also looked at whether learning from the chosen or the unchosen option differed. For this, we fitted a learning model. In the learning model, we fitted two separate learning rates for learning from the chosen and the unchosen option. To avoid over-parameterization, we did not split the learning rates further by learning rates for reward or loss. Apart from this, the model had the same parameters as the Heuristic Model described in the main paper. We found that the placebo and the d-cycloserine group did not differ in the learning rates for the chosen option (p=0.52) or for the unchosen option (p=0.3); supplementary table 1c.

The results of these three analyses show that participants are influenced by the outcomes from the options they had chosen, as well as the outcomes of the alternative option. The d-cycloserine and the placebo group did not differ in how much their choices were influenced by the chosen or the unchosen options. Similarly, we did not find a difference between the groups in how much they learnt about the chosen or the unchosen options.

Supplementary Table1. a) Results of the regression predicting stay or switch on the current trial, based on the last trial reward/loss probabilistic outcomes and the current magnitude values of the option that was chosen/unchosen on the last trial. Mean and standard error of the decision weights are shown for the two groups (Pla – placebo, DCS- d-cycloserine), as well as the results of between-subject t-tests comparing the placebo and the d-cycloserine groups. b) Results of the regression analysis predicting stay or switch on the current trial based on the reward/loss probability estimates and explicitly shown magnitudes of the option that was chosen/unchosen on the last trial. Decision weights are mean values with standard error; p-values are the results of between-subject t-tests. c) Parameter estimates of the learning model with separate learning rates for the chosen and the unchosen option (mean values with standard errors shown), p-values are the results of between-subject t-tests.

Supplementary figure 1. Results of the regression predicting stay or switch on the current trial, based on the last trial reward/loss probabilistic outcomes and the current magnitude values of the option that was chosen/unchosen on the last trial. Error bars show s.e.m.

Supplementary figure 2. Results of the regression analysis predicting stay or switch on the current trial, based on the reward/loss probability estimates and explicit magnitudes of the option that was chosen/unchosen on the last trial. Errorbars show s.e.m.

Bayesian Learner

A Bayesian Learner (as described in Behrens et al., 2007) was used to estimate the reward and loss probability estimates for each option.

In short, the model estimates the current reward/loss probabilities of an attribute (e.g. reward left option) based on the previous trials for each attribute separately. It does this by taking into account the following properties of the experimental task: 1) The reward/loss outcomes (either 1 or 0) are determined by underlying reward/ loss probabilities. 2) The probability of an attribute can sometimes change. 3) How quickly each probability changes can vary over the course of the experiment. Sometimes, the probability changes more quickly (the attribute’s volatility is said to be high) whereas at other times, it changes more slowly (the volatility is said to be low). 4) The volatility of an attribute is not static but can also change over the course of the experiment.

In other words, the parameters that the model estimates on every trial for each attribute are the reward/loss probability, the volatility (i.e. the change in probability) and the volatility change.

The model estimates these parameters for the current trial based on last trial’s parameter likelihoods and the current attribute outcomes. It does this using Bayes’ rule, which is the most efficient way for updating beliefs given new evidence.

For details on the specific mathematical implementation of the Bayesian learner, please refer to Behrens (2007).

Supplementary references

First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. W. (2002). Structured Clinical Interview for DSM-IV_TR Axis I Disorders, Research Version, Non-patient Edition. New York: Biometrics research, New York State Psychiatric Institute.