Backward/Forward Induction: an Experiment

Drift Effect under Timing without Observability: Experimental Evidence*

Mauro Caminati1,2, Alessandro Innocenti1,2°, Roberto Ricciuti1,2,3

1Department of Economics, University of Siena

2LabSi, Experimental Economics Laboratory, University of Siena

3ICER, Torino

Abstract. We provide experimental evidence of Binmore and Samuelson’s (1999) insights into modelling the learning process through which equilibrium is selected. They proposed the concept of drift to describe the effect of perturbations on the dynamic process leading to equilibrium in evolutionary games with boundedly rational agents. We test two different versions of the modified Dalek game within a random-matched population. We also impose that the first mover makes his or her decision first (‘timing’) but the second mover is not informed of the first mover’s choice (‘lack of observability’) to emphasize the learning process taking place within the population. Our results support Binmore and Samuelson’s model.

Jel Codes: C72, C92

Keywords: evolutionary games, experiments, drift, order of play.

* We thank three referees for valuable comments and Francesco Lomagistro for writing the experimental software and for research assistance. Research support was provided by MIUR and the University of Siena. Previous versions were presented at the 7th CEEL Workshop in Experimental Economics (Trento, 2003), the ESA European Conference (Erfurt, 2003), Università di Siena, and Royal Holloway University of London.

° Corresponding author: Alessandro Innocenti, Dipartimento di Economia Politica, Piazza S. Francesco 7, 53100 Siena (Italy). Telephone: +39.0577.232785 Fax: +39.0577.232661 Email:

Drift Effect under Timing without Observability: Experimental Evidence

1. Introduction

In extensive form games a theory of equilibrium selection follows the logic of backward induction that results from coupling Harsany and Selten’s (1988) notions of subgame consistency (“Play in a subgame is independent of the subgame position in the larger game”) and of truncation consistency (“Replacing a game with its equilibrium payoffs does not affect play elsewhere in the game”).[1] In games of imperfect information, the possibility of multiple equilibria in a subgame may require a choice between these equilibria, before the logic of backward induction can be applied. Obviously enough, the logic of backward induction fails altogether, if there are not proper subgames as a result of imperfect information.

Another class of equilibrium refinements follows from the general logic of forward induction, which states that in games of imperfect information, players at an information set can deduce plausible beliefs from the information conveyed by previous choices in the game. Forward induction applies in situations in which previous players’ actions can reveal private information to other players observing those actions and moving at subsequent decision nodes in the game. However, what happens if observability is eliminated but timing does not change (timing without observability)?

In situations where the force of standard selection arguments is not self evident, or fails altogether, convergence in the actual play may depend upon features of the game that, if irrelevant to the arguments based on the careful scrutiny of a game in a single interaction, are pertinent to the dynamic properties of strategy-play distribution resulting from the learning of boundedly rational agents. Our experiment focuses on this issue, which is relevant for the analysis of equilibrium convergence in evolutionary games. Specifically, this paper intends to provide experimental evidence for Binmore and Samuelson’s (1999) model of the learning process through which equilibrium is selected. They proposed the concept of drift to describe the conditions under which perturbations in evolutionary games can stabilize a strategy distribution in the population of players that is not locally asymptotically stable in the learning dynamics. Our experimental design aims at testing the relevance of the drift effect within a random-matched population interacting in an evolutionary setting under conditions of timing without observability.

The paper is organized as follows. Section 2 explains the theoretical background and how the paper relates to the existing theoretical and experimental literature. Section 3 describes the experimental design. Results are presented in Section 4. Section 5 draws some conclusions and suggests some future experiments.

2. Theoretical and experimental background

In games of imperfect information with multiple equilibria, play may depend on seemingly irrelevant features of the game that do not have direct bearings on the applicability of standard selection arguments. In fact, in iterated versions of the game, these apparently irrelevant features may direct players’ adaptive learning towards different equilibria.

Repeated interaction among players in actual game situations as a way of studying the stability properties of the Nash equilibrium was the object of Binmore and Samuelson’s (1999) drift. A strong motivation for referring to this theory of the relationship between learning and the selection of equilibrium is that it claims to be more compatible than other theories (Young, 1993; Kandori, Mailath and Rob, 1993), at least in the absence of ‘local interactions’ (Ellison, 1993), with the time horizons that are more likely to be observed in the laboratory.[2]

Consistently with Binmore and Samuelson (B&S henceforth) notion of drift in games, the out-of-equilibrium states that are observed in repeated plays of a game can be modelled through dynamic coupling of the following features:

i. Initial conditions corresponding to the selection by agents in the subpopulation of players of different strategies sustained by prior beliefs about the behaviour of opponents.

ii. Population ‘learning’ reflecting the frequency increase in each subpopulation of players of the strategies that are more profitable, given the available information on the actual play of the game.

iii. Drift; that is, the expected subpopulation-average ‘mistake’ in a given game situation.

iv. Residual stochastic perturbations.

The drift component may arise from a misreading of the game induced by careless scrutiny that leads to confusion between the conditions relating to the game and those relating to similar games previously experienced in real life. The prediction is that the size of the drift component is inversely related to the potential payoff consequences of a mistake.

The theory of drift proposed by B&S predicts that under suitable conditions, the dynamics induced by the coupling of (ii) and (iii) can ‘stabilize’ an equilibrium w* belonging to a set of Nash-equilibrium components E that are not asymptotically stable in the learning dynamics induced by (ii). The reason for this lack of dynamic stability is that at an unattained information set h, a player is indifferent with regard to the actions that are available at h. The suitable conditions refer to the ‘compatibility’ between drift and E. In particular, the action distribution prescribed by drift at h must belong to the relative interior of the set Eh of action distributions at h supporting some state in E as Nash equilibrium. The larger Eh, the higher the probability that drift is ‘compatible’ with E.

To test this prediction, B&S proposed a modified version of the Dalek game (Figure 1) as the object of a possible experiment.[3] If x is equal to 0 and standard assumptions on the unperturbed selection function are made, the game has two components of Nash equilibrium:

i. A strict Nash equilibrium given by (Play-Red, Left) obtainable for iterated dominance: (a) for Player 1, Exit dominates Play-Black; (b) after deleting Play-Black, for Player 2, Left dominates Right; (c) after deleting Right, for Player 1, Play-Red dominates Exit.

ii. A component E of Nash equilibria in which player 1 chooses Exit and player 2 plays Right with probability at least p = 2/9.

B&S suggest that the smaller the payoff x obtained by Player 1 with (Play-Red, Right), the lower the frequency of the outcome (Play-Red, Left). This is because the size of the Nash equilibrium component E is inversely related to the value of x, so that x affects the probability that drift is ‘compatible’ with E. The complication arising in this context, which must be considered in the evaluation of the experimental evidence, is that the slight difference in payoffs between the two games affecting the critical value of the probability p may affect factors other than the size of E. It could induce a change in the initial conditions of the game (as admitted by B&S); it can potentially change the specification of the drift component by altering the set of real-life games that players recognize as ‘similar’ (as we argue).

In terms of the notation used above, in the proposed game, the solution (Exit-Right) is a state in E. For ease of exposition, the play of the game is said to correspond to this solution if the population distribution of play converges to an action in E. One would expect the frequency with which the play of the game corresponds to the two solutions to be influenced by the payoff modifications leaving the subgame perfect equilibria unchanged, but affecting: (a) the size of the set Eh; (b) the agents’ prior beliefs, and hence the initial conditions of repeated play; and (c) the size and specification of the drift component.

In the form proposed by B&S, the game has two pure-strategy equilibria that are both subgame perfect. Our experiment does not completely adhere to B&S’s proposal. We test actual behaviour in the two modified versions of the Dalek game represented in Figure 2, which preserve the same pure-strategy equilibria of the game shown in Figure 1. To analyze the effect of repeated interaction in a random-matched population, we impose a sequence of choices different from B&Ss game: first Player 1 chooses between Exit and Play; then, simultaneously, Player 2 chooses between Left or Right and Player 1 chooses between Red and Black (if he or she plays). We also impose that Player 2 makes his or her only choice without knowing the choices of Player 1 even if both players are perfectly informed of the sequence of choices. At the end of each repetition, both players are informed of the choice made by their counterpart and of the distribution of choices made in the subpopulation of other players. Our variant has three advantages. First, Players 2 make their decision only on the basis of the choices made by the population of Players 1 in previous periods. Second, interactions between members of the same subpopulation are ruled out, because players have no information on the frequency distribution of play within their own subpopulation. Third, if Player 1 chooses the outside option, the choice made by Player 2 is recorded in the experimental evidence.[4]

The main implication of our design is that, as a result of the imposed lack of observability by Player 2, there are not proper subgames in the two games. Although the first mover makes his or her decision first, the second mover is not informed of the first mover’s choice and this lack of observability would seem to prevent any form of inductive reasoning. When game theorists deal with cases of timing without observability, they assume that an unknown event does not affect subsequent choices. This long-standing view[5] follows from the idea that priority of information includes priority in time but not vice versa, and this makes the former the basic approach to characterizing strategies. Experimental work provides evidence for the opposing view that the timing of unobserved moves might matter. In a seminal work, Rapoport (1997) showed how players, when they choose in a predetermined sequence without knowing previous players’ choices, behave differently from when all players move simultaneously. The main finding of Rapoport’s experiment is that subjects are inclined to select the strategy that gives the first mover his or her preferred outcome. Güth et al. (1998) provided further experimental evidence that this attitude could increase the probability of disequilibrium play, although this effect can be weakened if there is an outcome of the game that is clearly perceived as fair. More recently, order of play effects in laboratory experiments were studied by Muller and Sadanand (2003)[6]. Their experiments can discriminate behaviour conforming to the ‘pure timing effect’ found by Rapoport from behaviour predicted by theoretical explanations of the way the order of play influences players’ reasoning process.[7] They find convincing evidence of both types.

Following the main ideas of this literature, the mere order of play might act as a coordination device inducing Players 2 to focus on the equilibrium that is preferred by Player 1. This feature is worth considering when interpreting our evidence, because it may affect the initial play conditions of our game, but we emphasize that our design does not intend to address the issue of pure-timing effects, or even less detect more profound effects of timing as proposed by the theory of virtual observability[8] (Weber et al., 2004).

As a result of the described deviations from B&S’s proposal, there are not unreached information sets in the two games shown in Figure 2. Still, if Player 1 takes the outside option, Player 2 is indifferent between the choices that are available at his information set. This gives rise to the Nash equilibrium component E in our games and all equilibria in E are not hyperbolic under the learning dynamics (ii).

Let a state in the play of our games be , where Î Zj is the distribution of population Player i over the choices available at her information set j and the elements of sum to 1. Moreover, to clarify how our design may be relevant to detect effects of the type proposed by the theory of drift, we assume, following B&S, that population i's average mistake at information set j points towards a fixed distribution of play (the average bias induced by real life experience with ‘similar’ games) in the relative interior of Zj and its measure is decreasing in the maximum pay-off relevance of a mistake at information set j, if the state of play is z. Then, since Player 2 is indifferent between Right and Left at z Î E, for a fixed specification of , Player-2 drift will be highest[9] at z Î E, and the larger E, the higher the probability that drift is ‘compatible with E’, in the following sense, which closely mimics the definition of Binmore and Samuelson (1999): there is w* in the relative interior of , such that the action prescribed by w* at information set 2 is .