DECISION DEADLINES IN CATEGORY LEARNING1
Part 1. Pilot Experiment
The present research addressed the concern that stimulus materials in this area have been relatively narrow, as researchers protected against the possibility that new stimuli would raise new interpretative issues. Believing that it is essential that the field broaden the range of stimulus dimensions to which its theoretical proposals apply, we evaluated two new stimulus dimensions that may also have desirable properties and we describe this evaluative experiment here.
Table S1. Distributional characteristics of the category tasks.
RBhA50.0035.86355.55 16.33 0
B50.0064.14355.55 16.33 0
RBvA35.8650.00 16.33355.55 0
B64.1450.00 16.33355.55 0
RB: rule-based; II: information-integration; h: Horizontal; V: Vertical; m: minor diagonal; M: major diagonal.
Figure S1. Proportion of correct responses across first 15 20-trial blocks for participants performing the RB-horizontal task (black circles), RB-vertical task (black squares), II-minor task (black diamonds), and II-Major task (black triangles).
Participants. Participants were 48 undergraduates from the population already described. There were 12 participants in each of the four tasks: RB-horizontal (RBh), RB-vertical (RBv), II-minor diagonal (IIm), and II-major diagonal (IIM) tasks.Table S1 gives the specifics of the statistical distributions that defined these category tasks.
Procedure. The procedure was described in the methods for Experiment 1 in the main article.
Results: Performance-based analyses. Figure S1 shows by 20-trial block the average proportion correct achieved in each of the four tasks. Participants generally showed robust learning. In the RBh, RBv, IIm, and IIM tasks, respectively, participants improved their proportion correct from .74 to .93, .61 to .88, .62 to .87, and .49 to .76. Overall, they improved from .62 to .86 correct. Participants also showed an RB performance advantage. At Block 15, participants were .90 and .82 correct, respectively, on the RB and II tasks. Over all 15 blocks, they were .86 and .73 correct, respectively, on the RB and II tasks. The learning effect and the RB advantage are seen in Figure S1.
These data were treated descriptively so we could retain for the main article the pair of RB and II tasks that best represented the general data patterns shown by RB and II tasks in the literature. The IIm task (black diamonds) showed the slow, gradual, and negatively accelerating acquisition curve up to fairly high accuracy levels that is typical of II category learning. Likewise, the RBh task (black circles) showed the canonical shape of RB category learning generally, with relatively fast and early learning up to high levels. Accordingly, we carried forward the RBh (black circles) and IIm (black diamonds)tasks in Figure S1 into the experiments described in the main article.
Figure S2. The decision bounds that provided the best fits to the last 100 responses of participants in the RB-horizontal, RB-vertical, II-minor, and II-Major category tasks.
Results: Model-based analyses: Figure S2 shows the best-fitting decision boundaries by task and participant. For the RBh and RBv tasks, the decision boundaries were mainly horizontal and vertical, respectively, as participants generally found the adaptive, RB solutions to these tasks. The RBh task apparently fostered the strongest consensus toward RB category learning. The RBh task produced no guessing participants, for whom a decision boundary could not be estimated, also supporting our decision from the learning data to carry forward the RBh task into the main experiments.
For the IIm and IIM tasks, the decision boundaries tended to be appropriately diagonal. Some II participants did focus on dimensional rules even though these were not appropriate to the category structure and even though this reduced their proportion of correct responses. The dimensional focus by some humans is a common finding in studies of II category learning. The IIm task apparently fostered the strongest consensus toward appropriate decision boundaries. With only two exceptions, and with no guessing participants, the decision bounds for this task were appropriately diagonal along the minor axis of the stimulus space. This consensual learning strategy supported our decision from the learning data to also carry forward this task.
Part 2. Complementary Analyses for Experiment 1
In Experiment 1 of the main article, we conducted a complementary set of analyses,including performance-based analyses that focused not on terminal performance levels only but on measured improvements in learning by comparing initial and terminal levels of performance. For this purpose, we examined the first and the 13th 20-trial block of trials for all subjects (117 of 124) who completed at least 260 learning trials within the response deadline (if one was imposed). These proportion-correct data were also analyzed using the GLM procedure in SAS 9.3. The analysis was a four-way analysis of variance (ANOVA) with task-type (RB, II), mask type (Present, Absent), and Condition (Unspeeded, Deadline) as between-participant factors and Block (1,13) as a within-participant factor. This ANOVA confirmed the results reported in the main experiment. There was a significant main effect for condition, F (1, 109) = 21.86, p < .0001, p2= .167, indicating that performance was generally higher in the unspeeded conditions than in the deadline conditions. Participants were .75 and .64 correct—averaging over Blocks 1 and 13—in the unspeeded and deadline conditions, respectively. Crucially, this analysis also found a significant three-way interaction among task, condition, and block, F (1, 108) = 4.72, p = .032, p2= .042. This confirmed that the deadline condition undercut early-late improvements in performance selectively for the II task. Participants improved from .57 to .69 in the II-deadline condition (.12), from .64 to .86 in the II-unspeeded condition (.22), from .50 to .81 in the RB-deadlinecondition (.31), and from .62 to .88 in the RB-unspeeded condition (.26). The learning in the II-deadline condition was half that in the other three conditions or less. Bonferroni corrected comparisons showed that every condition (unspeeded, deadline) of both tasks (RB, II) produced learning from the start of the experiment to the 13th block.
Table S2. Descriptive statistics of learning scores (13th block minus 1st block)
Mean (SD)Mean (SD)
Unspeeded0.26 (.18)0.22 (.19)
Deadline0.31 (.22)0.11 (.18)
RB: rule-based; II: information-integration; SD: standard deviation.
To fully understand the three-way interaction (task, condition, and block), we conducted planned-comparison analyses using learning scores. We calculated learning scores by subtracting performance at block 1 from performance at block 13 (see Table S2for descriptive statistics). Independent t-test comparisons were made between deadline and unspeeded conditions for each task (RB and II). A significant decrease in learning was found in the II task, t(57) = 2.26, p = .028, Cohen’s d =.59 , but not in the RB task, t(56) = .92, p = .363, Cohen’s d = -.24. This is a clear indication that learning was significantly hurt in the II task but not in the RB task.
As a small matter, this complementary analysis found a significant main effect for block, F (1, 108) = 174.49, p < .0001, p2= .618, confirming that significant learning occurred in the experiment. It also found a significant two-way interaction between task and block, F (1, 108) = 11.85, p < .001, p2= .099, confirming that performance improvements were generally stronger for the RB task than for the II task. Participants improved from .56 in Block 1 to .85 in Block 13 for the RB task. They improved from .61 to .78 in the II task. This is the typical finding in RB-II research, and it helps confirm that our RB and II tasks were well behaved relative to the existing experimental literature.
Part 3. The Difficulty-based Single-System Hypothesis
Given the present article’s goals, one aspect of the multiple-systems debate must be treated especially carefully. Some have supposed that RB-II dissociations arise because RB tasks are easier in some undefined way. This was a conscientious attempt to apply a strict parsimony standard to defend a single-system account of categorization. However, ongoing developments in this field show that this supposition is untenable for many reasons.
First, RB and II tasks are equated for difficulty a priori. They are aminimal-pair contrast, matched for category size, discriminability (e.g., d’), within-category exemplar similarity, between-category exemplar separation, and the proportion correct achievable by an ideal observer. The tasks are simply 45-degree rotations of one another(Fig. 1, main article).
Second, Smith et al. (2011) confirmed this match by showing that pigeons (C. livia) learn RB and II tasks equivalently. Pigeons may learn equivalently because they lack the explicit categorization processes that favor RB learning. Pigeons may confirm multiplesystems by revealing their unitary categorization system(Smith, Berg et al., 2012).
Third, Smith et al. (2014) did not merely show that II learning happened with greater difficulty under deferred-rearranged reinforcement. They showed that II learning did not happen at all—qualitatively. A difficulty notion cannot explain the complete absence of learning, but the elimination of the associative-learning process—on which II learning depends—can explain it.
Fourth, RB and II learning show profoundly different trajectories. RB learning features a leap in performance, consistent with rule discovery (Smith et al., 2014, Fig. 3; main article, Figure 3). Neither the difficulty notion nor single-system models explain this trajectory. The models fit learning curves through gradual parameter changes. They cannot explain the sudden leap, the complete absence of learning until some point, or the explosion of learning at that point. But all aspects of RB learning follow if one assumes a process of explicit rule discovery.
Fifth, some argue that RB tasks must be easier because humans (though in three separate experiments not pigeons) learn them faster. This claim only restates the empirical fact—a psychological explanation is needed instead. Difficulty is a non-psychological label with no process implication. It is nonproductive of theory or research. If one defines difficulty by any objective standard, the tasks are matched for difficulty as we have seen. Examples easily show the emptiness of “difficulty.” Multiplication is faster than repeated addition, obviously because it is a different process that unfolds differently—not because it is easier. Procedural learning is spared in amnesia compared to declarative memory. But one would never say that procedural learning is easier than declarative memory. There are theoretically sharper things to say. There are sharper things to say about RB categorization, too.
Sixth, if one fills the empty difficulty notion by adding that humans learn RB tasks faster because they bring different cognitive processes to bear on them, one endorses the multiple-systems view.
Seventh, the difficulty notion risks self-contradiction. When II learning fails selectively given delayed reinforcement, one may hypothesize that the II task is more difficult. But when RB learning fails selectively given a concurrent load, one must hypothesize that the RB task is difficult and resource intensive. One simply cannot have it both ways, and so the difficulty notion falters.
For these reasons, there is a strong consensus in neuroscience and cognitive psychology that multiple-systems theory is the parsimonious interpretation of the wide-ranging RB-II data. Accordingly, we adopted in the main article the working hypothesis that these dissociable category-learning utilities exist, and we addressed an empirical problem regarding these utilities that remains unexplored.
Part 4. Mean Reaction Times in Experiment 2
Table S3. Mean Reaction Time by Task and Condition
Mean (SD)Mean (SD)
ConditionUnspeededb1.16 (1.30)1.19 (.61)
RB: rule-based; II: information-integration; SD: standard deviation across participants.
aNo exclusion criteria used
bFor each person, trials with reaction time greater than 2 SDs above mean were excluded
*Late responses in the Deadline Condition were excluded
Part 5. Experiment 2 Testing Performance by Condition Order
Part 6. Experiment 2 Appropriate Strategy Participants.
Twenty participants in the RB task and 24 participants in the II task used the appropriate decision strategy in the last 60 trials of training. We analyzed these participants separately, using the same GLM procedure described in the text. There was a main effect of deadline condition, F (1, 40) = 57.38, p < .001, p2 = .589. The main effect of task, F (1, 40) = 3.81, p = .058, p2 = .087, and the task by condition interaction, F (1, 40) = 3.45, p = .071, p2 = .079, did not reach significance. No significant main effect or interaction with order was found.
However, the same performance pattern was found. II participants were .89 (SD = .06) and .79 (SD = .09) correct in the unspeeded and deadline conditions, respectively. RB participants were .91 (SD = .06) and .84 (SD = .09) correct in these conditions.