Supplemental Online Materials for a Challenge to Estimates of an Upper-Bound on Relations

Supplemental Online Materials for “A Challenge to Estimates of an Upper-Bound on Relations between Accumulated Deliberate Practice and the Associated Performance in Domains of Expertise:Comments on Macnemara, Hambrick, and Oswald’s (2014) Published Meta-Analysis”

This document reports detailed considerationsof studies included in Macnemara, Hambrick, and Oswald’s (2014) meta-analysis and their relation to our original definition of deliberate practice (Ericsson, Krampe & Teach- Römer, 1993, p.368): “Throughout development toward expert performance, the teachers and coaches instruct the individuals to engage in practice activities that maximize improvement. Given the cost of individualized instruction, the teacher designs practice activities that the individual can engage in between meetings with the teacher. We call these practice activities deliberate practice and distinguish them from other activities, such as playful interaction, paid work, and observation of others, that individuals can pursue in the domain”. The definition is summarized by Ericsson and Lehmann (1996, pp. 278-279) as “the individualized training activities specially designed by a coach or teacher to improve specific aspects of an individual's performance through repetition and successive refinement. To receive maximal benefit from feedback, individuals have to monitor their training with full concentration, which is effortful and limits the duration of daily training”.

In the first part I will show that most of the included studies do not provide valid measures of the duration of accumulated deliberate practice based on the first three criteria. I have included additional quotes to support our defining characteristics and how they differ from typical characteristics. The fourth criterion requires that the study measures the performance towards which the deliberate practice was directed. Measuring a performance that was not the target of the accumulated deliberate practice is likely to reducethe observed correlation between the measured accumulated deliberate practice and the measured performance.

In the second part I analyze the remaining studies that have met the four criteria of the first part. The remaining studies do not support Macnemara et al.’s (2014, p. 1615) rejection of Ericsson and Moxley’s (2012, p. 145, italics added) statement that “the concept of deliberate practice can account for the large individual differences between experts and novices”. The remaining studies, with the exception of one study, only analyze the relation between attained performance within groups of experts or groups of novices, and thus does not address the question of whether large individual differences between novices and experts can account for those large differences in performance. I will conclude by discussing the only study included by Macnemara et al.’s (2014)that meets the criteria for accurately estimating the relation between accumulated deliberate practice and attained performance.

Part I: Identifying Studies that Include Measures of Practice that Do Not Meet the Criteria of Deliberate Practice or Do Not Measure the Performance Targeted by Deliberate Practice

Many of the studies included in Macnemara et al.’s (2014) meta-analysis do not measure the accumulated amount of deliberate practice designed to improve a targeted performance in the domain of expertise. I will start by applying the most simple and obvious criteria of deliberate practice and then list the studies not meeting the criterion of deliberate practice. I will then apply more complex criteria to studies that have not yet been rejected to identify studies that meet the criteria.

Criterion 1: Included Studies that Do Not Even Mention “Deliberate Practice”

The included studies should refer to deliberate practice in their text and they should refer to a study that actuallyestimates accumulated deliberate practice. There were a number of studies in the “education” category that never mention deliberate practice in their text and only cited the study by of Plant, Ericsson, Hill and Asberg’s (2005) article “Why study time does not predict grade point average: Implications of deliberate practice for academic performance” (GPA) (p. 96). This article did not study deliberate practice, but instead examined study behavior in college “in light of characteristics of deliberate practice” and found “important similarities as well as differences” (p. 114). These studies are listed in Table 1.

The actual measures analyzed by these articles do not meet the criteria for deliberate practice with practice assigned by a teacher to address a particular student’s weaknesses with respect to defined target performance. For example,Brunborg, Pallesen, Diseth, & Larsen (2010, p. 128) measured “the average number of hours they spent on study activities per week, including lectures, tutorials, private seminars, and self-study”. Loyens, Rikers, & Schmidt (2007, p. 585) asked students to estimate “mean number of hours spent on self-study per week”.In addition, these studies do not collect estimates of all accumulated practice relevant to their measured performance, typically grades in college. It is reasonable to argue that that learning attained during schooling in K-12 education will have allowed students to learn relevant knowledge and skills, which will be reflected in their performance prior to taking a given college class. By only measuring study activities during the particular semester, the influence of earlier study activities will not be considered.

I will now specify criteria linked to our definition of deliberate practice that can easily be applied to the practice and study activity described in each study and thus help us identify studies that meet or violate each criterion of deliberate practice.

Criterion 2: The Need for a Teacher or Coach

In our research we have tried to be quite explicit about the differences between individualized training supervised by a teacher and other types of practice activities. Ericsson and Lehmann (1996, p. 279) clearly stated the differences: “In many domains, knowledge of effective training procedures has accumulated over a long time, and qualified—often professional—teachers draw on this knowledge to design deliberate practice regimens for individual students. In domains such as chess, for which there is no organized system of formal training, Ericsson et al (1993) found practice activities that have the characteristics of deliberate practice and were thereby able to extend their research framework to these domains. From informal interviews with elite chess players, they learned that these players created optimal learning situations by studying published chess games for several hours every day and attempting to predict—one by one—the moves chosen by chess masters.”

One of the main challenges for practice activities is to find practice activities that are known to improve a targeted aspect of performance for an individual at a given level of skill. In order to assure that the practice tasksare relevant to a particular individual’s improvement of performance toward the objective goal, it is necessary that the individual’s performance is evaluated and an effective task with immediate feedback is found. Finding tasks for particular individuals is the responsibility of the teacher. The absence of a teacher for all or most of the accumulated practice time violates the definition “Ericsson et al (1993) used the term deliberate practice for the individualized training activities specially designed by a coach or teacher to improve specific aspects of an individual's performance through repetition and successive refinement.” (Ericsson & Lehmann, 1996, pp. 278-279, italics added). The studies that violate the requirement of a supervising teacher are listed in Table 2.

A number of the cited studies in this sectionexaminesmany training activities and then asks the participants to judge the relevance of the different activities for improving performance. For example, Catteeuw, Helsen, Gilis, & Wagemans (2009) found that only a small number of activities are seen as highly relevant to improvement and none of them involved a teacher or coach, who guided the training for the referee to improve thetargeted skill.

Criterion 3: Need for a Coach or Teacher to Assign Individualized Practice Tasks with Immediate Feedback and Goals for Practice.

In many of the studies that have not yet been found to violate the definition of deliberate practice, most of the training is supervised by a coach or teacher. This training is, however, often conducted as a group rather than individual athletes or performershaving the teacher or coach design and monitor each trainee’s individualized training with training tasks including immediate feedback. In sports it is rare to have extensive one-on-one instruction, especially in team sports. This led to the realization that there was not any obvious activity, such as the individualized instruction and practice alone of musicians, that could possibly explain the large individual differences in performance. In an effort to find an equivalent to deliberate practice in music, Janet Starkes and her colleagues searched for a different definition. Unfortunately, this led to an incorrect definition. Janet Starkes and her colleagues (Starkes, Deakin, Allard, Hodges, & Hayes, 1996, p. 99) claimed that “Ericsson et al. (1993) defined deliberate practice as an activity “rated very high on relevance for performance high on effort, and comparatively low on inherent enjoyment’ (p. 373)”. These attributes, according to Ericsson et al. (1993) were not defining attributes but characteristic attributes of “practice alone in music”. Ericsson et al. (1993) attempted to make the differences between defining characteristics of deliberate practice as being individualized training by teachers and predictions about typical characteristics: “Our framework made predictions about the qualities of various domain-related activities, such as deliberate practice. We predicted that deliberate practice would be rated very high on relevance for performance, high on effort, and comparatively low on inherent enjoyment. We could evaluate ratings by expert individuals to determine the extent to which deliberate practice is perceived to have these attributes” (Ericsson et al., 1993, p. 373, italics added).

Even with their incorrect definition Starkes et al. (1996) conclude that they found no activities in wrestling and skating: “[w]We have no activities that fit the deliberate practice definition [their incorrect definition]” (p. 99). When the coach or teacher is leading a group of individuals’ training then the training is not individualized, thuspreventing the individuals from engaging in training tasks with immediate feedback and opportunities for repetition that are designed especially for them. The studies that lack coaches and teachers that individually guide and assign trainees’ individual training are listed in Table 3.

Most studies simply collect data on practice activities and then form a sum of all accumulated practice even in individual sports (c.f. Hodges & Starkes, 1996, for wrestlers). There are exceptions. For example,in his master’s thesis Young (1998) relied on Starkes et al.’s (1996) incorrect definition and concluded that based on the middle-distance runners’ rating of 27 weekly activities,Young (1998, p. 47) found that “no activities in the current study would have qualified as DP [deliberate practice] for middle distance running”. Young (1998) then proposed five practice activities that he felt should be related to middle-distance performance, namely hard-up tempo runs, long interval work, speed work, races and time trials, and mental preparation. In Study 2 he estimated the accumulated duration of “deliberate practice” by adding up the estimated accumulated duration of those five activities. It is important to note that none of these practice activities were demonstrated to be assigned by the coach with a particular goal for practice, which would be required to meet the criteria for individualized training supervised by a coach (c.f. deliberate practice). It is noteworthy that Macnemara et al. (2014) collected their data from the master thesis rather than the published report on Study 2 (Young & Salmela, 2010), which reported results based on a superior method of statistical analysis.

Most of the other studies listed in Table 3 are more straight forward, Baker, Bagats, Büsch, Strauss, & Schorer (2012) described the measured practice time as “Training with team” (pp. 26-27) and Hendry (2012) calls the estimated training and practice activity “estimates of hours per week in organized soccer practice” (p. 39). The effect sizes reported from Ward et al. (2007) explicitly included “team practice”, which is likelybeneficial for performance improvement, but does not meet the criteria for individualized training.Rather consistently Macnemara et al. (2014) in the description of the independent variable in their open data refer to the practice as sport-specific training without any reference to whether the training was individualized or whether the practice tasks could be repeatedly performed with immediate feedback and with explicit goals for improvement of aspects of performance.

Criterion 4: The Accumulated Deliberate Practice has to Be Directed Toward the Same Performance Goal as the Administered Test of Performance

In many domains it is difficult to administer tasks that objectively measure the target performance in a domain. In classical music, for example, the target is to perform pieces that have been extensively practiced. This is the traditional method of evaluating performance among expert musicians at music competitions. It is also consistent with objective testing of music performance to assess the successful development of performanceamong music students in the United Kingdom.

In research on music performance, investigators avoid the problem of measuring this performance or they are genuinely interested in studying other types of music performance or some particular aspect of music production. These studies present a challenge for anyone interested in assessing the relation of accumulated deliberate practice and attained performance because the attained performance is not the one that has been targeted by the efforts associated with the accumulated deliberate practice.

In Table 4 I have listed studies that measure other music abilities than the ability to perform rehearsed music namely sight reading or playing a scale with high degree of temporal evenness. In the sight-reading task, the participants play music by reading it directly from the score, because they do not practice the piece before they perform it. Among expert musicians there is often a large discrepancy between rehearsed musical performance and sight-reading performance (Lehmann & Ericsson, 1996). There is nothing wrong with administering such tests and analyzing the structure and acquisition of performance on those tests. The problem arises when investigators quantify the training towards one goal while testing performance for another.This approach is not unlike finding limited effects on tests of English from accumulated time of studying Latin (Douglass & Kittelson, 1935).

After applying the fourth criterion only 20 out of the original 157 effect sizes included in Macnemara et al.’s s (2014) meta-analysis (See Table 6). I will now turn to examine if these studies evaluate Macnemara et al.’s (2014, p. 1615) rejection of Ericsson and Moxley’s (2012, p. 145, italics added) statement that “the concept of deliberate practice can account for the large individual differences between experts and novices”.

Part II: Covering the Large Differences in Performance between Novices and Experts -Restriction of Range of Attained Performance

It is a well-recognized problem that inferences about how much variance can be explained by analyzing samples depend on the range of sampled values, for the independent or dependent variables.

Several of the studies included in Macnemara et al.’s (2014) meta-analysis do not include individuals representing the entire range of performance in the domain of expertise, so the studies do not include both novices and experts. For example, Johnson, Tenenbaum, and Edmonds (2006, p. 121) describes their two samples of national and international swimmers: “Elite swimmers (n=8) in this study had achieved at least one gold medal at an Olympic Games or World Championships, were ranked in the top 5 in the world at the conclusion of a calendar year, or, in one case, was the top ranked 13 year old in the world in her primary event. Sub-elite swimmers (n=11) in this study had not achieved these criteria, yet had qualified for at least one U.S. National Championship or, in the case of the youths involved in this study, had achieved a top 5 national ranking in the U.S.A. for their age-group.” Their analysis did not include a group of novices, such as recreational swimmers or recreational swimmers that used to swim competitively in high school. If samples of both novices and expertshad been included, the relation between accumulated amount of coach-led practice and attained performance would most likely be considerably stronger.

In Table 6 I list the studies associated with 18 effect sizes that only included novices or experts, but not both. This leaves two effect sizes that will be discussed in the next section.

Studies that Do Not Violate the Criteria for Measuring Accumulated Deliberate Practice norRestriction of the Range of the Attained Performance

One of the remaining two studies is easy to discuss. In fact, there is no publically available information about Maynard, Hambrick, & Meinz (unpublished) so it is not possible to evaluate it based on my proposed criteria discussed above.

The only remaining studyis a study that I am very familiar with, namely Study 2 by Ericsson et al. (1993). This study consisted of two groups of pianists that differed in performance—Ericsson et al. (1993) reported a number of analyses of objective performance to verify the large differences in performance. This study measured the piano performance of amateur and expert pianists with the average age of around 25 years old. All musicians played the same music piece while it was recorded. When expert raters rated the piano performances without knowing the group membership of the pianists (blind judgments) the experts received much higher ratings (N=12, M=6.4) than the amateurs (N=12, M=4.7). The expert pianists’ accumulated practice until age 18which was significantly more than the amateurs (N=12, M=7,606) and not a single experts’ estimated practice overlapped with that of the amateur pianists (N=12, M=1,606). The same two groups of participants were included in Study 1 by Krampe and Ericsson (1996) and analyzed with the addition of one group of old experts and one group of old amateurs. Macnemara et al.’s. (2014) open dataset refers to their performance as “music interpretation (consistency of phrasing)”, when the reported correlations actually measured ratings by experts of recordings of the same piece of music, which were conducted blindly without knowledge of the group membership of the rated performer. If the measure of consistency of phrasing between the two recordings of the same piece had been used as the performance variable then this measure would not meet the criterion of measuring a variable that measured the target of the accumulated deliberate practice and thus violated criterion 4. When Macnemara et al. (2014) analyzed the data from Ericsson et al. (1993, Study 2) they used an ANOVA to estimate the association of practice to category membership they estimated a correlation of 0.9, whichwould account for over 80% of the variance in music performance).