Assessing Standards-Based Language Performance

Teacher’s Handbook: Chapter 11 summary: Assessing Standards-Based Language Performance in Context (4th ed., Shrum & Glisan)

In this chapter, you will learn about:

the paradigm shift in assessment practices
the washback effect of tests
purposes of tests
summative vs. formative assessments
the continuum of test item types
assessment formats: prochievement, performance-based, and PALS
an interactive model for assessing interpretive communication
authentic assessments
developing and using scoring rubrics
standards-based Integrated Performance Assessments (IPAs)
empowering students through assessment
portfolios and self-assessments
interactive homework
classroom assessment techniques (CATs)
implications of the OPI for oral assessment
dynamic assessment

Teach and Reflect: Analyzing and Adapting a Traditional Test; Adding An Authentic Dimension to a Performance-Based Assessment Task; Designing an Integrated Performance Assessment (K-16)

Discuss and Reflect: Developing Authentic Assessment Tasks and Rubrics

Conceptual Orientation

In the last decade, the term assessment has been used to refer to “the act of determining the extent to which the desired results are on the way to being achieved and to what extent they have been achieved” (Wiggins & McTighe, 2005, p. 6). The information gathered during assessment provides a window into student learning, thinking, and performance. Equipped with this knowledge, teachers can improve instruction and student performance.

Wiggins (1993) has traced the word assessment to its Latin root assidere, which means “to sit with,” and he suggests that we consider assessment as something we do with students rather than to them (as cited in Phillips, 2006, p. 83). Throughout this chapter, you will see the recurring theme of the value of assessment in assisting and improving learner performance and in therefore having a seamless connection to instruction.

Planning for Assessment in a New Paradigm

Figure 11.1 depicts the paradigm shift in assessment that has occurred in recent yearsas a result of current SLA research, Standards for Foreign Language Learning in the 21st Century (SFLL)(National Standards in Foreign Languge Education Project [NSFLEP], 2006), and experiences in classrooms. Planning begins with a consideration of what learners should be able to do by the end of a period of instruction and what assessments would best serve to assess achievement and track progress; you explored this type of backward-design planning process, in which assessment plays a pivotal role. Within backward design, you anticipate and even plan your assessments as part of designing a thematic unit, before instruction begins.

An important concept in the new assessment paradigm is the emphasis on the use of multiple measures in assessing student progress in order to provide ongoing opportunities for students to show what they know and can do with the language. Figure 11.2 depicts curricular priorities and sample assessment methods; note that while paper-and-pencil tests and quizzes may be adequate for assessing basic facts and skills, performance tasks are necessary for assessing deep understanding and big ideas. Furthermore, in order for broader program evaluation to occur, assessment should be done from the standpoint of multiple perspectives as reported by (Donato, Antonek, & Tucker, 1994, 1996) in their assessment of a Japanese FLES program through analysis that included oral interviews with learners, observations of classroom lessons, and questionnaires completed by learners, parents, foreign language teachers, and other teachers in the school. These types of assessment data provide the basis for a comprehensive assessment not only of learner progress but also of program effectiveness.

The new vision for assessment highlights the need for both formative and summative measures, assessment within meaningful and authentic (i.e., real-world) contexts, and opportunities for students to exhibit creativity and divergent responses. Phillips notes that a great deal of classroom assessment still consists of the “decades-old testing in the form of quizzes and chapter tests with single written right answers” (2006, p. 79). In the new assessment paradigm, there is no place for decontextualized testing of discrete language elements such as translation of vocabulary words and fill-in-the-blank verb conjugations within disconnected sentences. In a standards-based language program, assessments feature a series of interrelated tasks that reflect the three modes of communication, more than one goal area, and technology. It is important to note that in the new paradigm, a task is a performance-based, communicative activity that reflects how we use language in the world outside of the classroom.

The new assessment paradigm also features expanded roles for both teacher and learners. Teachers inform students of how they will be assessed prior to an assessment, and they show students samples of performance that would meet and exceed expectations. Additionally, they provide rich feedback that describes how students could improve their performances. Learners have multiple opportunities to demonstrate growth in language development and progress in attaining the standards; they learn as a result of assessment; and they participate in the assessment planning process, through means such as portfolio development, in which they are empowered to make decisions about how they illustrate their own progress. Of course, the entire assessment process also serves to inform and improve classroom instruction and curricular development.

Key point: In the new assessment paradigm, there is no place for decontextualized testing of discrete language elements such as translation of vocabulary words and fill-in-the-blank conjugations within disconnected sentences.

Current research in assessment argues for “alternative approaches to assessment” that attempt to bring about a more direct connection between teaching and assessment (McNamara, 2001, p. 343). “Teaching to the test” is no longer viewed with disdain, but rather as a logical procedure that connects goal setting with goal accomplishment (Oller, 1991; Wiggins, 1989). Further, as teachers and learners work toward standards-driven goals using authentic materials from real-world contexts, assessment takes a more realistic form.

Four basic principles that can guide foreign language teachers in the development of classroom tests are: (1) test what was taught; (2) test it in a manner that reflects the way in which it was taught; (3) focus the test on what students can do rather than what they cannot do; and (4) capture creative use of language by learners (Donato, Antonek, & Tucker, 1996). For example, if learners spend their class time developing oral interpersonal communication, then testing formats should include assessment of oral language output. Similarly, students who learn in class how to narrate in the past by writing paragraphs about events that occurred during their childhood should be tested by being asked to write paragraphs about past events in their lives. Test items should be designed so that students must understand the meaning being conveyed in order to complete the tasks (Walz, 1989). Furthermore, since a large portion of classroom time is spent in learning language for communication in real-life contexts, testing should also reflect language used for communication within realistic contexts (Adair-Hauck, 1996; Harper, Lively, & Williams, 1998; Shrum, 1991).

Working Towards Standards-Based Authentic Assessment

The term authentic has been used to describe the type of assessment that mirrors the tasks and challenges faced by individuals in the real world (Wiggins, 1998). If student progress in attaining the standards is to be effectively assessed, teachers must adopt an approach to assessment that includes authentic assessment as one type of measure. Since implementation of authentic assessment is still a new endeavor for many teachers, a worthwhile goal is for teachers to work towards implementing more of these assessment tasks for both formative and summative purposes.

The reality of the classroom setting and instructional goals is that teachers make use of a wide variety of assessments, which may vary according to the degree to which they are authentic, given the definition provided above. Although there are differences in the various test formats presented here in terms of their purpose, implementation, and the degree of authenticity that they reflect, they all share the following characteristics:

They are contextualized, i.e., they are placed in interesting, meaningful contexts.
They engage students in meaning-making and in meaningful communication with others.
They elicit a performance of some type.
They encourage divergent responses and creativity.
They can be adapted to serve as either formative or summative assessments.
They address at least one mode of communication.
They can be used or adapted to address goal areas and standards.

Purposes of Tests: A Definition of Terms

Figure 11.3 categorizes key types of tests according to the purposes they serve.

Standardized tests, norm-referencedtests, measure learners’ progress against that of other learners in a large population, e.g., SAT, the TOEFL, Advanced Placement Tests, and PRAXIS exams.
Proficiency tests are also called criterion-referencedtests because they measure learner performance against a criterion, e.g., the ACTFL Oral Proficiency Interview uses the educated native speaker as the criterion against which to judge oral performance;
Instructional tests include commercially prepared achievement tests, such as textbook publishers’ tests andteacher-made classroom assessments.
Researchtests, given, for examples, to learn more about language acquisition (Brooks, Donato, & McGlone, 1997; Phillips, 2006). Many of the empirical studies cited in Teacher’s Handbook used research-based tests.

Summative vs. Formative Assessments

Learners have the reasonable right to expect that their scores should be the same regardless of who is doing the scoring; that is, learners can expect that scorers will view the responses objectively. Furthermore, learners can expect that the test consistently measures whatever it measures. This is called reliability (Gay, 1987). Learners should also be able to expect that the test measures what it is supposed to measure and that this measurement is appropriate for this group of learners. This is referred to as validity. A test is considered to have face validity if it looks as if it measures what it is intended to measure, especially to the test taker (Hughes, 2003). Authentic and standards-based assessments are considered to have face validity because they mirror performance in the world.

Summative assessment often occurs at the end of a course and is designed to determine what the learner can do with the language at that point, e.g., a final exam. Formative assessments are designed to help form or shape learners’ ongoing understanding or skills while the teacher and learners still have opportunities to interact for the purposes of repair and improvement within the instructional setting.

Language teachers should make extensive use of formative testing (Shohamy, 1990), which may be ungraded or graded: quizzes of five to fifteen minutes’ duration, class interaction activities such as paired interviews, and chapter or unit tests. A sufficient amount of formative testing must be done in the classroom in order to enable learners to revisit and review the material in a variety of ways, and formative feedback must enable the learner to improve without penalty.

Summative and formative assessments are systematic, planned, and connected to the curriculum, and many of the assessment tasks are similar. For example, a role-play situation may serve as both a formative assessment task designed to check learner progress within a unit and as a summative assessment at the end of the year or course to assess oral proficiency and learners’ ability to perform global linguistic tasks. Consequently, planning a year-end summative assessment does not need to be overwhelming, since it should reflect the types of formative tasks that students have experienced throughout the instructional experience (Donato & Toddhunter, 2001). Results of summative assessments may be compared across grade levels, classes, and even schools; proficiency results are often used in this way. Additionally, the results of summative assessments may be used to justify the existence of programs and support advocacy, as in the case of early language programs (Donato & Toddhunter).

Key point: Formative assessments are designed to help form or shape learners’ ongoing understanding or skills while the teacher and learners still have opportunities to interact for the purposes of repair and improvement within the instructional setting. Summative assessment often occurs at the end of a course and is designed to determine what the learner can do with the language at that point.

Continuum of Test Item Types

Natural-situationalUnnatural-contrived

DirectIndirect

Integrative/Global Discrete point

“Most language tests can be viewed as lying somewhere on a continuum from natural-situational to unnatural-contrived” (Henning, 1987). With this statement, Henning posited a continuum with the point on either end representing a specific type of test item.

Natural-situational assessments present tasks that learners might encounter in the world outside of the classroom, e.g., writing a response to a letter from a pen pal or key pal from the target culture
Unnatural-contrived assessments feature traditional test items that often focus on isolated grammatical structures and vocabulary within contexts that do not reflect the world beyond the classroom, such as a fill-in-the-blank exercises for verb manipulation.
Direct assessments are those that “incorporate the contexts, problems, and solution, e.g., students deliver a talk to peers
Indirect assessments “‘represent competence’ by extracting knowledge and skills out of their real-life contexts” (Liskin-Gasparro, 1996, p. 171), such as a mltiple-choice grammar test, often lacking face validity.
Discrete-point assessments test one point at a time, such as a grammatical structure or one skill area, and include formats such as multiple-choice, true-false, matching, and completion; an example of this is a quiz on verb endings. Although discrete-point items are most often associated with assessment of one isolated grammar or vocabulary point, they can also be used to assess interpretive listening/reading/viewing or sociocultural knowledge, e.g., a multiple-choice item in which students read a brief description of a dinner invitation and must choose the appropriate form of refusal from among four options (Cohen, 1994).
Integrative or global assessments assess the learner’s ability to use various components of the language at the same time, often requiring multiple modes or skills as well. For example, an integrative test might ask learners to listen to a taped segment, identify main ideas, and then use the information as the topic for discussion, as the theme of a composition, or to compare the segment to a reading on the same topic; learners could be graded on the basis of several criteria including their ability to interpret the text, interact interpersonally with a classmate, and produce a written product. Cohen (1994) describes the continuum of test items as featuring the most discrete-point items on one end, the most integrative items on the other end, and the majority of test items falling somewhere in between (pp. 161–162). Discrete-point and integrative test formats may be either direct or indirect assessments, depending on the degree to which the tasks address problems and strategies that learners would be likely to encounter in the world outside of the classroom.

What implications does the discussion of the continuum of test item types have for foreign language teachers? First, the selection of assessments and test types should always depend on the teacher’s objectives and what is intended to be assessed. For example, if literal comprehension of a reading is being assessed, perhaps a discrete-point, multiple-choice test would be appropriate, while on the other hand, if interpersonal speaking is being assessed, an integrative assessment that engages students in real-life communication would be in order. Secondly, test types that directly address the knowledge, modes, or skills that they are intended to assess may be more valid measures than their indirect counterparts. Although at first students may seem to prefer “one-right-answer” types of tests because that is what they are most accustomed to, recent findings indicate that students may acquire a more positive attitude toward direct tests because they have face validity and allow them to show what they are able to do with the language in real-life contexts. Furthermore, students tend to be more enthusiastic about direct tests if they reflect the type of classroom instruction and practice that they have experienced. For example, in the Integrated Performance Assessment (IPA), about which you will learn later in this chapter, students overwhelmingly commented on how they were able to apply what they had learned to “real” tasks, how they had freedom to express themselves by using what they already knew, and how they felt a sense of accomplishment in being able to use what they had learned in real communicative tasks (Glisan, Adair-Hauck, Koda, Sandrock, & Swender, 2003). Thirdly, teachers should understand the limitations of discrete-point testing in terms of its role in assessing learner performance. As mentioned above, discrete-point items may be used appropriately to assess the interpretive mode of communication and sociocultural knowledge. However, when these items are used to assess grammar and vocabulary, teachers must understand that what is being assessed is recognition—not production or performance. To illustrate, if a learner accurately completes a fill-in-the-blank exercise that requires verb conjugation, the teacher cannot assume that the learner will be able to use these verbs appropriately and accurately in a real-life oral interpersonal task.