AERA Proposal for the Alignment Lit Review

Alignment 7

Alignment of Standards, Large-scale Assessments, and Curriculum:

A Review of the Methodological and Empirical Literature

Meagan Karvonen

Western Carolina University

Shawnee Wakeman and Claudia Flowers

University of North Carolina at Charlotte

Support for this research is provided by the National Alternate Assessment Center (www.naacpartners.org) a five-year project funded by the U.S. Department of Education, Office of Special Education Programs (No. H324U040001). The NAAC represents a collaborative effort between the University of Kentucky, University of North Carolina at Charlotte (UNCC), National Center on Educational Outcomes (NCEO), the Center for Applied Special Technology (CAST), and the University of Illinois at Urbana-Champaign. The opinions expressed do not necessarily reflect the position or policy of the Department of Education, and no official endorsement should be inferred.

Abstract

The purpose of this study was to provide a comprehensive review of the literature on the alignment of academic content standards, large-scale assessments, and curriculum. After reviewing the characteristics of 195 identified resources on alignment published between 1984 and 2005, this review primarily focused on (1) a comparison of features of alignment models and their methodologies, and (2) a narrative and quantitative analysis of characteristics of 67 empirical alignment studies. Based on this review, several recommendations for further research and improvements in alignment technology were made.

Alignment of Standards, Large-scale Assessments, and Curriculum:

A Review of the Methodological and Empirical Literature

The educational community sometimes assumes that instructional systems are driven by content standards, which are translated into assessment, curriculum materials, instruction, and professional development. Research has shown that teachers may understand what content is wanted and believe they are teaching that content, when in fact they are not (Cohen, 1990; Porter, 2002). Improvements in student learning depends on how well assessment, curriculum, and instruction are aligned and reinforce a common set of learning goals, and on whether instruction shifts in response to the information gained from assessments (National Research Council, 2001). Alignment is often difficult to achieve because educational decisions are frequently made at different levels of the educational agency. For example, states may have one set of experts who develop written standards, a second set of experts who develop the assessment, and a third set of experts who train teachers in standards-based instruction. Finally, it is teachers who translate academic standards into instruction.

In 1994 the Improving America’s Schools Act and Title I of the Elementary and Secondary Education Act required states to set high expectations for student learning, to develop assessments that measure those expectations, and to create systems that hold educators accountable for student achievement. The No Child Left Behind Act (2002) reiterated this emphasis on quality assessment of student achievement; final NCLB regulations require that states’ assessment systems “address the depth and breadth of the State’s academic content standards; are valid, reliable, and of high technical quality; and express results in terms of the State’s academic achievement standards” (55 Fed. Reg. 45038, emphasis added). NCLB peer review guidance (U.S. Department of Education, 2004) indicates that judgments about the compliance of states’ assessments systems with Title I requirements will be made based on evidence submitted by states (e.g., alignment studies) rather than assessments themselves. The Guidance further recommends that states consider the following points about their assessments:

o Cover the full range of content specified in the State’s academic content standards, meaning that all of the standards are represented legitimately in the assessments; and

o Measure both the content (what students know) and the process (what students can do) aspects of the academic content standards; and

o Reflect the same degree and pattern of emphasis apparent in the academic content standards (e.g., if the academic content standards place a lot of emphasis on operations then so should the assessments); and

o Reflect the full range of cognitive complexity and level of difficulty of the concepts and processes described, and depth represented, in the State’s academic content standards, meaning that the assessments are as demanding as the standards; and

o Yield results that represent all achievement levels specified in the State’s academic achievement standards. (U.S. Department of Education, 2004, p. 41)

These issues should be considered in the alignment of the state’s entire assessment system, including assessments for students with disabilities and English language learners. Low complexity methods, such as simply mapping assessment items back to state content standards, are insufficient for peer review purposes (U.S. Department of Education, 2004, p. 41).

Alignment can be formally defined as the degree of agreement, overlap, or intersection between standards, instruction, and assessments. In other words, alignment is the match between the written, taught, and tested curriculum (Flowers, Browder, Ahlgrim-Delzell, & Spooner, in press). Accurate inferences about student achievement and growth over time can only be made when there is alignment between the standards (expectations) and assessments. From this perspective, alignment has both content and consequential validity implications (Bhola, Impara, & Buckendahl, 2003; LaMarca, Redfield, Winter, Bailey, & Despriet, 2000).

The consequences of poorly aligned standards, assessments, and curriculum are potentially significant for students and educational systems. Aligning curriculum with assessments can result in improved test scores for students regardless of background variables such as socioeconomic status, race, and gender. In contrast, misalignment may reinforce differences among students based on their sociocultural backgrounds, as those with more exposure to educational opportunities in their everyday lives may still perform well when tests measure content that is not taught in the classroom (English & Steffy, 2001). Strong evidence of alignment between assessments and state standards supports the validity of interpretations made about test scores.

For many years, states and test developers have relied on content experts and other item reviewers to make judgments about whether test items reflect the content of particular strands within state content standards. The AERA position statement on high-stakes testing calls for alignment of assessments and curriculum on the basis of both content and cognitive processes (AERA, 2000). Bhola et al. (2003) emphasized the need to use more complex methods for examining alignment that go beyond content and cognitive process at the item level. La Marca et al. (2000) reviewed and synthesized conceptualizations of alignment and methods for analyzing the alignment between standards and assessment. They identified five dimensions that should be considered, based largely on Webb’s (1999) work:

1. Content match, or the correspondence of topics and ideas in the standards and the assessment,

2. Depth match, or level of cognitive complexity required to demonstrate knowledge and transfer it to different contexts,

3. Relative emphasis on certain types of knowledge tasks in the standards and the assessment system,

4. Match between the assessment and standards in terms of performance expectations, and

5. Accessibility of the assessment and standards, so both are challenging for all students yet also fair to students at all achievement levels.

The emphasis in this study is on the methodologies used to empirically investigate alignment, and on the existing empirical evidence that might indicate what degree of alignment has been achieved in large-scale assessment systems. In addition to the focus on alignment of standards and assessments emphasized by La Marca et al. (2000) and Webb (1999), this study examines the alignment of standards and assessments with the curriculum taught in schools. This review and synthesis of literature is intended to yield information about gaps in methodological approaches to examining alignment, as well as areas in which additional empirical investigations are needed to establish sound criteria for judging the quality of alignment.

Methods

This section describes the literature search and identification procedures, primary and secondary coding procedures, and data analysis strategies.

Literature Search and Identification Procedures

Cooper (1989) warned against overly narrow problem formations in the early stages of a literature review, as limited conceptual breadth poses a threat to the validity of the study. Thus, the scope of the literature search was initially very broad. Literature written between 1984 and 2005 that had a primary focus of alignment was the target of the search. The scope of the alignment included measures between (1) assessment and curriculum / instruction, (2) assessment and content standards, (3) content standards and curriculum / instruction, (4) instruction and instructional materials, (5) measures of alignment between two types of standards, and (6) a combination of assessment, content standards, and curriculum / instruction. Assessments included both general and special education instruments that were either objective or alternative (e.g., performance-based, portfolio). Classroom and district-level assessments were excluded from this study, but alignment in higher education settings was included. Studies on alignment based on standards at any level (e.g., district, state) were included.

A total of 28 terms or combinations of terms were used to define the research base of alignment resources (e.g., sequential development; alignment and curriculum; accountability, alignment, and assessment). Electronic and print resources were used to identify materials for possible inclusion. Electronic databases searched included InfoTrac, Google, ERIC, PsychInfo, Academic Search Elite, Books in Print, and Dissertation Abstracts. The websites of assessment organizations (e.g., Harcourt, Measured Progress, Buros Institute for Assessment Consultation and Outreach), technical assistance centers (e.g., National Center on Educational Outcomes), educational organizations (e.g., Council of Chief State School Officers, National Center for Research on Evaluation, Standards, and Student Testing [CRESST]), and state education agencies were also searched for nonpublished alignment material. As some websites identified a very large number of potential hits (e.g., Google identified 5,690,000 hits for alignment and assessment), the first 150 of those documents were reviewed for potential inclusion. The reference lists of identified books and several seminal and recent works (e.g., Bhola, Impara, & Buckendahl, 2003: Case, Jorgensen, & Zucker, 2004; La Marca et al., 2000; Webb, 1997) were also searched. Contacts with authors were made when identified materials could not be located. Finally, a follow up list of prominent authors (e.g., Andrew Porter, Robert Rothman, John Smithson, Norman Webb) and model names (e.g., Surveys of Enacted Curriculum, Achieve, Council for Basic Education) were also searched in Google to ensure complete coverage of the reference material.

Conceptual relevance of each source identified in the literature search was determined by the study coordinator, who applied the inclusion criteria liberally during the first round of literature identification. Resources that were of questionable relevance were reviewed by a second author.

Coding Procedures

Initial coding was done on the entire set of identified documents in order to broadly identify the nature of the alignment literature identified. A secondary coding scheme was applied to the empirical resources.

Initial coding procedures. Identified material was entered into a database by reference and was coded according to three categories: (a) elements being aligned (as described above), (b) type of document, and (c) purpose or focus of document. The type of document was defined by five categories. Literature was coded as a report if it was written as a non-published paper, technical report, dissertation, or brief. Presentations included all papers or multimedia work presented to an audience. Journal articles were published works found in a journal or newsletter format. Books included any chapters in edited works or manuals disseminated by states. Finally, other included all training materials, web pages dedicated to alignment, and other relevant alignment work (e.g., state documents that discussed alignment but did not include any empirical data or methodological descriptions).

The purpose or focus of the document was coded into six groups. Conceptual included literature that either defined alignment, discussed the relationships among standards, assessment, and curriculum, discussed reasons for alignment, or argued the benefits of well-aligned systems or drawbacks of poorly aligned systems. Resources that described a model or method to conduct alignment studies were coded as methodological. Literature that focused on recommendations for policy about alignment was coded as policy. Documents that included data collection procedures and results from an original alignment study were coded as empirical. Review/synthesis was coded for materials that described more than one primary source for alignment. Finally, other was coded for miscellaneous foci that did not fit other categories (e.g., state descriptions of alignment without methodological or empirical components; instances where rubrics or test blueprints were used to examine alignment).

Interrater reliability was obtained for each coded category. Two researchers coded a sample of 80 documents (41%) to obtain inter-rater reliability. A point-by-point method (the number of agreements for occurrences and non-occurrences divided by the total of points multiplied by 100) was used to calculate the reliability. The average reliability for type of alignment was determined to be 88% (range of 50%-100% agreement). As there were only two documents that were identified as addressing alignment between standards and curriculum/instruction, the reliability percentage of 50% reflects one disagreement. The median agreement was 91%. The average reliability for the type of document was 100%. The average reliability for purpose or focus of document was 90% (range of 50%-100%). Policy was identified as the focus of six documents by one researcher and by three for the other researcher resulting in a 50% agreement rate. The median was 96%. Consensus was found for any disagreements across all categories.

Secondary Coding Procedures. Using a coding form developed by the first author, two researchers summarized information about the resources identified in the first phase as empirical studies. Categorical data were recorded for type of literature; content area(s) and grade levels; elements of the educational system aligned; descriptions of the types of standards assessment, and instructional indicators; alignment methodology used; and entity that conducted the alignment study. The second author coded three resources with a second coder for training purposes, and then both people coded three additional resources and compared codes before the second coder coded the remaining empirical studies independently. Reliability on the secondary coding was 93% based on a sample of 11 resources (16% of the empirical literature). One researcher entered data into SPSS and cleaned the database prior to analysis.

Data Analysis Strategies

Descriptive statistics were calculated on all primary codes for the entire set of literature, and on the secondary codes for the subset of empirical literature. Frequencies were also calculated for key characteristics of alignment studies, by alignment methodology. Narrative descriptions of some articles were provided to illustrate certain points about the literature.