Massachusetts EEC Common Metric Project Memorandum #1 1
MEMORANDUM #1
Date:REVISED February 22, 2013
FROM: Gary Resnick, Ph.D., and Pamela Kelley, Ph.D.
TO: Sherri Killins, Commissioner and Jennifer Louis, Project Manager, Massachusetts Early Education and Care
RE: Massachusetts Common Metric Project: Memorandum #1 (REVISED), Identify Common Items within Domains
------
Executive Summary
1. An investigation into the extent of conceptual alignment across the three instruments (WSS, GOLD, and COR) suggests a moderate-to-high degree of alignment. In other words, the test developer’s domains were found to be relatively comparable across the three instruments, and there appeared to be good coverage of similar items across all three instruments. For example, of the total 407 items, 75% matched on all three instruments, while only 25% were partial matches (i.e. matched on two or less of the three instruments). This finding supports the plan to move forward with more in-depth analyses.
2. Each instrument was tested separately for internal consistency and reliability by domain. The results suggest a high degree of reliability on all three instruments, with WSS and GOLD having very high reliability (domain alpha coefficients of .93 or greater), while COR was somewhat lower with a wider range (domain alpha coefficients of 0.79 to 0.95). These results suggest that the items used to measure the developer domains were consistent and all tap the same underlying construct.
3. Each instrument was tested separately to assess the extent to which scores were normally distributed. Normally distributed scores can be one indicator of how well an instrument measures or distinguishes between different levels of children’s’ ability. The results were mixed. For example, for WSS, large groups of children had high scores, creating a “ceiling effect” which may indicate the test is not effective at identifying children at different levels of ability for a given developmental domain. For GOLD, normal distributions appeared more consistent, however, a pattern was observed in which scores were clustered in the center of the distribution, suggesting that the test may not distinguish children with particularly low or high ability levels. The COR showed a mixture of both of these distribution patterns. These results suggest that the full range of children’s abilities may not be fully represented by these test domains.
4. The assessments were also tested for their ability to distinguish between children’s ability levels by age group. Age group differences in which younger children are rated consistently lower on a test than older children indicate that the test detects maturational differences. Also tested was the interaction effects between age and quartile groups to determine whether younger or older children hold their positions across skill levels. Our testing found mixed results. On the one hand there were significant age differences in the expected developmental progression for many subdomains on all three tests. On the other hand, the age group differences do not consistently hold up across different ability level groups. For example, on the WSS Social Studies subdomain, three-year olds and five-year olds in the lowest ability quartile for their age group had similar scores. These types of mixed findings were evident for at least two subdomains across all three assessments.
5. There are two recommendations that we can make based on the level of missing data found in the dataset, which far exceeds the convention for the amount of missing data “allowed” to continue the sophisticated analyses planned in the next phases of this project. First, the factor analytic analyses should be considered preliminary due to the amount of missing data, and there is concern that there may be enough cases to properly explore some subgroups of the population. A second suggestion is that it would be useful to discuss these missing data issues and potential strategies for improving the quality of future assessment data.
Thefollowing memorandumdescribes the first set of tasks completed under the Common Metric Project. The project comprises a descriptive study of three criterion-referenced child assessment tools in use by early education and care providers within the Commonwealth of Massachusetts. The Massachusetts Department of Early Education and Care made these tools available, along with the training, to allow providers to assess their children’s strengths and challenges, to assist with educational programming. However, these tools may also provide aggregate information programmatically on children’s progress in the five developmental domains, in order to benchmark developmental growth. To do so, it would be important to devise a common metric so that children on the various tests can be compared. Further, the common metric would be norm-referenced, allowing for comparing groups of children of different ages with their peers.
This project will assess the feasibility of developing a statistical methodology to answer the question of what are the baseline skills, knowledge and abilities of children entering preschool and kindergarten, using descriptive analytic methods. To do this, the project is focused on determining the commonalities among Teaching Strategies GOLD, High Scope Child Observation Record, and Work Sampling System assessments tools in measuring five key developmental domains.
This memorandum is the first of three and represents early analytic work examining the three tests conceptually and empirically. In this memorandum, we cover the following key areas:
- Determine Alignment of Items and Developmental Domains across the Assessments
- Explore Distributions for Test Developer Domain Score Distributions
- Describe Internal Consistency Reliability of Test Developer Domain Scores
- Assess Developmental Differences and Progression by Ages and Quartiles, by test
The purpose of these tasks is to explore how each test defines key domains of development[1], and, based on data collected by MA EEC, determining the distributions of the domain or sub-domain scores. The first task focuses on determining the extent to which the items and domains or sub-domains of development from each of the three assessments are aligned with each other, and how they correspond to the five larger domains of development typically considered as key areas of young children’s early skills. The second task is designed to understand the distribution of sub-domain scores for each test and whether the scores meet the assumptions of normality required for more extensive factor analyses. The third task is to describe how well the items from each test are highly correlated as intended to measure the test developer’s domains or sub-domains of development. That is, we will report on the internal consistency form of reliability for each of the tests key sub-scales. The final task is to determine whether scores from the three tests show the expected progression across ages and across groups of children at the lower and higher ends of the distribution. In this task, we would expect that a test with good measurement properties would be able to distinguish younger from older children as well as children within age groups who are operating below or above their peer group.
All analyses were conducted using those developmental domains or sub-domains designed by the test developer, to determine whether scores on these domains reflect meaningful distinctions in children’s abilities and development. The results will help to determine the feasibility of conducting the exploratory and confirmatory factor analyses in the next phases of this project.
Analytic Methods and Procedures
The assessment instruments referred to in this report include the following: Teaching StrategiesGold (GOLD),the Child Observation Record(COR),and WorkSampling System (WSS). The WSS includes five separate but related assessments: Preschool 3 (P3), Preschool 4 (P4), Head Start 3 (HS3), Head Start 4 (HS4), and Kindergarten (K).
The Work Sampling System (WSS), Teaching Strategies GOLD (GOLD), and Child Observation Record (COR) use ordinal rating scales to measure children’s progress. The WSS is based on a three-point scale, the COR uses a six-point scale, and GOLD uses a nine-point scale. For the Common Metric Project, the individual item ratings were summed to create raw scores for each subdomain and domain and a total score was then calculated by summing the domain scores.
To address missing data, scores were imputed based on valid cases. That is, a set of decision rules were made whereby a certain number of missing items would be tolerated and a score could be imputed prorated based on the number of valid items. In most situations, if a given case was missing one, two or sometimes three items, a score for that sub-domain could still be generated. If more than the threshold number of missing items occurred, then that case was considered missing.
I. Alignment of Items and Developmental Domains acrossthe Assessments
Question: To What Extent Do the Developmental Domains and Items Match Across the Three Assessments?
A. Alignment of Developmental Domains
The first phase of the project focuses on determining the extent to which the three assessments are conceptually aligned. Qualitative and quantitative methods were used to examine the domains, including reviewing the instruments themselves, studying the publisher’s technical support documentation, and counting the number of domains and items within domains. A summary of each assessment is provided below:
COR consists of 34 items organized into 6 major domains: 1) initiative, 2) social relations, 3) creative representation, 4) movement and music, 5) language and literacy, and 6) mathematics and science.
GOLD consists of 66items organized into 9major domains: 1) social-emotional, 2) physical, 3) language, 4) cognitive, 5) literacy, 6) mathematics, 7) science and technology, 8) social studies, and 9) the arts. A tenth domain,English language acquisition, is included to assess language skills for English Language Learners.
WSS: the P3 (49 items), P4 (55 items), and K (66 items) assessments are organized into 7 major domains: 1) personal and social development, 2) language and literacy, 3) mathematical thinking, 4) scientific thinking, 5) social studies, 6) the arts, and 7) physical development and health. The HS3 (54 items) and HS4 (59 items) assessments are organized into 10 major domains: 1) physical development and health, 2) social and emotional development, 3) approaches to learning, 4) logic and reasoning, 5) language development, 6) literacy and knowledge skills, 7) mathematics knowledge and skills, 8) science knowledge and skills, 9) creative arts expression, and 10) social studies knowledge and skills.
Table 1 summarizes the content of each of the three tests according to the domains listed by the test developer (shaded blue), and within the larger categories of the five key developmental domains drawn from the child development literature and used by the National Education Goals Panel (shaded pink).[2] These five domains are as follows: 1) social-emotional, 2) language and literacy, 3) cognitive and general knowledge, 4) approaches to learning, and 5) physical development and health.
The boundaries between these domains and constructs within them are somewhat artificial,as noted by other experts in the field.[3]For exampleusing vocabulary was categorized under language and literacy;however, because it is also relevant to understanding science and general knowledge, it could also have been categorized under cognitive and general knowledge.[4] Thus, the five domains are presented here for heuristic purposes only as opposed to child development theory-building.[5]
Further, some domains, such as language and literacy, have longstanding research supporting their conceptual and operational definitions, while other domains, such as approaches to learning,are less well-defined and in some cases may overlap with cognitive, language and social domains.[6]The key distinction in defining approaches to learning is to identify those behaviors that convey a child’s effort and engagement in classroom learning, particularlylearning related to attention andpersistence(focused, enduring, goal-directed learning),as well as competence motivation (initiative for effectiveness in learning).[7]In this project, we use this definition of approaches to learning to classify test items and determine the degree to which the items match, with the understanding that some of the distinctions will be arbitrary, for all domains and perhaps especially for approaches to learning.
For the purposes of this project the above five key developmental domains are referred to as the Common Metric Domains to distinguish them from the Test Developer’s Domains.
Table 1. Common Metric Domains: Domain Recoding Scheme[8]WSS (All Versions) / Gold / COR
Social-Emotional Common Metric Domain
Personal and Social Development / Social Emotional / Social Relations
- Self-Control
- Self-Concept
- Interaction with Others
- Social Problem Solving
- Regulates own Emotions and Behaviors
- Establishes and Sustains Positive Relationships
- Participates Cooperatively and Constructively in Group Situations
- Relating to Adults
- Relating to Other Children
- Resolving Interpersonal Conflict
- Understanding and Expressing Feelings
Approaches to Learning Common Metric Domain
Approaches to Learning / No Similar Domain[9] / Initiative
- Initiative and Curiosity (HS)
- Persistence and Attentiveness (HS)
- Cooperation (HS)
- Taking Care of Personal Needs
- Making Choices and Plans
- Solving Problems with Materials
- Initiating Play
Language and Literacy Common Metric Domain
Language and Literacy / Language andLiteracy[10] / Language and Literacy
- Listening/Receptive Language
- Speaking/Expressive Language
- Reading
- Writing
- Alphabet Knowledge (HS)
- Print Concepts and Conventions (HS)
- Engagement in English Literacy Activities (ELL/HS only)
- Listens to and Understands Increasingly Complex Language
- Uses Language to Express Thoughts and Needs
- Comprehends and Responds to Books and Other Texts
- Demonstrates Emergent Writing Skills
- Demonstrates Knowledge of the Alphabet
- Demonstrates Knowledge of Print and its uses
- Uses Appropriate Conversation and Other Communication Skills
- Demonstrates Phonological Awareness
- English Language Acquisition
- Demonstrates Progress in Listening to and Understanding English
- Demonstrates Progress in Speaking English
- Listening to and Understanding Speech
- Using Complex Patterns of Speech
- Using Vocabulary
- Reading
- Writing
- Using Letter Names and Sounds
- Demonstrating Knowledge about Books
- Showing Awareness of Sounds in Words
Cognitive and General Knowledge Common Metric Domain
Mathematical Thinking / Mathematics / Mathematics and Science
- Mathematical Processes
- Number and Operations
- Patterns, Relationships, Functions
- Geometry and Spatial Relations
- Measurement
- Scientific Thinking
- Inquiry/Scientific Skills and Method
- Conceptual Knowledge of the Natural/Physical World (HS only)
- Life Science (K only)
- Physical Science (K only)
- Earth Science (K only)
- Uses Number Concepts and Operations
- Demonstrates Knowledge of Patterns
- Explores and Describes Spatial Relationships and Shapes
- Compares and Measures
- Science and Technology
- Uses Scientific Inquiry Skills
- Demonstrates Knowledge of the Characteristics of Living Things
- Demonstrates Knowledge of the Physical Properties of Objects and Materials
- Demonstrates Knowledge of Earth’s Environment
- Uses Tools and other Technology to Perform Tasks
- Sorting Objects
- Counting
- Identifying Patterns
- Identifying Position and Direction
- Comparing Properties
- Identifying Sequence, Change, and Causality
- Identifying Natural and Living Things
- Identifying Materials and Properties
Logic and Reasoning (HS Only) / Cognitive / No Similar Domain
- Reasoning and Problem Solving (HS Only)
- Symbolic Representation (HS Only)
- Demonstrates Positive Approaches to Learning
- Remembers and Connects Experiences
- Uses Classification Skills
- Uses Symbols and Images to Represent Something Not Present
Social Studies / Social Studies / No Similar Domain
- People, Past and Present
- People and Where they Live/Environment
- Self, Family and Community (HS Only)
- Human Interdependence
- Citizenship and Government
- Demonstrates Knowledge About Self
- Shows Basic Understanding about People and How they Live
- Explores Change Related to Familiar People or Places
- Demonstrates Simple Geographic Knowledge
The Arts / The Arts / Creative Representation
- Expression and Representation (includes music, dance, art, drama)
- Understanding and Appreciation
- Explores the Visual Arts
- Explores Musical Concepts and Expression
- Explores Dance and Movement Concepts
- Explores Drama Through Actions and Language
- Making and Building Models
- Drawing and Painting Pictures
- Pretending
Physical Development and Health Common Metric Domain
- Gross Motor Development
- Fine Motor Development
- Personal Health and Safety
- Demonstrates Gross Motor Manipulative Skills
- Demonstrates Fine Motor Strength
- Demonstrates Traveling Skills
- Demonstrates Balancing Skills
- Moving in Various Ways
- Moving with Objects
- Feeling and Expressing a Steady Beat
- Moving to Music
- Singing
This table shows that, in general, the test developer’s domains across all three assessment tools are comparable, and as we can see from the items within each of these domains, there appears to be good coverage of similar items across all three tests. Further, when fitting the Test Developer Domains to the five key developmental domains there also appears to be reasonably good fit, with a few exceptions, such as the GOLD not having a subdomain corresponding to Approaches to Learning, as noted in Table 1. Minor inconsistencies were due primarily to assessments that combined domains, for example, language and literacy were treated as separate domains in GOLD, but were combined into one domain in COR. These analyses suggest that the three instrumentsappear to be aligned with regard to the developmental domains being assessed.
B. Alignment of Items[11]
Using the same methods described above, individual items from each test were examined. The goal of this analysis was to determine the degree to which the items matched in their content. The criteria for determining a match was based on the item’s developmental objective, purpose and domain. Some items were determined to have the same developmental objective although they were worded differently, for example, “moves with some balance and control” from the WSS and “moving in various ways” from the COR were both coded as having the same objective (assessing gross motor skills) and domain (physical development and health).A match was defined as an item or objective that was included on all three instruments (COR, GOLD, and WSS).[12]If an item or objective did not appear on all three instruments, it was not counted as a match. This is a very stringent requirement and thus serves as a conservative measure of the degree to which items likely correspond.[13]