Form2-09: Supplemental Educational Services (CA Dept. of Education)

Template for Quality Verification of Testing Instruments

For each assessment instrument referenced in your application, type complete responses in the tables for the required responses in this template. This template corresponds with RFA Section III, Part C, Element 3 and is described in RFA Section IV, Appendix C.Submission instructions are provided in RFA Section I, Part E.

Name of Applicant Entity:

Name of Test to be Verified / Group Reading Assessment & Diagnostic Evaluation (GRADE) and Group Math Assessment & Diagnostic Evaluation (GMADE)
Publisher of Test / Pearson Publishing
Indicate Where Test is Referenced in Application (Element 3 or Element 4)

To establish the adequacy of the testing instruments used in response to the RFA, applicants will need to appropriately address the following three topic areas: Test Rationale, Technical Qualities of the Test, and Assessing Students Requiring Special Accommodations.

A.Test Rationale

1.State the purpose of the test

2.Indicate the content and skills to be tested

3.Indicate the intended test takers (i.e., grade-level and subject)

Appendix C. Item A.1. Response:The GRADE and GMADE are used to determinethe students’ current level in mathematics or reading. The results are used to determine student placement.
Appendix C. Item A.2. Response:
Students are assessed in National Council of Teachers of Mathematics Standards as Represented in the GMADE –Concepts & Communication, Operations & Computation, and Process & Applications: Number & Operations, Algebra, Geometry, Measurements, Data Analysis & Probability, Problem Solving, Reasoning & Proof, Communication, Connections, and Representation. (Content varies per grade level based on NCTM Standards.)
The developmental sequence of the GRADE is based upon the findings of the National Reading Panel (2000), the Committee on the Prevention of Reading Difficulties in Young Children (Snow, Burns, & Griffin, 1998), and recognized reading experts such as Chall (1983), Spache and Spache (1973), and Gibson (1965) – Reading Readiness (Sound Matching, Rhyming, Print Awareness, Letter Recognition, Same & Different Words, Phoneme-Grapheme Correspondence), Vocabulary (Word Reading, Word Meaning, Vocabulary), Comprehension (Sentence Comprehension and Passage Comprehension), and Oral Language (Listening Comprehension). (Content varies per grade level based on the developmental sequence.)
Appendix C. Item A.3. Response: GRADE (Reading): K-12th grades and GMADE (Math): 1st-12th grades
Note: Type responses in this table. As the response is entered, this box will grow. Maximum response is one-half page.

B.Technical Qualities of the Test

1.Provide evidence on the technical qualities of the test which need to include reliability and validity analyses or studies that determine whether the test meets its intended purpose.

a.At a minimum, content validity must be established. The maximum response length is one page, single-spaced with a 12-point font.

(1)If there was a review of the test by a panel curriculum of experts in the subject that the test is intended to measure, the applicant needs to describe the procedures for review and the qualifications of the panel.

(2)If there was a study or analysis conducted to determine validity, the applicant needs to describe what type of study or analysis was conducted and what were the results of the study or analysis.

Appendix C. Item B.1a(1). Response: The national tryout administration manuals for the GMADE and GRADE included a questionnaire to be completed by the teacher or administrator who had actually given that particular level of test to a classroom of students. The questionnaire addressed issues of content, as well as administration procedures. 310 GMADE surveys and 777 GRADE surveys were completed. The results were tallied and provided valuable input in shaping the standardization and final versions of the GMADE and GRADE. The panel of experts included a group of certified teachers, Ph.D, M.Ed, M.A., and M.S. educational experts in the fields of mathematics (GMADE), reading (GRADE), psychometrics, psychology, and curriculum.
Appendix C. Item B.1a(2). Response: Both the GMADE and the GRADE were compared with scores from other group-administered achievement tests and an individually administered mathematics achievement test. The studies were carried out in conjunction with fall and spring standardizations.
GMADE: Group-administered achievement tests included numerous teachers in various regions of the US collecting this data. All correlations were corrected for restriction of range of GMADE scores using Guilford’s formula. Schools provided data on three nationally standardized, group-administered tests: the Iowa Tests of Basic Skills(ITBS, 2001), the TerraNova, and the Iowa Tests of Educational Development (ITED, 2001) – and on one standardized state tests, the Texas Assessment of Knowledge and Skills (TAKS, 2003). The individual-administered achievement test included 30 students at two different sites; they were given the KeyMath – Revised/Normative Update: A Diagnostic Inventory of Essential Mathematics.
Reading – Schools provided student data on the group-administered achievement assessments including the Iowa Test of Basic Skills (ITBS), the California Achievement Test (CAT), and the Gates-MacGintie Reading Tests (Gates). The individual-administered assessment was the Peabody Individual Achievement Test – Revised (PIAT-R) and was administered to 30 fifth grade students.
All comparisons indicated a complementary relationship, suggesting that each assessment probably measures performance on a similar ability or skills – mathematics. Content validity, criterion-related validity, and construct validity comparisons provide substantial evidence that the GMADE and GRADE do measure what they purports to measure and that appropriate inferences from tests’ results can be made.
Note: Type responses in this table. As the response is entered, this box will grow. Limit response to this item to one page.

b.At a minimum, test reliability must be established. The maximum response length is one page, 12-point font, single spaced.

(1)If there was a study or analysis conducted to determine reliability, describe the type of study or analysis was conducted (test retest, internal consistency, etc.).

(2)Describe the results of the study or analysis.

Appendix C. Item B.1b(1). Response: The reliability of the GMADE and GRADE were determined by three measures: Internal Reliability, Alternate-form Reliability, and Test-Retest Reliability. Internal-consistency reliabilities were computed for each GMADE/GRADE subtest and Total Test score for each level and form, using the raw-score split-half method. On each level and form, the items within each subtest were divided into equivalent halves by content. A total raw score was computed for each half based on the dichotomously scored items. These two total raw scores were correlated, and the correlation coefficient between the two halves was adjusted for test length using the Spearman-Brown formula.
Alternate-form reliabilities are derived from the administration of two different but parallel test forms to a group of students. A sample of 651 students took part in an alternate-form reliability study of the GMADE, and each student took both Form A and Form B. (54% took Form A first, and 46% took Form B first.) The average number of days between testings ranged from 10.6 to 40.0 days. A sample of 696 students took part in the alternate-form reliability of the GRADE. Each student took Form A and Form B (38.1 took Form A first and 61.9% took Form B first). The mean interval between testings was 8.0 to 32.2 days.
Short-term test-retest reliability tells how much the student’s test score is likely to change if a brief time period elapses. 761 students took part in the test-retest reliability of the GMADE. The students were tested twice (37% took Form A both times, 63% took Form B twice). The average interval between testings ranges from 13.7 to 48 days. 816 students took part in the test-retest reliability study of the GRADE. The students were tested twice (73.7% took Form A twice, and 26.3% took Form B twice). The mean interval between testing ranged from 3.5 days to 42 days.
Appendix C. Item B.1b(2). Response: The GMADE Total Test reliabilities average .92 (Form A) and .93 (Form B) in the fall and .93 (A) and .94 (B) in the spring. The GRADE Total Test reliabilities are in the range of .95 to .99 (with one exception: the first-grade reliability is .89). The reliabilities are very high, indicating a high degree of homogeneity among items in the GMADE and GRADE for each form.
GMADE Alternate-form reliabilities range from .84 to .96 with a median of .88. GRADE Alternate-form reliabilities range from .81 to .94 with half of the coefficients being .89 or higher. These high correlations indicate that Forms A and B in both the GMADE and GRADE are quite parallel.
The GMADE Test-Retest reliabilities for Form A range from .78 to .94 with a median of .90. Form B Test-Retest reliabilities range from .80 to .94 with a median of .92. The GRADE Test-Retest reliabilities for Form A range from .77 to .98 with a median of .90. Reliabilities for Form B range from .83 to .96. Thus. Forms and B in the GMADE and GRADE appear to have similar short-term stability.
Note: Type responses in this table. As the response is entered, this box will grow. Limit response to this item to one page.

c.At a minimum, the test must have an accurate scoring system. Provide evidence that the test was appropriately calibrated. The maximum response length is one page, 12-point font, single spaced.

(1)Describe measures taken to ensure the accuracy of the scoring system for the test.

(2)Describe metric used to measure student performance.

(3)If a standard setting was conducted, describe the type of standard setting (Angoff, Modified Angoff, Bookmark, or Contrasting Groups) and the results.

Appendix C. Item B.1c(1). Response: To ensure the success of the national tryout standardization sampling, several quality control procedures were implemented before, during, and after testing. These included establishing criteria for the demographic variables to define the samples to be tested, determining the sources of data to use in constructing the sampling plans, using current data to tabulate target percentages, and careful handling of the data once testing was completed. All procedures were provided in a detailed manual to site coordinators.
Communication was maintained on a regular basis with all site coordinators. Following both tryout and standardization, teachers completed a questionnaire about procedures and materials. The feedback was used in designing the final GMADE and GRADE materials.
When completed the Student Booklets and Answer Sheets were returned, a series of check-in procedures were followed. If the case included missing or inaccurate information, the site coordinator was contacted. Cases that could not be corrected or completed were not used. If sufficient numbers of cases were eliminated at a certain test level and form or at a specific GMADE level, other sites were contacted to reassign the eliminated cases in order to reach the target numbers of the sampling plans.
Appendix C. Item B.1c(2). Response: Student performance on the GMADE and GRADE are measured in Standard Scores, Percentiles, Normal Curve Equivalents, Stanines, Grade or Age Equivalents, and Growth Scale Values.
Appendix C. Item B.1c(3). Response: A standard setting such as Angoff, Modified Angoff, Bookmark, or Contrasting Groups was not used for the GMADE or GRADE.
Note: Type responses in this table. As the response is entered, this box will grow. Limit response to this item to one page. .

C.Assessing Students Requiring Special Accommodations

1.Describe special accommodations made. The maximum response length is one page, 12-point font, single spaced.

a.Indicate the test guidelines on reasonable procedures that would be taken with students with disabilities.

b.Indicate the test guidelines on reasonable procedures that would be taken with those students with diverse linguistic backgrounds.

Appendix C. Item C.1a. Response: With special permission from the publisher, each test page can be copied and enlarged for use with students who are partially sighted or otherwise visually impaired. Copies can also be made for use with special communication devices for students with physical handicapping conditions. If the student is physically unable to make response choices but he can indicate answers by some method, a proctor or assistant can mark the choices.
Students with hearing impairments can be tested individually or in a small group. The GMADE subtests can be read aloud and repeated. Instructions in both the GMADE and GRADE can be conveyed used American Sign Language; however, ASL cannot be used to convey the content of word items.
Students who are highly distractible or may need more time than the other members of the group or class may be tested individually.
Appendix C. Item C.1b. Response: The GMADE is not intended to be a reading test. All items in both subtests can be read to English Language Learners. This was the same administration procedure followed during the development of the items and derived scores. Such an accommodation does not invalidate the normative scores. However, words or terms that are read to the students cannot be defined or explained as such a deviation would invalidate a normative interpretation of performance.
The GRADE does not allow for specific accommodations for students with diverse linguistic backgrounds. However, students whose reading ability is suspected to be significantly above or below grade level can be given an out-of-level test. Normative scores are available based on the student’s current grade enrollment for appropriate comparison and reporting.
Note: Type responses in this table. As the response is entered, this box will grow. Limit response to this item to one page.

Template for Quality Verification of Testing Instruments.doc