Lecture 12: Interpreting Standardized Test Results

In addition to preparing and using classroom tests, you will be expected to administer and interpret standardized tests.

A.What Makes a Test Standardized?

They are administered, scored, and interpreted in a standardized (consistent) manner.

Standardization permits comparisons across classrooms, schools, and school districts.

These comparisons are made using a representative norm group.

1.ADMINISTERED:remember that there are standard directions, a specific way to respond to the question (e.g., fill in the bubble correctly), a given number of proctors, an appropriate setting

2.SCORED:remember that the use of selected response items insures objective scoring. Remember also that the scoring of constructed response items (e.g., ILLINOIS WRITING ASSESSMENT) are evaluated using a defined rubric and the consistency of scoring is continually checked

3.INTERPRETED:we will discuss five score interpretations

  1. PERCENTILE (OR PERCENTILE RANK)
  2. GRADE-EQUIVALENT SCORES
  3. STANINES
  4. Z SCORES

B.ADVANTAGES and DISADVANTAGES of each of these five types of score reporting

1.PERCENTILE (OR PERCENTILE RANK)

advantage –

disadvantage - dependent on the quality of the norm group

2.GRADE-EQUIVALENT SCORES

advantage – easy to communicate

disadvantage – history of misinterpretation

3.STANINE

advantage - given the existence of measurement error, it is a report of general performance

disadvantage -

  1. Z SCORES

advantage – expressed in standard deviation units (easy to understand)

disadvantage –

C.Important Terms:

______– student’s performance is compared to a standard of performance called a criterion. Test items are drawn carefully selected from a specified set of skills that make up the goal. (Absolute interpretations)

______– student’s performance is compared to the performance of others. When referring to standardized test results, the others are those individuals from the norm group. (Relative interpretations)

______– typically, a stratified random sample of individuals chosen to represent the population of individuals about whom inferences will be made. A sample of examinees that is geographically, ethnographically, racially, parochially, and gender wise similar to the population who will take the test. These individuals take the test first, their results are analyzed, and then your students’ scores can be interpreted relative to this group’s performance.

I. Comparing Individual Performance to the Group

As you have seen, we can divide the score scale into performance categories using the mean and standard deviation

Well Below Average / Below Average / Average / Above Average / Well Above Average
Standard deviation / X < -2 / -2 ≤ X < -1 / -1 ≤ X ≥ +1 / +1 < X ≤ +2 / X > +2

Percentile Rank

/

Below 60

/ 60 to 65 / 65 to 75 / 75 to 80 / Above 80

*given a hypothetical mean = 70 and standard deviation = 5.

II. Calculating and Using Z Scores

Z-Scores
  1. = (Score – Mean) / standard deviation
  1. the distribution of z-scores has a mean = 0
  2. Z-scores express the distance of a raw score from the mean in standard deviation units.

Well Below Average / Below Average / Average / Above Average / Well Above Average
Z < -2 / -2 ≤ Z < -1 / -1 ≤ Z ≥ +1 / +1 < Z ≤ +2 / Z > +2
Practice Calculating Z-Scores
Individual Test
Score / Test Mean / Standard
Deviation /

Calculus: 90

/ 98 / 4 / -2
Biology: 85 / 70 / 5 / 3
U.S. History: 40 / 45 / 3 / -1.7
Spanish: 22 / 21 / 3 / 0.33
English: 88 / 85 / 5
Music: 50 / 80 / 15
Geography: 90 / 85 / 3

III. More Common Standardized Test Scores

A.Stanines (Standard Nine Point Scale)

  1. Range from 1 (lowest) to 9 (highest).
  2. Stanines break the distribution into 9 equal intervals.
  3. Stanines of 1, 2, and 3 reflect below-average achievement in the subject, compared to the norm group.
  4. Stanines of 4, 5, and 6 reflect average achievement in the subject compared to the norm group.
  5. Stanines of 7, 8, and 9 reflect above-average achievement in the subject compared to the norm group.
  6. They provide a rough approximation of performance that takes into account the error of measurement.
B.Percentile Ranks

Percentile Ranks - indicates a student’s relative position in a group by indicating the percentage of scores the student surpassed. A percentile rank of 80 indicates that the student surpassed 80% of the other students in the norm group who took this same test.

  1. Range from 1 to 99.
  2. Depend on the quality of the norm group.

Below Average / Average / Above Average
below 25th / 25th - 75th / above 75th

Cautions:1. Percentile scores do not indicate the percent correct.

  1. Percentile scores cannot be manipulated arithmetically (ordinal scale).
  1. Percentile scores are based on frequency, so small score increases at the middle of the distribution result in large percentile rank changes, while even large score increases at the tails of the distribution do not result in large percentile rank changes.

Approximate comparisons of Scores

Stanine / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
P. Rank / < 4 / 4-10 / 11-22 / 23-39 / 40-59 / 60-76 / 77-88 / 89-95 / 96+
Z / < -2 / -1.5 / -1 / -0.5 / 0 / 0.5 / 1 / 1.5 / 2+
C.Grade Equivalent Scores

Grade Equivalent Scores - identifies the grade level (year.month) at which a typical student might obtain the same observed raw score as the examinee

  1. Grade equivalents range from (year.0) to (year.9) for every grade level (1 through 12).
  2. They are based on the assumption that students learn an equal amount of information during each of the 9 months of the school year and nothing during the summer months.
  3. They are based on the performance of three contiguous norm groups.
  4. The scale is created using mathematical interpolation of observed performance of the 3 contiguous norm groups and extrapolation of anticipated performance of students who have never taken the test.
  5. Grade equivalent scores are only appropriate to interpret performance in basic skills areas.
  6. Every grade equivalent score of (year.0) corresponds to the median performance for that grade level, which has serious implications for interpretation.

Interpretations:

Rodney is a fourth grade student who has taken a standardized test. For math computation he received a Grade Equivalent of 7.3. This means that he performed as well as a seventh grade student in the third month of seventh grade on this fourth grade test!!!!! It does not mean that he can do seventh grade work, only that he is performing very well for a fourth grader!

Mary is a ninth grade student who has taken a standardized test. Her language total was a G.E. of 8.6 and a stanine of 5. Her performance is average, not below grade level!

Rachel is a sixth grade student who has taken a standardized test. Her language score was a G.E. of 3.5 and she placed in the 10th percentile. Her performance is significantly below grade level.

Cautions:1. G.E.S. is NOT an estimate of which grade a student should be placed in.

  1. Don’t expect all students to gain 1.0 grade each year.
  1. G.E. scores on different tests are not comparable.
  1. G.E. scores that are below grade level may not be low, remember, 50% of the students in the fourth grade norm group received G.E. scores < 4.0 because they fell below the median, and your students who place below the median will as well.

Types of Standardized Tests

  1. Criterion-referenced Achievement Tests
  2. Developed by school districts, state departments of education and commercial testing companies. Their purpose is to measure student progress toward stated curriculum goals.
  3. Example:
  1. Norm-referenced Achievement Tests
  2. Used primarily to compare students’ achievement to that of a large, representative group of students at the same grade level, called the norm group. Their purpose is to ascertain whether a student’s or a group’s achievement level is above average, average, or below average when compared to the norm group.
  3. Examples:
  1. Scholastic Aptitude Tests
  2. Developed by commercial test publishers, these tests measure students’ thinking and reasoning skills rather than curriculum-based skills. Student performance is compared to a norm group. Their purpose is to predict how well the student will achieve in given subjects, special training programs, higher education, and certain careers.
  3. Examples:

Which standardized test should you use?

They are all expensive so many schools just use a norm-referenced achievement test.

While many companies now report criterion-referenced scores along with norm-referenced interpretations, be careful during interpretation due to the high item difficulty and discrimination indices and selected content sampling typical of norm-referenced tests.

Just like on your tests, the items must match the instructional objectives. If the curriculum of your classroom does not match the instructional criteria of the test, students will not perform well.

How Should Teachers Help Students Prepare for Standardized Tests?

  1. General teacher instruction on objectives not determined by looking at the objectives measured on a standardized test
  1. Teaching general test taking skills
  1. Specific instruction based on objectives that specifically match those on the standardized tests
  1. Instruction on specific objectives where the practice follows the same format as the test questions

1