Chapter 2 Issues in Test Design

Chapter 2 Issues in Test Design

Maximal performance tests measure the upper limits of one's abilities and for that reason are also called "Ability" tests.

Achievement tests - are maximal performance tests. They measure how much you have learned or how much skill you have developed in a given area. A classroom test is an achievement test.

Aptitude tests - measure your "potential" for learning new information. A test of mechanical ability is an aptitude test.

Whether a test is one or the other of these is not always clear and there is much debate and controversy surrounding tests like IQ tests, the SAT, GRE, etc.

Speed vs. Power tests - A maximal performance test can be a speed or a power test.

Speed tests - contain items that are all pretty much equal in difficulty. The outcome measure is "how many" you can answer correctly in a given amount of time (e.g., a typing test or the WISC "digit symbol" subtest)

Power tests - contain items of increasing difficulty so that fewer and fewer people will make it to the end of the test. The SAT, GRE, and other standardized tests are power tests.

Typical Performance Tests - measure "characteristics" of the person. These include (1) personality, (2) attitude, and (3) vocational/Interest tests.

Personality tests- may be "projective" or "objective."

Objective Personality Tests - (also called "self report" tests) utilize objective and standard questions and scoring, typically multiple choice, true-false, or Likert scale in format. Favored by "trait" and "statistically" oriented psychologists. Some examples are the MMPI, Cattell's 16 pf, and the NEO-PI-R.

Advantages - fast, inexpensive, easy to administer and score, can easily be given to large numbers of people, not subject to examiner biases.

Disadvantages - subjects may not understand instructions, questions may be "face valid," leading to biased (e.g., fake good) responding.

Projective Personality Tests - The subject responds to a series of "ambiguous stimuli" Presumably, the "unconscious" is being tapped. Favored by psychoanalytically (Freudian) oriented psychologists. Some examples are the Rorschach and the Thematic Apperception Test (TAT).

Advantages - may provided "interesting data," can sometimes be used as a tool to "jump start" the therapy process, don't suffer from the "face validity" problem.

Disadvantages. time consuming to administer, score, and interpret. They don't fit in well with the current "Zeitgeist" (world view) of managed care psychotherapy.

There is not much dispute that objective tests are far superior to projective tests when it comes to RELIABILITY and VALIDITY

In your instructor's opinion, use of projective tests is on the decline.

Attitude Tests - measure opinions or beliefs, usually use objective items. A bias problem common to attitude tests is "socially correct or appropriate responding"

Interest Tests - measure likes and dislikes and are therefore useful in decision making regarding future career and job training.

Norm Referenced vs. Criterion Referenced Scorning (sometimes the distinction between these two is not entirely clear)

Norm referenced scoring - most important is where a test taker falls in relation to others who have also taken the test (vs. the actual raw score). Percentiles are one type of norm referenced scoring. If a test gets "curved," it is clearly norm referenced.

Norm Group - (or normative group) is the group the subject is being compared to.

Standardization Sample - name for large norm groups used when working with major standardized tests such as the Stanford Binet, SAT, or GRE.

Criterion Referenced Scoring - (also called pass-fail or mastery tests) A particular score (the "criterion") such as 75% correct must be reached in order to pass. The performance of others is irrelevant. The EPPP (Examination for Professional Practice in Psychology), and state boards for various professions are examples.

Ipsative Scoring (also called Forced Choice) - Questions typically take the form: "Would you rather: A. read a book OR B. go Bungee Jumping? ONLY used with Typical Performance Multi scale tests. This is so you cannot score high on all of the scales. Most commonly seen on vocational and interest tests. The Myers Briggs test (based on Carl Jung's theory) uses ipsative items.

Construct Explication - (actually, the domain may or may not be a construct). A logical dissection and analysis of the domain of your test, identifying content areas to be covered by the test (see Table 2.12). This should generally precede question creation.

Individual vs. Group Administration

Individual Administration - there is one test taker and one examiner, items are presented one at a time (e.g., Stanford Binet). Most items are verbal free response or physical response (e.g., puzzle assembly).

Advantages - (1) Examiner can use the "basal-ceiling" approach so that time is not wasted on too easy or too hard items, (2) Test taker attitudes and reactions can be observed and addressed, (3) encouragement and guidance can be given

Disadvantages - (1) Costly and time consuming, (2) examiner behavior can influence subject performance, (3) there is an element of subjectivity in recording and grading responses.

Basal level - level at which the subject gets virtually all items correct.

Ceiling level - level at which the subject gets virtually all items wrong.

Group Administration - one examiner can test many people, usually paper and pencil alternate choice. The California Achievement Test (CAT) and Iowa Test of Basic Skills (ITBS) are used to assess achievement in K - 12 year olds.

Advantages - (1) Large numbers of people can be assessed quickly and efficiently, (2) Very cost effective, (3) no risk of grading or scoring biases.

Disadvantages - (1) critics argue that only "rote" learning is assessed, more complex cognitive skills cannot be assessed this way, (2) no way of knowing if there are motivational or other subject problems.

Tailored Testing (2 meanings)

1. To save time, computerized testing can be programmed to simulate the "basal-ceiling" method of testing used in the Stanford Binet.

2. Adapting a test for individuals with special needs. For example, the KABC (Kaufman Assessment Battery for Children) has a set of "non-verbal" scales well suited for testing children with hearing or speaking difficulties.