Qualities of a Good Test: Three Versions

QUALITIES OF A GOOD TEST: THREE VERSIONS

A good test should possess the following qualities.
• Objectivity
• Objective Basedness
• Comprehensiveness
• Validity
• Reliability
• Practicability
• Comparability
• Utility
Objectivity
• A test is said to be objective if it is free from personal biases in interpreting its scope as well as in scoring the responses.
• Objectivity of a test can be increased by using more objective type test items and the answers are scored according to model answers provided.
Objective Basedness
• The test should be based on pre-determined objectives.
• The test setter should have definite idea about the objective behind each item.
Comprehensiveness
• The test should cover the whole syllabus.
• Due importance should be given all the relevant learning materials.
• Test should be cover all the anticipated objectives.
Validity
• A said to be valid if it measures what it intends to measure.
• There are different types of validity:
– Operational validity
– Predictive validity
– Content validity
– Construct validity
•Operational Validity
– A test will have operational validity if the tasks required by the test are sufficient to evaluate the definite activities or qualities.
•Predictive Validity
– A test has predictive validity if scores on it predict future performance
• Content Validity
– If the items in the test constitute a representative sample of the total course content to be tested, the test can be said to have content validity.
•Construct Validity
– Construct validity involves explaining the test scores psychologically. A test is interpreted in terms of numerous research findings.
Reliability
• Reliability of a test refers to the degree of consistency with which it measures what it indented to measure.
• A test may be reliable but need not be valid. This is because it may yield consistent scores, but these scores need not be representing what exactly we want to measure.
• A test with high validity has to be reliable also. (the scores will be consistent in both cases)
• Valid test is also a reliable test, but a reliable test may not be a valid one
Different method for determining Reliability
•Test-retest method
– A test is administrated to the same group with short interval. The scores are tabulated and correlation is calculated. The higher the correlation, the more the reliability.
• Split-half method
– The scores of the odd and even items are taken and the correlation between the two sets of scores determined.
• Parallel form method
– Reliability is determined using two equivalent forms of the same test content.
– These prepared tests are administrated to the same group one after the other.
– The test forms should be identical with respect to the number of items, content, difficult level etc.
– Determining the correlation between the two sets of scores obtained by the group in the two tests.
– If higher the correlation, the more the reliability.
Discriminating Power
• Discriminating power of the test is its power to discriminate between the upper and lower groups who took the test.
• The test should contain different difficulty level of questions.
Practicability
• Practicability of the test depends up on...
• Administrative ease
• Scoring ease
• Interpretative ease
• Economy
Comparability
• A test possesses comparability when scores resulting from its use can be interpreted in terms of a common base that has a natural or accepted meanings
• There are two method for establishing comparability
– Availability of equivalent (parallel) form of test
– Availability of adequate norms
Utility
• A test has utility if it provides the test condition that would facilitate realization of the purpose for which it is mean.

Characteristics of A Good Test
1- Validity:
A test is considered as valid when it measures what it is supposed to measure.
2- Reliability :
A test is considered reliable if it is taken again by the same students under the same circumstances and the score average is almost the constant , taking into consideration that the time between the test and the retest is of reasonable length.
3- Objectivity:
Objectivity means that if the test is marked by different people, the score will be the same . In other words, marking process should not be affected by the marking person's personality.
4- Comprehensiveness:
A good test should include items from different areas of material assigned for the test. e.g ( dialogue - composition - comprehension - grammar - vocabulary - orthography - dictation - handwriting )
5- Simplicity:
Simplicity means that the test should be written in a clear , correct and simple language , it is important to keep the method of testing as simple as possible while still testing the skill you intend to test . ( Avoid ambiguous questions and ambiguous instructions ) .
6- Scorability :
Scorability means that each item in the test has its own mark related to the distribution of marks given by ( The Ministry of Education
Read more:

Tests are better, if they are relativelyobjective. A test is objective, if — using the same scoring key — whoever scores the test will arrive at the same score — assuming no clerical errors. Objective test items are usually multiple choice, matching or true-false. In contrast, essay questions are typicallysubjective. This means that different people — or the same person in a different mood — will tend to score the same essay answers differently. However, with more exact standards of scoring, essay questions can be relatively objective. Scorer bias will be reduced, and essentially the test will be objective — there will beconsistency among scorers.

A good test should also be relativelyreliable. As long as the quality being measured has not changed, this means that any person should get about the same score each time they take the test. However, to be reliable, the test must be relatively objective. How can you obtainconsistency among the scoresyou earn from one time to the next, if the scorers are inconsistent?

A third quality a good test should have isvalidity. To be valid, a test shouldmeasure what it claims to measure. Although it needs to be relatively reliable to be valid, merely because it is reliable does not mean that it will be valid.

Suppose I were to give a man an intelligence test by measuring his height. I use a tape measure three different times, and each time, I get a measure of 5'5". His scores are completely consistent. Is my test valid? Probably not. I cannot really measure intelligence with a tape measure. Even though my test is perfectly reliable, it is not necessarily valid.

On the other hand, how can we measure what we claim to measure (validity), if the measurements are not consistent (reliability)? Thus relative reliability is needed for a test to be valid.

In contrast to absolute measures,tests only give a relative rankingin terms of group norms.

Finally, any good test must havestandardization. This means that thesame procedures and conditionsare used each time the test is given. Such things as instructions, time limits, lighting and so on are the same for each administration. If this is the case, all those who take the test can be used as part of the standardizationnorms. With any measurement, you can only rate a person as high, low or average in relations to a set of norms. The problem is "Which norm?" If you want to judge yourself in terms of height, you wouldn't want to use basketball players as your norm group.

The question, "Which norm?" causes a big problem with intelligence testing. The most frequently used intelligence tests take "middle-class WASPs" (White, Anglo-Saxon, Protestants) as their norm, assuming that everyone has similar background and learning experiences in our society. However, this does not accurately apply to many minority members, such as African-Americans, Hispanics or American Indians. This is especially true, if they are from different backgrounds, like the ghetto, barrio or reservation, respectively. The assumption of similar backgrounds does not apply in these cases. Because of this, when members of these groups are compared to general norms, they may be falsely labeled as slow learners or even mentally retarded. However, when compared with norms of others from a similar background, many of these people may earn scores that indicate high potential.

Thus, if you come from a background different from "middle-class WASP" and take a "standardized" test, before judging yourself from the results, find out what group is being used as a norm.

Norm Referenced Test & Criterion Referenced Test

Norm Referenced Test

Norm Referenced Test is a test which compares the individual’s performance with those other persons taking the same test.

Criterion Referenced Test

Criterion Referenced Test evaluates an individual’s performance in a given situation with respect to specific characteristics expected in the performance.
A comparison of Criterion Referenced Test & Norm Referenced Test

Criterion Referenced Test

The Main Objective to measure the effectiveness of a programme or instruction.
Provides specific information in individual level of performance with respect to objectives
The score of an individual can be interpreted individually
The purpose is not classify and rank learners, but to ensure development
The results are used to evaluate student performance relative to specific performance level anticipated.
The test constructor is not concerned with developing a test to maximize the variability of test scores
Norm Referenced Test
The main Objective to measure individual differences
Aim to classify and grade learners in various categories
The meaning of any particular score can be determined only by comparing it to other scores achieved by student taking in the test
It is often used for selection purposes.
The test results are used for making comparative decision regarding individuals
It is specially constructed to maximize the variability of test scores, as the purpose is discriminating of individuals by comparison