# Bond and Fox, Chapter 3

Bond and Fox, Chapter 3

The pathway

A graphic that presents information on the following . . . .

Person ability

Item difficulty

Precision

Item and person inconsistency

As we’ll see – having all the information in the pathway will give us much more control over measurement than, for example, just having the scalogram. P 38

Characteristics of items and scales

Unidimensionality – very important in Rasch modeling

Reliability –

Person reliability index – the expected correlation of person ordering we could expect if this sample of persons were given another parallel set of items measuring the same construct

Item reliability index – the expected correlation of item placements along the pathway if these same items were given to another sample of persons of the same size that behaved in the same way.

Basic quantities

Person ability – vertical positions of person symbols

Item difficulty – vertical positions of item symbols

Precision – sizes of person and item symbols – smaller means more precise

Fit – horizontal position of symbols

Ch 4 – Developing a test

The Bond Logical Operations Test (BLOT).

A test of childhood cognitive development suitable for administration to a whole class of students.

Items were taken one by one from Chapter 17 of Inhelder and Piaget (1958), The Growth of Logical Thinking.

The 35 items on the BLOT are identified as follows

01 Negation (to negate identity) ; Item labels courtesy of Trevor Bond

02 Reciprocal (to negate identity)

03 Implication

04 Incompatibility

05 Multiplicative compensation

06 Correlations

07 Correlations

08 Correlations

09 Conjunction

10 Disjunction

11 Conjunctive negation

12 Affirmation of p

13 Reciprocal exclusion

14 Probability

15 Reciprocal implication

16 Reciprocal (to negate identity)

17 Identity (to negate reciprocal)

18 Negation (to negate correlative)

19 Reciprocal (to cause disequilibrium)

20 Negation (to cause disequilibrium)

21 Correlative + negation > equilibrium

22 Reciprocal + negation > disequilibrium

23 Correlative + identity > disequilibrium

24 Coordination of two systems of reference

25 Complete negation

26 Complete affirmation

27 Negation of p

28 Non-implication

29 Affirmation of q

30 Equivalence

31 Negation of q

32 Negation of reciprocal implication

33 Probability

34 Coordination of two systems of reference

35 Coordination of two systems of reference

The test was administered to a group of 158 children. The results of the administration are below.

Think for a moment about the complex process that begins with this 158x35 collection of numbers (5530 in all) and ends with the output, tables, graphs, and interpretations that will be created from them.

A Regular analysis of the data in SPSS.

data list fixed /id 1-3 q1 to q35 5-39.

begin data.

compute totalscore = sum (q1 to q35).<---- This is the usual person estimate.

fre var=totalscore /format=notable /histogram.

This is a classic “easy” test, with scores “piled up” near the top of the range of possible scores.

reliability variables = q1 to q35 /summary=total.

Reliability Statistics
Cronbach's Alpha / N of Items
.876 / 35
Item-Total Statistics
Scale Mean if Item Deleted / Scale Variance if Item Deleted / Corrected Item-Total Correlation / Cronbach's Alpha if Item Deleted
q1 / 25.47 / 38.277 / .380 / .873
q2 / 25.47 / 38.345 / .355 / .873
q3 / 25.68 / 37.226 / .436 / .871
q4 / 25.56 / 37.765 / .398 / .872
q5 / 25.45 / 38.517 / .349 / .873
q6 / 25.37 / 39.496 / .207 / .875
q7 / 25.48 / 38.117 / .400 / .872
q8 / 25.70 / 36.775 / .509 / .870
q9 / 25.59 / 37.935 / .348 / .873
q10 / 25.53 / 37.539 / .467 / .871
q11 / 25.59 / 37.734 / .386 / .873
q12 / 25.39 / 38.294 / .558 / .871
q13 / 25.73 / 37.958 / .298 / .875
q14 / 25.47 / 38.949 / .213 / .876
q15 / 25.73 / 37.032 / .457 / .871
q16 / 25.52 / 38.520 / .273 / .875
q17 / 25.62 / 36.868 / .530 / .869
q18 / 25.55 / 37.349 / .487 / .870
q19 / 25.63 / 37.496 / .406 / .872
q20 / 25.46 / 38.116 / .429 / .872
q21 / 25.97 / 38.711 / .176 / .878
q22 / 25.44 / 38.423 / .385 / .873
q23 / 25.61 / 37.782 / .363 / .873
q24 / 25.59 / 37.129 / .498 / .870
q25 / 25.64 / 37.843 / .341 / .874
q26 / 25.69 / 36.847 / .501 / .870
q27 / 25.45 / 38.008 / .468 / .871
q28 / 25.85 / 37.661 / .338 / .874
q29 / 25.50 / 37.876 / .430 / .872
q30 / 25.74 / 38.113 / .269 / .876
q31 / 25.59 / 37.976 / .340 / .874
q32 / 25.75 / 36.898 / .474 / .871
q33 / 25.49 / 38.534 / .291 / .874
q34 / 25.51 / 38.037 / .387 / .873
q35 / 25.52 / 37.674 / .452 / .871

This is about all we typically get from SPSS when we analyze a test.

The Rasch analysis

The Rasch control file

Mike – demo the analysis here.
The Rasch Item Map

Item characteristics . . . Item STATISTICS ordered by Measure

The key quantities in the table are

1. Measure – the difficulty of the item.

2. S.E. – the standard error of the estimate of the item’s difficulty. Note that the SEs of the more difficult items are smaller than those of the easy items. This is because there were too few respondents of low ability, resulting in nearly everyone getting these items correct, so the proportions of the sample getting them correct were very close to 1. When proportions are close to 1, larger numbers of persons are required for those proportions to be stable, so the proportions near 1 are not as stable as the proportions closer to .5, making the standard errors larger.

3. Infit – the mean of squared residuals with extra weighting given to persons whose abilities were close to the item difficulties. According to text, Rasch analysts give this more weight.

4. Outfit – the mean of squared residuals with all residuals weighted equally.

5. Note that Item 4 was anchored at difficulty = 0.

Item Characteristics . . . Items FIT GRAPH ordered by Measure

Output Tables -> 13. Item: measure -> Scroll down to Table 13.2.

Item Characteristics – the bubble plot . . .

This plot show the same information as above, but it’s prettier.

Plots -> Bubble Chart -> Items (columns in data) -> Entry number

For this plot, I chose Items only and used Entry Numbers to identify items.

A comparison of 3 ways of scoring the BLOT – Summated score, log odds, and Rasch.Start here on 3/27/13.

To make the comparison, I had to put the Rasch Person measures into an SPSS file.

Output Tables -> 17. Person: entry.

I then copied the 150 lines of the table with the person entries and pasted them into Word. I then Alt+Selected the measure column and pasted it into a column in SPSS.

In SPSS I created a “rough” Rasch score for each person, logoddsscore, using the following syntax. (Recall it’s ln(score/(total possible - score)).

compute logoddsscore = ln(totalscore/(35-totalscore)).

Correlations of totalscore, logoddsscore, and raschscore

The correlations of the three measures are quite high. In fact, the correlation of the raschscore and logoddsscore is 1.000 to three decimal places. Its actual value to 6 decimal places is .999664.

Of course, this tells us that for some datasets we don’t need the program to compute person measures. For those datasets we can simply compute the ln(score/(total possible-score)) and use it.

The disadvantage of this is that we don’t get the other stuff that the BF program gives us. One advantage of using the program is that it will give us an estimate of the Rasch value for persons whose total score is perfect or 0, something the log method cannot do.
Dot plots of the three measures . . .

Note that for the summated score, totalscore, the best performers were just a little better than the crowd but were considerably better than the crowd for both logoddsscore and raschscore. This is in keeping with the notion that Rasch measurement lengthens the tails of distributions, spreading out the best and worst performers. This spreading was most noticeable among the best performers for these data.

Scatterplots of the three measures. The plots involving logoddsscore do not include the 3 persons whose totalscore was 35.

The Wonderlic Personnel Test Form II given to UTC students.

The Wonderlic Personnel Test (WPT) Form II considered here is a 50-item paper and pencil test designed to measure overall cognitive ability (g). It is a timed test. For the paper and pencil version respondents are given 12 minutes to complete as many items as possible.

From the Wonderlic manual,

“The WPT and SLE (Scholastic Level Exam) are short form tests of general cognitive ability. Often referred to as general intelligence, or “g”, cognitive ability is a term that is used to describe the level at which an individual learns, understands instructions and solves problems.” – p 5

“The score is the total number of correct answers.” P. 9

If you’re interested, median scores for populations given in the manual printed in 2002 are

For our mostly Frosh research sample . . ., Mean = 22.02 and SD = 5.401.

The dataset is balancedscale_120428.sav.

1. Are all of the WPT items appropriate?

a. Does it correlate with the total score?

b. Is the pattern of correctness / incorrectness appropriate across persons?

2. Are the persons and items matched for our data? (A question we’d never have asked before Rasch.)

a. Are there sufficient numbers of difficult items, so differences between high scorers can be measured?

b. Are there sufficient numbers of easy items, so differences between low scorers can be measured?

c. are there sufficient numbers of items at all difficulty levels?

3. Are the person ability (WPT scores) that we’re using appropriate, or should we use Rasch scores?

The SPSS Analysis

The Rasch Analyses – Item Map

Item Information . . . Item STATISTICS ordered by Measure

The very difficult items were not even responded to by most of the participants, so their SEs are huge. Luckily, since no participants even got to them, that probably won’t be a problem for our sample.

Item Information . . . Bubble chart

The span of difficulty values is much greater for these items (-5 to +7) than for the BLOT items in the previous example, for which it was -3 to +3.

I would say that if I were to “look” at this test for ways to improve it, I would begin with items 3, 31, 36, 10, and 11, for all of which there was inconsistency in the proportions of low and high ability persons who got them correct.

This analysis skirts the issue of whether an item should be included in the analysis if it was not responded to. That is, because the test it timed, most people do not respond to all items. In fact, not many people got to items 31 and 36, so most of the “incorrect” responses were counted as such because persons didn’t get to them. This is something that the BF program does not know about – all it knows is that these items were counted as being “incorrect”.

Comparison of the 3 ways of scoring – Summated vs. log odds vs Rasch.

(These graphs were created after I copied and pasted the 206 Rasch measures into the SPSS file.)

Log odds vs Total Score (compute wptlogoddsscore = ln(wpttotscore/(50-wpttotscore)).

Rasch vs. Total Score

Rasch vs. Log odds

