L2 SPSS Stuff Data Preparation and Manipulation for Multilevel Analyses

1/18/2012SPSS Stuff 1

L2 – SPSS stuff – Data preparation and manipulation for multilevel analyses

Data from the text are at: .

Data Levels –

Level 1 – Variables that are characteristics of the Level 1 entities – individuals at this time.

In a Level 1 data matrix, each line represents the Level 1 entity – an individual.

The values outlined in red are Level 1 values – they potentially vary from person to person.

The values outlined in blue are actually Level 2 values. I’m not sure how they got in the Level 1 file. They’re not in Figure 2.1 in the text, p. 22.

The values outlined in green are Level 2 values, but they’re also the Level 2 ID values.

Level 2 data only

In a Level 2 only data file, each line represents a group. The values in the line are characteristics of

the group. In this example, there are three – schcode – the group ID, ses_mean – mean SES of all individuals in the group, and per4yrc – proportion of individuals in the school who went on to a 4-year college.

All the variables are Level 2.

Combined file – required for SPSS analyses.

Some programs (e.g., HLM) can perform multilevel analyses with the Level 2 data separate from the Level 1 data. SPSS requires that all the data be in the Level 1 file. For such a file, each Level 2 characteristic is replicated across lines of the data editor for all individuals within the same group.

How to go from 2 files – one Level 1 and the other Level 2 – into a single multilevel file.

See the Merge files examples later on.

The SPSS RECODE command

Recode changes the values of a variable, allowing you to specify the values to be changed uniquely as opposed to specifying them through a mathematical formula, for example.

RECODING in place – the changed values replace the original values. The original values are lost.

RECODING into a new variable – the changed values are put into the cells of a different variables. The original values are retained in the old column and the changed values are retained in the new.

I recommend recoding into new variables. It’s less dangerous, less confusing than recoding in place.

RECODING using pull down menus: Transform -> Recode into different variables . . .

(File is

‘mdbT\P595C(Multilevel)\Multilevel and Longitudinal Modeling with IBM SPSS\ch2multivarML1.sav’

Recoding ses into categories separated by the 25th, 50th, 75th percentiles

If a value is <= 25th percentile point, it’ll RECODE to 1.

If a value is > 25th p pt but <= 50th percentile point, it’ll RECODE to 2.

If a value is > 50th p pt but <= 75th percentile point, it’ll RECODE to 3.

If a value is > 75th percentile point, it’ll RECODE to 4.

First, FREQUENCIES to get the percentile points . . .

The text’s values are -.5180, .0250, and .6130. I’m not sure why theirs are different than those above.

The RECODE dialog box: Transform -> Recode into Different Variables . . . START HERE ON 1/23/13.

Click on the “Old and New Values” button.

The text’s recode here was -.5181. It should have been -.5179, more positive by .0001than -.5180.

I forgot to deal with the 4 category.

The result . . .

Doing the same RECODE in syntax

recode ses (lowest thru -.52=1) (-.5199 thru .0300=2) (.0301 thru .6100=3)(.6101 thru highest=4) into ses_cat.

Transform -> Compute (p. 29)

The Compute command allows you to modify an existing variable or create a new variable using a mathematical function.

1. Computing the mean of 2 or more variables.

(file is ch2multivarML1.sav)

Transform -> Compute Variable…

When the OK button is clicked, a new column, called testmean, containing the appropriate values is created.

This could be done in syntax using the following command

compute testmean = mean(test1,test2,test3).

An enormous collection of mathematical and statistical functions are available.

Note the “Function group:” box on the right.

This leads to a wealth of possible functions.

Take some time to look over the possibilities. .

When you highlight a specific function, a short description of how it works appears in the empty field below the keypad.

Suppose you had conducted a test involving chi-square by hand and needed the p-value for a chi-square of 4.32 with 1 df.

If you didn’t have a table of critical chi-square values handy, you could use SPSS to get the p-value.

Merging Files – p. 30

Generally, there are two ways that files can be merged . . .

1. Both files have the same variables, and you simply want to add the cases of one of the files to those of the other. For example, you gathered more data in the Spring. You want to add them to the Fall data set.

This is quite easy and won’t be covered here.

2. Both files have cases. You want to add variables from the 2nd file to those of the first file.

A. Both files have exactly the same cases.

B. Both files have about the same cases, although each might have cases the other does not.

C. One file has Level 1 data while the other file has Level 2 data.

In this instance, each case in the Level 2 file will have to be propagated to several Level 1 cases.

We’ll cover 2A, B, and C.

2A – Adding variables when both files have exactly the same cases.

Here are the files

To combine them,

Important: One of the files has to be open when this is done.

Dialog box allowing you to specify the file from which variables are to be gotten.

Since both files have EXACTLY the same number of cases and no shared variables, nothing needs to be specified in this dialog box. You may simply click on Continue.

The result . . .

Note that the name of the merged file is the same as the original file.

I usually save the merged file under a new name.

2B. The files have about the same number of cases, although each may have cases the other does not.

The wrong way to merge . . .

The result of the wrong way . . .

The problem is that if the two files are not arranged exactly alike, with exactly the same number of cases, the order of cases for the original variables (ID, V1, and V2) will be different from the order of cases for the added variables (V3, and V4).

So, when the two files have different numbers of cases, a key variable must be used to match up the cases.

Step 0: Put a unique ID in each file such that equal values of this variable in the two files mean the same person.

Step 1: Sort both files in ascending order on a unique key variable common to both with unique values for each case.

If there is no key variable in both files, you’ll have to put one there as in Step 0 above.

Both files have been sorted in ascending order on ID, the unique key variable.

Step 2. Data -> Merge Files -> Add Variables.

Specify that cases in the two files are to be matched on the key variable.

The resulting merge specifications . . .

Step 3. The resulting file . . . Save it under a new name.

The original files, so you can better appreciate the result

2C. Adding Level 2 information to a Level 1 file.

The two files. Note that each value of Level2VarA must be propagated to two cases in the Level 1 file. This will require a special specification in the Merge Variables dialog box.

Goal

11111312

21212312

32313513

42414513

Etc.

Step 1. Sort both files in ascending order on the key variable, in this case, Group.

(They’re already sorted as they appear above.)

Step 2. From the Level 1 file: Merge variables. Note the special specification that each Level 2 file variable is called a “keyed table”. This instructs SPSS to propagate values on the Level 2 variables to all cases with the same value of the key variable, Group.

The Non-active dataset is the “other” dataset, not the one you started from.

Step 3. The result. Save as a separate file.

The example in the text, p. 31 forward

The two files to be merged. Level 1 is on the left. Note that each value of apexams must be propagated to multiple cases in the Level 1 file.

Step 1. Sort both files on the group variable, in this case, nschcode. That’s already been done – in ascending order.

Step 2. From the Level 1 file: Data -> Merge Files -> Add Variables . . .

Step 3. The resulting file . . . It should be saved under a new name.

The AGGREGATE procedure

Recall the two types of Level 2 variable . . .

a. A characteristic of the organization that is divorced from the characteristics of Level 1 elements.

b. A characteristic of the organization that is built up from Level 1 characteristics

The AGGREGATE command is useful for b. The AGGREGATE procedure computes summary statistics of groups of cases and puts them into cells of a variable, where they can be treated as Level 2 variables.

Simple example

Here’s a Level 1 file

Suppose we want to use the mean of the Level1Var values within each group as a Level 2 variable.

Data -> Aggregate . . .

The result

The text’s example . . . p. 36 - AGGREGATEing percentage of females and median SES

The AGGREGATE Dialog box

The result . . .

Finally, looking ahead to longitudinal analyses

Note that longitudinal analyses are repeated measures designs, also known as within-subjects analyses.

In all previous work, repeated measures on a person have been represented on the same row in different columns.

For example, for the Ch 2 data, three tests were given each student. These tests are represented in the original file in the traditional repeated measures fashion . . .

The MIXED procedure that we’ll be using for our analyses is not set up to recognize this form of repeated measures data.

It only recognizes data in a “between subjects” arrangement, with values to be compared in different rows of the data editor – never with different values in different columns within the same row.

So, if we want to compare test1 with test2 with test3 in the above, we have to put test1 in one row of the data editor, test 2 in a different row, and test3 in a yet different row from either of the other two.

Moreover, if there are variables that each of the three were with, such as id in the above, we have to keep the values of those variables with test1, with test2, and with test3.

This can be done using the Restructuring command in SPSS.

Simple Example

What we want

11.110318

11.210318

11.310318

22.19519

22.229519

22.39519

Data -> Restructure . . .

The result

The original

Centering, p 51

Some of the analyses will yield more interpretable results if performed on centered data.

There are two centering practices that are used . . .

1. Grand Mean Centering: Centering variables about the grand mean of all values in the combined file.

2. Group Mean Centering: Centering variables about the mean of the group within which the value resides.

Grand mean Centering

Example: Grand mean centering of test1 in the file, ch2multvarML1.sav

Step 1. Get the grand mean of the variable to be centered.

Analyze -> Compare Means -> Means

Report
test1
Mean / N / Std. Deviation
47.6439 / 8335 / 6.32466

Comparing the Grand mean centered values with original values . . .

They are perfectly linearly related.

In spite of the fact that they’re perfectly related, some of the results of analyses will have different interpretations, depending on whether original scores or grand mean centered scores are analyses.

These differences invariably involved intercepts.

Group Mean Centering.

Step 1. Invoke the AGGREGATE procedure, Data -> Aggregate.

1st Dialog box

Giving the summary variable, a new name and label

The resulting Fancied-up Dialog Box

Now transform test1 .

The result

Plotting the relationship of grpmtest1 to test 1

Note that the group mean centered values are NOT a simple linear transformation of the original values.

This means that the results of analyses with group mean centered scores may differ nontrivially from those of original scores or those of grand mean centered values.