PRR 475: SPSS for WINDOWS - BASICS, LAB Nov 17-24

PRR475 Stynes Exercises Page

PRR 475: SPSS FOR WINDOWS version 10.0 - LAB Nov 1-15.

Contents: SPSS procedures - 1-4; Practice Exercise - 5-6 ; Assigned Exercise is on page 6 at bottom. Sample analysis - 7; HCMA study description - 8 Codebook 9-10.

SPSS stands for Statistical Package for the Social Sciences. Other popular statistical software includes SAS, SYSTAT and MINITAB. SPSS is well suited to analysis of social science/survey data. Like all statistical packages, SPSS works with a table of data with cases as rows and variables as columns (just like an Excel Table - in fact you can import Excel tables directly to SPSS and vice versa). For survey data, each case is a respondent or questionnaire and each variable is usually a numeric coding of the response to a single question on the survey instrument. Statistical packages prefer to analyze data in numeric form so one codes variables like GENDER as something like 1=male, 2=female ( 1=male, 0=female is better). We will be analyzing data from the 1996 Huron Clinton Metropark visitor survey. The HCMA survey dataset includes 4,031 cases and 136 variables (original). A few of the messier variables have been dropped for this exercise and other variables have been computed.

You will need copies of the HCMA96.SAV file to complete this exercise. You may retrieve directly in SPSS in micro-labs from the Course AFS space, labs99 subdirectory.

1. Loading SPSS-PC . Run SPSS by selecting the SPSS program from the START menu (In math/stat applications).

When SPSS opens you will see options to run tutorial, enter data, or open an existing file (the default). Close this dialogue box and retrieve file directly from SPSS menus. File, Open, Data then browse to the HCMA96.SAV file in the course labs99 subdirectory on U drive.

When the file is loaded, you will see the data in the data window in spreadsheet format. Variable names are at top of columns. Cases run down rows. Each case/row represents one respondent/completed questionnaire. See HCMA codebook and questionnaire to match variables with items on the questionnaire. To see codes as Values rather than numbers, choose View on menus and check Value Labels (uncheck to toggle back to numbers). To see information about any variable, choose "Variable view" tab at bottom (SPSS 10.0). On menus, Utilities, Variables shows you information for all variables. You are now ready to run statistical analysis.

2. To run Statistical Procedures choose the ANALYZE option on menu and then the statistical procedure you wish to run. We will work mostly with Descriptive Statistics and the Compare Means procedure.

Descriptive Statistics

FREQUENCIES frequencies for nominal & ordinal variables

DESCRIPTIVES means etc. for interval/ratio scale variables

EXPLORE Exploratory data analysis procedures to see distributions

CROSSTABS Tables for nominal or ordinal (few categories) variables, Chi square test

COMPARE MEANS Interval dependent variable, nominal or limited category independent variable

Means Compare subgroup means, Options ANOVA for stat test

One Sample T-Test Test H0 : Mean of variable = some constant

Indep. Samples T-Test Two groups, Test H0 : Mean for group 1 = Mean for group 2

Paired samples T-Test Paired variables - applies in pre-test, post-test situation

One Way ANOVA Compare means for more than two groups

3. General Steps for Running Procedures.

a. First choose a procedure from Analyze menu. Note the appropriate procedure depends on measurement levels of your variables and nature of the intended analysis. See 5 below for details.

b. Choose variables : Select from list of variables at left, click arrow to move into Variable Box at right. Note that you can choose several variables at a time - move one at a time by selecting and clicking arrow or by double clicking on variable name. Hold CTRL key down while clicking to select several variables and move to Variable Box as a group. To Unselect a variable, click on it in the Variable Box on right, arrow switches direction, click it to move back.

c. Select Buttons at bottom for special Options, Statistics, etc. - complete dialog boxes, CONTINUE

d. Click OK to run the procedure

e. Results appear in the OUTPUT Window. SPSS automatically switches to output window when you run a procedure. Scroll around in this window to view results. To return to Data window click HCMA96 button on application bar at bottom or choose HCMA96 from Window menu item.

4. SPSS Windows and files. SPSS throws up lots of WINDOWS, often not maximized. Use the MAXIMIZE buttons at top right of windows to expand display to full screen. Use WINDOW command on menu bar to choose between the Output or Data Windows or choose them from Application bar at bottom. Three primary windows are

The Data Window - a spreadsheet showing raw data, variables across columns, cases down rows. Run most procedures from here. SPSS data files have an *.SAV extension. SPSS 10.0 has added a "variable view" page to the data window accessed via Excel-type tabs at bottom. The Variable view page has definitions of variables and coding information.

Output window - when you run a procedure, results are shown in the Output window. This is like a wordprocessor with outline at left to select particular results. You may print results from here or copy and paste them to WORD or EXCEL. SPSS Output files have an *.SPO extension

Syntax window - optional. If you use Paste option, you can paste procedures to syntax window, where you can easily rerun them or edit them. SPSS syntax files have an *.SPS extension.

SPSS data and output files are specially coded files you can only read in SPSS. There are utilities to save data files as Excel or Access files, or to import data from those formats to SPSS. The syntax files are simple text files that can be read by a wordprocessor.

5. Guidance on individual procedures - basic statistics

a. FREQUENCIES - run this on variables at nominal or ordinal scale with a small number of categories. Gives frequency distribution for the variable and optional statistics.

b. DECRIPTIVES - run for interval scale variables to get mean, standard deviation, etc. , choose S.E. Mean in Statistics Dialog BOX to compute confidence intervals.

c. CROSSTABS - for nominal/ordinal variables, choose a row and column variable (variable with fewer categories for columns). In Statistics, select Chi square for a hypothesis test, in Cells choose Row Pct and Column Pct.

d. COMPARE MEANS - dependent variable must be interval scale (or dichotomous), independent variable forms subgroups (should take on limited set of values - usually nominal or ordinal).

e. CORRELATE - for two or more interval scale variables user Pearson, Spearman/Kendall for ordinal measures.

6. Variable Transformations - Sometimes you want to change coding of a variable or compute a new variable. RECODING AND COMPUTING procedures are in the TRANSFORM menu. Use RECODE to change coding of a variable (maybe to collapse into fewer groups or reassign missing codes) and COMPUTE to compute new variables (e.g. simple sum of other variables).

a. RECODE changes coding of a variable. First choose whether you want to put new codes in same variable or a different (new) one . The latter preserves old codes and sets up new variable with new codes. To preserve the original coding on the file, choose recode "into new variable". Then you must add name for new variable and press the CHANGE button. In either case, specify coding changes as follows. Select variable you want to change codes for and choose the "old and new values" option. Then complete the Dialog Box to indicate how codes should be changed. Press ADD button to add each coding change to the recode box. Repeat procedure for as many codes as you wish to change. Then press OK to execute the changes.

For example to change code 4 on the FIRST variable to group "within the past 5 years" (3) and "more than 5 years ago" (4) together, select recode into same variable, choose FIRST variable, choose OLD AND NEW VALUES button enter a 4 in box for old value and a 3 for new value at right. Then click the ADD button and a line 4 Þ 3 will appear in box. Click CONTINUE, then Click OK to perform the recoding. If you look in DATA Window under FIRST column all the 4’s should now be 3’s. When you run a FREQ on FIRST, 3’s and 4’s will be grouped and show up as 3’s. Careful as any value labeling won't be automatically corrected.

b. COMPUTE: To compute new variables from old. Choose transform, Compute. Enter a name for new variable in the Target Variable Box( 8 characters or less). Then enter a mathematical expression in the larger box after the = sign indicating how new variable is computed. Press OK to execute the procedure. Your new variable is added as a column at the end of file in DATA window. You may now use this variable in any procedure (refer to it by the name you assigned).

e.g. to compute a variable equal to length of time each party stayed in the park. Enter HOURS as a name in Target Variable Box. In numeric expression box enter LEAVE - ARRIVE. Press OK. Be careful to spell variable names correctly. You can paste variables into box by double clicking on them in the list of variables at left and then adding (or pasting from calculator pad) math expressions in between. You can edit inside box to correct mistakes. SPSS will add the new variable to the file - check it at far right in data window. You can now use the new HOURS variable like any other in a statistical procedure. It won't be kept when you exit SPSS unless you save file (probably no need to save file, but if you do, you'll have to put it in your own AFS space). Beware of missing values when computing new variables. Result will be missing if any variables in formula are missing.

Good practice when recoding or transforming is to always check the result before proceeding with further analysis. Check via frequencies on new and old variables or by manually checking a few cases in data window.

7. Other Procedures and Tips

a. OPTIONS. SPSS may be set up to show variables in either alphabetic or file order in pick lists. To get “File” order of variables, choose EDIT, OPTIONS in main SPSS menu and change Variable order from Alpha to File order (push radio buttons on General Tab at right). You must do this BEFORE retrieving the file. Choose File, New Data and then re-retrieve the file if you already loaded it for this change to take effect. This doesn’t change the order of variables on data window, only in variable pick lists.

b. CUSTOM TABLES: The Custom Tables procedures let you run descriptive statistics on groups of variables and assembles the results in tables, giving you some control over formatting and labeling. It produces what are sometimes called "banner tables" summarizing a number of variables in a single table. Use "Basic Tables" for descriptive statistics, "General Tables" for crosstabulations, and "Tables of Frequencies" for frequency distributions. You may check out this procedure after you have mastered those in SUMMARIZE section, if you wish.

c. PRINTING and SAVING. You may print results as you generate them from the output WINDOW, copy ones you want into a wordprocessor. To save output, when you exit SPSS (By File Exit command), answer YES to the question about saving your output. Enter a path and filename, e.g. A:SPSS.SPO to put it on your floppy or enter path to your AFS space. You don’t need to save the data (respond NO to this question when exiting). The SPSS.SPO file can only be read by SPSS. You can also copy and paste SPSS output to WORD or EXCEL by opening both SPSS and these applications. The Output window is a simple text editor - you can add your own notations and delete items you don't want. Outline at left is handy for finding a procedure you ran or deleting it.

d. Selecting and Sorting Cases: The Data menu has procedures to SORT the data file on a particular variable or to SELECT subsets of cases to use in an analysis. For example, to Select only cases from Kensington Metropark, choose Data, Select Cases and then push the IF tab and enter filter PARK=1 (Kensington is park 1 in coding scheme). Any subsequent analysis will only use the Kensington cases and you will see a "filter on" message in status bar and cases not from Kensington are "slashed out" in data window. REMEMBER To turn filter off when you want to return to all cases -, come back to DATA, Select CASES and choose the "all cases" radio button.

e. WEIGHTS: Weights can be used to adjust the sample to better represent the population or to expand cases from sample to the population. The HCMA file has two sets of weights: VSTWT adjusts and expands the sample to the population of 2.788 million visits (park entries) to HCMA, while VSITORWT adjusts the sample to population of about 300,000 household visitors (anyone visiting an HCMA park at least once in 1996). Use the VSITORWT when describing people, use VSTWT when describing park vehicle entries. The weights adjust the sample to the actual distribution of use in 1996 by park, season, and weekday/weekend; correcting for disproportionate sampling and different response rates across parks and periods. DO NOT use these expansion weights when conducting statistical tests, as all hypotheses will be significant (tests think they are based on a sample of 2.8 million). Instead use VISWT2 or VSTORWT2, which adjust for disproportionate sampling, but then normalize weights back to the actual sample size, so statistical tests are based on the true sample size. You can also run tests unweighted. When a weight is on, a message appears on status bar. To set weighting variable or turn weighting off, go to Data, Weight Cases on menu and choose the desired weighting variable.