Creating, Importing, Reading and Validating SPSS Data Files

Creating, Importing, Reading and Validating SPSS Data Files

SPSS Basics - Part 1

Creating, Importing, Reading and Validating SPSS Data Files

March 31, 2009

April 2, 2009

Draft

March 30, 2009

This document and related materials will be available at

  1. About SPSS
  2. Comprehensive collection of tools for data analysis, reporting, and data manipulation
  3. SPSS workshop series and format
  4. Available versions
  5. Licensing at Lehman
  6. Locations where SPSS is installed
  1. An example of data analysis using SPSS and NORC’s 2008 General Social Survey (GSS)
  2. For further information on the GSS including a detailed codebook, visit
  3. Starting SPSS for Windows
  4. SPSSinterface (menus, dialog boxes, and other standard Windows features)
  5. HELP!
  6. Opening SPSS window for data entry and display – Data Editor
  7. Opening an existing SPSS-format data file (“system file”)
  8. .sav file extension for SPSS data files
  9. Data View/Variable View
  10. Basic structure of an SPSS-format data file (units of analysis, cases, or observations as rows; columns as variables or measures; cells contain values)
  1. Subset of variables from the 2008 General Social Survey

1)age

2)attend (religious services)

3)class

4)confinan (confidence in banks/financial institutions)

5)consci

6)degree

7)educ

8)fatalism

9)finrela

10)geomobil

11)getahead

12)happy

13)health

14)income

15)intecon (interest in economics)

16)life (exciting/dull)

17)marital

18)newsfrom

19)partyid

20)polviews

21)pres04

22)racecen1

23)region

24)relig

25)satfin

26)sex

27)trust

28)vote04

29)wrkstat

  1. Running statistical procedures (Frequencies, Descriptives, Crosstabs, Means) to answer questions about the data
  2. SPSS window for output – Output/viewer window
  3. Reviewing results
  4. Saving output
  5. .spv file extension for output
  1. Creating your own SPSS data files in the Data Editor
  1. The following hypothetical height/weight data will be used:

IDGENDERHEIGHTWEIGHT

(inches)(pounds)

1M70155

2F61

3F64125

4M175

5M72180

6M69170

7F65115

8M77200

9F68140

10M70

Note that some measurements are missing.

  1. Define variables in Data Editor-Variable View

NameIDGENDERHEIGHTWEIGHT

TypeNumericStringNumericNumeric

Width1

Decimals00

Variable labelsIdentificationHeight in inchesWeight in lbs

Number

Value labelsM Male

F Female

Missing values99999

  1. Enter the following data in the Data Editor - Data View

IDGENDERHEIGHTWEIGHT

1M70155

2F61999

3f64125

4M99175

572180

6m69170

7F65115

8M775200

9F68140

10M70

  1. Structure of an SPSS data file – rectangular array or matrix with

Rows as cases (Units of analysis, observations)
Columns as variables
Values for particular cases on particular variables in cells at row-column intersection

  1. System missing versus user-defined missing values
  1. Saving your data file (.sav file extension)
  1. Data validation
  2. Running procedures to validate coding and data entry
  3. Frequencies
  4. Crosstabulation for contingent questions (not applicable here)
  1. Creating SPSS data files from other data sources
  2. Importing data from Excel
  3. Variable names in first row
  4. Variable type and potential for error
  5. Other sources: Access, SAS etc.
  1. Creating SPSS data files from “ASCII” text files
  2. Common non-SPSS formats: delimited or fixed format plain text (“ASCII”) files
  3. File extensions for text files .dat, .txt
  4. Notepad to view text files
    In ASCII files, a line is generally referred to as a RECORD. The columns assigned to a variable are collectively referred to as a FIELD. HEALTH.DAT is an example of a fixed format ASCII file since the same information is coded in the same location for every case. AGE, for example, is always found in columns 14-15 of the first and only record of a case. (Column positions may appear distorted when using proportional fonts in Word or other word processors.)
    The use of the term column when describing the layout of an ASCII text file is different from the use of the term column when describing the contents of the Data Editor. A variable occupies a column in the Data Editor; a character occupies a column in an ASCII data file. The ASCII text file is not an SPSS data file.
  1. Record layout ("Codebook") for HEALTH.DAT

descriptionvariablerecordcolumns

Identification numberID11-2

Systolic blood pressureSBP13-5

Quetelet indexQUET16-10

Age in yearsAGE111-12

98 = 98 or more

99 = missing
.

Respondent smokesSMK113

0 = no
1 = yes

*Quetelet Index (a measure of size) = 100 * (weight/height**2)

Fixed format file

011352.876450

021223.251410

031303.100490

041483.768520

051462.979541

061292.790471

071623.668601

081603.612481

091442.368441

101804.637641

111663.877591

121384.032511

131524.116990

141383.673560

151403.562541

161342.998501

171453.360491

18 3.024461

191353.171570

201423.401560

211503.628561

221443.751580

231373.296530

241323.210500

251493.301541

261323.017481

271202.789430

281262.956431

291613.800630

301704.132631

311523.962620

321644.010650

Source

Kleinbaum, David G. and Kupper, Lawrence L. (1978). Applied Regression Analysis and Other Multivariable Methods. Boston, Massachusetts: Duxbury Press. (p. 60)

Comma delimited format

1,135,2.876,45,0

2,122,3.251,41,0

3,130,3.1,49,0

4,148,3.768,52,0

5,146,2.979,54,1

6,129,2.79,47,1

7,162,3.668,60,1

8,160,3.612,48,1

9,144,2.368,44,1

10,180,4.637,64,1

11,166,3.877,59,1

12,138,4.032,51,1

13,152,4.116,99,0

14,138,3.673,56,0

15,140,3.562,54,1

16,134,2.998,50,1

17,145,3.36,49,1

18,,3.024,46,1

19,135,3.171,57,0

20,142,3.401,56,0

21,150,3.628,56,1

22,144,3.751,58,0

23,137,3.296,53,0

24,132,3.21,50,0

25,149,3.301,54,1

26,132,3.017,48,1

27,120,2.789,43,0

28,126,2.956,43,1

29,161,3.8,63,0

30,170,4.132,63,1

31,152,3.962,62,0

32,164,4.01,65,0

  1. Read an ASCII file using the Text Import Wizard
  1. Using command syntax to read file
  1. More complex input file structures
  2. multiple lines per case
  3. hierarchal files (e.g. household record, followed by one record per member of household)
  1. different record types (e.g. personal data record, course records, financial data record)
  2. varying numbers of measures per unit
  1. Obtaining and using existing SPSS data files
  2. Example of GSS 2008
  3. Other data archive sites

    Contact William Bosworth () for further information
  1. Learning more about SPSS

  2. Manuals in pdf format provided with license
  3. Help > Tutorial etc.
  4. Academic web sites, e.g.

1