SPSS Basics - Part 1
Creating, Importing, Reading and Validating SPSS Data Files
March 31, 2009
April 2, 2009
Draft
March 30, 2009
This document and related materials will be available at
- About SPSS
- Comprehensive collection of tools for data analysis, reporting, and data manipulation
- SPSS workshop series and format
- Available versions
- Licensing at Lehman
- Locations where SPSS is installed
- An example of data analysis using SPSS and NORC’s 2008 General Social Survey (GSS)
- For further information on the GSS including a detailed codebook, visit
- Starting SPSS for Windows
- SPSSinterface (menus, dialog boxes, and other standard Windows features)
- HELP!
- Opening SPSS window for data entry and display – Data Editor
- Opening an existing SPSS-format data file (“system file”)
- .sav file extension for SPSS data files
- Data View/Variable View
- Basic structure of an SPSS-format data file (units of analysis, cases, or observations as rows; columns as variables or measures; cells contain values)
- Subset of variables from the 2008 General Social Survey
1)age
2)attend (religious services)
3)class
4)confinan (confidence in banks/financial institutions)
5)consci
6)degree
7)educ
8)fatalism
9)finrela
10)geomobil
11)getahead
12)happy
13)health
14)income
15)intecon (interest in economics)
16)life (exciting/dull)
17)marital
18)newsfrom
19)partyid
20)polviews
21)pres04
22)racecen1
23)region
24)relig
25)satfin
26)sex
27)trust
28)vote04
29)wrkstat
- Running statistical procedures (Frequencies, Descriptives, Crosstabs, Means) to answer questions about the data
- SPSS window for output – Output/viewer window
- Reviewing results
- Saving output
- .spv file extension for output
- Creating your own SPSS data files in the Data Editor
- The following hypothetical height/weight data will be used:
IDGENDERHEIGHTWEIGHT
(inches)(pounds)
1M70155
2F61
3F64125
4M175
5M72180
6M69170
7F65115
8M77200
9F68140
10M70
Note that some measurements are missing.
- Define variables in Data Editor-Variable View
NameIDGENDERHEIGHTWEIGHT
TypeNumericStringNumericNumeric
Width1
Decimals00
Variable labelsIdentificationHeight in inchesWeight in lbs
Number
Value labelsM Male
F Female
Missing values99999
- Enter the following data in the Data Editor - Data View
IDGENDERHEIGHTWEIGHT
1M70155
2F61999
3f64125
4M99175
572180
6m69170
7F65115
8M775200
9F68140
10M70
- Structure of an SPSS data file – rectangular array or matrix with
Rows as cases (Units of analysis, observations)
Columns as variables
Values for particular cases on particular variables in cells at row-column intersection
- System missing versus user-defined missing values
- Saving your data file (.sav file extension)
- Data validation
- Running procedures to validate coding and data entry
- Frequencies
- Crosstabulation for contingent questions (not applicable here)
- Creating SPSS data files from other data sources
- Importing data from Excel
- Variable names in first row
- Variable type and potential for error
- Other sources: Access, SAS etc.
- Creating SPSS data files from “ASCII” text files
- Common non-SPSS formats: delimited or fixed format plain text (“ASCII”) files
- File extensions for text files .dat, .txt
- Notepad to view text files
In ASCII files, a line is generally referred to as a RECORD. The columns assigned to a variable are collectively referred to as a FIELD. HEALTH.DAT is an example of a fixed format ASCII file since the same information is coded in the same location for every case. AGE, for example, is always found in columns 14-15 of the first and only record of a case. (Column positions may appear distorted when using proportional fonts in Word or other word processors.)
The use of the term column when describing the layout of an ASCII text file is different from the use of the term column when describing the contents of the Data Editor. A variable occupies a column in the Data Editor; a character occupies a column in an ASCII data file. The ASCII text file is not an SPSS data file.
- Record layout ("Codebook") for HEALTH.DAT
descriptionvariablerecordcolumns
Identification numberID11-2
Systolic blood pressureSBP13-5
Quetelet indexQUET16-10
Age in yearsAGE111-12
98 = 98 or more
99 = missing
.
Respondent smokesSMK113
0 = no
1 = yes
*Quetelet Index (a measure of size) = 100 * (weight/height**2)
Fixed format file
011352.876450
021223.251410
031303.100490
041483.768520
051462.979541
061292.790471
071623.668601
081603.612481
091442.368441
101804.637641
111663.877591
121384.032511
131524.116990
141383.673560
151403.562541
161342.998501
171453.360491
18 3.024461
191353.171570
201423.401560
211503.628561
221443.751580
231373.296530
241323.210500
251493.301541
261323.017481
271202.789430
281262.956431
291613.800630
301704.132631
311523.962620
321644.010650
Source
Kleinbaum, David G. and Kupper, Lawrence L. (1978). Applied Regression Analysis and Other Multivariable Methods. Boston, Massachusetts: Duxbury Press. (p. 60)
Comma delimited format
1,135,2.876,45,0
2,122,3.251,41,0
3,130,3.1,49,0
4,148,3.768,52,0
5,146,2.979,54,1
6,129,2.79,47,1
7,162,3.668,60,1
8,160,3.612,48,1
9,144,2.368,44,1
10,180,4.637,64,1
11,166,3.877,59,1
12,138,4.032,51,1
13,152,4.116,99,0
14,138,3.673,56,0
15,140,3.562,54,1
16,134,2.998,50,1
17,145,3.36,49,1
18,,3.024,46,1
19,135,3.171,57,0
20,142,3.401,56,0
21,150,3.628,56,1
22,144,3.751,58,0
23,137,3.296,53,0
24,132,3.21,50,0
25,149,3.301,54,1
26,132,3.017,48,1
27,120,2.789,43,0
28,126,2.956,43,1
29,161,3.8,63,0
30,170,4.132,63,1
31,152,3.962,62,0
32,164,4.01,65,0
- Read an ASCII file using the Text Import Wizard
- Using command syntax to read file
- More complex input file structures
- multiple lines per case
- hierarchal files (e.g. household record, followed by one record per member of household)
- different record types (e.g. personal data record, course records, financial data record)
- varying numbers of measures per unit
- Obtaining and using existing SPSS data files
- Example of GSS 2008
- Other data archive sites
Contact William Bosworth () for further information
- Learning more about SPSS
- Manuals in pdf format provided with license
- Help > Tutorial etc.
- Academic web sites, e.g.
1