Dataset Description for the Polit2 Dataset Files

Dataset Description for the Polit2 Dataset Files

Introduction

Each chapter in Polit’s (2010) textbook Statistics and Data Analysis for Nursing Research (2nd ed.) includes exercises that involve analyzing real data collected in a large study of low-income women. The data are stored in three data files on this website: Polit2SetA.sav, Polit2SetB.sav, and Polit2SetC.sav. We created three files, rather than having one large file, because the student version of SPSS accommodates no more than 50 variables. This website also contains 14 Word files—on for each book chapter—with our answers to the data analysis exercises, together with portions of the relevant SPSS output. The analyses were run using SPSS Version 16.0, but the Polit2 datasets are compatible with earlier versions of SPSS. Thus, you can download the files and use the data to practice doing statistical analyses in SPSS for Windows, and you can check your results against our answers.

This Word file (Dataset_Description.doc) provides an overview of the variables in the datasets, and a brief description of the study that generated these data. The SPSS codebooks (data dictionaries) for the datasets are available in a separate Word file on the book’s website (Codebooks_for_Datasets.doc). The codebook file shows, for each dataset, detailed information about each variable—for example, for the variable Race/ethnicity, the codebook shows that the code of 1 signifies Black, not Hispanic, and so on. It also provides information about the missing values codes that were assigned when legitimate values were not available. Variables in the Codebook Word file are arranged in the order that variables appear in the dataset. Variables in this Dataset Description file are arranged more substantively.

Background of the Study

The data in the three datasets come from a longitudinal study of low-income women in four urban communities[1]. In the original study, extensive information was collected in 1999 (Wave 1) and 2001 (Wave 2) from about 4,000 women. A major purpose of the study was to understand the life trajectories of these women and their children during a period of major changes to social policies affecting poor people in the United States. The sample was randomly selected from women who, in 1995, were single mothers receiving cash welfare assistance in the four cities. All data were collected by means of 90-minute in-person interviews in either English or Spanish in the study participants’ homes. Professional interviewers from a survey research firm, specially trained for this study, collected the data.

For this textbook, we have reduced the dataset appreciably because the purpose is to provide you with some data for developing analytic skills. We have randomly selected a subsample of 250 cases per site, for a total of 1000 cases. We included primarily variables that are relevant to health and mental health outcomes, and factors affecting those outcomes, such as material hardships. With one exception, only variables from the Wave 2 interviews are included.

There are approximately 125 variables in the datasets. A few demographic variables (e.g., age, race/ethnicity) are included in all three. Although the original dataset for this study contains literally thousands of variables, we have selected only a few. Because we were selective in including variables, we do not provide the actual survey instrument.

Data for some variables in the Polit2 datasets are the actual answers to questions the interviewers asked. For example, all of women’s answers to individual items on the SF-12 Health Survey are included on the Polit2SetB file. Many variables, however, were constructed by analysts. For example, data for the variable age (the second variable in all three datasets) were calculated by subtracting the participants’ date of birth (which was asked in the survey) from the date of the interview. We have omitted the raw data on actual date of birth, because that information is of little inherent interest, and also to further ensure participants’ confidentiality.

the spss data editor

If you open any of the Polit2SetA to Polit2SetC data files on a computer that has SPSS software, you will access the SPSS Data Editor. The Data Editor has two “views”—one that shows the actual data (Data View) and the other that shows variable information (Variable View). There are tabs at the bottom left of every screen that allow you to toggle back and forth between the two views.

In Data View, you will see a spreadsheet-type arrangement of the data. Each column in the spreadsheet corresponds to a variable in the data set, and is headed by a variable name. Each row corresponds to an individual study participant. The numerical entries in the body of the spreadsheet represent the data for each participant on each variable.

A small portion of the spreadsheet for the Polit2SetA dataset is presented below. Data for the first 8 variables for the first 10 participants are shown. The first participant, whose ID number is 12001, was 30.148 years old at the time of the interview (age), and she first gave birth when she was 22.81 years old (age1bir). You can see that there are some blanks in the spreadsheet, which occurs when information is missing. For example, the dataset does not have information on how old the second participant (ID # 12202) was when she first gave birth, either because she was unsure or refused to answer the question.

id / age / age1bir / racethn / educatn / higrade / worknow / workweek
1 / 12001 / 30.148 / 22.81 / 1.0 / 2.0 / 14.0 / 1.0 / 40.0
2 / 12002 / 30.723 / 1.0 / 1.0 / 11.0 / 0.0
3 / 12003 / 30.189 / 15.78 / 1.0 / 2.0 / 8.0 / 0.0
4 / 12004 / 26.679 / 17.69 / 1.0 / 2.0 / 14.0 / 1.0 / 40.0
5 / 12006 / 39.334 / 25.48 / 1.0 / 1.0 / 11.0 / 1.0 / 40.0
6 / 12007 / 36.397 / 20.21 / 1.0 / 2.0 / 13.0 / 1.0 / 45.0
7 / 12009 / 27.510 / 15.83 / 1.0 / 2.0 / 12.0 / 0.0
8 / 12011 / 31.814 / 14.68 / 1.0 / 1.0 / 10.0 / 0.0
9 / 12012 / 25.016 / 20.83 / 1.0 / 2.0 / 12.0 / 1.0 / 40.0
10 / 12015 / 38.038 / 15.53 / 1.0 / 2.0 / 11.0 / 1.0 / 40.0
Data View / Variable View

TIP: In SPSS, all variable names must be unique—there can be no duplication within a file. Variable names can be no longer than 8 characters, and must begin with a letter, not a number. Sometimes researchers use variable names corresponding to questions in a data collection instrument (e.g., Q1, Q2, etc.), but often they use names that communicate the concept being measured (e.g., age, sex), as we have done.

In Variable View, you will see information about each variable in the dataset. Below we show an SPSS Variable View screen from the SPSS manual. This is not the Variable View for the Polit2 dataset, but it does show that Variable View presents a wealth of information about each variable in the dataset. Comparable information about the variables is also available, in a different and perhaps more convenient format, in the Codebooks_for_Datasets.doc file.

listing of variables in the dataset

The charts in this section provide information about the variables in the three Polit2 datasets. The charts give the variable name, the variable label attached to the variable name, and the variable’s position in the A, B, or C files, i.e., whether it is variable number 1, or 10, or 35 (for example). The variables have been roughly clustered into thematically-related groupings. When relevant, each chart is prefaced by a brief description of the variable cluster.

(Note: The abbreviation “R” in the variable labels stands for Respondent, that is, the study participant; the abbreviation “hh” stands for Household).

Chart 1: Administrative Variable

Each participant was assigned a unique 5-digit identification number. The second digit (2, 4, 6, or 8) was a code for the participant’s city. The ID number is in all three datasets.

Variable Name / Variable Label / Position in the File
Set A / Set B / Set C
id / Identification number / 1 / 1 / 1

Chart 2: Demographic and Background Variables

Basic demographic information was collected, and most of these variables are in all three datasets because demographic characteristics often play an important role in statistical analyses.

Variable Name / Variable Label / Position in the File
Set A / Set B / Set C
age / Respondent's age at time of interview / 2 / 2 / 2
age1bir / Age at first birth / 3 / 3 / 3
racethn / Race/ethnicity / 4 / 4 / 4
educatn / Educational attainment / 5 / 5 / 5
higrade / Highest school grade completed / 6
worknow / Currently employed? / 7 / 6 / 6
workweek / Hours worked per week, most recent job / 8
marital / Current marital status / 9 / 7 / 7
hhsize / Total # people living in hh past mo / 10
kidshh / # of R's children living in hh past mo / 11 / 8 / 8
ageyoung / Age of youngest child (years, rounded) / 12

Chart 3: Income-Related Variables

Participants were asked a long series of questions about sources of income in the prior month, for themselves, their spouses or partners, and any other household member. A substantial minority of women were living in complex arrangements (e.g., with their own mothers, cousins, and so on), and in these situations the women were seldom able to provide income information for everyone. Thus, the variable income, which represents the total household income from all sources in the previous month, has a considerable amount of missing data. The variable poverty, a variable indicating whether the family was above or below the official federal poverty line in the prior month, was based on the income variable and thus also has many missing values.

Variable Name / Variable Label / Position in the File
Set A / Set B / Set C
cashwelf / Received cash welfare-past mo / 19
foodstam / Received Food Stamps-past mo / 20
emergfd / Received emergency food assistance-past mo / 21
income / Family income prior month, all sources / 22 / 9
poverty / Poverty status / 23 / 9 / 10

Chart 4: Material Hardships: Housing and Housing Problems

The women in this sample were all economically disadvantaged. Many questions in the survey were designed to better understand the material hardships that these women and their children endured. The variables in this set concern housing problems that were expected to have possible consequences for the physical and mental health of these families. Eight of the variables in the chart concern housing problems in their current residence: from having utilities cut off (utilcut) to having a stove or refrigerator that did not work (badstove). The variable housprob is a count of the number of these housing problems that each participant was facing at the time of the interview. Possible values ranged from 0 to 8.

Variable Name / Variable Label / Position in the File
Set A / Set B / Set C
rooms / No. of rooms in HH-not incl bath / 13
evicted / Ever evicted in past 12 mos? / 14
shelter / Ever lived in emergency housing shelter-pst 12m / 15
homeless / Ever homeless past 12 mos / 16
moves / Total # of times R moved past yr / 17
nabrqual / Neighborhood quality as place to live/raise kids / 18
utilcut / Utilities shut off b/c could not pay bill? / 25
leakroof / Home has leaky roof or ceiling? / 26
badplumb / Home has plumbing that doesn't work? / 27
brknwind / Home has broken windows? / 28
elcprobs / Home has electrical problems? / 29
vermin / Home has vermin or insects? / 30
noheat / Home has heating sys that doesn't work? / 31
badstove / Home has stove or refrig doesn't work? / 32
housprob / Number of housing problems / 33

Chart 5: Material Hardships: Food Insecurity and Health Care

This group of variables also concerns the women’s material hardships. Three variables reflect unmet need for medical or dental care as a result of not having enough money or not having insurance. The variable foodscor represents the women’s score on a scale called the Household Food Security Scale, which is a national benchmark measure of food security developed for the U.S. Bureau of the Census. The scale is an 18-item measure that classifies people into one of four categories: Food secure, food insecure without hunger, food insecure with moderate hunger, and food insecure with severe hunger. Examples of items on this scale include “I worried that our food would run out before we got money to buy more” and “My children sometimes skip meals because there isn’t enough money for food.” Responses to individual items are not included in these datasets.

Variable Name / Variable Label / Position in the File
Set A / Set B / Set C
foodscor / Food Insecurity Scale score / 24
medcare / Didn't get medical care 1+ time past yr b/c no $ or ins / 34
dentcare / Didn't get dental care 1+ time past yr b/c no $/ins / 35
unmet / Unmet medical or dental need past yr / 36

Chart 6: Difficult Life Circumstances

During each interview, study participants completed a self-administered booklet that included several composite scales. One such scale was the Difficult Life Circumstances scale. The full scale included 16 items that reflected sources of daily stress. Respondents indicated whether or not the circumstance applied to them. In the full study, the items were added to create an overall index of how many difficult life circumstances the women were experiencing. Six of the items from the scale are included in Polit2SetA.