10

USING ICPSR

(Inter-University Consortium for Political and Social Research)

ICPSR was created in the 1950s to serve as a data repository and clearinghouse for political science – a way for scholars to share data. Beginning it in the 1970s it broadened its coverage to include data of interest to: multiple disciplines in the social sciences, education, business, health, and more.

Organizations and individual scholars are constantly in the process of collecting data. Many collect it for a specific purpose and have no further use for the data after completing their original analysis. They will then deposit their data files at a clearinghouse such as ICPSR so that it is readily available for other scholars who may wish to use it for their research purposes. Most government agencies, such as the Census Bureau and the FBI deposit their data with ICPSR, and the National Science Foundation has long required any person or entity collecting data under an NSF grant to send that data to ICPSR as well. Private news organizations, such as the ABC News / Washington Post poll series, are also deposited with ICPSR. ICPSR’s current data set holdings number in the thousands.

Scholarly research sometimes requires collecting new data, but just as often can be accomplished through a new and novel re-analysis of existing data. There is no need for a scholar (or graduate student) to spend hundreds of thousands of dollars developing a new data file if an existing one is sufficient.

ICPSR, headquartered at the University of Michigan, is not the only source of existing data. It is a primary source for political scientists, but there are others. The National Opinion Research Center (NORC), headquartered at the University of Chicago; is a primary clearinghouse for sociologists, and the European Consortium for Political Research (ECPR), headquartered at the University of Essex in Great Britain, provides a service similar to ICPSR for European social scientists. With the growth of the world wide web it has also become increasingly common for professional journals, university research centers, and even individual scholars to store data files on web sites available to all. Political scientists use all these sources.

ICPSR: Where and What?

ICPSR maintains a web site at www.icpsr.umich.edu. For any data set, ICPSR provides not only the data itself, but also a codebook for the data, a 1-2 page description of the data file, and sometimes a list of published works that have used the data. The data files are often stored in multiple formats including as raw data and as SPSS, SAS, and Stata files. The web site is readily available to all visitors. So too are many, but not all, of ICPSR’s resources. Codebooks, data descriptions, and other informative information are freely available to all. The data itself are only available to the 500+ ICPSR member institutions. Those member institutions include all major research universities in the U.S. as well as many international universities. TTU is an ICPSR member institution which means TTU faculty and students may download copies of any ICPSR data free of charge.

Searching for Data

Go to the ICPSR home page at www.icpsr.umich.edu. Once there you will see a number of announcements and links to various sources of information. These links contain useful information and can be explored at your leisure. However the home page feature you will use the most is the search function. On the ICPSR home page is an empty search box, usually accompanied by a label or file folder-type tab entitled Web Site & Data Holdings. To the immediate right of the empty box is a category box. On first accessing the ICPSR home page, the term all fields will likely appear in the category box. Clicking the arrow to the right will show other categories, such as title, study number, investigator, or subject terms. This search feature is what you will use to locate a data set on a topic of particular interest to you. Leaving the box set to “all fields” is best at the beginning. As you become familiar with ICPSR you can narrow your search.

Lets assume we wanted to search for the public opinion polls done by ABC news. Enter “abc news” in the search box and click on the search button. You will be taken to a web page listing the results that match your search criteria. Depending upon how broad or narrow your search is, you may end up with pages of data sets or a only a few matching your search. (NOTE: ICPSR’s search feature is case sensitive. Searching for abc news (all lower case) may well produce a different set of results from searching for ABC NEWS (all upper case). The data sets matching your search criteria will be displayed as a list of items looking like the entry below.

2963 ABC News Clinton Legacy Poll, January 2000 2000-10-18

ABC News

description | download | related literature

At the far left is a four-digit number, in this example 2963. This is the ICPSR study number. Each data set stored at ICPSR has a unique study (or inventory) number. If you know the study number of the data you are looking for, it is best to use that study number in your search for it will take you to the exact data you want instead of numerous data sets with similar names. Next is the title of the study. Each study also has a unique title. If you search by title, the closer your search words are to the exact title of the study the more likely you are to find the precise set of data you are looking for. The last item on the top line, over at the far right, is the date this study was deposited with ICPSR. This date starts with the year, and is then followed by the month and day. Though this poll, according to its title, was conducted in January of 2000, it was not deposited at ICPSR until October 18 2000, or 2000-10-18.

The second line of the entry contains the name of the investigator that collected the data. This may be the name of an individual person if a single scholar put together the data set, or more than one person if it was a group effort. It may also be, as in this case (ABC News), the name of the organization responsible for gathering the data. Finally, the third line contains links to the data set, an online description of the data, downloading the data and its associated files, and a list of any published items utilizing this data set. (In some cases there may be a fourth link entitled online analysis, which you will seldom use). Below is what you will find following each link.

Study Description

The description link leads to a page that discusses characteristics of this specific data set. Click on it and you will find a number of headings on the left side of the page. Some of the labels are self-explanatory, and others are of only occasional importance. Below are the meanings of some of the labels that you will encounter and use frequently. Not all will appear for every study and some studies may have additional headings, but these are the most common items of importance for data management.

SERIES. If this collection is part of a series where similar data are collected repeatedly across time, the name of the series will be provided here

BIBLIOGRAPHIC CITATION. A description of how this study should be cited in any paper you write using this data set.

SUMMARY. A brief description of the data, how they were collected (e.g., a survey or a compilation of historical records), and to investigate what issues.

SAMPLING and UNIVERSE. The universe describes the person or elements that are the object of the study. For many major surveys, the universe (in statistical language, the population which the sample should represent) consists of all adults living in a certain geographic area – the United States, France, the State of Maine, etc. Sometimes only targeted groups within that region are studied, such as only Hispanics or only the elderly. One should always read the universe to make sure the data are suitable to your needs. SAMPLING refers to the system used to draw a representative sample from the universe. More information about both the universe and the sampling system are often provided in the study’s codebook.

SUBJECT TERMS. Descriptions of ICPSR data sets are stored under a variety of keywords. These are words which, if entered into the ICPSR search engine will produce this data set as one of those matching your search. Keywords are provided in data descriptions should you wish to find other ICPSR data sets that may contain similar variables / information. Examine these subject terms as using them may help you find data sets of interest.

DATA COLLECTION NOTES and EXTENT OF COLLECTION. These two headings each describe some of the files available with the data and their formats. For example codebooks might be described as available in machine-readable or Portable Document (PDF) format. Data and related files might be described as: SPSS Portable, Stata System, SAS Data Definition, and more.

DATA FORMAT. Contains information similar to the two items mentioned above. May also refer to data files as LRECL, Card Image, Delimited, or Portable /Transport file formats. Each refers to a different way of storing data.

EXTENT OF PROCESSING. ICPSR treats data deposited in its archives in a number of different ways depending upon the amount of processing ICPSR itself performs on the data set. Some data sets may be quite small or of limited utility. For such studies ICPSR merely stores what was deposited in exactly the format received without making changes. It is not unknown for users to receive codebook copies for these limited studies that are exactly as produced by the depositing scholar, including typographical errors and handwritten notes in the margin. Occasionally you may see REFORM.DOC or REFORM.DATA notes here, showing ICPSR has reformatted the documentation or data into files easier access by users.

For other data sets, ICPSR performs a high level of processing which may include checking data for errors, producing a frequencies distribution of the variable or its own codebook, and more. Terms such as CONCHK, DDEF, and others refer to these additional processing steps.

NOTES: A notes section may exist with a link to an item entitled the file manifest. The file manifest is a technical document describing the structure of the data file. On occasion it may contain information about the file necessary for a statistics program to read it properly.

The description link provides basic information about each data set. If you are searching for data to meet a particular need, this link will give you an overview of the data. If you are sure this data set is the one you want, the description link provides some basic information you will later need to read the data into your statistical analysis software.

Related Literature

Compared to the Description link, the Related Literature link is only occasionally used. The Related Literature link will show any publications ICPSR is aware of that utilize or cite this particular data set. ICPSR requests scholars to send in citations to any published material using their data, but many scholars forget to do so. If any citations exist under this link they may be of interest to you if you want to see how these data have been used before, but more often than not the link will be empty.

Download

Your eventual desire is to download a particular data file to the computer you are currently using so you can analyze it with Stata or similar software. The download link leads to a page containing the data and documentation files for a study. However, before you can download any file you will need to register with ICPSR which is called creating a MyData account.

MyData Accounts. Clicking on the download link will take you to an Authorized Download page. This page begins with a set of instructions and warnings that you should read carefully. The use of ICPSR data involves both ethical requirements, such as acknowledging ICPSR as a source in any papers you write utilizing the data, and legal requirements about maintaining confidentiality when accessing government data. The primary confidentiality requirement is that data summaries, such as the average income of clients using public health services, are acceptable but any use or presentation of the data that might lead to the identification of specific individuals is forbidden.

After reviewing the legal and ethical requirements, you will see log in information in the middle of the page. If you have used the ICPSR web site previously, you will likely have created a MyData account at that time. You are now a returning user who only needs to enter her email address and password.

First Time Users. If this is your first time using ICPSR resources, you will need to create a MyData account. Every site visitor is allowed to create a MyData account and use a number of ICPSR’s resources, though some files can be accessed only by people affiliated with ICPSR member institutions. To create a MyData account, go to the New User section, and click on the Create Account button. This will take you to a new page. Fill in the required information and click the Submit button at the bottom of the page. Since most beginning graduate students are still learning about statistics and statistical analysis software, in the section asking for a download preference select No Answer or Multiple Packages.

The Data Cart. Whenever you select a file to download, and have logged in as a new or returning user, you will be taken to a Data Cart page. Here files you have selected will be displayed in a format similar to that below. (If you are not associated with an ICPSR member institution, the documentation files will be automatically selected as the only ones you can download, and a message will appear on the page that some files in this study are available only to users at ICPSR member institutions.)