Using O*NET Occupational Characteristics with Longitudinal Panel Data

Sarah Porter

University of Iowa

and

Jennifer Glass

University of Iowa

Support for this research came from the Alfred P. Sloan Foundation Workplace Flexibility Project (grant # 04-???) and the University of Iowa Undergraduate Student Assistantship Program. Address all inquiries to: Jennifer L. Glass, Department of Sociology, W140 Seashore Hall, University of Iowa, Iowa City, IA52242.

Using O*NET Occupational Characteristics with Longitudinal Panel Data

O*NET, the Occupational Information Network, is a comprehensive database of worker attributes and job characteristics compiled as a joint project of the Bureau of Labor Statistics and the U.S. Census Bureau. As the replacement for the Dictionary of Occupational Titles (DOT), O*NET will be the nation's primary source of occupational information (O*NET, 2005 online). To use the information in the O*NET data base for research purposes, however, there has to be a mechanism for attaching the attributes of jobs to the occupation codes given to individual respondents in survey data sets.

Our research project at the University of Iowa traces the effects of flexible work practices on individual’s wage growth over time, which we believe will be moderated based on organizational and occupational characteristics of the respondent’s primary job. For example, we believe that jobs involving a high level of customer/client service or team coordination of work tasks may penalize employees more strongly for utilizing a flexible schedule or working from home. The O*NET data base contains measures such as these for detailed job classifications using the Standard Occupational Classification (SOC) system. In order to investigate this hypothesis, we needed to attach occupational characteristics from the O*NET data base to each person/job in the National Longitudinal Survey of Youth 1979 (NLSY) sample beginning with the 1989 wave through the 2002 wave when respondents were in their peak years of career building and family formation. Table 1 provides names and descriptions of some of the O*NET job characteristics we used in this research. The O*NET data base is a work-in-progress, with new job characteristics being added on an ongoing basis.

The NLSY is similar to many other national panel data sets in that occupations are coded using the Census Bureau’s 3-digit classification system. This system is not static, however, and experienced a minor revision in 1990 and a major revision in 2000. Panel data sets such as the NLSY consistently coded all occupations in every wave from 1979 to 2002 using the 1980 occupation codes, even after the Census system was revised in 1990 and 2000. Other panel data sets, including the 1997 NLSY, the Panel Survey of Income Dynamics (PSID), the Survey of Income and Program Participation (SIPP), and the National Survey of Families and Households (NSFH), also use a constant occupational classification system across survey waves to maintain comparability over time.

The O*NET database is rich with detail about specific occupations; however, the O*NET database uses the very detailed level of the Standard Occupational Classification (SOC) system codes to record its data on job characteristics, not the Census 3-digit occupation classification codes (OCC). While the SOC systemprovides more precise information, large longitudinal panels tended to save time and expense by using the simpler Census OCC codes, which is why the NLSY and other large ongoing longitudinal studiesstill code occupations using the 3-digitCensus occupational classification codes. The especially dramatic change in the Census occupation classification systemfrom 1990 to 2000 as well as the differences in level of detail between Census 2000 OCC codes and detailed SOC codes makes research combining the two data bases difficult. In our case, we had data coded with 1980 Census OCC codes for each respondent across survey years. Fortunately, the changes in the Census OCC classification system between 1980 and 1990 were quite modest and could be handled with a few lines of syntax in any statistical package. The Census reports these changes in [put URL here]

While the Census conveniently provides a crosswalk from the Census 1990 OCC codes to Census 2000 OCC codes, and from theCensus 2000 OCC codes to standard (not detailed) SOC codes, these crosswalks are what we term “backward looking”; i.e. it is very easy to move detailed SOC codes back into one and only one 2000 Census OCC code or even further back to one of the 1990 Census OCC codes. However, it is much more difficult to move the 1990 Census OCC codes forward into one and only one 2000 Census code and even more difficult to move 2000 Census OCC codes into one and only one detailed SOC code. This problem occurs because the SOC codes used in the O*NET data base are much more detailed than the standard SOC codes and the 2000 Census OCC codes, and many of the 2000 Census OCC codes are more detailed than the 1990 Census OCC codes. To address this problem, we modified a series of crosswalks that ultimately allow us to attach an O*NET job code (a detailed SOC code) to each 1990 census occupational code.

STEP 1. Moving from the 1990 Census codes to the 2000 Census codes

The Census occupational coding system was updated and significantly altered between the 1990 and 2000 censuses to account for technological advances and structural changes in the economy. The system was also expanded to make the 2000 Census occupational classification system (OCC) more like the Standard Occupational Classification (SOC) coding system. We began with the U.S. Census Bureau’s “Table 2. Census Occupation Classification System and Its Redistribution into the 2000 Census Occupation Classification System” which can be found at:

This table lists each 1990 census occupational code, the number of people who worked in that field, and gives the number of those people who were redistributed to each of the corresponding 2000 census codes. Because the coding system was changed so dramatically, it is difficult to assign a single 2000 census code to each 1990 census OCC. Many of the 1990

Census OCC categories were redistributed into two or more 2000 Census OCC categories. This table shows every 2000 census OCC that fit any worker in a single 1990 census OCC category. There were ??? 1990 Census codes but ??? 2000 census codes, with ?? 1990 codes having more than one match in the 2000 codes. In these cases where a 1990 census OCC code corresponded to more than one 2000 census code, we chose the 2000 census code with the most workers in it for our crosswalk from 1990 into 2000 codes. The following example delineates this process using one 1990 occupation, “Drafting Occupations” (code 217), and the Census 2000 count of the number of workers in each 2000 Census OCC code.

Example 1. 1990 to 2000 Census Crosswalk :

Workers in the 1990 census code 217 – “Drafting Occupations” (324,761 workers) – were redistributed in 2000 into the following 2000 Census OCC codes:

130 – Architects, except naval (10, 277 workers)

131 – Surveyors, cartographers, and photogramnetrists (14, 388 workers)

154 – Drafters (287,766 workers)

156 – Surveying and mapping technicians (2,055 workers)

260 – Artists and related workers (10,277 workers)

Since the vast majority of people coded as 217- “Drafting Occupations” were redirected to the 2000 code 154 – “Drafters,” we mapped 1990 code 217  2000 code 154 in our crosswalk between the 1990 census codes and the 2000 census codes.

The resulting crosswalk was transformed into a SAS syntax file that can be run on any SAS formatted data set to move 1990 Census codes into 2000 Census codes. This crosswalk

accomplishes the first step necessary to match 1990 Census occupations with O*NET job characteristics. This crosswalk can be located at: [put url here for crosswalk on UI website]

STEP 2. Moving from 2000 Census OCC codes to the standard SOC codes

The next step was to take each 2000 Census OCC code and map it to a single 2000 SOC code. Since the 2000 Census codes were modeled after the standard (not detailed) SOC coding system, this crosswalk was fairly straightforward and required few modifications. We started with another crosswalk from the census bureau, “Census 2000 Occupational Categories, with Standard Occupation Classification (SOC) Equivalents” <

In a few cases, the 2000 Census Code corresponded to more than one SOC code. However, this only occurred in 13 cases out of over 500 Census OCC codes. When this happened, we chose the corresponding SOC category which seemed to have the most workers in it. We did not have population data for this crosswalk, but these judgments were necessary only a few times. Continuing the example above, the 2000 Census code 154 “Drafters” corresponded directly to the SOC code 17-3010 “Drafters.” No modification was necessary for this example.

Again, the resulting crosswalk was transformed into a SAS syntax file that can be run on any SAS formatted data set to move 2000 Census OCC codes into standard SOC codes. This crosswalk accomplishes thesecond step necessary to match 1990 Census occupations with O*NET job characteristics, and can be located at: [put url here for crosswalk on UI website]

Step 3. Moving from standard SOC codes to the detailed SOC codes used in the O*NET data base

The last step of the crosswalk process involved moving from standard SOC codes to the detailed (sometimes decimal-level) SOC codes used in the O*NET data base. The O*NET job characteristics are coded in a modified, more detailed form of the SOC coding system. Our crosswalk from 2000 Census codes to SOC codes gave us output like: 154 “Drafters 17-3010 “Drafters”. However, the O*NET job characteristics were coded at a far more detailed level, such as:

17-3011 “Architectural and Civil Drafters” (97,800 workers with this job title)

17-3012 “Electrical and Electronics Drafters” (33,720 workers)

17-3013 “Mechanical Drafters” (74,010 workers)

??? out of the ??? standard SOC codes had a more detailed level of occupational classification in the O*NET data base for which a method of simplification was needed. Because the Census-provided crosswalks did not cover this level of detail, we had to come up with a methodology for either choosing one detailed category to match to each standard SOC code, or for combining the information on the detailed categories associated with one standard SOC code to get a single indicator for that SOC code on each of the job characteristics in the O*NET data base we desired. We elaborate on the two methods below, and then provide a detailed empirical example comparing the two methods in models of wage growth using O*NET job characteristics merged to the employed NLSY respondents in each year.

Method 1. First, we simply assigned the standard SOC code to whichever of the detailed SOC codes had more workers in it—in the example, we chose 17-3011 "Architectural and Civil Drafters" because there are more drafters in the U.S. who fall into this category than any of the other O*NET categories.

Method 2. In our second method, we created weighted averages of the O*NET job characteristic scores based on the population size of each of the detailed occupations that fell into one standard SOC code. O*NET job characteristics are all measured on scales ranging from 0-100. For each O*NET characteristic we used, we weighted the scale score for each detailed SOC code in a standard SOC code by number of workers in the detailed SOC code and took a weighted average across the set to merged with the one standard SOC code.

In most of the cases for which there were more detailed SOC codes than standard SOC codes, we had accurate population information in 2000 for the detailed SOC code categories. However, some detailed SOC codes used by the O*NET data base not only have detail in the ones-place discussed above, but also go to decimal level detail. Example 2 below shows how detailed SOC code, 17-3011 “Architectural and Civil Drafters” was split into 17-3011.01 “Architectural Drafters” and 17-3011.02 “Civil Drafters.” Likewise, 17-3012 “Electrical and Electronics Drafters” was split into17-3012.01 “Electrical Drafters” and 17-3012.02 “Electronics Drafters” by the O*NET system. Since we had no population information for the decimal-level codes besides job title, we were forced to return to our practice of choosing the title which seemed to have the largest number of workers in it to represent the aggregate group. We picked what seemed to be the largest occupation, erased the decimal points, and used it to represent the cluster at the decimal-level. In this example, we chose 17-3011.01 and 17-3012.01 because they seemed larger than the other choices. Information about the number of workers in each of these very detailed fields would have made this decision simpler. Of about 700 detailed SOC codes, we had to make this decision on about 100 of them that had decimal-level differentiation in the O*NET data base.

Example 2. Decision Tree for sample Census 2000  O*NET Code

Census 2000 Code 154 “Drafters”  Soc Code 17-3010 “Drafters”


We are aware that other methods of imputation exist. For example, Schaumann (2005) merges the O*NET characteristics to standard SOC codes by simply averaging the O*NET scores for each of the detailed SOC codes nested within a standard SOC code, across all levels of detail (including the decimal level). We believe the weighted average method is superior given the fairly wide range in population across detailed codes within a single standard code.

To ensure population representativeness in the O*NET scores computed, either a weighted average or Method 1 above in which the largest category by size is selected to represent the entire set of detailed codes seems preferable.

The actual calculations to be performed for each O*NET characteristic vary, of course, because the scores for occupations at the detailed level vary across characteristics. However, it is quite easy to use EXCEL or another spreadsheet program to download the O*NET data from its website, and add population data on the detailed categories to calculate the weighted average. From that point, the calculated weighted averages matched to each standard SOC code can be entered into a SAS (or other statistical package) syntax file to create variables for each respondent in the NLSY or similar data set. We have created such syntax files for each of the O*NET job description variables listed in Table 1.

Empirical Simulation.

To compare how the two methods of imputation work in empirical analysis, we calculated two sets of O*NET job characteristic scores for each employed respondent in the NLSY from 1989-2002. We then estimated fixed-effects regression analyses of the natural log of wages to track wage growth over time among the employed men and women of the NLSY during these years. After our baseline model containing human capital and family status covariates was estimated, we added one O*NET characteristic at a time using each of the two methods of imputation to see how that O*NET characteristic behaved in analyses of wage growth. The results are presented in Table 2. The estimated coefficients represent main effects of occupational characteristics on individual wage growth after controlling for human capital and family status characteristics, as well as major occupation sector and geographic region. Changes in the size and significance of each coefficient across method 1 and method 2 indicate change resulting from the method used to impute occupational characteristics for those in the standard SOC codes for which more detailed SOC codes exist.

Overall, the results display few notable differences across methods. Where differences occur, they do not seem on the surface to uniformly favor one method of imputation over the other. In three cases, the weighted average produced larger coefficient estimates (level and importance of customer service, level of managerial skill) while in the other three cases the largest category match produced larger coefficients (importance of instructing others, level and importance of negotiation skills). In only one case (level of managerial skill) did the difference in method create an important change in the level of statistical significance (from not significant to significant at p < .05 one-tailed when using the weighted average method).

These results give us confidence that both the low cost method of imputation using the largest detailed SOC category to impute values for all respondents in the standard SOC category and the more complicated weighted average method produce roughly comparable results for these O*NET variables in analyses of wage growth. Although both methods produce some systematic measurement error (especially for respondents in the smaller detailed SOC codes), such error is greater among respondents in the smaller detailed occupations using method 1 (largest category) than method 2 (weighted average) where measurement error is shared across small and large occupations within a standard SOC code. Both methods also produce more potential measurement error among respondents in SOC codes that have detailed occupations subsumed within them compared to those in the homogenous standard SOC codes for which a single O*NET score can be easily matched. This suggests that heteroscedasticity based on occupation may be an issue and future topic of exploration for those using the O*NET data base (we have not yet conducted any tests for heteroscedasticity based on level of occupational detail on the NLSY).