Chapter 1-5. Labeling Variables and Values

Using the do-file, chapter5.do, read in the data and list it.

use ch5data, clear

list

+------+

| id age sex dose |

|------|

1. | 1 23 1 1 |

2. | 2 23 2 1 |

3. | 3 59 1 2 |

4. | 4 15 2 2 |

5. | 5 30 1 3 |

+------+

At the moment, no labels are assigned. If we “describe” the variables, we get:

describe

Contains data from ch5data.dta

obs: 5

vars: 4 8 Aug 2005 23:53

size: 40 (99.9% of memory free)

------

storage display value

variable name type format label variable label

------

id byte %8.0g

age byte %8.0g

sex byte %8.0g

dose byte %8.0g

------

Sorted by:

Notice that no entries exist in the “value label” and “variable label” columns.

If we browse the data, with the browser, we see the original data values. Try it, using the browser icon on the menu bar.

If we ask for a frequency table, we will see it labeled using the variable name.

tab dose

dose | Freq. Percent Cum.

------+------

1 | 2 40.00 40.00

2 | 2 40.00 80.00

3 | 1 20.00 100.00

------+------

Total | 5 100.00

______

Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.


Variable Labels

To add a label to a variable we use the syntax

label variable varname ["label"]

Add a label to dose,

label variable dose "morphine dose category"

Now, describe the data

describe

Contains data from ch5data.dta

obs: 5

vars: 4 8 Aug 2005 23:53

size: 40 (99.9% of memory free)

------

storage display value

variable name type format label variable label

------

id byte %8.0g

age byte %8.0g

sex byte %8.0g

dose byte %8.0g morphine dose category

------

Sorted by:

We see an entry in the variable label column.

If we browse the data, with the browser, we see the variable name, rather than the label, at the top of the column. This is a good thing, as variable labels can take up a lot of room, so that few variables would fit in the browser window. Double click on the variable name, dose, inside the browser and you will see the variable label that was assigned.

Tabulating the variable

tab dose

morphine |

dose |

category | Freq. Percent Cum.

------+------

1 | 2 40.00 40.00

2 | 2 40.00 80.00

3 | 1 20.00 100.00

------+------

Total | 5 100.00

We now get the variable label on the output, rather than the variable name.


Value Labels

To add labels to values of a variable, we use the syntax

Define value label

label define lblname # "label" [# "label" ...] [, add modify nofix]

Assign value label to variable

label values varname [lblname] [, nofix]

First, we have to store the value labels in a special label variable (the variable contains labels, rather than data). Then we assign that label variable to the values of a data variable.

label define sexlab 1 "female" 2 "male"

label values sex sexlab

Now, describe the data

describe

Contains data from ch5data.dta

obs: 5

vars: 4 8 Aug 2005 23:53

size: 40 (99.9% of memory free)

------

storage display value

variable name type format label variable label

------

id byte %8.0g

age byte %8.0g

sex byte %8.0g sexlab

dose byte %8.0g morphine dose category

------

Sorted by:

Notice that the label variable is shown in the “value label” column.

If you want to see what the value labels are, you can use

label list <- list all defined value labels

label list sexlab <- list value labels only stored in sexlab

You will get

sexlab:

1 female

2 male


It’s easy to just ask for a frequency table, however, to see the labels, since the “label list” command is very hard to remember.

tab sex

sex | Freq. Percent Cum.

------+------

female | 3 60.00 60.00

male | 2 40.00 100.00

------+------

Total | 5 100.00

If we browse the data, with the browser, we see the value labels for sex, rather than the actual values. Try it.

The same thing happens if we list the data. We only see the value labels, rather than the actual values, for dose. Notice they are in blue colored text, which is Stata’s way of saying these are labels, rather than values. (black=numeric variable, red=string variable, blue=value label of numeric variable).

list

+------+

| id age sex dose |

|------|

1. | 1 23 female 1 |

2. | 2 23 male 1 |

3. | 3 59 female 2 |

4. | 4 15 male 2 |

5. | 5 30 female 3 |

+------+

To see the actual values in the browser, enter either of the following in the command window:

browse, nolabel <- browse all variables, without labels

browse sex, nolabel <- browse only sex, without labels

To list the data without value labels, use

list, nolabel

+------+

| id age sex dose |

|------|

1. | 1 23 1 1 |

2. | 2 23 2 1 |

3. | 3 59 1 2 |

4. | 4 15 2 2 |

5. | 5 30 1 3 |

+------+


Tabulating the variable, we see the value labels displayed

tab sex

sex | Freq. Percent Cum.

------+------

female | 3 60.00 60.00

male | 2 40.00 100.00

------+------

Total | 5 100.00

To not display the value labels, use

tab sex, nolabel

sex | Freq. Percent Cum.

------+------

1 | 3 60.00 60.00

2 | 2 40.00 100.00

------+------

Total | 5 100.00


Removing Variable Labels

To remove the variable label, use the label variable command without specifying a label.

label variable dose

The label will no longer exist, so it will not show in the describe output, nor the frequency table.

describe

tab dose

. describe

Contains data from ch5data.dta

obs: 5

vars: 4 8 Aug 2005 23:53

size: 40 (99.9% of memory free)

------

storage display value

variable name type format label variable label

------

id byte %8.0g

age byte %8.0g

sex byte %8.0g sexlab

dose byte %8.0g

------

Sorted by:

. tab dose

dose | Freq. Percent Cum.

------+------

1 | 2 40.00 40.00

2 | 2 40.00 80.00

3 | 1 20.00 100.00

------+------

Total | 5 100.00


Removing Value Labels

To remove value labels, use

label drop sexlab

To verify the value labels are gone, use

describe

tab sex

. describe

Contains data from ch5data.dta

obs: 5

vars: 4 8 Aug 2005 23:53

size: 40 (99.9% of memory free)

------

storage display value

variable name type format label variable label

------

id byte %8.0g

age byte %8.0g

sex byte %8.0g sexlab

dose byte %8.0g

------

Sorted by:

. tab sex

sex | Freq. Percent Cum.

------+------

1 | 3 60.00 60.00

2 | 2 40.00 100.00

------+------

Total | 5 100.00

The describe output still shows sexlab in the value labels column, but the value labels really are no longer assigned. The assignment of the label name to the values remains so you don’t have to repeat that step if you create new labels. To make this go away from the “describe” output, you can use:

label values sex

where we now assign nothing to the values of the variable sex.


Some Useful Practices

Frequently, it takes two tries to get the value labels right, so it is helpful to include a “capture drop” in your do-file.

capture label drop sexlab

label define sexlab 1 "female" 2 "male"

label values sex sexlab

This allows you to re-run that block of commands, without having to drop the labels in the command window, and still have a do-file that runs without crashing.

Many times, you would want to see both the value itself and the label in the tabulate output. To do this, simply add the value to the value labels.

capture label drop sexlab

label define sexlab 1 "1.female" 2 "2.male"

label values sex sexlab

tab sex

sex | Freq. Percent Cum.

------+------

1.female | 3 60.00 60.00

2.male | 2 40.00 100.00

------+------

Total | 5 100.00

With this approach, when you browse your data you see both the label and original value, which is usually what you would want.

Chapter 1-5 (revision 16 May 2010) p. 1