Chapter 1-5. Labeling Variables and Values
Using the do-file, chapter5.do, read in the data and list it.
use ch5data, clear
list
+------+
| id age sex dose |
|------|
1. | 1 23 1 1 |
2. | 2 23 2 1 |
3. | 3 59 1 2 |
4. | 4 15 2 2 |
5. | 5 30 1 3 |
+------+
At the moment, no labels are assigned. If we “describe” the variables, we get:
describe
Contains data from ch5data.dta
obs: 5
vars: 4 8 Aug 2005 23:53
size: 40 (99.9% of memory free)
------
storage display value
variable name type format label variable label
------
id byte %8.0g
age byte %8.0g
sex byte %8.0g
dose byte %8.0g
------
Sorted by:
Notice that no entries exist in the “value label” and “variable label” columns.
If we browse the data, with the browser, we see the original data values. Try it, using the browser icon on the menu bar.
If we ask for a frequency table, we will see it labeled using the variable name.
tab dose
dose | Freq. Percent Cum.
------+------
1 | 2 40.00 40.00
2 | 2 40.00 80.00
3 | 1 20.00 100.00
------+------
Total | 5 100.00
______
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.
Variable Labels
To add a label to a variable we use the syntax
label variable varname ["label"]
Add a label to dose,
label variable dose "morphine dose category"
Now, describe the data
describe
Contains data from ch5data.dta
obs: 5
vars: 4 8 Aug 2005 23:53
size: 40 (99.9% of memory free)
------
storage display value
variable name type format label variable label
------
id byte %8.0g
age byte %8.0g
sex byte %8.0g
dose byte %8.0g morphine dose category
------
Sorted by:
We see an entry in the variable label column.
If we browse the data, with the browser, we see the variable name, rather than the label, at the top of the column. This is a good thing, as variable labels can take up a lot of room, so that few variables would fit in the browser window. Double click on the variable name, dose, inside the browser and you will see the variable label that was assigned.
Tabulating the variable
tab dose
morphine |
dose |
category | Freq. Percent Cum.
------+------
1 | 2 40.00 40.00
2 | 2 40.00 80.00
3 | 1 20.00 100.00
------+------
Total | 5 100.00
We now get the variable label on the output, rather than the variable name.
Value Labels
To add labels to values of a variable, we use the syntax
Define value label
label define lblname # "label" [# "label" ...] [, add modify nofix]
Assign value label to variable
label values varname [lblname] [, nofix]
First, we have to store the value labels in a special label variable (the variable contains labels, rather than data). Then we assign that label variable to the values of a data variable.
label define sexlab 1 "female" 2 "male"
label values sex sexlab
Now, describe the data
describe
Contains data from ch5data.dta
obs: 5
vars: 4 8 Aug 2005 23:53
size: 40 (99.9% of memory free)
------
storage display value
variable name type format label variable label
------
id byte %8.0g
age byte %8.0g
sex byte %8.0g sexlab
dose byte %8.0g morphine dose category
------
Sorted by:
Notice that the label variable is shown in the “value label” column.
If you want to see what the value labels are, you can use
label list <- list all defined value labels
label list sexlab <- list value labels only stored in sexlab
You will get
sexlab:
1 female
2 male
It’s easy to just ask for a frequency table, however, to see the labels, since the “label list” command is very hard to remember.
tab sex
sex | Freq. Percent Cum.
------+------
female | 3 60.00 60.00
male | 2 40.00 100.00
------+------
Total | 5 100.00
If we browse the data, with the browser, we see the value labels for sex, rather than the actual values. Try it.
The same thing happens if we list the data. We only see the value labels, rather than the actual values, for dose. Notice they are in blue colored text, which is Stata’s way of saying these are labels, rather than values. (black=numeric variable, red=string variable, blue=value label of numeric variable).
list
+------+
| id age sex dose |
|------|
1. | 1 23 female 1 |
2. | 2 23 male 1 |
3. | 3 59 female 2 |
4. | 4 15 male 2 |
5. | 5 30 female 3 |
+------+
To see the actual values in the browser, enter either of the following in the command window:
browse, nolabel <- browse all variables, without labels
browse sex, nolabel <- browse only sex, without labels
To list the data without value labels, use
list, nolabel
+------+
| id age sex dose |
|------|
1. | 1 23 1 1 |
2. | 2 23 2 1 |
3. | 3 59 1 2 |
4. | 4 15 2 2 |
5. | 5 30 1 3 |
+------+
Tabulating the variable, we see the value labels displayed
tab sex
sex | Freq. Percent Cum.
------+------
female | 3 60.00 60.00
male | 2 40.00 100.00
------+------
Total | 5 100.00
To not display the value labels, use
tab sex, nolabel
sex | Freq. Percent Cum.
------+------
1 | 3 60.00 60.00
2 | 2 40.00 100.00
------+------
Total | 5 100.00
Removing Variable Labels
To remove the variable label, use the label variable command without specifying a label.
label variable dose
The label will no longer exist, so it will not show in the describe output, nor the frequency table.
describe
tab dose
. describe
Contains data from ch5data.dta
obs: 5
vars: 4 8 Aug 2005 23:53
size: 40 (99.9% of memory free)
------
storage display value
variable name type format label variable label
------
id byte %8.0g
age byte %8.0g
sex byte %8.0g sexlab
dose byte %8.0g
------
Sorted by:
. tab dose
dose | Freq. Percent Cum.
------+------
1 | 2 40.00 40.00
2 | 2 40.00 80.00
3 | 1 20.00 100.00
------+------
Total | 5 100.00
Removing Value Labels
To remove value labels, use
label drop sexlab
To verify the value labels are gone, use
describe
tab sex
. describe
Contains data from ch5data.dta
obs: 5
vars: 4 8 Aug 2005 23:53
size: 40 (99.9% of memory free)
------
storage display value
variable name type format label variable label
------
id byte %8.0g
age byte %8.0g
sex byte %8.0g sexlab
dose byte %8.0g
------
Sorted by:
. tab sex
sex | Freq. Percent Cum.
------+------
1 | 3 60.00 60.00
2 | 2 40.00 100.00
------+------
Total | 5 100.00
The describe output still shows sexlab in the value labels column, but the value labels really are no longer assigned. The assignment of the label name to the values remains so you don’t have to repeat that step if you create new labels. To make this go away from the “describe” output, you can use:
label values sex
where we now assign nothing to the values of the variable sex.
Some Useful Practices
Frequently, it takes two tries to get the value labels right, so it is helpful to include a “capture drop” in your do-file.
capture label drop sexlab
label define sexlab 1 "female" 2 "male"
label values sex sexlab
This allows you to re-run that block of commands, without having to drop the labels in the command window, and still have a do-file that runs without crashing.
Many times, you would want to see both the value itself and the label in the tabulate output. To do this, simply add the value to the value labels.
capture label drop sexlab
label define sexlab 1 "1.female" 2 "2.male"
label values sex sexlab
tab sex
sex | Freq. Percent Cum.
------+------
1.female | 3 60.00 60.00
2.male | 2 40.00 100.00
------+------
Total | 5 100.00
With this approach, when you browse your data you see both the label and original value, which is usually what you would want.
Chapter 1-5 (revision 16 May 2010) p. 1