Stata introduction, Exercises

Hein Stigum

Data files for the course are found on the net. This Stata command will set the web address:

webuse set “http://www.med.uio.no/forskning/doktorgrad-karriere/forskerutdanning/kurs/biostatistikk/mf9510-logistisk-regresjon-overlevelsesanalyse-cox/»

This Stata command will open the datafile:

webuse "birth1.dta", clear

  1. Start Stata, Open the data file ”birth1.dta”. Run the command “describe” from the command window. Use the menu to do a summary on birth weight (weight) (first item in all submenus to Statistics).
  2. Open a new do-file. Copy the commands you have used so far to the do file. Type the command “tab sex”, mark and run it from the do-file.
  3. Find the mean of mother’s age (Hint: mean mage). Advanced question: Suppose you do not trust a normality assumption for your data. How can you estimate the “se” of the mean? (hint: run “help mean” and look for vce and bootstrap. vce=variance-covariance estimation)
  4. Do a summary of mother’s age (mage). Do a summary of mother’s age with extra details. Do a summary of mother’s age for each sex.
  5. Summarize mother’s age if gestational age is greater that 260 (if gest>260), how many subjects? Rerun the command excluding missing in gestational age (summarize mage if gest>260 & gest<. ), how many subjects now? Why is there a difference in the number of subjects!!! (Hint: missing).
  6. Make a plot of the birth weight distribution (Hint: kdensity).
  7. The variable “magegr2” contains mother’s age in two groups. Do “tab magegr2” and “tab magegr2, nolab” to find the groups and the coding. An alternative to find coding is to list all labels: “label list”, try! Make a plot of the birth weight distribution for each of the two groups of mother’s age.
  8. Test if the mean birth weights are different in the two groups of mother’s age. (Hint ttest).
  9. Make a scatter plot of birth weight versus mother’s age (birth weight on the y-axis). Does birth weight increase with mother’s age? Redo the plot with a linear fit line (Hint: (lfit y x). Is the relation really linear? Redo the plot with a fractional polynomial fit line with confidence intervals (Hint: add (fpfitci y x) as the second of the 3 plot parts. Redo the plot with a title and an xtitle and an ytitle.
  10. Make a table of selected statistics (Number of cases, some percentiles and min and max) of birth weight for each of the two groups of mother’s age. (Hint: tabstat)

Last 30 minutes: Plenary discussion of the exercises