Plotting Assignment due January 31

Find a data set (the internet is full of them) that contains measurements of two related variables. (Note: The probability that two students use the same data set should be negligibly small.) Use R to display the data in a scatter plot, histograms, and box plots (at least one of each). The axes of your plots should be appropriately labeled and each plot should have an appropriate title.

You may need to edit the data before R will handle it properly. The elements in each row are separated by spaces. The first row of the data set should be the names of the elements. Numbers should not include commas (use search and replace to remove them if you need to). Make certain that each row contains the same number of elements. For example, in the data set below (which I found on the Census Bureau web site

http://www.census.gov/population/cen2000/phc-t2/tab01.txt ), I had to delete explanatory comments and certain formatting, the commas in the numbers, and the spaces in New York, New Jersey etc. (otherwise R thinks New and York are separate elements). I also added the row of variable names.

The plots should be moved from R into a Word document (see http://www.math.wisc.edu/~kurtz/312/graph_word.doc for a brief description on how to do this), and you should write a short paragraph giving the source and significance of the data. The Word document should be e-mailed as an attachment to both Lancine and me.

rank state Y2000 Y1990 change percent

1 California 33871648 29760021 4111627 13.8

2 Texas 20851820 16986510 3865310 22.8

3 NewYork 18976457 17990455 986002 5.5

4 Florida 15982378 12937926 3044452 23.5

5 Illinois 12419293 11430602 988691 8.6

6 Pennsylvania 12281054 11881643 399411 3.4

7 Ohio 11353140 10847115 506025 4.7

8 Michigan 9938444 9295297 643147 6.9

9 NewJersey 8414350 7730188 684162 8.9