Bio 286 Review page 5

Bio286- Statistics midterm review – you will probably not be able to finish all of this in the class – I encourage you to finish it on your own if you don’t finish in class.

I)  Data Visualization

a)  Box plots


Describe the box plot above: label the regions and indicate any problems associated with the distribution (for parametric statistics). What sort of transformation might help?

b)  Histograms – what is one problem associated with the use of Histograms for assessing the shape of a distribution?

c)  Compare summary plots to scatterplots – when should you each type of chart?

II)  Statistical analyses – for the following problems indicate the sort of analysis you would do and why you chose that approach

a)  Many students believe that all night study sessions improve their performance on exams. To test this an experiment was done that assessed performance on standardized tests. Each student was tested two times: once following a full nights sleep and once following an all night study session. The order of the tests was randomly chosen. In all 63 students participated. What sort of test is appropriate? What are the degrees of freedom for the test?

b)  We are interested in the examining the relationship between percent cover of Salicornia (a salt tolerant Halophyte) and salt concentration in Elkhorn slough. We sample random 27 plots and collect measurements in each for salt concentration and cover of Percent Cover of Salicornia. What sort of analysis would we use? Why? What are the assumptions of the analysis?

III) Experimental design and hypothesis testing

a)  What sort of information does a confidence interval provide?

b)  What is the relationship between type 1 and type 2 error? As part of you answer note what the two types of error are.

c)  What is a replicate?

d)  What is the null hypothesis in a two sample t-test. Draw the null distribution.

e)  How is r2 calculated in a regression analysis?

1)  What does this statistic measure?

f)  Why can an alternative (general) hypothesis never be proven in the hypothetico-deductive method of doing science?

IV) Testing of assumptions

a)  Examine the following figures which show the residuals produced by the regression of y vs x. For each draw a scatterplot that could have produced the residual plot, indicate whether there is a violation of regression assumptions (state the violation) and propose a solution to the violation.

b)  Consider the following probability plot. What sort of transformation might be useful? Explain your reasoning and indicate why such a transformation helps.

c)  Use the following probability plot to explain how a probability plot helps you determine the nature of departure form an expected distribution (here the expected is the normal distribution)

d)  Assume that you are interested in comparing the height of boys to the height of girls in the 12th grade. You go to Santa Cruz, Harbor and Aptos high to conduct the measurements. As it turns out you have to take measurements at basketball games and you are only able to measure cheerleaders (all girls) and the players on the men’s basketball team. What sort of errors are you making? (In other words what assumption or assumptions are you violating?)

V)  Completing and interpreting statistical analyses. Use 0.05 as your critical p-value. In all cases report the results of your analyses in a way that would convince a reviewer that you have:

1)  assessed the assumptions of the analysis (if any)

2)  tested the posed hypothesis or hypotheses

3)  presented the results in a way that conveys the patterns and statistics in an efficient manner

4)  Interpreted the results correctly

a)  Seedlings – We have proposed a hypothesis that grazing negatively affects the establishment of oak seedlings. We test this by setting up a very ambitious experiment in a field subject to heavy grazing. 200 plots are allocated randomly and half of them are fenced to keep out grazers. After one year we count the number of seedlings in each plot. The alternative hypothesis is that the number of seedlings will be lower in unfenced plots. (use seedlings.jmp)

b)  Limpets – There is a great concern that access to intertidal areas has a negative effect on ecological communities. There is added concern when access is made easy. Ironically, the most accessible areas are often State Parks, which are supposed to provide protection for resources. Prior to the establishment of a State Park a survey was done of the density of limpets per square meter. The details have been lost but we know that the average density was 27.9. We do a new study (limpets.jmp) and want to compare our results to the earlier ones. Make sure that you state the null or alternative hypothesis.

c)  Limpets and bare rock (use limpets) – As part of the earlier study the author reported that there seemed to be a relationship between limpet density and the percent cover of bare rock in the plots. Again the raw data are missing but author did report that the relationship was explained by the linear equation
cover of bare rock = 0+2*(limpet density)
I want you to test to see if the overall (alternative) hypothesis is correct (with increasing limpets there is increasing cover of bare rock). I also want you to determine if the linear equation is different from that reported.

VI) Power

a)  Fishers are unwilling to accept marine reserves unless they are shown that they yield more or larger fish than fished areas. As a test, we sample Hopkins marine reserve and Mcabee beach (an area that is fished and that is comparable in other ways to Hopkins). In each location we sample 10 transects and count the number of adult kelp rock fish. The density of KRF per transect at Hopkins is 45 and that at Mcabee is 36. The pooled standard deviation is 15. Fish and Game wants to have a power of 80% in the evaluation and a critical alpha =0.05. Is the design adequate? If so explain. If not how might it be changed to make it adequate.