1.  Regression-type Analysis

a and b. Regress lnterrestrial on lncavenum. The fitted model is

Lnterrestrial = –0.35945 + 0.40308*Lncavenum

(SE=0.1886) (SE=0.04285)

(p=0.0579) (p<0.001)

c.  The MSE = 0.53463 and R2 = 0.2897. There is a statistically significant relationship (the test of a non-zero slope has a p-value < 0.0001) but it not strong.

d.  The assumptions of the model are that the error terms are independent, homoscedastic (constant, equal variance), and normally distributed. We can’t check for independence since no information was given as to the way data were collected. The plot of the residuals against the predicted values (see below) do not show any strong evidence of unequal variance. In fact, the lines of residuals are due to the fact that the values of lnterrestrial are limited in possible values when lncavenum is near zero (see the lines in the above graph).

To check for normality we must use the residuals. First, it is irrelevant whether or not the explanatory variable is normally distributed since it is considered fixed and known (not random). The Y variable is normally distributed for each value of X and so it makes no sense to combine the Y-values over all values of X. Instead we look at the residuals again. A test of normality (Shapiro-Wilk W = 0.989107, p = 0.0959) of the residuals shows no evidence that the error terms are not normally distributed. In addition, the normal Q-Q plot is linear and the stem-and-leaf is unimodal and reasonable symmetric.

Stem Leaf # Boxplot

14 4026 4 |

12 129004 6 |

10 024558489 9 |

8 012357124488 12 |

6 13457891367788889 17 |

4 33366667890113334589 20 +-----+

2 0011123445667011223345556779 28 | |

0 0233567780025777799 19 *--+--*

-0 9963330998876542 16 | |

-2 8776631999988632000 19 | |

-4 8663333088776422 16 +-----+

-6 975431109774432110 18 |

-8 855442209777775511 18 |

-10 832163 6 |

-12 9491 4 |

-14 7307 4 |

-16 910 3 |

----+----+----+----+----+---

Multiply Stem.Leaf by 10**-1

Normal Probability Plot

1.5+ +* ***

| ****

| ****

| ***

| ****

| ****

| ****

| ****

-0.1+ ***

| ***

| +***

| ****

| *****

| ***

| +**

| * ***

-1.7+**++

+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

2.  Regression-type Analysis with Region

  1. The full ANCOVA model with the categorical variable Region and the continuous variable lncavenum is:

  1. The test of the null hypothesis that the slopes are parallel is the test of whether there is any interaction between lncavenum and region. The test statistic value is F=3.97 on 3 and 211 degrees of freedom with a p-value = 0.0089. Hence we reject the null hypothesis and conclude that at least one slope is not parallel to the other slopes. A graph of the predicted lines is below.
  1. The final model is given in section (a) which has an MSE of 0.3967. The SEs of the coefficients are listed in the following table:

Solution for Fixed Effects

Standard

Effect region Estimate Error DF t Value Pr > |t|

intercept for 2 -1.0114 0.3351 211 -3.02 0.0029

intercept for 3 -0.1911 0.2225 211 -0.86 0.3913

intercept for 4 -0.02083 0.3913 211 -0.05 0.9576

intercept for 5 -0.1323 0.7257 211 -0.18 0.8556

lncavenum for region 2 0.5752 0.07805 211 7.37 <.0001

lncavenum for region 3 0.3932 0.04963 211 7.92 <.0001

lncavenum for region 4 0.1762 0.08901 211 1.98 0.0491

lncavenum for region 5 0.5195 0.1652 211 3.14 0.0019

  1. The assumptions are the same as for a regression model: the error terms are independent, homoscedastic (constant, equal variance), and normally distributed. We can’t check for independence since no information was given as to the way data were collected. The plot of the residuals against the predicted values (see below) do not show any strong evidence of unequal variance.

To check for normality we will use the residuals. A test of normality (Shapiro-Wilk W = 0.98798, p = 0.0627) of the residuals shows no evidence that the error terms are not normally distributed. In addition, the normal Q-Q plot is linear and the stem-and-leaf is unimodal and reasonable symmetric.

Stem Leaf # Boxplot

14 1 1 |

12 48 2 |

10 27707 5 |

8 255778569 9 |

6 02334455666790255899 20 |

4 0000234446711223356778889 25 +-----+

2 0011123445557811111344556788889 31 | |

0 01114556777999012335557889 26 *--+--*

-0 9966433332210987755422 22 | |

-2 77755431109866543210 20 | |

-4 88644111097432110 17 +-----+

-6 8853219875321 13 |

-8 99321974442220 14 |

-10 074110 6 |

-12 4470 4 |

-14 02 2 |

-16 55 2 |

----+----+----+----+----+----+-

Multiply Stem.Leaf by 10**-1

Normal Probability Plot

1.5+ ++*

| +++**

| +****

| +*****

| *****

| ****

| *****

| ****+

-0.1+ ****

| ***

| ****

| +***

| +****

| +****

| ++***

|++* *

-1.7+**

+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

3.  ANOVA-type Analysis I

  1. The results of fitting an ANOVA model with the categorical variable Region is:


Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

region 3 215 14.65 <.0001

Least Squares Means

Standard

Effect region Estimate Error DF t Value Pr > |t|

region 2 1.3931 0.09631 215 14.47 <.0001

region 3 1.4917 0.08325 215 17.92 <.0001

region 4 0.7306 0.1197 215 6.10 <.0001

region 5 2.0951 0.1985 215 10.55 <.0001

As can be seen, region has a significant effect on lnterrestrial (F = 14.65 on 3 and 215 df, p < 0.0001); the estimated mean for each region and the associated standard error are shown in the second table.

  1. The SEs of the estimated means are shown in the table above. Clearly, there is an effect due to region. The MSE for this model is 0.6307 which is higher than the MSE for the model with both explanatory variables or the model with only lncavenum. The R2 is quite low at 0.17.
  1. The table of means and SEMs is given in (a). The results of the pairwise comparisons using Bonferroni’s adjustment to correct for multiple comparisons indicate that the means of Regions 2 and 3 do not differ but all others do.

Differences of Least Squares Means

Effect region _region Adj P

region 2 3 1.0000

region 2 4 0.0001

region 2 5 0.0101

region 3 4 <.0001

region 3 5 0.0332

region 4 5 <.0001

  1. The assumptions are the same as for the regression model. A plot of the residuals for each region do not indicate any evidence of heteroscedasity but Levene’s med test rejected the null hypothesis of equal variances (F = 5.368, p = 0.001). The test of normality indicate that the residuals are not normally distributed (Shapiro-Wilk W = 0.97876, p = 0.0022) and the stem-and-leaf plot of the residuals shows some skew (see next page).

|

1.5 +

| |

| |

| | | |

1 + | | |

| | | |

| | | |

| | | | |

0.5 + | +-----+ | +-----+

| +-----+ | | +-----+ | |

| | | | | | | *-----*

| *-----* *-----* | | | |

0 + | + | | + | | + | | + |

| | | | | *-----* | |

| | | | | +-----+ | |

| | | | | | | |

-0.5 + +-----+ +-----+ | +-----+

| | | | |

| | | | |

| | | | |

-1 + | | |

| | | |

| | | |

| | |

-1.5 + | |

| | |

|

|

-2 +

------+------+------+------+------

region 2 3 4 5

Stem Leaf # Boxplot

20 06 2 |

18 4 1 |

16 03 2 |

14 65 2 |

12 2222514 7 |

10 000655 6 |

8 0001118811111111119 19 |

6 166666999111 12 |

4 000000055555544555559999 24 +-----+

2 2222222200000007777 19 | |

0 022222 6 | + |

-0 5111111144444444444444444444211111 34 *-----*

-2 9999999999999999999999 22 | |

-4 99 2 | |

-6 33333333333110000000000000 26 +-----+

-8 000000000000000 15 |

-10 |

-12 999999999 9 |

-14 99999999900 11 |

----+----+----+----+----+----+----

Multiply Stem.Leaf by 10**-1

4.  ANCOVA – type analysis

  1. The results of the modeling are shown in (2) and so are not repeated here.
  2. The LSmeans (adjusted to the overall mean lncavenum) and their SEs are given in the next table:

Least Squares Means

Standard

Effect region Estimate Error DF t Value Pr > |t|

region 2 1.4313 0.07655 211 18.70 <.0001

region 3 1.4785 0.06604 211 22.39 <.0001

region 4 0.7274 0.09496 211 7.66 <.0001

region 5 2.0737 0.1576 211 13.16 <.0001

As expected, the means are similar to those from the model without lncavenum (expected since the lncavenum seems to have similar values in all regions) but the SEs are smaller. Are these valid to use? NO. Since the slopes are not parallel, we should not use these means since the differences among the LSmeans depend strongly on the choice of the value of lncavenum at which they are compared.