R Code for Clopper-Pearson `exact' CI for a binomial parameter.
This inverts two binomial tests.
------
exactci <- function(x,n,conflev){
alpha <- (1 - conflev)
if (x == 0) {
ll <- 0
ul <- 1 - (alpha/2)^(1/n)
}
else if (x == n) {
ll <- (alpha/2)^(1/n)
ul <- 1
}
else {
ll <- 1/(1 + (n - x + 1) / (x * qf(alpha/2, 2 * x, 2 * (n-x+1))))
ul <- 1/(1 + (n - x) / ((x + 1) * qf(1-alpha/2, 2 * (x+1), 2 *
(n-x))))
}
c(ll,ul)
}
# Computes the Clopper/Pearon exact ci
# for a binomial success probability
# for x successes out of n trials with
# confidence coefficient conflev
## One-sample test.
## Hollander & Wolfe (1973), 29f.
## Hamilton depression scale factor measurements in 9 patients with
## mixed anxiety and depression, taken at the first (x) and second
## (y) visit after initiation of a therapy (administration of a
## tranquilizer).
Method 1: Paired t-test
- The pairedt-test appears in every elementary statistics text. It is the standard parametric approach for analyzing paired data. Interestingly it handles the problem of paired data by essentially doing an end-around. In a pairedt-test we calculate the differences between the paired values and then do a one-samplet-test on the differences. Hence in a pairedt-test we convert paired data into unpaired differences.
- Because a pairedt-test is at-test on differences, it is the differences must satisfy the assumption of at-test, not the original values. For small samples the important assumption is that the differences should be a sample from a normal distribution. This is typically hard to assess with small samples unless the response is especially unusual in some way.
- In R we can carry out a paired t-test using thet.testfunction either by first calculating the differences ourselves or by using thepaired=TRUEoption oft.test.
x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
## Paired t-test
out1<-t.test(y,x,paired=TRUE,conf.level=.95)
out1
## One sample t-test on differences
out2<-t.test(y-x,conf.level=.95)
out2
Method 2: Wilcoxon signed rank test
- The Wilcoxon signed rank test is the nonparametric version of the pairedt-test. It works with ranks instead of the raw data and rather than a normal distribution it makes use of theoretical rank calculations based on all possible permutations to calculate significance levels. It is the preferred alternative to the pairedt-test when the normality assumption is suspect.
- There are two implementations of the signed rank test in R. The traditional test is carried out bywilcox.testbut it's results are only approximate when there are tied observations. A more modern version iswilcox.exactfound in the no longer maintainedexactRankTestspackage. The syntax for both functions is identical tot.testexcept that we need to set aconf.intargument to TRUE to get confidence intervals.
- When run on the Hamilton data thewilcox.testfunction returns a series of warning statements about ties and another that states that it was not possible to obtain a 95% interval exactly.
out3 <- wilcox.test(x, y, paired = TRUE, alternative = "less")
out3
out4 <- wilcox.test(y - x, alternative = "less") # The same.
out4
## large sample approximation
out5 <- wilcox.test(y - x, alternative = "less",exact = FALSE, correct = FALSE)
## To compute the Hodges-Lehmann Estimator associated with WSR statistic:-
The set of Walsh averages can be easily calculated in R using theouterfunction to produce a matrix of Walsh averages and then thelower.trianddiagfunction to extract the ones we need. Here's how the Hodges-Lehman estimate of the median is calculated for the Hamilton dataset.
ham.diff<-y-x
#Walsh averages
Wmat<-outer(ham.diff,ham.diff,'+')/2
Walsh<-c(Wmat[lower.tri(Wmat)],diag(Wmat))
ham.median<-median(Walsh)
print(ham.median)
To obtain a confidence interval for the Hodges-Lehman estimate of the median we sort the Walsh averages and then extract the values that lie at the appropriate positions of the empirical distribution. The locations are obtained from the quantiles of the signed rank distribution which can be obtained with theqsignrankfunction in R.
Walsh.sort<-sort(Walsh)
# number of transects
ham <- cbind(x,y)
n.ham<-dim(ham)[1]
# obtain the positions
q1<-n.ham*(n.ham+1)/2+1-qsignrank(.985,n.ham)
q2<-qsignrank(.985,n.ham)
ham.lower95<-Walsh.sort[q1]
ham.upper95<-Walsh.sort[q2]
c(ham.median,ham.lower95,ham.upper95)
## The data on annual salaries, Example 3.2
x <- c(11750, 20900, 14800, 29900, 21500, 18400, 14500, 17900, 21400, 43200,15200,14200)
y <- c(12500,22300,14500,32300,20800,19200,15800,17500,23300,42100,16800,14500)
## Remember, there is ties in the data. Let’s run
wilcox.test(y - x, alternative = "greater", correct = F)
## Download library exactRankTests”, now type
wilcox.exact(y - x, alternative = "greater")
## Now, let’s use the large sample results and see. Exact = FALSE mean you are using approximate test
## correct = FALSE imply you are NOT using continuity corrections.
wilcox.test(y - x, alternative = "greater",exact = FALSE, correct = FALSE)
wilcox.test(y - x, alternative = "greater",exact = FALSE, correct = TRUE)
## Let’s check the Comment 11 in book, Page 46-48.
## Check that Z is -12, -10, 10 and 12
## So, we just type
wilcox.exact(c(-12,-10,10,12), alternative = "greater")
## or
wilcox.exact(c(-12,-10,10,12), alternative = "g")