BOOTSTRAPPING
Bootstrapping is a resampling technique that builds a sampling distribution for a statistic from the empirical data rather than assuming some theoretical sampling distribution that requires assumptions that may not be true. The bootstrap procedure consists of the following steps:
- Pseudo-Population. Define a pseudo-population distribution for resampling. This is usually defined as the distribution for the sample data or of some appropriate transformation of the data.
- Resampling. Draw, with replacement, N independent random observations from the pseudo-population. These Nobservations comprise a bootstrap resample. Compute the statistics of interest (e.g., mean, median, mode, standard deviation, residual, r, R2, b) for the sample.
- Evaluation. Repeat the resampling, typically at least 1000 times to produce multiple sets of (boot strapped) values for the statistics of interest. The distribution of bootstrapped values is the bootstrap sampling distribution of the statistics. The mean or median of that distribution is the best estimate of the population value. The upper and lower tails of the distribution can be used for significance testing by establishing whether the null hypothesis value falls below or above, for example, the 2.5% or 97.5% values.
The success of the bootstrap method depends on how well the sample distribution resembles the population distribution. This, in turn, is dependent upon two things: (1) the sample is randomly drawn; and (2) the sample size N, in favor of large N. This may sound like the bootstrap is simply another asymptotic method. However, there is evidence for the superiority of the bootstrap for small samples. How small is not as certain.
Reading Results in SPSS:
•Bias is the difference between the average value of this statistic across the bootstrap samples and the value in the Statistic column. In this case, the mean value of Churn within last month is computed for all 1000 bootstrap samples, and the average of these means is then computed.
•Std. Error is the standard error of the mean value of Churn within last month across the 1000 bootstrap samples.
•The lower bound of the 95% bootstrap confidence interval is an interpolation of the 25th and 26th mean values of Churn within last month, if the 1000 bootstrap samples are sorted in ascending order. The upper bound is an interpolation of the 975th and 976th mean values.
References:
Yung, Y-F and Chan, W. “Statistical Analyses Using Bootstrapping: Concepts and Implementation.” In R.H. Hoyle (1999).Statistical Strategies for Small Sample Research, Thousand Oaks, CA: Sage Publications.
Mooney, C. Z., & Duval, R. D. (1993).Bootstrapping: A nonparametric approach to statistical inference. Newbury Park, CA: Sage Publications.