Statistics 510: Notes 24

Reading: Section 8.2, 8.4

I. Markov’s Inequality and Chebyshev’s Inequality (Section 8.2)

Sometimes we know the mean and/or variance of a random variable but not its entire distribution. Markov’s Inequality provides a bound on the probability that a nonnegative random variable will be greater than or equal to some value when only the mean of a distribution is known.

Proposition 8.2.1: If X is a random variable that takes only nonnegative values, then for any value ,

Proof: For , let

and note that since ,

Taking expectations of the preceding yields that

which since proves the result.

As a corollary, we obtain Chebyshev’s Inequality that provides a bound on the probability of a random variable departing from its mean by a certain amount if we know the variance of a random variable.

Proposition 8.2.2: If X is a random variable with finite mean and variance , then for any value ,

Proof: Since is a nonnegative random variable, we can apply Markov’s inequality with to obtain

(1.1)

But since if and only if , Equation (1.1) is equivalent to

and the proof is complete.

Example 2: A fair die is tossed 100 times. Let denote the outcome on the kth roll. Use Chebyshev’s Inequality to get a lower bound for the probability that is between 300 and 400.

Example 3: The mean of a list of a million numbers is 10 and the mean of the squares of the numbers is 101. Find an upper bound on how many of the entries in the list are 14 or more.

As Chebyshev’s inequality is valid for all distributions of the random variable X, we cannot expect the bound on the probability to be very close to the actual probability in most cases.

Consider a normal random variable with mean and variance .

Probability / Chebyshev bound / Probability for
/ at most 1 / 0.3173
/ at most / 0.0465
/ at most / 0.00270
/ at most / 0.000063

As the table shows, Chebsyhev’s bound will be very crude for a distribution that is approximately normal. Its importance is that it holds no matter what the shape of the distribution, so it gives some information about two-sided tail probabilities whenever the mean and standard deviation of a distribution can be calculated.

II. Convergence in Probability and the Weak Law of Large Numbers (Section 8.2)

Limit Theorems:

Limit theorems concern what happens in the limit to the probability distribution of a random variable in an infinite sequence of random variables In particular we often consider to be a random variable that is associated with a random sample of units, e.g., the sample mean for a sample . Although the notion of an infinite sample size is a theoretical artifact, it can often provide us with some useful approximations for the finite-sample case.

We will consider three types of convergence for the probability distribution of a sequence of random variables : (i) convergence in probability, (ii) almost sure convergence and (iii) convergence in distribution.

Convergence in Probability: A sequence of random variables converges in probability to a number if, for every ,

or equivalently .

Note: The are typically not independent and identically distributed randomly variables. The distribution of changes as the subscript changes, and the convergence concepts we will discuss describe different ways in which the distribution of “converges” as the subscript becomes large.

The weak law of large numbers:

Consider a sample of independent and identically distributed random variables . The relationship between the sample mean and true mean of the ’s, , is a problem of pivotal importance in statistics. Typically, is unknown and we would like to estimate based on . The weak law of large numbers says that the sample mean converges in probability to .

This means that for a large enough sample size

, will be close to with high probability.

Theorem 8.2.1. The weak law of large numbers.

Let be a sequence of independent and identically distributed random variables, each having finite mean . Then, for any ,

Proof: We prove the result only under the additional assumption that the random variables have a finite variance (it can be proved without this assumption using more advanced techniques). Because

it follows from Chebyshev’s inequality that

Since , the result is proved.

Application of Weak Law of Large Numbers: Monte Carlo Integration.

Suppose that we wish to calculate

where the integration cannot be done by elementary means or evaluated using tables of integrals. The most common approach is to use a numerical method in which the integral is approximated by a sum; various schemes and computer packages exist for doing this. Another method, called the Monte Carlo method, works in the following way. Generate independent uniform random variables on (0,1) – that is -- and compute

By the law of large numbers, for large n, this should be close to which is simply

This simple scheme can easily be modified in order to change the range of integration and in other ways. Compared to the standard numerical methods, it is not especially efficient in one dimension, but becomes increasingly efficient as the dimensionality of the integral grows.

As a concrete example, let’s consider the evaluation of

The integral is that of the standard normal density, which cannot be evaluated in closed form. From Table 5.1, an accurate numerical approximation is . The following code for the statistical computing software package R generates 1000 pseudorandom independent points the uniform (0,1) distribution and computes

> # Generate vector of 1000 independent uniform (0,1) random variables

> xvector=runif(1000);

> # Approximate I(f) by 1/1000 * sum from i=1 to 1000 of f(Xi)

> fxvector=(1/(2*pi)^.5)*exp(-xvector^2/2);

> Ihatf=1/1000*sum(fxvector);

> Ihatf

[1] 0.3430698

III. Almost Sure Convergence and the Strong Law of Large Numbers (Section 8.4)

A type of convergence that is stronger than convergence in probability is almost sure convergence (sometimes confusingly known as convergence with probability 1). This type of convergence is similar to pointwise convergence of a sequence of functions except that the convergence need not occur on a set with probability 0 (hence the “almost” sure).

Almost sure convergence: A sequence of random variables defined on a common sample space converges almost surely to a number if, for every ,

or equivalently,

Almost sure convergence is a much stronger convergence concept than convergence in probability and indeed implies convergence in probability.

Example of a sequence of random variable that converges in probability but not almost surely:

Consider an experiment with sample space the interval from 0 to 1 and a uniform probability distribution over the sample space. We construct a sequence of intervals as follows: is the interval from 0 to 1/2, is the interval from 1/2 to 1, is the interval from 0 to 1/3, is the interval from 1/3 to 2/3, is the interval from 2/3 to 1, is the interval from 0 to ¼, and so forth. Now for every point between 0 and 1, define the value of the random variable to be -1 if is in the first half of the interval , 1 if is in the second half of the interval and 0 if is not in the interval

We have for all . Moreover, for any , is the probability that a uniform random variable is in the interval , which converges in probability to 0 since the length of the intervals converges to 0. Thus, the sequence converges in probability to 0.

However, no matter how short the intervals become, every between 0 and 1 is in some during each left-to-right “progression” of these events for increasing . Consequently, for each between 0 and 1, the sequence does not converge and hence does not converge almost surely.

The Strong Law of Large Numbers: The strong law of large numbers states that for a sequence of independent and identically distributed random variables , the sample mean converges almost surely to the mean of the random variables .

Theorem 8.4.1: Let be a sequence of independent and identically distributed random variables, each having a finite mean . Then, with probability 1,

as .

What is the strong law of large numbers adding to the weak law of large numbers?

The weak law of large numbers states that for any specified large value , is likely to be near . However, it does not say that is bound to stay near for all values of n larger than . Thus, it leaves open the possibly that large values of can occur infinitely often (thought at infrequent intervals). The strong law shows that this cannot occur. In particular, it implies that with probability 1, for any positive value ,

will be greater than only a finite number of times.

Application of the strong law of large numbers: Consider a sequence of independent trials of some experiments. Let be a fixed event of the experiment and denote by the probability that the event occurs on any particular trial. Letting,

we have by the strong law of large numbers that with probability 1,

(1.2)

Since represents the number of times that the event occurs in the first trials, equation (1.2) is stating that, with probability 1, the limiting proportion of times that the event occurs in repeated, independent trials of the experiment is just .

The strong law of large numbers is of enormous importance, because it provides a direct link between the axioms of probability (Section 2.3) and the frequency interpretation of probability. If we accept the interpretation that “with probability 1” means “with certainty,” then we can say that is the limit of the long-run relative frequency of times would occur in repeated, independent trials of the experiment.