1
Files\courses\MonteCarlo\MonteCarlo1.docRevision 11.06.97 Marc Nerlove 1997
notes on
monte carlo, bootstrapping and estimation by simulation
Lest men suspect your tale untrue,
Keep probability in view.John Gay, Fables, I, 1727
1. Random Number Generation
Anyone who considers arithmetical methods of producing random digits, is, of course, in a state of sin. John von Neumann, 1951
Uniform and Normal Deviates
"It may seem perverse to use a computer, that most precise and deterministic of all machines conceived by the human mind, to produce 'random' numbers. More than perverse, it may seem to be a conceptual impossibility. Any program, after all, will produce output that is entirely predictable, hence, not truly 'random.'"(Press, et al., p.191) There is, nonetheless, a large literature on how one uses a computer to "simulate" a sequence of random numbers.[1] The basic building block of all "random" number generation is the uniform random deviate. These are random numbers which lie within an interval, usually (0,1), and for which any number in this interval is as likely as any other. All other sorts of random deviates (GAUSS lists commands for beta, RNDBETA, gamma, RNDGAM, negative binomial, RNDNB, and Poisson, RNDP, as well as uniform, RNDU and RNDUS, and normal, RNDN and RNDNS[2]) are generated by transforming uniform deviates, sometimes simply and sometimes by elaborate algorithms.[3] Standard normal deviates (zero mean, variance 1) may also be produced from uniform deviates by such transformations.[4] For example, Box and Muller suggest the following transformation which produces pairs of independent normal deviates:
(1) ,
where ui and ui+1 are consecutive uniform deviates in the interval (0,1).[5] At that point it is easy to generate vectors of numbers having a multivariate normal distribution with specified mean vector and variance covariance matrix through the use of an appropriate matrix transformation.
Uniform deviates are usually generated by so-called congruential generators. The first to be suggested (by Lehmer in 1951) was the multiplicative generator:
,
where the notation indicates that is the remainder when is divided by m. Such generators have to be started by an initial value called the seed. The numbers produced by such a recurrence relation will eventually start to repeat. The maximum period possible is m, but the multiplicative generator will generally have a period much shorter than this. A recursive relation with better properties is the linear congruential generator:
(2),
where m is called the modulus, and a and c are positive integers called the multiplier and the increment respectively. The reccurrence (2) will eventually repeat itself, but if m, a and c are properly chosen the period of the generator will be m, the maximum possible. The number m is usually chosen to be about the length of a machine word, making the generator machine dependent.[6] "Portable" random number generators (RNG) are generally much less efficient. Moreover, unless m, a and c are chosen very carefully in relation to one another, there may be a lot of serial correlation among values generated. The rules are complicated: (a) c and m may have no common divisor, (b) a = 1(mod p) for every prime factor of m, and (c) a = 1(mod 4) if m is a multiple of 4. The seed for each successive call of the generator is the last integer ui+1 returned. Usually this value is saved for the next use of the procedure. In the case of the GAUSS generators RNDU and RNDUS, the ending value of the seed is not available directly to the user, but could be computed from the last random number returned. RNDU takes its seed from the computer clock when GAUSS is first started up. To obtain a number in the interval [0,1) usually ui+1/m is returned rather than ui+1.[7]
What can go wrong with the RNG in common use? Clearly, that the choice of m, a and c may be botched is the greatest problem in situations in which the specifics of the RNG are not known. Press, et al., pp.194-195, suggest a kind of shuffling procedure to alleviate this problem But randomness, like beauty, is in the eye of the beholder. "...the deterministic program that produces the random sequence should be different from, and – in all measurable respects – statistically uncorrelated with, the computer program that uses its output. In other words, any two different random number generators ought to produce statistically the same results when coupled to your particular applications program." (p.191) More generally, Knuth (1969, pp. 34-100) gives a large number of tests for randomness, the most powerful of which is called the spectral test. Understanding and implementing this test requires knowledge of time series analysis, but Exercise 2 to this section suggest a couple of simpler tests which might be implemented in GAUSS: Chi-square and Kolmogorov-Smirnov. GAUSS's generators RNDU and RNDN pass these tests.
To obtain a series of vectors which can be considered to have come from a multivariate normal distribution with mean vector and variance-covariance matrix , we can make use of the Cholesky decomposition of the matrix and the characterization of the multivariate normal discussed in Econometrics for Applied Economists, Notes for AREC 624, Spring 1995, vol. 3, pp. APP/11 - 17.[8] Let u be a vector of iid normal variable with mean zero and variance 1; then,
(3)
has a multivariate normal distribution with mean vector and variance-covariance matrix T'T = .
Nonuniform Deviates in General[9]
One of the principal problems in doing Monte Carlo simulations of all kinds is the generation of nonnormal deviates. For example, in regression analysis, tests of significance are based on the assumption of an underlying normal distribution for the disturbances; if these are not normal, potentially serious errors can result in the assessment of the significance of the regression coefficients.. Similarly, maximum-likelihood estimation is generally based on the assumption of a normal distribution and heavy reliance is placed on the desirable asymptotic properties of the ML estimates; while many of these asymptotic properties hold for many nonnormal cases, we typically deal with samples which are not so large that we can be confident that the asymptotic properties are good approximations; moreover, methods of likelihood inference more generally depend crucially on the distributional assumptions made..
As remarked, all random number generation, including that for normal deviates, begins with the generation of uniform deviates. The problem is to get from uniform deviates to the deviates we want.[10] There are essentially three basic methods for generating nonuniform deviates: (1) the inverse transform method; (2) the composition method; and the acceptance-rejection method. Fishman (1996, pp. 179-187) deal with two other methods, as well as with refinements of the basic three.
(1)The inverse transform method is conceptually the easiest to explain and can be applied to discrete as well as continuously distributed variates: Since we know how to generate variates which are uniform on the interval [0,1), simply regard these as probabilities from a cumulative distribution function corresponding to the distribution we want and find the value of the variate corresponding to the "probability" turned up by the uniform generator. Thus, to obtain continuously distributed variates with the cumulative F( ), for which we have F-1( ) in closed form, first generate the uniform(0,1) variable, u, the compute z = F-1(u ). The z's so generated could have come from the distribution with density f( ). The applicability of this method depends on our ability to specify F-1( ), and its efficiency is problematic since we often can not avoid evaluating exponential, logarithmic or trigonometric functions. Fishman (1996, p.151) gives a useful table of common continuous distributions and their analytical inverses. Included are the following: uniform; beta; exponential, logistic, noncentral Cauchy, normal Pareto; and Weibull. David Baird has a series of programs, available at the GAUSS site at American University,
for generating nonuniform random variates, in which he also gives programs for the following inverse cumulatives: Normal; t; F; chi-square; and Gamma.
For a discrete distribution the inverse transform method is a bit more difficult because the cumulative cannot generally be obtained in closed analytical form, although the principle is the same. The essential idea is as follows: Suppose we want to generate a discrete random variate, z , which can take on values a, a+1, ..., b, coming from the cumulative distribution with probabilities { 0 < qa < qa+1 < ... < qb = 1}. qi is the probability that z i. Set z =a. Generate a uniform variate u, as long as u < qa z remains at a, but when u > qa augment z by 1 and continue with qz. If this seems difficult to follow, the example of Bernoulli trials discussed by Mooney (1997, pp. 31-33) is instructive: A Bernoulli trial is just a "one-shot" binomial; z takes on the value a with probability q and not a with probability 1-q; thus the cumulative distribution is {0 < q < 1}. Here is a little GAUSS program to generate n Bernoulli RVs with this distribution.
q = 0.4; /* for example */
y = RNDU(n,1);
p = q*ones(n,1);
z = y .le p; /* .le is the element by element logical operator, returning 1 if true, 0 if false */
z is an n by 1 vector of RVs from the distribution {0 < 0.4 < 1}. Another example involving only a two-point cumulative is the binomial distribution with t trials and probability p of success . Here
.
Obviously, this method, in general, applies to any univariate discrete distribution, although in general the probabilities qi can be specified in terms of a few underlying parameters and the method makes no use of this to increase efficiency. When the number of possible values is more than 2, the number of comparisons involved can be large. Fishman (1996, pp. 153-155) gives a method for improving efficiency and reducing the mean number of comparisons required to generate a random variable with the specified cumulative. GAUSS has commands for generating RVs for the negative binomial and the Poisson distributions, and Baird includes a program for the binomial. Fortunately, many common discrete distributions can be built up as the composition of Bernoulli variables.
(2) The composition method is the lazy approach, when it can be implemented, and is generally even more inefficient than the inverse transform method. We have already had an example of this approach in the Box-Muller method for generating random normal deviates from uniform deviates by transforming the latter. IID normal 0-1 variables can also be transformed to univariate normals with nonzero mean and variance different from one, as well as into nonindependent multivariate normals. Mooney (1997, pp. 18-23) suggests obtaining RVs from the lognormal, Chi-square and t distributions this way: lognormal by exponentiating a normal variable; Chi-square with df degrees of freedom by summing df independent normals squared; and t with df degrees of freedom as the ratio of a normal and the sqrt of an independent Chi-square with df degrees of freedom. The binomial, geometric and negative binomial distributions can also be treated in this way (Mooney, 1997, pp. 35-42). A standard exponential variable can be generated from a uniform [0,1) variable.[11] Notwithstanding its potential inefficiency, the composition method provides a very powerful tool for handling distributions that cannot be represented in terms of one of the standard distributions. The idea is to "mix" one or more distributions you already know something about. The key parameters in the implementation of this method are the mixing proportions. For example suppose we wanted to mix only two distributions for which we know how to create RVs easily, e.g., two normal distributions with different means and variances; We create a vector from one distribution of length np and one from the other of length n(1-p) where p is the mixing proportion; concatenate the two vectors so obtained vertically; then randomize the elements using the row index and the GAUSS commands RNDU and CEIL. Mooney (1997, pp. 24-25) gives the following GAUSS example for mixing two normal distributions:
y1 = RNDN(((1-p)*n),1); mean = a1; variance = b1; y1 = mean + (sqrt(variance)*y1);
y2 = RNDN((p*n),1); mean = a2; variance = b2; y2 = mean + (sqrt(variance)*y2);
y = y1|y2;
index = CEIL(RNDU(n,1)*n);/* See GAUSS manual, p. 1075. */
x = SUBMAT(y, index', 0);/* See GAUSS manual, p. 1599. Note that index has been
transposed. */
x is the desired vector of RVs having a distribution which is a mixture in proportion p of the two normal distributions specified.
(3) The acceptance-rejection method (AR method) is not only the most generally applicable but the basis for the most efficient and refined algorithms. It works when the inverse cumulative is intractable and when the distribution of the RV we want to generate is not a simple function of one or more distributions we know how to generate. The AR method is based on a theorem on conditional probability stated by von Neumann (1951, op. cit.):
Theorem: Let f(z) be a pdf defined on the interval [a,b], such that
Let Z be the RV with pdf h(z) and U be uniformly distributed on [0,1); then, if Ug(Z), Z has the pdf f(z).
This theorem suggests the following procedure for generating a vector, x, n by 1, of RVs from a distribution having the pdf f(z): Generate two independent RVs, distributed uniformly on [0,1), say z and u. Calculate f(z); if u f(z), insert the value z as the next element in x; if not, try again. Here g(z) and h(z) are both the uniform density. Go on until you have filled the vector x.[12] Here is a short GAUSS program to generate a vector of 10000 RVs coming from the exponential distribution using the von Neumann method:
SAMPLESZ = 10000;
i = 1; x = zeros(samplesz,1);
do while i le samplesz;
T = 0;
again:
u = rndu(1,1); sum = 0; N = 2;
do while sum < u;
v = rndu(1,1); sum = sum + v; N = N + 1;
endo;
if (N % 2 eq 0); T = T +1;
goto again;
endif; x[i,1] = T + u;
i = i + 1;
endo;
The algorithm used by GAUSS to generate normal 0-1 deviates is of the AR type (Kinderman and Ramage, op. cit.). The AR method can be made quite efficient by clever programming and is the basis for many of the algorithms which are used in practice The method, however, has problems with "thick-tailed" distributions, since, for such distributions, the ratio of rejections to acceptances will generally be high, leading to the necessity of generating a very large number of uniform RNs in order to achieve the desired final number. And, as is the case in general, formulations which require the evaluation of exponential, logarithmic, trigonometric, or other special functions may be inefficient simply because these evaluations are themselves time consuming.
--Continued next file: MonteCarlo2.doc
Exercises
Part 1: Random Number Generation
1. Show that the transformation (1) of two variables each independently distributed on the unit interval yields two independently identically distributed normal variates with unit variances and zero means, with positive probability on the whole real plane. Follow the development of Hogg and Craig, 4th edition, section 4.3, "Transformations of Variables of the Continuous Type." Example 8 is exactly this problem. The variables (u1, u2) are distributed with positive probability density on the unit square, [0,0], [0,1], [1,0], [1,1]; whereas the variables (y1 y2) are distributed with positive probability density on the whole real plane. Pay particular attention as to how the region of positive probability for the uniformly distributed variates is transformed by (1) into the region of positive probability for the normal variates. (See H&C, sec. 4.3, Examples 3-7.)
2. Write a GAUSS program to generate 1000 random numbers in the interval [0,1) using RNDU. Divide the interval [0,1) into 10 equal parts and find the frequency of numbers falling in each category. Compare this with the theoretical frequency for each category of 1/10 using a standard Chi-square test to assess the statistical significance of the deviations you find. Use the same random numbers to implement a Kolmogorov-Smirnov test. The K-S test is unfortunately not referenced in the 4th and later editions of H&C, but you can find a good discussion in A. M. Mood, F. A. Graybill, and D. C. Boes, Introduction to the Theory of Statistics, 3rd edition, New York: McGraw-Hill, 1974, pp.508-510. Knuth, pp.41-51, and Press, et al., pp. 472-475, also have an extended discussions. Critical values for the test are tabulated in CRC Standard Probability and Statistics Tables and Formulae, ed. by W. H. Beyer, Boca Raton: CRC Press, 1991, p. 334. How would you modify these tests to check whether RNDN produces variates which are independent standard normal?
3. Write a GAUSS program to generate 1000 3 by 1 vectors having a trivariate normal distribution with mean vector (1, 2, 3)' and variance-covariance matrix:
6.250 1.875 0.625
1.875 2.8125 0.9375
0.625 0.9375 1.3125.
How would you modify your program to produce random normal deviates with a specified correlation matrix?
4. Write a GAUSS program to generate 1000 random nonuniform numbers using the GAUSS commands
RNDN (normal), RNDBETA (beta, choose shape parameters both = 2), RNDGAM (gamma, choose parameter alpha = 2), and RNDNB (negative binomial, choose parameters = 1.5 and 0.5; note that this is a discrete distribution). The Poisson distribution is treated in Exercise 6 below. Compute the Kolmogorov-Smirnov test statistic against the null of the true distribution, and graph the empirical against the theoretical distribution. Do the same for the t-distribution for which there is no GAUSS generator, with degrees of freedom = 2. (See Baird's programs, available at the GAUSS site at American University,
Note that the cumulative of the t-distribution is required and that you can use the GAUSS routine, CFDTC.) GAUSS does not supply all of the other cumulatives you need; you will have to write these yourself or get them from Baird. Note that the Poisson distribution is a discrete distribution and that the Kolmogorov-Smirnov test based on continuous cumulative distribution will give you weird results. See Exercise 6, below.
5. Write a GAUSS program to generate 1000 RVs from a binomial distribution with 20 trials and probability 0.75 of success. Use a Chi-square test to check whether the variates you generated could have come from a binomial with probability 0.5 of success.
6. The Poisson distribution, which is widely used in the study of event data in econometrics (Greene, 2nd ed., pp. 676-679), has the pdf
(Hogg and Craig, 4th ed., pp. 99-102.) This discrete distribution has, among others, the following properties, which may be useful in formulating computer algorithms to generate Poisson variates from uniform variates (Johnson, Kotz and Kemp, Univariate Discrete Distributions, 2nd. ed., Wiley, 1992, pp. 158-162):