ORMAT Statistics 550 Notes 13
Reading: Section 2.3.
Schedule:
1. Take home midterm due Wed. Oct. 25th
2. No class next Tuesday due to fall break. We will have class on Thursday.
3. The next homework will be assigned next week and due Friday, Nov. 3rd.
I. Asymptotic Relative Efficiency (Clarification from last class)
Consider two estimators and and suppose that
and that
.
We define the asymptotic relative efficiency of to by . For iid , .
The interpretation is that if person A uses the sample median as her estimator of and person B uses the sample mean as her estimator of , person B needs a sample size that is only 0.63 as large as person A to obtain the same approximate variance of the estimator.
Theorem: If is the MLE and is any other estimator, then
.
Thus, the MLE has the smallest asymptotic variance and we say that the MLE is asymptotically efficient and asymptotically optimal.
Comments: (1) We will provide an outline of the proof for this theorem when we study the Cramer-Rao (information) inequality in Chapter 3.4; (2) The result is actually more subtle than the stated theorem because it only covers a certain class of well behaved estimators – more details will be study in Stat 552.
II. Uniqueness and Existence of the MLE
For a finite sample, when does the MLE exist, when is it unique and how do we find the MLE?
If is open, is differentiable in and exists, then must satisfy the estimating equation
1)
This is known as the likelihood equation.
But solving does not necessarily yield the MLE as there may be solutions of that are not maxima, or solutions that are only local maxima.
Anomalies of maximum likelihood estimates:
Maximum likelihood estimates are not necessarily unique and do not even have to exist.
Nonuniqueness of MLEs example: are iid Uniform().
-
Thus any estimator that satisfies is a maximum likelihood estimator.
Nonexistence of maximum likelihood estimator: The likelihood function can be unbounded. An important example is a mixture of normal distributions, which is frequently used in applications.
iid with density
. This is a mixture of two normal distributions. The unknown parameters are .
Let . Then as , so that the likelihood function is unbounded.
Example where the MLE exists and is unique: Normal distribution
iid
The partials with respect to and are
Setting the first partial equal to zero and solving for the mle, we obtain
Setting the second partial equal to zero and substituting the mle for , we find that the mle for is
.
To verify that this critical point is a maximum, we need to check the following second derivative conditions:
(1) The two second-order partial derivatives are negative:
and
(2) The Jacobian of the second-order partial derivatives is positive,
See attached notes from Casella and Berger for verification of (1) and (2) for normal distribution.
Conditions for uniqueness and existence of the MLE: We now provide a general condition under which there is a unique maximum likelihood estimator that is the solution to the likelihood equation. The condition applies to many exponential families.
Boundary of a parameter space: Suppose the parameter space is an open set. Let be the boundary of , where denotes the closure of in . That is, is the set of points outside of that can be obtained as limits of points in , including all points with as a coordinate. For instance, for ,
Convergence of points to boundary: In general, for a sequence of points from open, we define as to mean that for any subsequence , either with or diverges with as where denotes the Euclidean norm.
Example: In the case,
all tend to as .
Lemma 2.3.1: Suppose we are given a function where is open and is continuous. Suppose also that
.
Then there exists such that
.
Proof: Problem 2.3.5.
Proposition 2.3.1: Suppose our model is that has pdf or pmf , and that (i) is strictly concave; (ii) as . Then the maximum likelihood estimator exists and is unique.
Proof: is continuous because is convex (see Appendix B.9). By Lemma 2.3.1, exists. To prove uniqueness, suppose and are distinct maximizers of the likelihood, then
with the inequality following from the strict concavity of ; this contradicts being a maximizer of the likelihood.
Corollary: If the conditions of Proposition 2.3.1 are satisfied and is differentiable in , then is the unique solution to the estimating equation:
2)
Application to Exponential Families:
1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential family, the log likelihood is strictly concave.
Consider the exponential family
Note that if is convex, then the log likelihood
is concave in .
Proof that is convex:
Recall that . To show that is convex, we want to show that
for
or equivalently
We use Holder’s Inequality to establish this. Holder’s Inequality (B.9.4 on page 518 of Bickel and Doksum) states that for any two numbers r and s with ,
.
We have
For a full exponential family, the log likelihood is strictly concave.
For a curved exponential family, the log likelihood is concave but not strictly concave.
2. Theorem 2.3.1, Corollary 2.3.2 spell out specific conditions under which as for exponential families.
Example 1: Gamma distribution
for the parameter space .
The gamma distribution is a full two-dimensional exponential family so that the likelihood function is strictly concave.
The boundary of the parameter space is
Can check that .
Thus, by Proposition 2.3.1, the MLE is the unique solution to the likelihood equation.
The partial derivatives of the log likelihood are
Setting the second partial derivative equal to zero, we find
When this solution is substituted into the first partial derivative, we obtain a nonlinear equation for the MLE of :
This equation cannot be solved in closed form.
Next topic: Numerical methods for finding the MLE (Chapter 2.4).
5