Maximum Likelihood Estimation

Page Contents

Binomial Model

We will use a simple hypothetical example of the binomial distribution to introduce concepts of the maximum likelihood test.

We have a bag with a large number of balls of equal size and weight. Some are white, the others are black. We want to try to estimate the proportion, &theta., of white balls. The chance of selecting a white ball is &theta.. Suppose we select ~n times replacing and mixing after each selection ("~{sampling with replacement}"). Then chance of selecting white ball is &theta. each time, and the individual trials (selections) are independent of each other.

The number, ~x, of white balls from the ~n trials is binomially distributed, that is:

p ( ~x | &theta. ) _ = _ ( ^~n _~x ) (&theta.)^{~x} (1 - &theta.)^{~n - ~x} _ _ _ &theta. &in. [ 0 , 1 ]

where _ ( ^~n _~x ) _ = _ ~n#! ./ ~x#! ( ~n - ~x )#! _ = _ ~n (~n - 1) ... (~n - ~x + 1) ./ ~x#! _ _ _ [ ~x terms top and bottom]

You will probably recognize this as the binomial distribution with parameters ~n and &theta..

Maximum Likelihood Estimator

Suppose now that we have conducted our trials, then we know the value of ~x (and ~n of course) but not &theta.. This is the reverse of the situation we know from probability theory where we assume we know the value of &theta. from which we can work out the probability of the result ~x, i.e. the probability of ~x given &theta., ~p ( ~x | &theta. ).

What is the ~{likekihood} of the parameter having a value &theta. when the result of the experiment is ~x? (We use the word ~{likelihood} instead of ~{probability} to avoid confusion.) Define the #~{likelihood function}:

L( &theta. ) _ #:= _ L( &theta. | ~x ) _ #:= _ p( ~x | &theta. ) _ = _ ( ^~n _~x ) (&theta.)^{~x} (1 - &theta.)^{~n - ~x}

The greater L(&theta.) is the more "likely" that &theta. is the true value. To find the value of &theta. that maximizes L(&theta.), take logs and differentiate:

l(&theta.) _ = _ ln(L(&theta.)) _ = _ ln ( ^~n _~x ) + ~x ln(&theta.) + (~n - ~x) ln(1 - &theta.)

fract{&partial. l(&theta.),&partial.&theta.} _ = _ fract{~x ,&theta.} - fract{(~n - ~x),1 - &theta.}

fract{&partial. l(&theta.),&partial.&theta.} _ array{ > _ 0, _ _ _ _ _ _ &theta. _ < _ {~x/~n}/ = _ 0, _ _ _ _ _ _ &theta. _ = _ {~x/~n}/ < _ 0, _ _ _ _ _ _ &theta. _ > _ {~x/~n}}

So est{&theta.} = ~x/~n, is the value of &theta. for which L(&theta.) is greatest. This is called the #~{Maximum likelihood estimator} (MLE) of &theta..

Likelihood Ratio Test

Suppose that, instead of trying to estimate &theta., we have an a-priori idea about the value of &theta., i.e. we put the hypothesis H: &theta. = &theta._0, and we want to test to see if this is acceptable. Suppose we have conducted the trial and the number of white balls was ~x. Define the #~{likelihood ratio} as

LR( ~x ) _ = _ fract{L( &theta._0 | ~x ),L( est{&theta.} | ~x )}

Where est{&theta.} is the maximum likelihood estimate as defined above. So LR is the ratio of the likelihood for the hypothesis to the maximum likelihood. As est{&theta.} maximizes L( &theta. ) for any &theta. &in. [ 0 , 1 ], and &theta._0 is just one of these values, then _ 0 < LR =< 1 . _ (Obviously LR = 1 when &theta._0 = est{&theta.}, but remember that we don't know the value of est{&theta.} until we have conducted the trial.)

In the case of the binomial example we have been looking at:

LR( ~x ) _ = _ fract{( &theta._0 )^{~x} ( 1 - &theta._0 )^{~n - ~x} ,( ~x/~n )^{~x} ( ( ~n - ~x )/~n )^{~n - ~x}}

_ _ _ _ _ = _ script{rndb{fract{~n &theta._0,~x}},,,~x,} script{rndb{fract{~n ( 1 - &theta._0 ),~n - ~x}},,,~n - ~x,}

Clearly this has maximum, LR = 1, when _ ~x = ~n &theta._0 , _ and falls off to either side of this value.

The diagram on the right plots the values of LR for ~n = 20 and H_0 : &theta. = 0.65 . Note that this has a maximum (of 1) at ~x = 20 # 0.65 = 13.

Significance Probability

Note that the likelihood ratio LR(~x) will be between 0 and 1, and the greater its value, the more acceptable the hypothesis is. But what criteria do we use to decide whether or not we accept the hypothesis? We define the #~{significance probability} (SP) of ~x as:

SP ( ~x ) _ = _ P\{ ~y | LR(~y) &leq. LR(~x) \}

I.e. this is the probability, given the hypothesis, of obtaining a result that is as likely or less likely than the obtained result.

It is possible, but messy to work this out explicitly (see Calculating MLE Statistics ), but modern computer packages make this a more realistic option. However it is still quite common to use approximations for SP, as is demonstrated on the Chi-squared Test for Binomial page.

In whichever way the SP is calculated, its main use is in deciding whether we accept the hypothesis or not. The usual procedure is to decide on an arbitrary #~{level} of the test, usually designated &alpha., where &alpha. = 5%, 1%, 0.5% etc. and reject the hypothesis if the SP is below this level.

#{Example}

Consider doing 20 samples with replacement in our white & black balls example, where we want to test the hypothesis that 65% of the balls are white (&theta. = 0.65). What is the significance probability of getting a result 11 white balls?

By looking at the graph of LR in the above section we can see that _ LR ( ~x ) =< LR ( 11 ) _ for _ 0 =< ~x =< 11 _ and _ 15 =< ~x =< 20.

P \{ 0 =< ~x =< 11 _ or _ 15 =< ~x =< 20 \} _ = _ B( 11 ; 20, 0.65) + 1 - B( 14 ; 20, 0.65) _ = _ 0.2376 + 1 - 0.7546 _ = _ 0.4830

( These values were calculated using the Mathyma binomial distribution look-up facility )

Confidence Interval

In the first section we showed that the MLE est{&theta.} = ~x/~n. This is a point estimate for &theta.. Although this is the most "likely" value for &theta. given the result it is unlikely that this is the exact true value of &theta..

We define the ~k% #~{confidence interval} as the range of values of &theta._0 for which SP > (100 - ~k)% - i.e. the range of values of &theta._0 for which we would accept the hypothesis H: &theta. = &theta._0, at the (100 - ~k)% level.

[ E.g. the usual value to use for the level of the test, &alpha., is 0.05 or 5%. The corresponding confidence interval is then (100 - 5)% or 95% ]

#{Example}

Continuing with our example, suppose we select a ball from the bag 20 times, and it turns out that the result is a white ball 7 times. From this we would conclude that the maximum likelihood estimator of &theta., the proportion of white balls in the bag, is 7/20 or est{&theta.} = 0.35. What is the 95% confidence interval?

If we had been testing the hypothesis H: &theta. = 0.35, then the significance probability of 7 white balls out of 20 would have been 100%.

For the hypothesis H: &theta. = 0.5, the S.P. would be 26.32% and we would accept the hypothesis. At &theta. = 0.6 the SP is 3.7% which would lead to rejection.

To three decimal places, the 5% S.P. is at &theta. = 0.580. (In fact the S.P. for &theta. = 0.580 is 6.73% => accept, while the S.P. for &theta. = 0.581 is 4.21% => reject.)

On the lower side S.P. = 1.27% for p = 0.135, S.P. = 6.69% for p = 0.136, so the confidence interval is 0.136 =< &theta. =< 0.580. Note that the point estimate of est{&theta.} = 0.35 is not exactly in the middle of the interval.

This method of trial and error is a somewhat laborious method of determining the confidence interval. A more practical way of calculating the confidence interval will be introduced after we have seen how to approximate a binomial distribution by a normal one.