Goodness of Fit Testing

Goodness of Fit testing

All of the models and approaches we have discussed so far make very specific assumptions (concerning model fit) that must be tested before using MARK - thus, as a first step, you need to confirm that your starting (general) model adequately fits the data, using GOF tests.

There are two primary purposes for GOF testing.

· it is a necessary first step to insure that the most general model in your candidate model set adequately fits the data.

· Comparing the relative fit of a general model with a reduced parameter model provides good inference only if the more general model adequately fits the data.

What do you do if you run a GOF test and it doesn’t adequately fit the data?

Well it forces you to look at the data and ask why it doesn’t it and of course the answers can be quite revealing.

So what causes lack of fit and what do we mean by it? We mean that the arrangement of the data do not meet the expectations determined by the assumptions underlying the model. In the context of simple mark-recapture, these assumptions, sometimes known as the ‘CJS assumptions’ are:

1. Every marked animal present in the population at time (i) has the same probability of recapture (pi)

2. Every marked animal in the population immediately after time (i) has the same probability of surviving to time (i+1)

3. Marks are not lost or missed.

4. All samples are instantaneous, relative to the interval between occasion (i) and (i+1), and each release is made immediately after the sample.

GOF testing is a diagnostic procedure for testing the assumptions underlying the model(s) we are trying to fit to the data. To accommodate (adjust for, correct for...) lack of fit, we first need some measure of how much extra binomial ‘noise’ (variation) we have. The magnitude of this over-dispersion cannot be derived directly from the various significance tests that are available for GOF testing, and as such, we need to come up with some way to quantify the amount of over-dispersion. This measure is known as a variance inflation factor (, or phonetically, ‘c-hat’).

C-hat is the measure of the lack of fit between the general and saturated models and so as the general model gets ‘further away’ from the saturated model, c-hat 1. Now a saturated model is loosely defined as the model where the number of parameters equals the number of data points - as such, the fit of the saturated model to the data is effectively ’perfect’ (or, as good as it’s going to get).

Many different approaches to estimating this are available and you can find a full description of this in Chapter 5 of the MARK book. There are several approaches you can use including program RELEASE GOF within MARK itself (only applicable to the recapture model) and the Bootstrap, and median c-hat approaches which use simulation and re-sampling to generate the estimate of c hat. Rather than assuming that the distribution of the model deviance is in fact χ2 distributed (since it generally isn’t, for typical ’MARK data’), the bootstrap and median c-hat approaches generate the distribution of model deviances, given the data, and compare the observed value against this generated distribution. The disadvantage of the bootstrap and median c hat approaches (beyond some technical issues) is that both merely estimate c hat. While this is useful (in a practical sense), it reveals nothing about the underlying sources of lack of fit.

Program RELEASE GOF: 2 tests associated with it

Test 2 – tests the assumption that all marked animals are equally detectable

Test 3 - tests the assumption that that all marked animals alive at (i) have the same probability of surviving to (i+1) - the second CJS assumption.

It is easy to run simply highlight your most parameterised model (e.g. in the dipper case Phi(t)p(t)) and select RELEASE GOF under the “tests” tab. It will run automatically and spawn a notepad window. BUT i don’t want you to use this and if you want to know more then read pages 147-155 (Chp5-9 – Chp 5-17)

POINT TO REMEMBER – USE THESE TESTS ON YOUR MOST GENERAL MODEL I.E. THE ONE WITH THE MOST PARAMETERS ONLY.

e.g. IN THE DIPPER CASE Phi(t)p(t).

e.g. IN THE SWIFT CASE Phi(c*t)p(c*t).

BOOTSTRAP APPROACH.

The bootstrap approach simulates data based on the parameter estimates of the model and these simulated data exactly meet the assumptions of the model, i.e., no over-dispersion is included, animals are totally independent, and no violations of model assumptions are included. Data are simulated based on the number of animals released at each occasion. So basically the bootstrap will generate a whole set of deviances, c hat estimates AICc values etc, etc and you would look at where your model deviance falls in the distribution. So if you ran 100 simulations and your model fell between the 80 and 81st simulation (when sorted by deviance values from lowest to highest) you would have an approximation that your model deviance was reasonably likely to be observed P < 0.19 (19 models with a higher deviance/100). How many simulations to use? Run 100 simulations, and do a rough comparison of where the observed deviance falls on the distribution of these 100 values. If the “P value” is > 0.2, then doing more simulations is probably a waste of time - the results are unlikely to change much (although obviously the precision of your estimate of the P-value will improve). However as the value gets closer to nominal significance (say, if the observed P-value is < 0.2), then it is probably worth doing ≫ 100 simulations (say, 500 or 1000). Note that this is likely to take a long time (relatively speaking, depending on the speed of your computer).

To get the bootstrap approach running, simply select this option under the “tests” tab. You will get a window like this. Click the top option and hit OK.

You will then be asked to name the bootstrap results dbf file

Give it a unique identifier – normally in relation to the dataset and model concerned - and click save. You will then get a little confirmation window. After this you will get the following. Now select the number of simulations (start with 100 but see above) and leave the random number seed at 0. The next time you come to do this choose a different random number seed e.g. 1, 2 etc.

As stated above the time taken for this to run will vary on the model, number of simulations and your computer power – so could take a long time.

When completed you need to choose “view simulation results” under the simulations tab. Choose the appropriate file (the one you just named) and you will get a window like the following

You can sort the data using the AZ↓ key and choosing which column header to use – i would use the deviance. You can then check where your model deviance falls in the distribution. You can also generate summary data from the simulations using the little calculator key – very useful.

BUT HOW DO I GET MY ESTIMATE OF C HAT?

Either use:

1) Observed deviance /mean deviance of simulations to estimate c hat or

2) Observed model deviance /deviance df will give you an observed model c hat which you can then divide by the mean of the simulated c hats). You can obtain the observed model c hat by selecting “median c hat” under the “tests” tab (see below).

Which one should i choose? I would calculate both and use the higher value of the 2 as this will make your estimates more conservative.

Recently as an alternative the Median c hat approach has been used ( see MARK book for a full description).

MEDIAN C HAT APPROACH. This gives you an estimate of c hat. Simply select “median c hat” under the “tests” tab. You will get a window like this which gives you the observed model deviance /deviance df at the top. This is the observed model c-hat that you would need in the above.

You should always set the lower bound at 1 and the upper bound at 3. Why these values? 1 because a c hat =1 would be a perfect fit and if c hat>3 then there is probably some fundamental problems with your model. I would set the # intermediate points = 3 and then it is entirely up to you how many replicates you choose. I would use a 100. Click ok and it will run and then spawn a Notepad window which will give you an estimate of c hat at the very top + a se. It will also produce a graph within MARK which will also show the estimate of c hat.

I have my c hat estimates now what do i do?

Well you can adjust your original model estimates (which were based on c-hat =1 and now you can account for some of that over-dispersion). To do this you simply select the “c hat” option under the “Adjustments” tab and enter your value for c hat.

Once you do this your results browser will change - try typing in 1.5 for the Phi(t)p(t) model in the male dipper dataset.

This will result in QuasiAICc values and the weighting of your models will also change. The best fitting model has suddenly become a much better fitting model compared to the others in this example. If you get estimates for these new adjusted models you will notice that the parameter estimates are still the same but the standard errors of those estimates will have changed.

How big a c hat can you use ?

When should you apply a new c hat? Is 1.5 really different than the null, default value of 1.0? At what point is c hat too large to be useful? If the model fits perfectly, then c hat = 1. What about if c hat = 2, or c hat = 10? Is there a point of diminishing utility? As a working “rule of thumb”, provided c hat ≤ 3, you should feel relatively safe. Above 3 and you need to consider whether your model is adequate.