Design, data weighing and designeffects in Dutch regional health surveys
Previously published in Dutch as: Uitenbroek DG. Design, wegen en het designeffect in GGD gezondheidsenquêtes.Tijdschrift voor Gezondheidswetenschappen (TSG).2009(2): 64-8.
Health surveys carried out by regional public health authorities in the Netherlands frequently use a stratified design (Ten Brinke and Verhagen, 2003; GGD Hollands Midden, 2006; Heemskerk and Poort, 2007; Acker, 2005). The population in the health authorities working area is divided into groups, and in each group a pre-determined number of individuals is surveyed (Cochran, 1977). One of the reasons for this design is that health authorities often want to be able to compare local authority areas within their working area with about equal numbers of cases collected in each local authority area. Stratified sampling designs which are developed with the combined aim of providing reliable statistics at both the local level and the overall regional level are common internationally. Other examples of stratified designs in health surveys can also be found, for example the Amsterdam health survey (Uitenbroek, et.al, 2006) was stratified by age and ethnicity, to enable the study of health service needs among older minority groups compared with the needs in the majority ethnic Dutch population, while simultaneously providing information on health in the city of Amsterdam in general.
To provide statistics for the full health authority area the data has to be weighted, to consider differences in population size and sampling fraction between the different strata used in the design. If the data is weighted the reliability of statistics and estimates produced on the health authority level will be less compared with unweighted data, given a similar number of cases, which translates in wider confidence intervals and differences between groups being less easily significant in weighted data (Kish, 1995). Although weighing reduces the reliability of statistics, non-weighing is often not an option, as statistics in data collected with a stratified design can be seriously biased compared with a simple random sample (Kish, 1957).
In analyzing weighted data therefore the decrease in reliability due to the weighing must be considered. In basic statistical computer packages this is mostly not done. However, more complex methods are nowadays available in a number of dedicated computer applications. In this paper the cell weighing procedure is discussed with specific attention for the designeffect caused by this procedure, whereby the designeffect is the statistic which measures the change in reliability which is caused (among other things) by data weighing. A number of simple formulas are introduced and the extent of the designeffect in Dutch regional health surveys is studied.
Methods

For this article a secondary analysis was done of health authority reports about local health surveys. The health authorities were asked for additional information when required. The weighing in this paper is done according to the cell weigh procedure, as described by Kalton and Flores-Cervantes (2003), the formulas by Kish (1992) are used to estimate the designeffect for the mean or average.

Weighing

Purpose of most weighing is to restore in the sample on a number of (social-demographic) variables the same distribution as observed in the population. There are several methods (Kalton & Flores-Cervantes, 2003) to do this, in this paper two of these methods are used which are both based on the cell weigh procedure. Both procedures discussed here produce the same result in the weighted sample, however, the weights produced by the two methods can be differently interpreted. In the first method used in this paper are the weights (Wi) for each of k stratums the reciprocal of the sample fraction. The sample fraction is the number of sampled individuals ni in each k strata divided by the number of people Ni in the same strata in the population. Thus:

wi=1/(sample fraction)=1/( ni/ Ni)= Ni/ ni(1)

After weighing according to this method the sum of all individual weights in the sample is equal to the total population size:Σ(Wi*ni) = N+. The weights Wi can be interpreted as the number of individuals in the population each individual respondent in the i-th strata represents. A practical advantage of using these weights is that many statistical programmes for complex designs use these weights as the basis of calculations.

A second method is to divide the proportion of each strata in the population by the same proportions as observed in the sample, thus:

wi= Pi/ pi(2)

The sum of all individual weights for these weights is equal to the sample size:Σ(wi*ni) = n+. These weights give the multiplication factor with which groups in the sample become more or less important because of the weighting. The weights also give an impression of the effect of weighing on the designeffect, weights which are (much) larger than one will particularly increase the designeffect. For this reason weights are often trimmed when they are above a certain value (Potter, 1999), introducing some bias in the process.

Designeffects

Because of the weighing the reliability of the sample will decrease, the real reliability will be lower as the number of cases collected suggests. The designeffect –DEFF- measures this and is defined as the factor by which the variance calculated under the assumption of a simple random sample (SRS) changes:

Variance after weighing (v^) = variance SRS (v) * DEFF(3)

The designeffect is sometimes also defined as the factor by which the observed number of cases changes because of the designeffect:

Effective n^ = observed n / DEFF(4)

The DEFF in formulae 3 is equivalent to the DEFF in formulae 4. As the designeffect is almost always larger than one the formulas result in the sample variance will increase because of the designeffects, and the effective number of cases decreasing.

In the case of a data file in which the weights are included as a separate variable with one weight for each respondent, formula 5 can be used to calculate the designeffect for a sample mean:

DEFF = nΣwj2/ (Σwj)2, whereby wjis the weight of the jth respondent out of n respondents; n is the sample size.(5)

If totals for the strata in the sample and population are available the following formula can be used for calculating the designeffect for the sample mean. This way of calculating the designeffect is particularly practical in the design phase of a study before data collection, for example to consider designeffects in calculating sample sizes.

DEFF =Σ(Ni2/ ni) * n/N2, Whereby Niand niare the totals for each of the “i” strata in the population and sample respectively(6)

Besides the designeffect there is the design factor (DEFT). The design factor is defined as the square root out of the designeffect:

Designfactor (DEFT)=√ DEFF.(7)

The design factor is the factor by which the standard error of estimates changes due to the sample design. It also gives the multiplication factor by which the confidence interval around an estimate changes due to the sample design.

Example and results

.

Table 1 gives an overview of the health survey done by the Amsterdam the Meerlanden regional public health authority (Ten Brinke and Verhagen, 2003 & 2004). The design of this survey was to take a sample of 750 individuals from the smaller local authorities, and a sample of 1500 from the larger authorities. In total this resulted in a sample of 5250 persons.

Table 1. Design for the health survey of the Amstelland de Meerlanden public health authority, 2002. Calculation of sample weights, designeffect, and the effect of weighing and design on the estimation of the percentage of citizens reporting noise disturbance due to overflying airplanes.

ni
size of design / Nipopulation size / Ni* Ni/ni / mi
response / wi$ / Number in sample experiencing airplane noise / Estimated number in population who experience noise
Aalsmeer / 750 / 16559 / 365578.6 / 483 / 34.3 / 89 / 3063
Amstelveen / 1500 / 55283 / 2037437 / 935 / 59.1 / 182 / 10780
Haarlemmermeer / 1500 / 87232 / 5072948 / 907 / 96.2 / 110 / 10555
Ouder amstel / 750 / 9234 / 113689 / 447 / 20.7 / 58 / 1191
Uithoorn / 750 / 19060 / 484378.1 / 492 / 38.7 / 88 / 3393
Total / 5250 / 187367 / 8074030 / 3264 / 528 / 28982
Designeffect DEFF = Σ (Ni2/ ni) * n/N2= 8074030 * 5250 / (187367 *187367) = 1.21
95% CI unweighted = 16.2 ± 1.96 * √ (p(1-p )/m) = 16.2 ± 1.96 * √ (0.162(1-0.162 )/3264)*100=16.2 ± 1.26
95% CI weighted = 15.4 ± 1.96 * √ (p(1-p )/m*DEFF) = 15.4 ± 1.96 * √ (0.154 (1-0.154)/3264*1.21)*100=15.4 ± 1.36

$ According to formulae 1.
This table is based on table 2.1 from Ten Brinke JM., Verhagen CE. Hoe gezond is de regio? health peiling 2002; and table 5.3 from: Hoe gezond is de regio? Supplement. Health peiling 2002. Both: Amstelveen: GGD Amstelland de Meerlanden (Ten Brinke and Verhagen, 2003 & 2004).

The designeffect of the Amstelland de Meerlanden public health authority survey is estimated to be 1.21. The effective N for calculating the variance and confidence interval around a mean for this design equals 5250/1.21=4338. In 2002 the health authority collected data from 3264 respondents, which results in an effective n for analysis of 3264/1.21=2698 respondents. The design factor is the square root from the designeffect, thus √1.21=1.1. The confidence interval around a mean is therefore after weighing about 10% wider compared with a simple random sample confidence interval.

The health authority area is near Schiphol-Amsterdam international airport and airplane noise is an important problem in the area. Data on airplane noise is used here to demonstrate the calculation to determine the designeffect of the study. The table shows in the fifth column the number of respondents reporting disturbance due to airplane noise. On the basis of the unweighted data about 16.2% (528/3264*100) of citizens in the area will be disturbed due to airplane noise, with a 95% confidence interval calculated according to a basic method (Blalock, 1960) ranging from 14.9 to 17.4%. Using the sample weights in the 6th column the numbers of respondents are recalculated into the estimated number of people disturbed by noise in the population. The result can be used to estimate the weighted percentage in the overall health authority area which is disturbed by airplane noise, this is calculated to be 15.4% (28982/187367*100), with a confidence interval ranging from 14.0 tot 16.8%. The weighted percentage is lower compared with the unweigthed percentage because in the larger local authority areas, which have larger weights, there is generally less airplane noise compare with the smaller areas.

Table 2Examples of health survey designs as done by Dutch regional health authorities.

Survey / Approximate Design / Wi, range * / DEFF / Reference
Health monitor Zuid Holland Zuid, 2006 / age 19+, 4% from 14 municipalities / 1.00-1.00 / 1.00 / Terpstra et.al., 2006
Health profile Groningen, 2006. / age 20+, 2% from 25 municipalities / 1.00-1.00 / 1.00 / Broer and Spijkers, 2006
Health profile Groningen, 2002. / In the age group 20-64 1% from 21 authorities, 2% from 4 authorities; older as 65+ 2% from 22 authorities, 4% from 2 authorities 5% from 1 authority. / 0.33-1.64 / 1.14 / Broer and Spijkers, 2002
Amstelland de Meerlanden, 2002, health monitor / Zie tabel 1 / 0.34-1.63 / 1.21 / Ten Brinke and Verhagen, 2003
Noord Kennemerland, health momitor, 2006. / About 480 respondents each from 8 local authorities, age 19-65 year / 0.14-2.97 / 1.71 / Heemskerk and Poort, 2007
Gooi and Vechtstreek. Health monitor, 2004. / About 1500 respondents each from 9 local authorities, age 19+ / 0.24-3.17 / 1.72 / Acker, 2005
Health survey GGD Hollands Midden, 2005. / About 500 respondents each from 13 local authorities, age 19-65 year / 0.42-3.92 / 1.80 / GGD Hollands Midden, 2006
Amsterdamse health monitor, 2004. / About 200 each from 5 age and 4 ethnicity groups, 20 groups in total, age 18+ / 0.04-3.21 / 1.85 / Uitenbroek et.al., 2006

* According to formulae 2

Table 2 gives the designeffect in the health surveys as they are published by a number of health authorities in the Netherlands. In two health surveys (Terpstra et.al., 2006; Broer and Spijkers, 2006) there is no designeffect, as it concerns self weighing designs, fixed proportions from each of the strata. Weighing is not required and therefore there will be no designeffect due to the weighing. In the health survey from Groningen done in 2002 (Broer & Spijkers, 2002) and Amstelland de Meerlanden, also done in 2002 (Ten Brinke & Verhagen, 2003), higher numbers of cases were selected from the larger local authorities. The designeffect for these studies is 1.14 and 1.21 respectively. The designeffect is larger in those surveys were a fixed number of cases is taken from strata were the population is different in size. The designeffect ranges from 1.71 in the health survey from Noord Kennemerland from 2006 (Heemskerk & Poort, 2007) to 1.85 in the Amsterdams health survey from 2004 (Uitenbroek et.al., 2006).

Discussion

In this article attention is given to the design, data weighing and designeffects in health surveys as done by regional public health authorities in the Netherlands. Mostly stratified designs are used whereby the population in the region is divided into groups and from each group in the region pre-determined numbers of cases are sampled. A number of the designs are self weighing, in these cases there is no designeffect because of weighing as the data does not have to be weighted. Largest designeffects could be observed were fixed numbers of cases were taken from strata were the population was different in size. In many health surveys the designeffect will not cause all too serious problems, as these survey tend to be very large in size. However, when surveys are smaller there can be a problem, and also in the case of subgroup analysis were there is weighing to make the sampled sub-group representative for the same subgroup in the population, the design effect needs to be considered.