STATISTICS IN MEDICINE

Statist. Med. 2003; 22:1069-1082 (DOI: 10.1002/si.1388)

http://www.biostat.washington.edu/~yanez/b578D/materials/tostesonSS.pdf

Power and sample size calculations for generalized regression models

with covariate measurement error

Tor D. Tosteson, Jeffrey S. Buzas, Eugene Demidinko and Margaret Karagas

Lixia Zhang

Statistics Department

North Carolina State University

September 2011

Contents

Contents 1

1. Introduction 2

2. Model and Assumption 2

3. Methodology 2

4. Simulation Studies 4

5. Conclusions 4

1.  Introduction

Measurement error is an important consideration in designing an epidemiologic study, especially for exposure risk factor. If covariate measurement error is ignored, power and sample size calculations tend to overestimate power and underestimate the actual sample size required to achieve a desired power. This paper introduces a method, which takes covariate measurement error into consideration, to correct power function for generalized linear models using a generalized score test based on quasi-likelihood methods.

2.  Model and Assumption

In epidemiologic study, with the increasing use of sophisticated measurement techniques and confounding risk factors which need to be further considered, the final planned analysis typically involves a multiple logistic regression model or other generalized linear model for discrete or continuous covariates. Of course, some of the covariates xiare contaminated with measurement error, while others zican be considered as free of error. Suppose that yi is the outcome, a generalized regression model can as set up as

Eyixi,zi=f(β0+βz'zi+βx'xi)

with variance function

varyixi,zi=σ2g2(β0+βz'zi+βx'xi;θ)

where θ is a variance function parameter. The measurement error model considered here is any measurement error structure satisfying the conditional independence assumption

Py,w|z,xyi,wizi,xi=Py|z,xyizi,xi*Pw|z,xwizi,xi

here P.. is a conditional probability density or mass function, wi is a surrogate variable for xi, the true value. Besides, all that is required in this specification is that the surrogate exposure be independent of the outcome given the true exposure. So, the measurement error structure can be classical error model or Berkson error model or any general surrogate measurement error model.

3.  Methodology

The paper derive the power function based on generalized score test as well as the asymptotic distribution of the score test statistics. Beside, the asymptotic relative efficiency (ARE) of the score test comparing using true value and surrogate under local alternative is also introduced.

The power function of the test for a fixed alternative under measurement error assumptions is based on generalized score test; while generalized score test for testing hypotheses concerning the (β0,βz',βx) is based on the quasi-likelihood score equations. In order to simplify the model, the paper assumes that there is only one covariate being subject to measurement error, and thus βx is a scalar. The quasi-likelihood for testing H0: βx=0 in the presence of measurement error is

Lβ0,βz=1ni=1ndiβ0,βz,θyi-fβ0+βz'ziE[xi|zi,wi;τ]

where diβ0,βz,θ=f1(β0+βz'zi)/g2β0+βz'zi,θ and f1x=ddxfx.

The score test statistics is

L2(β0,βz)σ2σ02

where (β0,βz') are consistent regression parameter estimates satisfying the q+1 non-linear equations

i=1ndiβ0,βz,θyi-fβ0+βz'zi1zi=0

Expressions for the normalizing scalars σ2and σ02 are too complicated to list here. Please go to the Appendix of paper for more details.

Under alternative, it is derived that the asymptotic distribution of the score test is non-central chi-square distribution on one degree of freedom with non-centrality parameterϕ:

L2β0,βzσ2σ02~kβ0,βz',βx-1χ2ϕβ0,βz',βx.

So the asymptotic power function is given by

Prχ2ϕ>kχα2,

more details about function k and ϕ could be found in Appendix of paper.

The ARE of the score test is calculated under the local alternative condition. In other words, the assumption is suitable for small alternativeβx, which implies that for large n

L2β0,βzσ2σ02~χ2(λ)

where λ=nβx2σ2σ02/2. Then the associated power function is

Prχ2λχα2

Thus we can see derived ARE for normal exposure and measurement error distributions,

ARE=nxnw=ρxw2

where nx is required sample size for exposure to achieve a given power, nw is required sample size for surrogate, and ρxw2 is the correlation between x and w.

4.  Simulation Studies

The paper includes two simulation studies. One is for empirical power; the other is for checking robustness. They are all based on 5000 simulated data sets and 2*1*2*3*3=36 configurations for the parameters(β0,βz,βx,ρ,v). From the first study, we can see that the empirical power is close to the nominal power of 0.9 in each case. Note also that sample size requirement can increase dramatically as measurement error increases. It also indicates that the power function yields accurate sample sizes for detecting both small and large alternatives. For the second study, the design is that using the sample sizes calculated in the first study, computed assuming (z,x,u) are normal, they generated data sets where the marginal distributions of z and x where either both skewed, symmetric or one skewed and the other symmetric. Additionally, the distribution of r was varied among skew and symmetric distributions. The results are as followed. When the distribution of x is skewed and z is skewed or symmetric, the power function computed assuming z and x are jointly normal generally underestimated the actual power. When x is symmetric and z is skewed, the power is moderately affected. It suggests that joint distribution need to be correctly specified for the power function to be accurate.

5.  Conclusions

In the conclusion, the author says that a great advantage of their method is that it is based on a generalized score test modified for measurement error corrections and do not require further assumptions about the size of the relative risk regression coefficients or other modeling restrictions.

1