Supplemental Materials for Carter Ms

ADS-0925

Materials and Methods

This study was based on a secondary analysis of data collected according to the GOG-0171 study protocol [5], as revised in 2003 when HPV testing began on liquid based cytology specimens. Diagnosis of Atypical Glandular Cells AGC was based on the 1991 Bethesda System of classification of conventional Pap smear results. The revised protocol was approved by the Division of Cancer Prevention of the National Cancer Institute and by the GOG Human Research Committee prior to accruing patients. It was reviewed annually by Institutional Review Boards at participating institutions. The protocol was closed to accrual in 2005. A thorough presentation of materials and procedures has been published previously by Liao, et al [5]. We present a summary here, along with a complete detailing of additional or different methods employed in the current study.

Patients

Patients with a cytological diagnosis of AGC not otherwise specified AGC-NOS, or of undetermined significance, who were 18 years of age or older and who consented to participate were included in the study. Exclusion criteria given by Liao et al[5] were: previous hysterectomy; a history of endometrial hyperplasia or cancer of the endometrium, vagina, or cervix; previous or concurrent radiation treatment to the vagina or cervix; positive HIV status; or pregnant and thought to be at high risk for excessive bleeding following a cone biopsy. Patients also were excluded if a histological evaluation of the cervix was not performed within six months of the AGC-NOS diagnosis.

Diagnostic Assessments

Diagnostic assessments included histological evaluation of cervical biopsies (cone biopsy, Loop Electrosurgical Excision Procedure, or standard biopsy if positive) or hysterectomy specimens, H-HPV testing and HPV genotyping by Roche Linear Array (RLA) performed on liquid-based (ThinPrep, Cytyc/Hologic, Marlborough, MA) cytology specimens, and immunochemistry determination of carbonic anhydrase IX (CA-IX) expression performed on conventional Pap smear specimens. Liao, et al [5] used the histological-based diagnostic test as the reference/”Gold Standard” (“G.S.”) method of diagnosis in their evaluation of diagnostic accuracy of HPV and CA-IX. We shall treat their histological-based diagnoses as imperfect and will evaluate their “G.S.” test accuracy along with that of CA-IX and three HPV-based diagnoses obtained from liquid-based Pap specimens. Specimens for both CA-IX and HPV tests were collected using a spatula and cytobrush before surgical procedures were performed.In all cases, evaluators were blinded to the results of other tests.

Histological diagnosis: A cone biopsy or Loop Electrosurgical Excision Procedure (LEEP) biopsy of the cervix that included the cervical transformation zone or a hysterectomy was performed within six months of the AGC-NOS diagnosis for each enrolled patient. Participating institutions submitted hematoxylin and eosin (H&E) stained slides and corresponding pathology reports for GOG Central Pathology review for each enrolled patient. Slides of the cone/LEEP biopsy or hysterectomy and/or of any other biopsies that were performed prior to the final excisional procedure also were submitted for review. A complete evaluation of the cervical transformation zone was required for negative diagnosis. Two pathologists reviewed each slide. A positive diagnosis resulted from consensus observation of CIN2, CIN3, AIS or invasive carcinoma in tissue taken either by a biopsy prior to the excisional procedure or in tissue taken by the excisional procedure. A third reviewer was used to resolve cases where there was disagreement. The highest grade lesion observed determined the “G.S.” diagnosis. Women with positive diagnoses were coded as having a SCL. Women with only lower grade lesions, such as CIN1 and atypia (defined as glandular and squamous lesions in which cellular atypia falls short of AIS and CIN1), or no lesions and women with significant lesions but not in the cervix were recorded as having no SCL and were classified as “test negative” by the histology-based diagnostic test.

CA-IX-based diagnosis: Spray-fixed conventional study Pap smear specimens were immunostained for CA-IX expression as described [5]. Immunostaining was performed in Dr. Stanbridge’s Laboratory at the University of California, Irvine; using the anti-CA-IX mouse monoclonal antibody, M75, as previously described[5,7,11,12]. Positive immunohistochemical staining was defined by the presence of specific brown reaction product on the plasma membrane under 40X magnification. Faint or no staining of the cytoplasm was considered negative.

Determinations of CA-IX status were recorded independently by three cyto/gynecologic pathologists without the knowledge of histological diagnosis or HPV status. Discrepant cases were reviewed simultaneously by the same three study pathologists, using a multi-headed microscope to achieve majority agreement and render a final CA-IX result.

HC2 H-HPV-based diagnosis: Liquid-based cytology (LBC) specimens were used for HPV testing. The presence of H-HPV types 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68 in LBC specimens, collected from 2003, was detected using the Hybrid Capture II (HC2: Digene Corp., Gaithersburg, MD). This test does not distinguish between the different H-HPV types. The result of the diagnostic rule based on this assessment is denoted in this paper by HC2 H-HPV.

RLA genotype-based diagnosis: The majority of specimens testedfor H-HPVwith theHC2 method were also genotyped using the Roche LINEAR ARRAY HPV Genotyping Test according to the manufacturer’s directions. This test uses a combination of polymerase chain reaction (PCR) amplification of the L1 gene of HPV and nucleic acid hybridization to detect up to 37 different HPV genotypes (13 high-risk and 24 low-risk) associated with anogenital lesions. Two diagnostic tests were defined based on the RLA HPV results. The first, denoted by RLA H-HPVG, is positive if one or more of the 13 high-risk HPV types (16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68) is present and is negative if none are present. The second, denoted by RLA HPVG, is positive if one or more of the 37 HPV types is observed and negative otherwise. This test includes each of 37 anogenital HPV DNA genotypes (13 high-risk and 24 low-risk). RLA Roche LINEAR ARRAY genotyping identifies individual HPV type(s) present in positive samples [12-13].

Statistical Methods

Three Latent Class Model LCM analyses [15,16]were performed, each including three diagnostic variables. Two of the three variables in each analysis were “G.S.”, the histology-based assessment of presence or absence of SCL, and CA-IX. The third diagnostic variable was HC2 H-HPV, RLA HPVG, and RLA H-HPVG in the first, second, and third analyses, respectively. The assumptions of classical LCM analysis require that the diagnostic variables be independent given true disease status [15,16]. We do not believe this to be the case for HC2 H-HPV, RLA HPVG, and RLA H-HPVG because the associations among these variables are not solely due to an association with true disease status. So, to avoid a violation of the conditional independence assumption we performed three LCM analyses with each of these as the third variable along with “G.S.” and CA-IX. It is reasonable to assume that the variables in each of these triplets are conditionally independent. Under the condition that the diagnostic tests are conditionally independent given true disease status, LCM analysis has been recommended for routine use when assessing diagnostic accuracy in the absence of a perfect gold standard [17].

The parameters of a classical LCM include the sensitivities and one minus the specificities of each diagnostic test and the prevalence of disease. In the current application, it is important to allow the prevalence to vary as a function of age. Thus, we extended the classical LCM to allow covariates in a prevalence model that replaces the prevalence parameter in the classical LCM. The EM-algorithm [18] with a Monte Carlo approximation in the E-step [19] was used to obtain the maximum likelihood estimates of these parameters. The prevalence of SCL, given age, was modeled as a logistic function of age (in decades) and age squared. Bayes’ rule was used to obtain 1-PPV and 1-NPV as functions of prevalence[20]. Bootstrap methods [21] were used to estimate standard errors of the estimators and these were used with the usual formulas to calculate 95% confidence intervals for the sensitivities and specificities of each diagnostic test and 95% confidence bands for 1-PPV and 1-NPV as functions of age.

A Receiver Operating Characteristics (ROC) analysis [20] was performed to determine how best to combine the information in HC2 H-HPV diagnostic test results with those of the CA-IX test. There are four possible outcomes for the combination of HC2 H-HPV and CA-IX results (HPV positive, CA-IX positive; HPV positive, CA-IX negative; HPV negative, CA-IX positive; and HPV negative, CA-IX negative) Thus, there are 16 (24) classification rules that could each define a diagnostic test based on both HC2 H-HPV and CA-IX. For example, one rule would be: classify an individual as test positive if either HPV=+ or CA-IX=+. Otherwise, if both HPV=- and CA-IX= -, then classify them as negative. This was the rule chosen by Liao, et al.[5] when defining their test that was based on the combination of HPV and CA-IX results. In the current study, we considered each of the 16 ways to define a diagnostic test based on the combination of HC2 H-HPV and CA-IX results. For each of the 16 tests, we plotted the sensitivity versus 1-specificity on an ROC plot. The “best” combined test was then selected as the one whose point on the plot had the shortest Euclidean distance to the point (1,0), which coincides with perfect sensitivity and perfect specificity.

Additional Discussion

While the combination of CA-IX with HC2 H-HPV testing does not improve the diagnostic accuracy for cervical neoplasia in women with AGC-NOS diagnosis over that of HC2 H-HPV testing alone in the U.S. population, it should be noted that CA-IX and RLA H-HPVG combined had improved sensitivity in the GOG-0171 Japanese cohort compared with RLA H-HPVG alone. Liao, et al [27]conjectured that this may be because some SCL in Japan (specifically, lobular endocervical glandular hyperplasia, LEGH) are caused independent of HPV infection or, alternatively, because there is an undiscovered oncogenic strain of HPV in Japan [27]. The former explanation is favored by Liao, et al[27], but is inconsistent with the commonly held belief that HPV infection is involved in the etiology of almost all cervical cancers [27]. In Figure 1, we observed a peak prevalence of SCL at about 28 years of age. LEGH usually occurs in postmenopausal women. Thus, it will be of particular interest whether the peak prevalence is shifted to an older age when we repeat the current study with Japanese data and whether the shift remains when the eight cases of SCL in Japan that were not detected by RLA H-HPVG are removed from the analysis.

Excessive test positivity is a concern with HPV screening tests and cytology cotesting [29], especially among cases that are HPV positive and cytology negative [30]. This problem is manifested in a low PPV. Even in our very high risk population with an AGC-NOS positive cytological result, the PPV is notably low. It is not clear that the clinical decision to treat with an excisional procedure should be based on a positive HPV test alone. A false test positive rate (1-PPV) of 17 to 70% was observed from 20-70 years of age. Nearly 20% of women in their reproductive years (age 20-40) with HC2 H-HPV positive test results would be over-treated and exposed to the risk of Caesarian sections, preterm delivery, and infant morbidity and mortality [31] or destruction of reproductive ability. The purpose of the ongoing GOG follow up study to GOG-0171 (i.e., GOG-0237) is to investigate whether the inclusion of additional biomarkers such as p16, Ki-67, or MCM2 in a diagnostic strategy together with HC2 H-HPV might increase specificity and decrease 1-PPV to an acceptably low level of false-positive test results.