Racial comparison of receptor-defined breast cancer in Southern African women: subtype prevalence and age-incidence analysis of nationwide cancer registry data
Caroline Dickens1 2, Raquel Duarte2, Annelle Zietsman3, Herbert Cubasch4 5, Patricia Kellett6, JoachimSchuz1, Danuta Kielkowski6, Valerie McCormack1
Supplementary Data
Supplementary methods
Immunohistochemistry
Histological confirmation of breast cancer diagnosis as well as immunohistochemistry for ER, PR and HER2 receptors was performed in South Africa at one of 13 NHLS laboratories located country-wide or in Namibia at the Namibia Institute of Pathology. The IHC systems used by each laboratory and their accreditation status are provided in Table S1.1. Standard operating procedures (SOPs) are available for each of the laboratories on request.
Supplementary Table S1.1: Namibian and NHLS laboratories (SA) where ER, PR and HER2 immunohistochemistry is performed
Laboratory / N (%) / SA Province / SANAS accreditation* / IHC automation systemNamibia Institute of Pathology / 440 (100) / - / Pending / Ventana Benchmark XT
NHLS, South Africa
Johannesburg / 1875 (15.7) / Gauteng / Accredited / Dako Autostainer
Tshwane / 1162 (9.8) / Gauteng / Accredited / †
Baragwanath / 753 (6.3) / Gauteng / Accredited / Ventana Benchmark XT
George Mukhari / 527 (4.4) / Gauteng / Accredited / Dako Autostainer
Groote Schuur / 1697 (14.2) / Western Cape / Accredited / Ventana Benchmark XT
Tygerberg / 1341 (11.3) / Western Cape / Accredited / Ventana Benchmark XT
Universitas / 1235 (10.4) / Free State / Accredited / Ventana Benchmark XT
Durban / 1014 (8.5) / KwaZulu Natal / † / †
East London / 593 (5.0) / Eastern Cape / Pending / Ventana Benchmark XT
Port Elizabeth / 485 (4.1) / Eastern Cape / Accredited / †
Umtata / 208 (1.7) / Eastern Cape / Accredited / †
Polokwane / 814 (6.8) / Limpopo / Accredited / †
Kimberley / 218 (1.8) / Northern Cape / † / ‡
* SANAS: South African National Accreditation System
† Information not available to us at present
‡ IHC outsourced to Universitas
Details of receptor-extraction algorithm
Data from Namibia was extracted from patient files manually and entered into a database.
For South Africa, electronic versions of the laboratory reports (n=23258) were provided by the NCR as free text documents in non-standardized formats. Each patient had between one and eight laboratory reports (mean 2.5 records) thus laboratory reports were available for 12093 patients. We developed an algorithm to auto-search for and extract information on ER, PR and HER2 receptor statuses and tumour grade. Statuses for each of the three receptors and tumour grade were extracted independently from each laboratory report. The algorithm located keywords indicating possible receptor information for each of the three receptors and for tumour grade. Examples of keywords for ER were ‘oestrogen’, ‘ ER’, ‘receptor’, ‘immuno*’, ‘triple*’, ‘hormone’, ‘prognostic marker’, ‘positiv*’, ‘stain’ and ‘nuclear’. Variations in spelling and Afrikaans versions of these keywords were also included. Once keywords were located, the algorithm assigned a receptor status (positive/negative, HER2 score if given) by considering the context in which the keyword was found e.g. ER+/ER-, positive/negative, staining intensity descriptions (none/weak/mild/moderate/strong), percentages and scores (0,1,2,3,+,++,+++), H scores (0-300) and proportions (0-5) and intensities (0-3). If any ambiguity existed in the keyword context, the record was flagged and read manually to determine the context of the keyword and assign receptor status. When multiple reports were available for a single woman, the earliest record with known ER status was used. Using this automated procedure, ER receptor status was assigned for 9621/12093 (80%) women. For the remaining 2472 women, reports were read individually by one of us (CD), and receptor status assigned manually if found. Similar methods were used to extract PR and HER2 status from a laboratory report but these statuses were only assigned to a woman if the ER-status was known and the PR and/or HER2 status was from a report within one month of the report from which ER status was extracted. This limited time frame was applied to ensure that all receptor statuses for a single woman came from a single earliest time point in her diagnosis.
A random validation sample (n=200) manually checked had 100% extraction rate (as all unassigned ER statuses were manually extracted), and 100% agreement with the extracted status for ER. For PR and HER2, the algorithm did not extract 1.8% and 2.3% of statuses respectively, but there was 100% agreement for PR and 99.4% for HER2 where extracted.
Demographic data
Demographic information including date of birth, age, race, date of diagnosis, hospital and NHLS laboratory were obtained from the report from which the oestrogen receptor status was extracted. Patients came from 311 hospitals, 13% from primary hospitals, 21% from secondary hospitals and 66% from tertiary hospitals. Province was assigned according to the location of the hospital. Race was predicted by the NCR using a hotdeck imputation method to allocate a population group to the cancer cases by matching the patient’s name with a reference database containing surnames with known racial group. Surnames not appearing in the database are coded as unknown and were more likely to contain recent immigrants from neighbouring Southern African countries whose surnames were not entered in the database. For those women missing an ER status (n=2317), demographic information was obtained from the laboratory report with the earliest date. Finally, we selected for women with known age ≥20 resulting in data from 11921 patients of whom 81% (n=9642) had known ER status.
Statistical Methods
Age-incidence rate curves
Proportional-rates were calculated as the number of imputed ER-specific cases/population. Population counts were obtained from the 2011 SA census in 5-year age categories by race and province (1). ER status was imputed for those cases where it was unknown by multiplying the total number of cases by the proportion of ER⁺ and ER⁻ cases respectively. Total imputed numbers of ER⁺ cases were thus calculated by adding together the known ER+ cases and the ER⁺ cases imputed from the unknowns. Total imputed numbers of ER⁻ cases were calculated in the same way. Proportional rather than actual rates are presented relative to ER⁺ rates at age 50-54 for each race individually. Poisson regression of imputed cases (log offset population count) with a linear term for age to estimate the percentage increase in incidence rate per year of age (100(eβ-1)), are included with a priori stratification at age 50 to examine Clemmesen’s hook.
Comparisons to published data
The most recent year of complete cancer incidence (pathologically confirmed cases in both the public and private sector) published by the NCR is 2006. Although incidence rates in this study (2009-11) may have changed slightly since then, the 2006 total breast cancers in women can be used to compare the proportion of the annual cases in 2009-11 by age. All ages combined, the annual number of public-sector cases in the 2009-11 data were 1.02 times the number of cases in 2006, i.e. almost identical as in 2006. Proportions were similar when restricted to black patients as only a small proportion of cases from this racial group are diagnosed in the private sector. For white, mixed ancestry, and Indian/Asian, all ages combined, the annual number of cases in the 2009-11 data were 0.36, 0.76 and 0.79 times that in 2006 respectively. These proportions are entirely consistent with the proportion of the population without private health insurance: 90% of black people, 29% of white, 78% of mixed race and 53% of the Indian population (2). Further, there were no major differences in these proportions by age (Figure S1.1).
Supplementary Figure S1.1: Annual number of public sector breast cancers in women in SA in 2009-10 as a percentage of annual number of total cases in 2006
References
1. Statistics South Africa 2011 National Census. http://interactive.statssa.gov.za/superweb/login.do 2013 [cited 2013 Feb 4];
2. Mayosi BM, Lawn JE, van Niekerk A., Bradshaw D, Abdool Karim SS, Coovadia HM. Health in South Africa: changes and challenges since 2009. Lancet 2012;380:2029-43.
Supplementary Results
Supplementary Table S2.1: Characteristics of 12361 histologically-confirmed breast cancer patients in South Africa (2009-11, public sector) and Namibia (2011-13), by race
Black / White / Mixed-Ancestry / Indian/Asian / Other/Unknown
N / % / N / % / N / % / N / % / N / %
Country
(N, row %) / All
South Africa
Namibia / 6926
6633
293 / 56.0
55.6
66.6 / 2308
2224
84 / 18.7
18.7
19.1 / 1812
1758
54 / 14.7
14.8
12.3 / 490
490
0 / 4.0
4.1
0.0 / 825
816
9 / 6.7
6.9
2.0
Age at diagnosis
(N, col %) / 20-49
≥50 / 2836
4090 / 41.0
59.1 / 626
1682 / 27.1
72.9 / 626
1186 / 34.6
65.5 / 136
354 / 27.8
72.2 / 310
515 / 37.6
62.4
Mean (SD) / 54.2 (14.5) / 58.7 (13.8) / 56.0 (13.5) / 57.5 (12.4) / 55.9 (14.8)
SA Province*
(N, col %) / Eastern Cape / 887 / 13.4 / 249 / 11.2 / 201 / 11.4 / 20 / 4.1 / 175 / 21.5
Free State / 558 / 8.4 / 150 / 6.7 / 36 / 2.1 / 5 / 1.0 / 49 / 6.0
Gauteng / 2484 / 37.5 / 573 / 25.8 / 199 / 11.3 / 87 / 17.8 / 281 / 34.4
KwaZulu-Natal / 565 / 8.5 / 65 / 2.9 / 19 / 1.1 / 301 / 61.4 / 64 / 7.8
Limpopo / 774 / 11.7 / 24 / 1.1 / 9 / 0.5 / 15 / 3.1 / 65 / 8.0
Mpumalanga / 348 / 5.3 / 28 / 1.3 / 8 / 0.5 / 0 / 0.0 / 16 / 2.0
North West / 401 / 6.1 / 57 / 2.6 / 30 / 1.7 / 6 / 1.2 / 38 / 4.7
Northern Cape / 126 / 1.9 / 99 / 4.5 / 103 / 5.9 / 1 / 0.2 / 24 / 2.9
Western Cape / 490 / 7.4 / 979 / 44.0 / 1153 / 65.6 / 55 / 11.2 / 104 / 12.8
Receptor status known
(N, % known) / ER / 5559 / 80.3 / 1963 / 85.1 / 1556 / 85.9 / 313 / 63.9 / 656 / 79.5
PR / 5173 / 74.7 / 1530 / 66.3 / 1057 / 58.3 / 268 / 54.7 / 592 / 71.8
HER2 / 4686 / 67.7 / 1569 / 69.0 / 1249 / 68.9 / 244 / 49.8 / 503 / 61.0
*SA only
4
Supplementary Table S2.2: Characteristics of women with missing ER status
n / n missing / % missing / OR of missing ER status* / 95% CI / p-value12361 / 2314 / 18.7
Age (years)
20-29 / 247 / 72 / 29.2 / 1.83 / (1.33-2.53) / 0.000
30-39 / 1442 / 282 / 19.6 / 1.17 / (0.98-1.40) / 0.078
40-49 / 2845 / 543 / 19.1 / 1.18 / (1.02-1.36) / 0.029
50-59 / 3020 / 530 / 17.6 / 1 (Ref) / - / -
60-69 / 2507 / 439 / 17.5 / 0.97 / (0.83-1.13) / 0.725
70+ / 2300 / 448 / 19.5 / 1.07 / (0.92-1.25) / 0.365
Race
Black / 6926 / 1367 / 19.7 / 1 (Ref) / - / -
White / 2308 / 345 / 15.0 / 0.95 / (0.81-1.10) / 0.479
Mixed-ancestry / 1812 / 256 / 14.1 / 1.02 / (0.86-1.23) / 0.797
Indian/Asian / 490 / 177 / 36.1 / 0.97 / (0.76-1.24) / 0.817
Unknown / 825 / 169 / 20.5 / 1.11 / (0.91-1.35) / 0.325
Grade
1 / 980 / 63 / 6.43 / 1 (Ref) / - / -
2 / 3644 / 303 / 8.32 / 1.01 / (0.75-1.34) / 0.97
3 / 2258 / 191 / 8.46 / 0.94 / (0.69-1.27) / 0.67
Unknown / 5480 / 1756 / 32.04 / 5.18 / (3.96-6.76) / 0.00
Province
Eastern Cape / 1534 / 188 / 12.26 / 0.63 / (0.52-0.76) / 0.000
Free State / 797 / 188 / 23.59 / 1.95 / (1.60-2.38) / 0.000
Gauteng / 3624 / 513 / 14.16 / 1 (Ref) / - / -
KwaZulu-Natal / 1014 / 547 / 53.94 / 4.00 / (3.35-4.79) / 0.000
Limpopo / 887 / 141 / 15.90 / 1.33 / (1.08-1.66) / 0.009
Mpumalanga / 400 / 102 / 25.50 / 1.29 / (0.99-1.66) / 0.055
North West / 532 / 146 / 27.44 / 1.76 / (1.40-2.21) / 0.000
Northern Cape / 353 / 143 / 40.51 / 3.18 / (2.46-4.11) / 0.000
Western Cape / 2781 / 310 / 11.15 / 0.47 / (0.40-0.56) / 0.000
Namibia / 440 / 35 / 7.95 / 0.46 / (0.31-0.66) / 0.000
Year of Diagnosis**
2009 / 3756 / 626 / 16.67 / 1 (Ref) / - / -
2010 / 3612 / 600 / 16.61 / 0.99 / (0.88-1.12) / 0.915
2011 / 3540 / 505 / 14.27 / 0.83 / (0.73-0.95) / 0.005
*Adjusted for age in 10 year categories, race, province and year of diagnosis except for where this is the variable of interest
**South Africa only, excluding the province of KwaZulu-Natal for which 2011 data were not available
32
Supplementary Table S2.3 Breast cancer subtype distribution by race
Black / White / Mixed Ancestry / Indian/Asian / UnknownN / Luminal A (%) / Luminal B (%) / HER2 enriched (%) / Triple negative (%) / N / Luminal A (%) / Luminal B (%) / HER2 enriched (%) / Triple negative (%) / N / Luminal A (%) / Luminal B (%) / HER2 enriched (%) / Triple negative (%) / N / Luminal A (%) / Luminal B (%) / HER2 enriched (%) / Triple negative (%) / N / Luminal A (%) / Luminal B (%) / HER2 enriched (%) / Triple negative (%)
Total / 4425 / 54.6 / 13.8 / 10.7 / 20.9 / 1261 / 60.8 / 13.9 / 7.8 / 17.5 / 878 / 55.4 / 11.8 / 10.9 / 21.9 / 219 / 64.8 / 9.6 / 8.2 / 17.4 / 463 / 54.2 / 12.1 / 10.4 / 23.3
Age / 20-29 / 87 / 56.3 / 20.7 / 3.4 / 19.5 / 14 / 35.7 / 21.4 / 21.4 / 21.4 / 18 / 27.8 / 0 / 16.7 / 55.6 / 2 / 50 / 0 / 0 / 50 / 9 / 55.6 / 0 / 11.1 / 33.3
30-39 / 599 / 50.6 / 17.5 / 12.5 / 19.4 / 107 / 65.4 / 12.1 / 7.5 / 15 / 80 / 55 / 18.8 / 5 / 21.3 / 19 / 57.9 / 21.1 / 10.5 / 10.5 / 59 / 45.8 / 28.8 / 11.9 / 13.6
40-44 / 510 / 54.9 / 16.5 / 11.0 / 17.6 / 94 / 53.2 / 23.4 / 5.3 / 18.1 / 84 / 56 / 11.9 / 9.5 / 22.6 / 21 / 61.9 / 9.5 / 4.8 / 23.8 / 55 / 58.2 / 7.3 / 10.9 / 23.6