Documentation for CanCORS Medical Record Abstraction
SAS Data SetsVersion 1.7
by
Paul Catalano and Yuhsia Boothroyd
CanCORS SCC
October 17, 2007
- General Information
This version of the medical record abstraction (MRA) data consists of raw data from MedQuest with no imputation. These data correspond to what are labeled _IMPUTE_ = 0 records in the patient and provider survey data. Imputations will be added to the MRA data sets at some future time.
The structure of the individual SAS MRA data sets differs depending upon the data. For example, some data sets have one row per participant but other data sets will have multiple rows per participant. The following section describes the overall structure of each data set. Please refer to the individual data sets for merge keys that can be used to link the data sets together. Use the CanCORS ID to link the MRA data to previously distributed core and patient survey data sets.
The data cover MRA data submitted to the SCC as of October 3, 2007. All PDCR sites are included. Please note also that the data have been masked using establish SCC guidelines. In particular, all dates of service in the MRA data have been replaced with days since RCA diagnosis taken from the patient tracking data.
- Data Sets and Data Structure
Below are brief descriptions of each MRA SAS data set. Please refer to the SAS data sets themselves for variable labels and associated variable formats. Please note that in some cases the data sets are cancer specific and even though the variables are similar across data sets, the formats differ. Please see the data sets, the associated SAS format file (mrformats.sas), and the Excel data dictionary (mra_data_dictionary.xls, included with the data distribution) for details.
Note: some participants may have multiple abstractions and many of the SAS data sets reflect these multiple abstractions through additional rows in the data sets. Multiple abstractions are identified by the QID variable.
Data Set Name: main.sas7bdat
Description: master MRA data set
Structure: one record (observation) per case (see consolidation note below)
Data Set Name: mr_derived.sas7bdat
Description: derived variables
Structure: one record per case (see consolidation note below). See below for variable definitions.
Data Set Name: mr1_cea.sas7bdat
Description: CEA data set
Structure: one record per CEA test result per abstraction
Data Set Name: mr1_colonoscopy.sas7bdat
Description: Colonoscopy data set
Structure: one record per colonoscopy per abstraction
Data Set Name: mr1_pet.sas7bdat
Description: PET scan data set
Structure: one record per PET scan per abstraction
Data Set Name: mr1_prior_cancer_hist.sas7bdat
Description: Prior Cancer History data set
Structure: one record per prior cancer per abstraction
Data Set Name: mr2_chemo_regimen.sas7bdat
Description: parent chemotherapy regimens data set
Structure: one record per chemo regimen per abstraction
Data Set Name: mr2_chemo_drug.sas7bdat
Description: chemotherapy drugs data set
Structure: one record per chemotherapy drug per regimen per abstraction
Data Set Name: mr2_chemo_growth_factor.sas7bdat
Description: growth factors data set
Structure: one record per growth factor per regimen per abstraction
Data Set Name: mr2_clinical_trial.sas7bdat
Description: clinical trials data set
Structure: one record per clinical trial per abstraction
Data Set Name: mr2_regional_chemo.sas7bdat
Description: regional chemotherapy data set
Structure: one record per regional chemotherapy drug per abstraction
Data Set Name: mr2_med_event.sas7bdat
Description: medical events data set
Structure: one record per medical event per abstraction
Data Set Name: mr2_vital.sas7bdat
Description: vital status data set
Structure: one record per case (see consolidation note below)
Data Set Name: mr2_new_mal_recurr.sas7bdat
Description: new malignancies and recurrence data set
Structure: one record per case (see consolidation note below)
Data Set Name: mr2_bisph.sas7bdat
Description: Bisphosphonates use data set
Structure: one record per bisphosphonate per abstraction
Data Set Name: mr2_other_surgery_crc.sas7bdat
Description: Other surgeries data set
Structure: one record per other surgery per colorectal abstraction
Data Set Name: mr2_other_surgery_lung.sas7bdat
Description: Other surgeries data set
Structure: one record per other surgery per lung abstraction
Data Set Name: mr2_prim_surgery_crc.sas7bdat
Description: Primary surgeries data set
Structure: one record per primary surgery per colorectal abstraction
Data Set Name: mr2_prim_surgery_lung.sas7bdat
Description: Other surgeries data set
Structure: one record per primary surgery per lung abstraction
Data Set Name: mr2_prim_sur_med_given_crc.sas7bdat
Description: Primary surgery medicine data set
Structure: one record per drug per surgery per colorectal abstraction
Data Set Name: mr2_prim_sur_med_given_lung.sas7bdat
Description: Primary surgery medicine data set
Structure: one record per drug per surgery per lung abstraction
Data Set Name: mr2_radiation.sas7bdat
Description: Radiation data set
Structure: one record per RT regimen per abstraction
Data Set Name: vis_hosp_adm.sas7bdat
Description: hospitalizations data set
Structure: one record per admission per abstraction
Data Set Name: vis_hosp_dec_mak.sas7bdat
Description: hospitalizations decision making data set
Structure: one record per decision making event per hospitalization per abstraction
Data Set Name: vis_hosp_icd9dx_code.sas7bdat
Description: ICD-9 diagnosis codes data set
Structure: one record per ICD-9 diagnosis code per hospitalization per abstraction
Data Set Name: vis_hosp_icd9px_code.sas7bdat
Description: ICD-9 procedure codes data set
Structure: one record per ICD-9 procedure code per hospitalization per abstraction
Data Set Name: vis_med_onc.sas7bdat
Description: Medical oncology visits data set
Structure: one record per medical oncology visit per abstraction
Data Set Name: vis_med_onc_dec_mak.sas7bdat
Description: Decision making for medical oncology visits data set
Structure: one record per decision making event per medical oncology visit per abstraction
Data Set Name: vis_rad_onc.sas7bdat
Description: Radiation oncology visits data set
Structure: one record per radiation oncology visit per abstraction
Data Set Name: vis_rad_onc_dec_mak.sas7bdat
Description: Decision making for radiation oncology visits data set
Structure: one record per decision making event per radiation oncology visit per abstraction
Data Set Name: vis_surgery_visit.sas7bdat
Description: Surgeon visits data set
Structure: one record per surgeon visit per abstraction
Data Set Name: vis_surgery_dec_mak.sas7bdat
Description: Decision making for surgeon visits data set
Structure: one record per decision making event per surgeon visit per abstraction
Data Set Name: vis_prim_care.sas7bdat
Description: Primary care visits data set
Structure: one record per primary care visit per abstraction
Data Set Name: vis_pcp_dec_mak.sas7bdat
Description: Decision making for primary care visits data set
Structure: one record per decision making event per primary care visit per abstraction
Data Set Name: vis_gastro_visit.sas7bdat
Description: Gastroenterology visits data set
Structure: one record per gastroenterology visit per abstraction
Data Set Name: vis_gast_dec_mak.sas7bdat
Description: Decision making for gastroenterology visits data set
Structure: one record per decision making event per gastroenterology visit per abstraction
Data Set Name: vis_pulmonary.sas7bdat
Description: Pulmonology visits data set
Structure: one record per pulmonology visit per abstraction
Data Set Name: vis_pul_dec_mak.sas7bdat
Description: Decision making for pulmonology visits data set
Structure: one record per decision making event per pulmonology visit per abstraction
Data Set Name: vis_pal_pain_hspce.sas7bdat
Description: Palliative pain mgmt and hospice data set
Structure: one record per visit per abstraction
Data Set Name: vis_pal_pain_hsp_dec_mak.sas7bdat
Description: Decision making for palliative pain mgmt and hospice data set
Structure: one record per decision making event per visit per abstraction
Data Set Name: vis_oth_spec.sas7bdat
Description: Other specialty visits data set
Structure: one record per other specialty visit per abstraction
Data Set Name: vis_oth_spec_dec_mak.sas7bdat
Description: Decision making for other specialty visits data set
Structure: one record per decision making event per other specialty visit per abstraction
Data Set Name: vis_key_referral.sas7bdat
Description: Key non-contact referrals data set
Structure: one record per key non-contact referral per abstraction
Data Set Name: vis_key_ref_dec_mak.sas7bdat
Description: Decision making for key non-contact referrals data set
Structure: one record per decision making event per key non-contact referral per abstraction.
Consolidation of multiple abstractions:
The following data sets have been consolidated by the SCC across multiple abstractions. The consolidation combines data items across abstractions into one record per case and in the situation of redundant data items, the data item with the highest source summary value is used in the consolidated record. The four consolidated data sets are: main, mr_derived, mr2_vital, mr2_new_mal_recurr. These data sets contain one record per case.
- Definitions of Derived Variables
The data set named derived_mr.sas7bdat contains several derived variables for general use in analyses. Below are the variables and their definitions:
- Variable: crc_site
Colon versus rectum primary site. Rectal cancer is defined as: Rectosigmoid junction (code 11) or Rectum NOS (code 12) in the primary tumor location of the histology tab under diagnosis and CS staging (CRC). Any other primary site is coded as colon. Missing crc_site occurs for lung cases and any CRC cases with unknown primary tumor location.
- Variable: lung_hist
Histology of lung cancer. Small cell cancer is defined as: 8041/3 Small cell carcinoma NOS (code 21), 8042/3 Oat cell carcinoma (code 22), 8043/3 Small cell carcinoma fusiform cell (code 23), 8044/3 Small cell carcinoma intermediate cell (code 24) or 8045/3 Combined small cell carcinoma (code 25) in the histology tab under diagnosis and CS staging. Non-small cell lung cancer is defined as any other histology code. Missing lung_hist occurs for CRC cases and any lung cases with unknown histology.
- Variable: comorbidity
Each patient is assigned an overall Comorbidity Score on an ordinal
scale of mild, moderate, and severe according to the highest ranked single ailment (one of the 25 ailments in the ACE-27), except in the case where two or more ailments in different organ systems are scored "moderate." In this situation, the overall Comorbidity Score is designated Severe. The systems are, in order of listing (and with number of ailments per that system in parentheses): cardiovascular (7), respiratory (1), gastrointestinal (3), renal (1), endocrine (1), neurological (3), psychiatric (1), rheumatic (1), immunologic (1), malignancy (3), substance abuse (2), and body weight (1).No comorbidity (code 0) means none of the 25 ailments had any grade which corresponds to all default values for these variables (by design MedQuest was programmed with defaults of “No record of X to this degree” for the Comorbidities tab).
- Variable: scc_adj_chem
Indicator of whether patient received adjuvant chemotherapy. See appendix for definition.
- Variable: scc_neoadj_chem
Indicator of whether patient received neoadjuvant chemotherapy. See appendix for definition.
- Variable: scc_adj_rad
Indicator of whether patient received adjuvant radiation therapy. See appendix for definition.
- Variable: scc_neoadj_rad
Indicator of whether patient received neoadjuvant radiation therapy. See appendix for definition.
- Variable: scc_met_chem_3mths
Indicator of whether patient received chemotherapy for metastatic disease within 3 months. See appendix for definition.
- Variable: scc_met_chem
Indicator of whether patient received any chemotherapy for metastatic disease (no time windows applies). See appendix for definition.
- Variable: scc_medonc_mra_surv
Indicator of whether patient saw a medical oncologist using evidence from MRA and patient survey. See appendix for definition.
- Variable: scc_medonc_mra
Indicator of whether patient saw a medical oncologist using evidence from MRA only. See appendix for definition.
- Variable: scc_radonc_mra_surv
Indicator of whether patient saw a radiation oncologist using evidence from MRA and patient survey. See appendix for definition.
- Variable: scc_radonc_mra
Indicator of whether patient saw a radiation oncologist using evidence from MRA only. See appendix for definition.
- Variable: scc_stage
Summary stage of disease derived from the following hierarchy in order of highest hierarchy from top (1) to bottom (7):
1. Collaborative Stage
(calculated AJCC stage from the histology [Primary tumor location, Histology, Histologic Grade variables] and collaborative stage elements tabs [CS Tumor Size, CS Extension, TS/Ext Eval, Lymph Nodes, Regional Nodes Evaluation, Regional LN Positive, Regional LN Examined, Metastases at Diagnosis, Metastatic Tissue Eval variables] in MR1)
2. Registry Group Stage (Registry TNM tab in MR1)
3. Registry Path TNM Stage (Registry TNM tab in MR1)
4. Registry Clinical TNM Stage (Registry TNM tab in MR1)
5. Physician Group Stage (Physician Stage Variables tab in MR1)
6. Physician TNM Stage (Physician Stage Variables tab in MR1)
7. Stage of disease from tracking data.
- Variable: scc_stage_src: Provides the highest data source for calculating stage for each case. Value range is 1-7. See also the format for this variable.
- Variable: abs_status: Provides best available data from accession/abstraction tracking data on status of medical record abstraction. Codes are Closed, Open/Pending and “.” (missing). The code of Closed refers to all records marked as closed in the abstraction tracking database. Open/Pending refers to cases with at least on tracked medical record marked as open/pending. Missing is used to indicate cases that do not have any abstraction status records in the accession/abstraction tracking database submitted to the SCC.
- Questions
For any questions regarding the construction and use of the MRA data sets, please post queries to the analytic discussion forum on the CanCORS web site. Go to and follow the links to Data Distribution and then Analytic Discussion Forums.
- Updates to this document / Change log
- v1.7:
Minor updates to this document for clarity. Variables and data formats have not changed since the last data release.
- v1.6:
Added scc_adj_rad, scc_neoadj_rad, scc_medonc_mra_surv, scc_medonc_mra, scc_radonc_mra_surv and scc_radonc_mra derived variables.
Removed QID variable from mr_derived data set. QID is unnecessary in this data set. Users should merge to other data on case because mr_derived is a one record per participant data set.
- v1.5:
scc_stage logic has been modified to more properly allow for multiple abstractions and to place Registry Path stage above Registry Clinical stage in the hierarchy.
Added scc_stage_src variable to indicate highest data source for staging.
Redefined adj_chem, neoadj_chem, met_chem, met_chem_3mths (see appendix) and renamed these variables scc_adj_chem, scc_neoadj_chem, scc_met_chem and scc_met_chem_3mths
Dropped adj_rad and neoadj_rad variables from data set. These variables are in the process of being redefined and will be distributed in a future data release.
Added other text specify field for chemotherapy drugs.
Add chemotherapy drug format to the supplied format library.
Updated comorbidity scoring algorithm to set cases with defaults of no comorbidities to 0 (None) rather than . (missing).
Added abs_status variable to mr_derived data set.
- v1.4:
The version number of the data was brought up to 1.4 to coincide with patient and provider survey data release during the same time period (hence there are no versions 1.1, 1.2 or 1.3 of the MRA data).
The variable scc_stage has been corrected to identify a larger number of stage IV cases and corrections to the cascading logic now more properly identify lung cases that were staged using physician group stage in the abstraction tool. Previously, these cases were cascading to the level of tracking stage.
The definitions of the variables adj_chem, neoadj_chem, adj_rad and neoadj_rad have been modified to remove the “intent” specification. This results in a larger number of cases being classified into the (neo)adjuvant chemo and RT modalities.
The variables crc_site, lung_hist, met_chem_3mos and met_chem have been added to thederived data set.
The following data sets have been consolidated (e.g., “rolled up”) into one record per case based on available data and highest source summary available in the medical record: main, mr_derived, mr2_vital, mr2_new_mal_recurr.
The latest data now include UIowa.
The MRA data distribution now includes a full data dictionary for all data sets in Excel format. The file is named mra_data_dictionary.xls.
- v1.0: This is the first version of this document.
Appendix. Definitions of select derived variables.
Definitions of Adjuvant and Neoadjuvant Chemotherapy
Adjuvant Chemotherapy within 6 months
LUNG CANCER: Adjuvant Chemotherapy within 6 months of surgery ("YES"), is defined as follows:
stage I-IIIA (scc_stage in (3,4,5,6,7,8,10,17,18))
and
NSCLC (histlung not in (21,22,23,24,25))
and
earliest primary cancer directed surgery (prmsrgln_f = YES) and known surgery date
and
earliest received chemo regimen (rcvchemreg=1) and known chemo start date (dtstchemo) and chemo start date after diagnosis and chemo start date within 6 months after surgery date
or
received chemo regimen (rcvchemreg=1) and unknown chemo start date and intent = adjuvant (chemintent=2)
or
received chemo regimen (rcvchemreg=1) and unknown chemo start date andchemo confirmed by patient survey with surgery date prior to survey (Note 1)
and
no recurrence or unknown recurrence or known earliest recurrence date is after chemo start date.
LUNG CANCER: Adjuvant Chemotherapy within 6 months of surgery ("NO"),is defined as follows:
stage I-IIIA (scc_stage in (3,4,5,6,7,8,10,17,18))
and
NSCLC (histlung not in (21,22,23,24,25))
and
earliest primary cancer directed surgery (prmsrgln_f = YES) and known surgery date
and
no evidence of received chemo regimen (rcvchemreg not equal to 1)
or
earliest known chemo start date (dtstchemo) and (chemo start before diagnosis or chemo start before surgery or chemo start after recurrence date or chemo start beyond 6 months from surgery).