Documentation for CanCORS Medical Record Abstraction

SAS Data SetsVersion 1.7

by

Paul Catalano and Yuhsia Boothroyd

CanCORS SCC

October 17, 2007

  1. General Information

This version of the medical record abstraction (MRA) data consists of raw data from MedQuest with no imputation. These data correspond to what are labeled _IMPUTE_ = 0 records in the patient and provider survey data. Imputations will be added to the MRA data sets at some future time.

The structure of the individual SAS MRA data sets differs depending upon the data. For example, some data sets have one row per participant but other data sets will have multiple rows per participant. The following section describes the overall structure of each data set. Please refer to the individual data sets for merge keys that can be used to link the data sets together. Use the CanCORS ID to link the MRA data to previously distributed core and patient survey data sets.

The data cover MRA data submitted to the SCC as of October 3, 2007. All PDCR sites are included. Please note also that the data have been masked using establish SCC guidelines. In particular, all dates of service in the MRA data have been replaced with days since RCA diagnosis taken from the patient tracking data.

  1. Data Sets and Data Structure

Below are brief descriptions of each MRA SAS data set. Please refer to the SAS data sets themselves for variable labels and associated variable formats. Please note that in some cases the data sets are cancer specific and even though the variables are similar across data sets, the formats differ. Please see the data sets, the associated SAS format file (mrformats.sas), and the Excel data dictionary (mra_data_dictionary.xls, included with the data distribution) for details.

Note: some participants may have multiple abstractions and many of the SAS data sets reflect these multiple abstractions through additional rows in the data sets. Multiple abstractions are identified by the QID variable.

Data Set Name: main.sas7bdat

Description: master MRA data set

Structure: one record (observation) per case (see consolidation note below)

Data Set Name: mr_derived.sas7bdat

Description: derived variables

Structure: one record per case (see consolidation note below). See below for variable definitions.

Data Set Name: mr1_cea.sas7bdat

Description: CEA data set

Structure: one record per CEA test result per abstraction

Data Set Name: mr1_colonoscopy.sas7bdat

Description: Colonoscopy data set

Structure: one record per colonoscopy per abstraction

Data Set Name: mr1_pet.sas7bdat

Description: PET scan data set

Structure: one record per PET scan per abstraction

Data Set Name: mr1_prior_cancer_hist.sas7bdat

Description: Prior Cancer History data set

Structure: one record per prior cancer per abstraction

Data Set Name: mr2_chemo_regimen.sas7bdat

Description: parent chemotherapy regimens data set

Structure: one record per chemo regimen per abstraction

Data Set Name: mr2_chemo_drug.sas7bdat

Description: chemotherapy drugs data set

Structure: one record per chemotherapy drug per regimen per abstraction

Data Set Name: mr2_chemo_growth_factor.sas7bdat

Description: growth factors data set

Structure: one record per growth factor per regimen per abstraction

Data Set Name: mr2_clinical_trial.sas7bdat

Description: clinical trials data set

Structure: one record per clinical trial per abstraction

Data Set Name: mr2_regional_chemo.sas7bdat

Description: regional chemotherapy data set

Structure: one record per regional chemotherapy drug per abstraction

Data Set Name: mr2_med_event.sas7bdat

Description: medical events data set

Structure: one record per medical event per abstraction

Data Set Name: mr2_vital.sas7bdat

Description: vital status data set

Structure: one record per case (see consolidation note below)

Data Set Name: mr2_new_mal_recurr.sas7bdat

Description: new malignancies and recurrence data set

Structure: one record per case (see consolidation note below)

Data Set Name: mr2_bisph.sas7bdat

Description: Bisphosphonates use data set

Structure: one record per bisphosphonate per abstraction

Data Set Name: mr2_other_surgery_crc.sas7bdat

Description: Other surgeries data set

Structure: one record per other surgery per colorectal abstraction

Data Set Name: mr2_other_surgery_lung.sas7bdat

Description: Other surgeries data set

Structure: one record per other surgery per lung abstraction

Data Set Name: mr2_prim_surgery_crc.sas7bdat

Description: Primary surgeries data set

Structure: one record per primary surgery per colorectal abstraction

Data Set Name: mr2_prim_surgery_lung.sas7bdat

Description: Other surgeries data set

Structure: one record per primary surgery per lung abstraction

Data Set Name: mr2_prim_sur_med_given_crc.sas7bdat

Description: Primary surgery medicine data set

Structure: one record per drug per surgery per colorectal abstraction

Data Set Name: mr2_prim_sur_med_given_lung.sas7bdat

Description: Primary surgery medicine data set

Structure: one record per drug per surgery per lung abstraction

Data Set Name: mr2_radiation.sas7bdat

Description: Radiation data set

Structure: one record per RT regimen per abstraction

Data Set Name: vis_hosp_adm.sas7bdat

Description: hospitalizations data set

Structure: one record per admission per abstraction

Data Set Name: vis_hosp_dec_mak.sas7bdat

Description: hospitalizations decision making data set

Structure: one record per decision making event per hospitalization per abstraction

Data Set Name: vis_hosp_icd9dx_code.sas7bdat

Description: ICD-9 diagnosis codes data set

Structure: one record per ICD-9 diagnosis code per hospitalization per abstraction

Data Set Name: vis_hosp_icd9px_code.sas7bdat

Description: ICD-9 procedure codes data set

Structure: one record per ICD-9 procedure code per hospitalization per abstraction

Data Set Name: vis_med_onc.sas7bdat

Description: Medical oncology visits data set

Structure: one record per medical oncology visit per abstraction

Data Set Name: vis_med_onc_dec_mak.sas7bdat

Description: Decision making for medical oncology visits data set

Structure: one record per decision making event per medical oncology visit per abstraction

Data Set Name: vis_rad_onc.sas7bdat

Description: Radiation oncology visits data set

Structure: one record per radiation oncology visit per abstraction

Data Set Name: vis_rad_onc_dec_mak.sas7bdat

Description: Decision making for radiation oncology visits data set

Structure: one record per decision making event per radiation oncology visit per abstraction

Data Set Name: vis_surgery_visit.sas7bdat

Description: Surgeon visits data set

Structure: one record per surgeon visit per abstraction

Data Set Name: vis_surgery_dec_mak.sas7bdat

Description: Decision making for surgeon visits data set

Structure: one record per decision making event per surgeon visit per abstraction

Data Set Name: vis_prim_care.sas7bdat

Description: Primary care visits data set

Structure: one record per primary care visit per abstraction

Data Set Name: vis_pcp_dec_mak.sas7bdat

Description: Decision making for primary care visits data set

Structure: one record per decision making event per primary care visit per abstraction

Data Set Name: vis_gastro_visit.sas7bdat

Description: Gastroenterology visits data set

Structure: one record per gastroenterology visit per abstraction

Data Set Name: vis_gast_dec_mak.sas7bdat

Description: Decision making for gastroenterology visits data set

Structure: one record per decision making event per gastroenterology visit per abstraction

Data Set Name: vis_pulmonary.sas7bdat

Description: Pulmonology visits data set

Structure: one record per pulmonology visit per abstraction

Data Set Name: vis_pul_dec_mak.sas7bdat

Description: Decision making for pulmonology visits data set

Structure: one record per decision making event per pulmonology visit per abstraction

Data Set Name: vis_pal_pain_hspce.sas7bdat

Description: Palliative pain mgmt and hospice data set

Structure: one record per visit per abstraction

Data Set Name: vis_pal_pain_hsp_dec_mak.sas7bdat

Description: Decision making for palliative pain mgmt and hospice data set

Structure: one record per decision making event per visit per abstraction

Data Set Name: vis_oth_spec.sas7bdat

Description: Other specialty visits data set

Structure: one record per other specialty visit per abstraction

Data Set Name: vis_oth_spec_dec_mak.sas7bdat

Description: Decision making for other specialty visits data set

Structure: one record per decision making event per other specialty visit per abstraction

Data Set Name: vis_key_referral.sas7bdat

Description: Key non-contact referrals data set

Structure: one record per key non-contact referral per abstraction

Data Set Name: vis_key_ref_dec_mak.sas7bdat

Description: Decision making for key non-contact referrals data set

Structure: one record per decision making event per key non-contact referral per abstraction.

Consolidation of multiple abstractions:

The following data sets have been consolidated by the SCC across multiple abstractions. The consolidation combines data items across abstractions into one record per case and in the situation of redundant data items, the data item with the highest source summary value is used in the consolidated record. The four consolidated data sets are: main, mr_derived, mr2_vital, mr2_new_mal_recurr. These data sets contain one record per case.

  1. Definitions of Derived Variables

The data set named derived_mr.sas7bdat contains several derived variables for general use in analyses. Below are the variables and their definitions:

  • Variable: crc_site

Colon versus rectum primary site. Rectal cancer is defined as: Rectosigmoid junction (code 11) or Rectum NOS (code 12) in the primary tumor location of the histology tab under diagnosis and CS staging (CRC). Any other primary site is coded as colon. Missing crc_site occurs for lung cases and any CRC cases with unknown primary tumor location.

  • Variable: lung_hist

Histology of lung cancer. Small cell cancer is defined as: 8041/3 Small cell carcinoma NOS (code 21), 8042/3 Oat cell carcinoma (code 22), 8043/3 Small cell carcinoma fusiform cell (code 23), 8044/3 Small cell carcinoma intermediate cell (code 24) or 8045/3 Combined small cell carcinoma (code 25) in the histology tab under diagnosis and CS staging. Non-small cell lung cancer is defined as any other histology code. Missing lung_hist occurs for CRC cases and any lung cases with unknown histology.

  • Variable: comorbidity

Each patient is assigned an overall Comorbidity Score on an ordinal

scale of mild, moderate, and severe according to the highest ranked single ailment (one of the 25 ailments in the ACE-27), except in the case where two or more ailments in different organ systems are scored "moderate." In this situation, the overall Comorbidity Score is designated Severe. The systems are, in order of listing (and with number of ailments per that system in parentheses): cardiovascular (7), respiratory (1), gastrointestinal (3), renal (1), endocrine (1), neurological (3), psychiatric (1), rheumatic (1), immunologic (1), malignancy (3), substance abuse (2), and body weight (1).No comorbidity (code 0) means none of the 25 ailments had any grade which corresponds to all default values for these variables (by design MedQuest was programmed with defaults of “No record of X to this degree” for the Comorbidities tab).

  • Variable: scc_adj_chem

Indicator of whether patient received adjuvant chemotherapy. See appendix for definition.

  • Variable: scc_neoadj_chem

Indicator of whether patient received neoadjuvant chemotherapy. See appendix for definition.

  • Variable: scc_adj_rad

Indicator of whether patient received adjuvant radiation therapy. See appendix for definition.

  • Variable: scc_neoadj_rad

Indicator of whether patient received neoadjuvant radiation therapy. See appendix for definition.

  • Variable: scc_met_chem_3mths

Indicator of whether patient received chemotherapy for metastatic disease within 3 months. See appendix for definition.

  • Variable: scc_met_chem

Indicator of whether patient received any chemotherapy for metastatic disease (no time windows applies). See appendix for definition.

  • Variable: scc_medonc_mra_surv

Indicator of whether patient saw a medical oncologist using evidence from MRA and patient survey. See appendix for definition.

  • Variable: scc_medonc_mra

Indicator of whether patient saw a medical oncologist using evidence from MRA only. See appendix for definition.

  • Variable: scc_radonc_mra_surv

Indicator of whether patient saw a radiation oncologist using evidence from MRA and patient survey. See appendix for definition.

  • Variable: scc_radonc_mra

Indicator of whether patient saw a radiation oncologist using evidence from MRA only. See appendix for definition.

  • Variable: scc_stage

Summary stage of disease derived from the following hierarchy in order of highest hierarchy from top (1) to bottom (7):

1. Collaborative Stage

(calculated AJCC stage from the histology [Primary tumor location, Histology, Histologic Grade variables] and collaborative stage elements tabs [CS Tumor Size, CS Extension, TS/Ext Eval, Lymph Nodes, Regional Nodes Evaluation, Regional LN Positive, Regional LN Examined, Metastases at Diagnosis, Metastatic Tissue Eval variables] in MR1)

2. Registry Group Stage (Registry TNM tab in MR1)

3. Registry Path TNM Stage (Registry TNM tab in MR1)

4. Registry Clinical TNM Stage (Registry TNM tab in MR1)

5. Physician Group Stage (Physician Stage Variables tab in MR1)

6. Physician TNM Stage (Physician Stage Variables tab in MR1)

7. Stage of disease from tracking data.

  • Variable: scc_stage_src: Provides the highest data source for calculating stage for each case. Value range is 1-7. See also the format for this variable.
  • Variable: abs_status: Provides best available data from accession/abstraction tracking data on status of medical record abstraction. Codes are Closed, Open/Pending and “.” (missing). The code of Closed refers to all records marked as closed in the abstraction tracking database. Open/Pending refers to cases with at least on tracked medical record marked as open/pending. Missing is used to indicate cases that do not have any abstraction status records in the accession/abstraction tracking database submitted to the SCC.
  1. Questions

For any questions regarding the construction and use of the MRA data sets, please post queries to the analytic discussion forum on the CanCORS web site. Go to and follow the links to Data Distribution and then Analytic Discussion Forums.

  1. Updates to this document / Change log
  • v1.7:

Minor updates to this document for clarity. Variables and data formats have not changed since the last data release.

  • v1.6:

Added scc_adj_rad, scc_neoadj_rad, scc_medonc_mra_surv, scc_medonc_mra, scc_radonc_mra_surv and scc_radonc_mra derived variables.

Removed QID variable from mr_derived data set. QID is unnecessary in this data set. Users should merge to other data on case because mr_derived is a one record per participant data set.

  • v1.5:

scc_stage logic has been modified to more properly allow for multiple abstractions and to place Registry Path stage above Registry Clinical stage in the hierarchy.

Added scc_stage_src variable to indicate highest data source for staging.

Redefined adj_chem, neoadj_chem, met_chem, met_chem_3mths (see appendix) and renamed these variables scc_adj_chem, scc_neoadj_chem, scc_met_chem and scc_met_chem_3mths

Dropped adj_rad and neoadj_rad variables from data set. These variables are in the process of being redefined and will be distributed in a future data release.

Added other text specify field for chemotherapy drugs.

Add chemotherapy drug format to the supplied format library.

Updated comorbidity scoring algorithm to set cases with defaults of no comorbidities to 0 (None) rather than . (missing).

Added abs_status variable to mr_derived data set.

  • v1.4:

The version number of the data was brought up to 1.4 to coincide with patient and provider survey data release during the same time period (hence there are no versions 1.1, 1.2 or 1.3 of the MRA data).

The variable scc_stage has been corrected to identify a larger number of stage IV cases and corrections to the cascading logic now more properly identify lung cases that were staged using physician group stage in the abstraction tool. Previously, these cases were cascading to the level of tracking stage.

The definitions of the variables adj_chem, neoadj_chem, adj_rad and neoadj_rad have been modified to remove the “intent” specification. This results in a larger number of cases being classified into the (neo)adjuvant chemo and RT modalities.

The variables crc_site, lung_hist, met_chem_3mos and met_chem have been added to thederived data set.

The following data sets have been consolidated (e.g., “rolled up”) into one record per case based on available data and highest source summary available in the medical record: main, mr_derived, mr2_vital, mr2_new_mal_recurr.

The latest data now include UIowa.

The MRA data distribution now includes a full data dictionary for all data sets in Excel format. The file is named mra_data_dictionary.xls.

  • v1.0: This is the first version of this document.

Appendix. Definitions of select derived variables.

Definitions of Adjuvant and Neoadjuvant Chemotherapy

Adjuvant Chemotherapy within 6 months

LUNG CANCER: Adjuvant Chemotherapy within 6 months of surgery ("YES"), is defined as follows:

stage I-IIIA (scc_stage in (3,4,5,6,7,8,10,17,18))

and

NSCLC (histlung not in (21,22,23,24,25))

and

earliest primary cancer directed surgery (prmsrgln_f = YES) and known surgery date

and

earliest received chemo regimen (rcvchemreg=1) and known chemo start date (dtstchemo) and chemo start date after diagnosis and chemo start date within 6 months after surgery date

or

received chemo regimen (rcvchemreg=1) and unknown chemo start date and intent = adjuvant (chemintent=2)

or

received chemo regimen (rcvchemreg=1) and unknown chemo start date andchemo confirmed by patient survey with surgery date prior to survey (Note 1)

and

no recurrence or unknown recurrence or known earliest recurrence date is after chemo start date.

LUNG CANCER: Adjuvant Chemotherapy within 6 months of surgery ("NO"),is defined as follows:

stage I-IIIA (scc_stage in (3,4,5,6,7,8,10,17,18))

and

NSCLC (histlung not in (21,22,23,24,25))

and

earliest primary cancer directed surgery (prmsrgln_f = YES) and known surgery date

and

no evidence of received chemo regimen (rcvchemreg not equal to 1)

or

earliest known chemo start date (dtstchemo) and (chemo start before diagnosis or chemo start before surgery or chemo start after recurrence date or chemo start beyond 6 months from surgery).