Dataset of Observed Features on Endoscopic Colorectal Biopsies from Normal Subjects and Patients With Chronic Inflammatory Bowel Disease (Crohn’s Disease and Ulcerative Colitis)

Dr Simon S Cross

Senior Lecturer

Department of Pathology

University of Sheffield Medical School

Beech Hill Road

Sheffield S10 2RX, Great Britain

E-mail:

Introduction:

Ulcerative colitis and Crohn’s disease are disorders of the digestive tract which extends from the mouth, through the stomach and intestines to the back passage (anus) (Anonymous 1996). Both disorders are grouped under the more generic term ‘chronic idiopathic inflammatory bowel disease’ (CIIBD). CIIBD describes diseases of the bowel which are characterised by acute and chronic inflammation and which have no identified aetiological agent (such as an infective agent). Ulcerative colitis (UC) is restricted to the large intestine which may be inflamed over a variable proportion of its length; if the rectum only is inflamed the condition is described as proctitis. The lining of the intestine becomes red and ulcerated; the inflamed area bleeds readily. For those with extensive colitis there is an increased risk of bowel cancer which can develop in a young person. Crohn’s disease most commonly involves the lower part of the small intestine (ileum) at its junction with the large intestine (‘regional ileitis’) or the small and large intestine, or the colon alone; swelling of the lips and ulceration of the mouth can occur, ulceration and infection around the anus are common. In Crohn’s disease, not only is the lining of the gut swollen and ulcerated, but there is also thickening of the wall of the intestine. The inflammation may spread through the wall to involve neighbouring structures. Local perforation of the wall can lead to widespread or localised infection, or an opening (fistula) on the skin through which intestinal contents emerge. There is an increased risk of cancer, particularly when the large intestine is extensively inflamed.

Histopathology is usually recognised as the 'gold standard' for the diagnosis of CIIBD. The two major diagnostic decisions to be made in this area are: 1. Does the subject have CIIBD or not? 2. If the subject does have CIIBD is it Crohn's disease or ulcerative colitis? These decisions have important implications for patient management. The division between CIIBD and not CIIBD determines whether a patient requires long term follow-up or not. The follow-up for CIIBD could include examination of the colon by colonoscopy (a flexible tube passed round the bowel from the anus) on a yearly basis, a procedure that has a small associated mortality, relatively high patient discomfort and a high cost. However lack of follow-up in a patient who has CIIBD could result in a missed opportunity to detect colorectal carcinoma at an earlier, and thus more treatable, stage. The distinction between Crohn's disease and UC becomes important if a patient requires surgical removal of their colon for disease that does not respond to medical therapy. In UC a total colectomy will be curative (since the disease only affects the colon) and the surgical procedure can include formation of an ileoanal pouch providing continence for the patient rather than an ileostomy (a stoma on the abdominal wall with an attached bag). In Crohn's disease such reconstructive surgery is not advised because of the risk of fistula formation between loops of small bowel.

Samples for the histopathological diagnosis of CIIBD are taken from the colon at endoscopic examination. These small biopsies (2 mm in diameter) are embedded in paraffin wax and thin sections are stained with haematoxylin and eosin to be examined by light microscopy. The histopathological diagnosis is made subjectively by trained histopathologists. Histopathologists acquire their knowledge and decision-making processes from textbooks and from teaching by more experienced histopathologists, often using a double-headed microscope so teacher and pupil are viewing the same image. In Britain a trainee histopathologist (who will be medically-qualified) is required to have 5 years postgraduate training in recognised laboratories before she/he can take the final examinations of the Royal College of Pathologists and be eligible to become a consultant histopathologist. The diagnostic process in histopathology is poorly-understood but is believed to be a combination of pattern recognition and some form of heuristic logic(Underwood, 1987). The performance of histopathologists in the diagnosis of CIIBD has been investigated by a few published studies but most of these have been carried out in specialist centres for identified studies and so are likely to represent better performance than the overall standard. However these studies produce a sensitivity for the diagnosis of Crohn's disease or ulcerative colitis in the range of 40% to 82% and a specificity for these diagnoses in the range of 73% to 98%(Frei and Morson, 1981; Thompson et al. 1985; Jenkins, 1988; Seldenrijk et al. 1991; Surawicz et al. 1994). There is thus scope for a decision-support system in the histopathological diagnosis of CIIBD to improve the sensitivity and PPV of fully-trained histopathologists and for use in the long training period required for histopathology novices.

The Dataset

Study population

The study population was drawn from large bowel endoscopic biopsies reported in the Department of Histopathology, Royal Hallamshire Hospital, Sheffield between 1990 and 1995 (inclusive) (Dube et al. 1998). Biopsies originating in diverted bowel, rectal stumps or pouches were excluded, as were those with a diagnosis of neoplasm. The diagnosis was confirmed by the finding of typical endoscopy appearances seen on video photographs in the clinical notes, subsequent bowel resection, pattern of disease on radiological investigation or microbiological culture results. In cases without confirmation by subsequent resection specimens this final diagnostic outcome was made with review of the patient's case notes. The biopsies were a mixed population of single distal biopsies and colonoscopic series from initial presentation and follow-up of disease.

The observed features

The biopsies were examined (blind to all clinical details) by a single experienced observer (SSC) using a computer interface which implements the BSG Guidelines for the Initial Biopsy Diagnosis of Suspected Chronic Idiopathic Inflammatory Bowel Disease(Jenkins et al. 1997) with digitised images representing examples of each histopathological feature (Cross et al. 1997). Some of the features are dichotomous variables, e.g. the presence or absence of mucosal granulomas, whilst others are ordinal categories, e.g. mucin depletion classified into none, mild, moderate or severe. The observed features and their coding are given in table 1. Observation was spread over a period of 9 months with no more than 30 biopsies observed in a single day.

Table 1. The observed features and other descriptors in the dataset.

Feature / Type / Range / 0 / 1 / 2 / 3
Year (=year in which the biopsy was taken)
Lab No (=unique laboratory accession number for specified year)
Age / Real integer / 14-84
Sex / Binary / 0,1 / Male / Female
Active inflammation (subset classifier, not observed feature) / Binary / 0,1 / No / Yes
Mucosal surface / Ordinal categorical / 0,1,2 / Flat / Irregular / Villous projections
Crypt architecture / Ordinal categorical / 0,1,2,3 / Normal / Mild / Moderate / Severe
Crypt profiles / Real integer / 2-7
Increased lamina propria cellularity / Binary / 0,1 / No / Yes
Mild & superficial increase in lamina propria cellularity / Binary / 0,1 / No / Yes
Increased lymphoid aggregates in lamina propria? / Binary / 0,1 / No / Yes
Patchy lamina propria cellularity? / Binary / 0,1 / No / Yes
Marked & transmucosal increase in lamina propria cellularity / Binary / 0,1 / No / Yes
Cryptitis extent / Ordinal categorical / 0,1,2,3 / None / Little / Moderate / Marked
Cryptitis polymorphs / Ordinal categorical / 0,1,2,3 / None / Few / Several / Many
Crypt abscesses extent / Ordinal categorical / 0,1,2,3 / None / Little / Moderate / Marked
Crypt abscesses polymorphs / Ordinal categorical / 0,1,2,3 / None / Few / Several / Many
Lamina propria polymorphs / Ordinal categorical / 0,1,2 / Absent / Focal / Diffuse
Epithelial changes / Ordinal categorical / 0,1,2,3 / Normal / Flattening / Degeneration / Erosion
Mucin depletion / Ordinal categorical / 0,1,2,3 / Normal / Mild / Moderate / Severe
Intraepithelial lymphocytes / Binary / 0,1 / Normal / Increased
Subepithelial collagen / Binary / 0,1 / Normal / Increased
Lamina propria granulomas / Binary / 0,1 / Absent / Present
Submucosal granulomas / Binary / 0,1 / Absent / Present
Basal histiocytic cells / Binary / 0,1 / Absent / Present
Confirmed diagnosis
Method of confirmation
Initial pathologists diagnosis
Observing pathologists diagnosis

Partitioning of the dataset

The Excel file contains 4 worksheets of data that include one with all cases and three subsets for analysis.

All cases

This worksheet contains all the cases in the study in the order in which they were observed. It is included only as a reference because it contains cases that are not suitable for analysis. These are cases of other diseases, such as mucosal prolapse and melanosis coli, which have small numbers of cases (and so are unlikely to be adequately represented in training and test sets) and which the BSG defined observations do not cover the specific diagnostic features of these diseases (e.g. smooth muscle passing up between crypts in mucosal prolapse, pigment-containing macrophages in the lamina propria in melanosis coli).

All IBD&normal

This worksheet contains 809 cases of which 165 are normal, 473 UC and 171 Crohn’s disease. These cases are all verified cases of these outcomes with active or inactive inflammation. The cases are in a randomised order and this order should be retained when training and testing. The first 270 cases should be used as a training set, the next 270 cases as an optimisation/verification set and the final 269 cases as a test set. If the methodology does not require a separate optimisation/verification set then the training set can be the first 540 cases. The outcome can be taken as normal v. CIIBD (i.e. UC and Crohn’s combined into a single CIIBD category) or as a tripartite normal v. Crohn’s v. UC outcome. A sequential process could be applied to this dataset with an initial division into normal or CIIBD and then a subsequent division of the CIIBD set into UC or Crohn’s (or inactive and active inflammation and a third division of actively-inflamed cases into UC or Crohn’s).

All IBD

This worksheet contains 644 cases of which 473 are UC and 171 Crohn’s disease. These cases are all verified cases of these outcomes with active or inactive inflammation. The cases are in a randomised order and this order should be retained when training and testing. The first 215 cases should be used as a training set, the next 215 cases as an optimisation/verification set and the final 214 cases as a test set. If the methodology does not require a separate optimisation/verification set then the training set can be the first 430 cases. The outcome will be Crohn’s v. UC.

Active IBD

This worksheet contains 370 cases of which 283 are UC and 87 Crohn’s disease. These cases are all verified cases of these outcomes with active or inactive inflammation. The cases are in a randomised order and this order should be retained when training and testing. The first 124 cases should be used as a training set, the next 123 cases as an optimisation/verification set and the final 123 cases as a test set. If the methodology does not require a separate optimisation/verification set then the training set can be the first 247 cases. The outcome will be Crohn’s v. UC.

Working practices for analysis:

  1. Use the randomised order of data as given in the worksheets
  1. Use the partitioning into training, optimisation/verification and test sets as specified above for each set
  1. For any process with a variable threshold express the results on a receiver operating characteristic (ROC) curve and use McNemar’s test to compare the area under the curve with other similar processes. Store the coordinates of the ROC curves in clearly labelled Excel worksheets so that these can be imported into graphing programmes
  1. At an optimal threshold calculate the sensitivity, specificity, predictive value of a positive result (PV +ve), predictive value of a negative result (PV -ve) and the kappa statistic – all with 95% confidence intervals. Discuss what these optimal thresholds might be with SSC e.g. is sensitivity or specificity most important in separating UC from Crohn’s in biopsies with active inflammation

Human performance:

All IBD&normal initial pathologists' diagnosis

Confirmed outcome
Normal / Crohn's disease / Ulcerative colitis / Totals
Normal / 157 / 24 / 22 / 203
Crohn's disease - highly suggestive / 0 / 62 / 0 / 62
Crohn's disease - suggestive / 0 / 14 / 5 / 19
Initial pathologists' diagnosis / Ulcerative colitis - highly suggestive / 0 / 0 / 219 / 219
Ulcerative colitis - suggestive / 0 / 2 / 54 / 56
CIIBD indeterminate / 3 / 41 / 146 / 190
Infective type colitis / 0 / 0 / 3 / 3
Lymphocytic colitis / 0 / 0 / 0 / 0
Melanosis coli / 2 / 1 / 0 / 3
Inflammation - unclassified / 3 / 27 / 24 / 54
Totals / 165 / 171 / 473 / 809

If the CIIBD categories are combined and all other categories combined as not CIIBD this table results:

Confirmed outcome
Not CIIBD / CIIBD / Totals
Initial / Not CIIBD / 162 / 101 / 263
pathologists' / CIIBD / 3 / 543 / 546
diagnosis / Totals / 165 / 644 / 809

Sensitivity84%(82-87%)

Specificity98%(96-99%)

PV +ve result99%(98-99%)

PV -ve result62%(56-67%)

Kappa0.68(0.62-0.73)

If the highly suggestive and suggestive of Crohn's disease categories are combined and all other categories combined as not Crohn's disease this table results:

Confirmed outcome
Not Crohn's disease / Crohn's disease / Totals
Initial / Not Crohn's disease / 633 / 95 / 728
pathologists' / Crohn's disease / 5 / 76 / 81
diagnosis / Totals / 638 / 171 / 809

Sensitivity44%(37-52%)

Specificity99%(98-99%)

PV +ve result94%(89-99%)

PV -ve result87%(85-89%)

Kappa0.54(0.46-0.63)

If the highly suggestive and suggestive of UC categories are combined and all other categories combined as not UC this table results:

Confirmed outcome
Not UC / UC / Totals
Initial / Not UC / 334 / 200 / 534
pathologists' / UC / 2 / 273 / 275
diagnosis / Totals / 336 / 473 / 809

Sensitivity58%(53-62%)

Specificity99%(98-99%)

PV +ve result99%(98-99%)

PV -ve result63%(58-67%)

Kappa0.53(0.47-0.58)

All IBD&normal observing pathologist's diagnosis

Confirmed outcome
Normal / Crohn's disease / Ulcerative colitis / Totals
Normal / 151 / 64 / 90 / 305
Crohn's disease - highly suggestive / 0 / 15 / 2 / 17
Crohn's disease - suggestive / 0 / 11 / 12 / 23
Observing pathologist's diagnosis / Ulcerative colitis - highly suggestive / 1 / 14 / 138 / 153
Ulcerative colitis - suggestive / 0 / 9 / 64 / 73
CIIBD indeterminate / 3 / 30 / 108 / 141
Infective type colitis / 0 / 3 / 2 / 5
Lymphocytic colitis / 0 / 0 / 1 / 1
Melanosis coli / 1 / 1 / 1 / 3
Inflammation - unclassified / 9 / 24 / 55 / 88
Totals / 165 / 171 / 473 / 809

If the CIIBD categories are combined and all other categories combined as not CIIBD this table results:

Confirmed outcome
Not CIIBD / CIIBD / Totals
Observing / Not CIIBD / 315 / 87 / 402
pathologist's / CIIBD / 4 / 403 / 407
diagnosis / Totals / 319 / 490 / 809

Sensitivity82%(79-86%)

Specificity99%(98-99%)

PV +ve result99%(98-99%)

PV -ve result78%(74-82%)

Kappa0.77(0.73-0.82)

If the highly suggestive and suggestive of Crohn's disease categories are combined and all other categories combined as not Crohn's disease this table results:

Confirmed outcome
Not Crohn's disease / Crohn's disease / Totals
Observing / Not Crohn's disease / 624 / 145 / 769
pathologist's / Crohn's disease / 14 / 26 / 40
diagnosis / Totals / 638 / 171 / 809

Sensitivity15%(10-21%)

Specificity98%(97-99%)

PV +ve result65%(50-80%)

PV -ve result81%(78-84%)

Kappa0.18(0.07-0.29)

If the highly suggestive and suggestive of UC categories are combined and all other categories combined as not UC this table results:

Confirmed outcome
Not UC / UC / Totals
Observing / Not UC / 312 / 271 / 583
pathologist's / UC / 24 / 202 / 226
diagnosis / Totals / 336 / 473 / 809

Sensitivity43%(38-47%)

Specificity93%(90-96%)

PV +ve result89%(85-93%)

PV -ve result54%(49-58%)

Kappa0.32(0.26-0.38)

All IBD initial pathologists' diagnosis

Confirmed outcome
Crohn's disease / Ulcerative colitis / Totals
Normal / 25 / 22 / 47
Crohn's disease - highly suggestive / 62 / 0 / 62
Crohn's disease - suggestive / 14 / 5 / 19
Initial pathologists' diagnosis / Ulcerative colitis - highly suggestive / 0 / 219 / 219
Ulcerative colitis - suggestive / 2 / 54 / 56
CIIBD indeterminate / 41 / 146 / 187
Infective type colitis / 0 / 3 / 3
Melanosis coli / 1 / 0 / 1
Inflammation - unclassified / 26 / 24 / 50
Totals / 171 / 473 / 644

If the highly suggestive and suggestive of Crohn's disease categories are combined and all other categories combined as not Crohn's disease this table results:

Confirmed outcome
Not Crohn's disease / Crohn's disease / Totals
Initial / Not Crohn's disease / 468 / 95 / 563
pathologists' / Crohn's disease / 5 / 76 / 81
diagnosis / Totals / 473 / 171 / 644

Sensitivity44%(37-52%)

Specificity99%(98-99%)

PV +ve result94%(89-99%)

PV -ve result83%(80-86%)

Kappa0.52(0.44-0.61)

If the highly suggestive and suggestive of UC categories are combined and all other categories combined as not UC this table results:

Confirmed outcome
Not UC / UC / Totals
Initial / Not UC / 169 / 200 / 369
pathologists' / UC / 2 / 273 / 275
diagnosis / Totals / 171 / 473 / 644

Sensitivity58%(53-62%)

Specificity99%(97-99%)

PV +ve result99%(98-99%)

PV -ve result46%(41-51%)

Kappa0.41(0.35-0.48)

All IBD observing pathologist's diagnosis

Confirmed outcome
Crohn's disease / Ulcerative colitis / Totals
Normal / 64 / 90 / 154
Crohn's disease - highly suggestive / 15 / 2 / 17
Crohn's disease - suggestive / 11 / 12 / 23
Observing pathologist's diagnosis / Ulcerative colitis - highly suggestive / 14 / 138 / 152
Ulcerative colitis - suggestive / 9 / 64 / 73
CIIBD indeterminate / 30 / 108 / 138
Infective type colitis / 3 / 2 / 5
Lymphocytic colitis / 0 / 1 / 1
Melanosis coli / 1 / 1 / 2
Inflammation - unclassified / 24 / 55 / 79
Totals / 171 / 473 / 644

If the highly suggestive and suggestive of Crohn's disease categories are combined and all other categories combined as not Crohn's disease this table results:

Confirmed outcome
Not Crohn's disease / Crohn's disease / Totals
Observing / Not Crohn's disease / 459 / 145 / 604
pathologist's / Crohn's disease / 14 / 26 / 40
diagnosis / Totals / 473 / 171 / 644

Sensitivity15%(10-21%)

Specificity97%(96-99%)

PV +ve result65%(50-80%)

PV -ve result76%(73-79%)

Kappa0.16(0.05-0.28)

If the highly suggestive and suggestive of UC categories are combined and all other categories combined as not UC this table results:

Confirmed outcome
Not UC / UC / Totals
Observing / Not UC / 148 / 271 / 419
pathologist's / UC / 23 / 202 / 225
diagnosis / Totals / 171 / 473 / 644

Sensitivity43%(38-47%)

Specificity87%(81-92%)

PV +ve result90%(86-94%)

PV -ve result35%(31-40%)

Kappa0.20(0.13-0.27)

Active IBD initial pathologists' diagnosis

Confirmed outcome
Crohn's disease / Ulcerative colitis / Totals
Normal / 0 / 0 / 0
Crohn's disease - highly suggestive / 48 / 0 / 48
Crohn's disease - suggestive / 13 / 4 / 17
Initial pathologists' diagnosis / Ulcerative colitis - highly suggestive / 0 / 143 / 143
Ulcerative colitis - suggestive / 2 / 40 / 42
CIIBD indeterminate / 20 / 85 / 105
Infective type colitis / 0 / 3 / 3
Inflammation - unclassified / 4 / 8 / 12
Totals / 87 / 283 / 370

If the highly suggestive and suggestive of Crohn's disease categories are combined and all other categories combined as not Crohn's disease this table results:

Confirmed outcome
Not Crohn's disease / Crohn's disease / Totals
Initial / Not Crohn's disease / 279 / 26 / 305
pathologists' / Crohn's disease / 4 / 61 / 65
diagnosis / Totals / 283 / 87 / 370

Sensitivity70%(60-80%)

Specificity99%(97-99%)

PV +ve result94%(88-99%)

PV -ve result91%(88-95%)

Kappa0.75(0.67-0.84)

If the highly suggestive and suggestive of UC categories are combined and all other categories combined as not UC this table results:

Confirmed outcome
Not UC / UC / Totals
Initial / Not UC / 85 / 100 / 185
pathologists' / UC / 2 / 183 / 185
diagnosis / Totals / 87 / 283 / 370

Sensitivity65%(59-70%)

Specificity98%(95-99%)

PV +ve result99%(97-99%)

PV -ve result46%(39-53%)

Kappa0.45(0.36-0.54)

Active IBD observing pathologist's diagnosis

Confirmed outcome
Crohn's disease / Ulcerative colitis / Totals
Normal / 0 / 1 / 1
Crohn's disease - highly suggestive / 13 / 2 / 15
Crohn's disease - suggestive / 11 / 10 / 21
Observing pathologist's diagnosis / Ulcerative colitis - highly suggestive / 13 / 130 / 143
Ulcerative colitis - suggestive / 8 / 43 / 51
CIIBD indeterminate / 25 / 67 / 92
Infective type colitis / 3 / 2 / 5
Inflammation - unclassified / 14 / 28 / 42
Totals / 87 / 283 / 370

If the highly suggestive and suggestive of Crohn's disease categories are combined and all other categories combined as not Crohn's disease this table results:

Confirmed outcome
Not Crohn's disease / Crohn's disease / Totals
Observing / Not Crohn's disease / 262 / 61 / 323
pathologist's / Crohn's disease / 21 / 26 / 47
diagnosis / Totals / 283 / 87 / 370

Sensitivity30%(20-40%)