Appendix B: Set Theory and Probability

B.1 Concepts from Set Theory

Set theory is used to indicate membership of elements in a set, and is used in probability theory. If set A contains the elements a1, a2, …an, then

A = {a1, a2, …an} (B.1)

a1 is an element of A, thus

(B.2)

but b1 is not an element of A, thus

(B.3)

An empty or null set is denoted by

φ = { } (B.4)

If set A is a sub-set of a larger set D, then

(B.5)

Sets can be represented in a Venn diagram, where everything is traditionally placed inside a large rectangle that represents the universal set U. Subsets of U appear inside the rectangle as quasi-circles. The complement of a set A is given by that portion of the rectangle outside of A and is denoted AC. The notation

(B.6)

reads as “AC is a set of elements, w, such that w‘s elements are not elements of A”.

The union of two sets A and B is the set of all elements belonging to either A or B, or both, and is denoted by

(B.7)

and is shown in figure B.1(i). And the intersection of two sets A and B is the set of all elements belonging to both A and B, and is denoted by

(B.8)

and is shown in figure B.1(ii).

Figure B.1 (i) Union and (ii) intersection of sets A and B, shown as light gray regions.

Morphological operators such as dilate and erode can be defined in terms of set theory [Gonzalez and Woods, 2008], where A is an image and B is a structuring element. For example, the dilation of A by B can be written as

(B.9)

where is the reflection of B, defined as

for (B.10)

and (B)z indicates a translation by z, such that

for (B.11)

Equation (B.9) is based on reflecting the structuring element B and shifting it by z; and then finding the set of all displacements, z, such that and A overlap by at least one element. This is analogous to convolution except that we are using logic operations in place of arithmetic operations. In practice however we will use a more intuitive definition of dilation and the other morphological operators (Chapter 9).

B.2 Boolean Algebra/ Logic Operations

Boolean algebra captures the essential properties of both set and logic operations. Specifically, it deals with the set operations of intersection, union and complement and the logic operations of AND, OR and NOT. Logic operations are very useful when dealing with binary images and provides a powerful tool when implementing morphological operations (chapter 9). They are used in the design of integrated circuits (ICs) in electronics, where an output voltage depends on one or more input voltages. The two states 0 and 1 represent low and high voltage although more generally in logic they represent “false” and “true”. Logic operations are described by logic gates and their corresponding truth tables.

The output of the logical AND operation is “true” (1) only when both inputs are “true” (1), i.e., the output is true if input A and input B are true (Fig. B.2). The AND operation is written as A.AND.B (or as A.B). The output of the OR operation is “true” (1) when either input A or input B, or both, is “true” (1) (Fig. B.2). It is denoted by A.OR.B (or as A+B). An exclusive OR operator (XOR) exists, whose output is “true” (1) when either input A or input B, but not both, is “true” (1). The NOT operation produces the complement of the (single) input, and can be incorporated into other gates to produce the NAND and NOR operations (Fig. B.2). In fact it can be shown that all operations can be built from NAND operations, or alternatively from NOR operations. When used with binary images these operations operate on the images on a pixel-by-pixel basis, taking a pixel from an image A with a correspondingly positioned pixel from image B to produce a pixel in an output image.

The logic operations have a one-to-one correspondence with set operations, except that the former operate only on binary variables. For example, the intersection operation (Ç) in set theory reduces to the AND operation for binary variables; and the union (È) reduces to the OR operation for binary variables.

Figure B.2 Logic operators and truth tables.

B.3 Probability

In probability a random experiment is the process of observing the outcome of a chance event. The elementary outcomes are all the possible results of the experiment, and they comprise the sample space, S. As examples, the sample space for the single toss of a coin is

S = {H, T}, where H is heads and T is tails:

for two tosses of a coin it is

S = {(H, H), (H, T), (T, H), (T, T)}:

for the throw of a single die it is

S = {1, 2, 3, 4, 5, 6}.

An event consists of one or more possible outcomes of the experiment. The probability or likelihood of an event occurring is the proportion of its occurrence in a large number of experiments. For example, if there is a finite number of outcomes, N, and each is equally likely, then the probability of each outcome, P, is 1/N; and the probability of an event consisting of M outcomes is M/N. A probability of 0 indicates that an event is impossible; and a probability of 1 indicates that it is certain.

Thus for an event A

0 ≤ P(A) ≤ 1 (B.12)

P(S) = 1 (B.13)

P(Ac) = 1 – P(A) (B.14)

Worked example Throwing two dice

Throw two dice, a white die and a black die. What is the probability of event A, that the black die shows a 6? What is the probability of event B, that the numbers on the dice add to 4?

The sample space, S, is shown in Fig. B.3. Event A comprises the 6 elements with the black die showing 6, so that the probability of event A is 6/36, i.e. 1/6. Event B comprises the 3 outcomes where the numbers on the two dice add to 4, so that its probability is 3/36, i.e. 1/12.

Figure B.3 Throwing two dice.

Events, A and B, are independent if

P(A.AND.B) = P(A). P(B) (B.15)

This would be true, for example, in the throwing of two dice; the score on the second is independent of the throw on the first.

Worked example

What is the probability of event E, getting at least one 6 in four rolls of a single die?

This type of problem is best solved by finding the probability of not getting any 6’s, and then taking the complement. The probability of not getting any 6’s on four rolls is (5/6 x 5/6 x 5/6 x 5/6), i.e. 0.432. The complement of this, the probability of getting at least one 6, is 0.518.

The sample space, S, for throwing three dice comprises outcomes such as (1, 1, 1), (1, 2, 1) etc, and can be considered as a three-dimensional sample space, comprising six slices. The first slice sets die #3 to show 1, and comprises the 36 outcomes of die #1 and die #2; the second slice sets die #3 to show 2, and comprises the 36 outcomes of die #1 and die #2; and so on.

The general addition rule is illustrated in figure B.4:

P( A.OR. B) = P(A) + P(B) – P(A.AND.B) (B.16a)

or, equivalently,

P( A. È. B) = P(A) + P(B) – P(A. Ç.B) (B.16b)

where the third term on the right hand side has to be subtracted because it has already been included twice (Fig. B.4).

Figure B.4 The general addition rule

Mutually exclusive events are those that cannot occur in the same experiment, e.g. throw a die and get A = number is even, B = number is 5. If events A, B, C… are mutually exclusive, they do not overlap (Fig. B.5) and the addition rule reduces to

P( A.OR. B.OR.C) = P(A) + P(B)+ P(C) (B.17)

Figure B.5 Mutually exclusive events.

A contingency table (Fig. B.6) is used to record and analyze the relationship between two or more variables, most usually nominal or categorical variables, i.e. variables which are classifying labels, such as sex, race, birthplace. The numbers in the right-hand column and the bottom row are called marginal totals and the figure in the bottom right-hand corner is the grand total.

Figure B.6 A contingency table

Provided the entries in the table represent a random sample from the population, probabilities of various events can be read from it or calculated, e.g. the probability of selecting a male, P(M), is 120/200, i.e. 0.6; the probability of selecting a person under 30 years old, P(U), is 100/200, i.e. 0.5. The probability of selecting a person who is female and under 30 years old, P(F.AND.U) or P(FÇU), is 40/200, i.e. 0.2. This is often called the joint probability. (Note that the events F and U are independent of each other, so that the joint probability is equal to the product of the individual probabilities, 0.4 x 0.5). The probability of selecting a person who is male or under 30 years old, P(M.OR.U) or P(MÈU), is 160/20, i.e. 0.8.

Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P(A|B), and is read "the probability of A, given that B is true". Conditional probability can be explained using the Venn diagram shown in figure B.7. The probability that B is true is P(B), and the area within B where A is true is P(A.AND.B), so that the conditional probability of A given that B is true is

P(A|B) = P(A.AND.B) / P(B) (B.18a)

Figure B.7 Conditional probability

Note that the conditional probability of event B, given A, is

P(B|A) = P(A.AND.B) / P(A) (B.18b)

(If the events A and B are independent, equation B.18 would reduce to

P(A|B) = P(A) (B.19a)

and

P(B|A) = P(B) (B.19b)

which can be used as an alternative to equation B.15 as a test for independence).

The general conditional probability definition, equation B.18, can be cross-multiplied to give the so-called multiplicative rule

P(A.AND.B) = P(A|B).P(B)

= P(B|A).P(A) (B.20)

Manipulating this further, by equating the equivalent two terms on the right, and re-arranging gives Bayes’ Rule:

P(A|B) = P(B|A).P(A)

P(B) (B.21)

where P(A|B) is known as the posterior probability.

Bayes’ theorem has many applications, one of which is in diagnostic testing. Diagnostic testing of a person for a disease typically delivers a score, e.g. a red blood cell count, which is compared with the range of scores that random samples from normal and abnormal (diseased) populations obtain on the test. The situation is shown in figure B.8, where the ranges of scores for the normal and abnormal population samples are shown as Gaussian distributions. We would like to have a decision threshold, below which a person tested would be diagnosed disease-free and above which the person would be diagnosed as having the disease. The complication is that the two ranges of scores overlap, and the degree of overlap will affect the goodness of our diagnosis of the tested patient, viz. the more the overlap, the less likely that our diagnosis will be definitive.

Figure B.8 Diagnostic test score distributions for normal (1) and abnormal (2) population samples.

The decision threshold should be in the region of overlap, i.e. between max 1 and min 2. It distinguishes between those who will receive a negative diagnosis (test negative) from those who will receive a positive diagnosis (test positive). The distribution of the scores from the normal population (1) is split into two regions, “d” (below the threshold) and “c” (above the threshold), and the distribution of the scores from the abnormal or diseased population (2) is split into “b” (below the threshold) and “a” (above the threshold).

The sample space thus can be arranged in a contingency table (Fig. B.9), showing event A (actually having the disease) and event B (having a positive result that indicates having the disease). Thus a person may or may not have the disease, and the test may or may not indicate that he/she has the disease. There are four mutually exclusive events. A person from region “a” tests positive and actually has the disease, and is known as a true positive (TP). A person from region “b” tests negative although he/she actually has the disease, and is known as a false negative (FN). A person from region “c” tests positive but does not have the disease, and is known as a false positive (FP). And a person from region “d” tests negative and does not have the disease, and is known as a true negative (TN). A good test would have a large TP and TN, and a small FP and FN, i.e. little overlap of the two distributions.

Figure B.9 Contingency table for a diagnostic test.

The corresponding Venn diagram is shown in figure B.10, although the areas are not drawn to scale.

Figure B.10 Venn diagram for diagnostic testing.

The traditional measures of the diagnostic value of a test are its sensitivity (the (conditional) probability of the test identifying those with the disease given that they have the disease) and its specificity (the (conditional) probability of the test identifying those free of the disease given that they do not have the disease). With reference to Fig. B.8 and B.9

sensitivity, P(B|A) = TP/ (TP + FN) = a/ (a + b) (B.22)

specificity, = TN/ (TN + FP) = d/ (d + c) (B.23)