Medical Image Analysis: Techniques and Applications

BMT-Exam 8D030 1/5

17-06-2005

Exam of 8D030:

Medical Image Analysis: Techniques and Applications

17-06-2005

In the exam there are 5 questions. Not all the questions have the same weight.

Q1=5 Q2=2 Q3=1 Q4=2

The questions refer to figures which are in the last pages of the exam.
Keep your answers concise and to the point. Use equations whenever you can to clarify your answers, and make sure that all variables are described.
If an algorithm or method has to be described, give a step by step and structured solution.

Good luck !

Q1.We want to distinguish between malign and benign tumors in mammography. Assume that malign as well as benign tumors appear equally often. Tumor images can be found in figure 1 and its segmentations is shown in the figure 2.The names of benign tumors start with b and malign with m. The malign tumors have been marked with a square. The order of the images is not the same in figure 1 than in figure 2, but they are the same tumors. Notice that the original images have been normalized to a standard size in the figure1 and 2.

Define a segmentation technique to segment the tumorsof figure 1 to obtain the contours in figure 2 as automatically as possible. Explain what assumptions you are making and why you made this choice. Explain what will be the pros and contras of using this technique.
Give three features that would allow you to distinguish between these classes. Describe why these can be good features.
What is the computational complexity of calculating these features?
Given these features design a classifier.Assume that the cost of misclassifying a malign tumor is 3 times the cost to misclassify a benign. Explainwhat other assumptions you are making.
How can we evaluate the performance of this classifier with the information and data that you have? What would you be able to tell with the results of this evaluation?
Which assumptions did you need to make if any?
We want to compare our computer aided diagnoses system with the performance achieved nowadays in clinical practice by radiologists. What information would you need to gather to be able to do the comparison? How could you make the comparison to decide whether the system performs better than the radiologists?
Imagine that you find out looking at the feature space that each class seems to cluster in more than one location in the feature space.
Would that have influence in your choice of the classifier? If you would choose another classifier which one? Which assumptions are you making? Describe how you wouldimplement it? What are the disadvantages of this method?

Q2.Assume that we are able to produce vertebra prostheses. The prostheses are adapted to each patient, and based in a CT scan of the patient vertebra, figure 3. Wehave a model of our ideal vertebra based on the segmentation of this CT scan. Once the prosthesis has been generated (figure 4), we also make a CT scan and segment the vertebra in the scan.The shape of the prosthesis usually has some imperfections. We want to use this scan to control the quality of the prosthesis, meaning its similarity to the model. Assume that we checked the volume with relation to the original and it is the same, V. However the same volume can still give different shapes. We consider that the prostheses are correct if the difference in shape, amount of volume that is placed wrongly or is missing in the prosthesis, does not exceed a 10% of the total volume.

Describe a method to calculatethe shape difference using the segmented data set of the vertebra (i.e., segmented data set of the prosthesis and segmented data set of the patient vertebra).

Q3.We want to distinguish between two states of nature A and B.A appears 2 times more frequently than B. We calculate two features from which we know are statistically independent. We also know that each state of nature should have one unique characteristic value for each class, but due to random noise the actual values differ.We also know that the distribution of the noise for each feature is the same for both classes.
We obtained the following values as training set:

A / B
x1 / x2 / x1 / x2
1.92 / 1.44 / 2.55 / 6.43
4.70 / 2.65 / 2.19 / 9.51
3.41 / -0.57 / 0.48 / 6.44
4.50 / 0.13 / 0.91 / 5.56
3.32 / 1.66 / 2.12 / 4.49

Using this information design a classifier. Explainwhat assumptions you are making.

This is the same that the 15-06-2006 exam but with other numbers.

Based on this classifier what would be the state of nature of the feature vectors (x1,x2)=(1.97, 2.18) and (x1,x2)=(3.13, 4.61)

Q4.We want to distinguish between two states of nature A and B.Class A appears 3 times more often than class B. We know that a good feature that distinguishes these classes is a discrete feature x, which is known to have a Poisson distribution for both A and B. A Poisson distribution is given by where is a parameter. We also know a priory that the probability density of the parameter for class A,, is a normal distribution with mean and standard deviation , . For class B,

Design a classifier based on this information. Assume you have as many training data as you need
What will be the error of this classifier, assuming that you have an infinite amount of data to train it?

FIGURES______

Figure 1

Figure 2

Figure 3Figure 4

Q1.

Here there are more than one correct answer. So, as long as a good justification is used, based on the evidence presented in the question, the answer will be correct. I will just give an answer as a possible solution.

If we look at the images in figure 1, they have different grey intensities. The tumors look like to have a concentration of higher values and a more homogeneous texture, in comparison to the rest of the image. It also looks like if we calculate borders in these images they will not give close contours for the tumors, although they might indicate parts of them.

Assumptions

-The pixel with the highest intensity value is inside the tumor with a minimum distance r to the border.

-That part of the borders of the tumor can be extracted or indicated by a high gradient magnitude (edge detection)

Method

A technique that can be used for these conditions are deformable models, we can introduce some knowledge to keep the contour close and connected.

We can use geometric or parametric deformable models. Since the data looks noisy, I would chose parametric deformable models, where we are sure we will obtain just one object, however, we will have the problem that there are cavities and parametric deformable models cannot deal so good with cavities.

We define an initial curveX(s,0)which is a parameterized circle with a smaller radius than r. The circle is in a discrete form such that we just use points defined at X(si,0)where.

Algorithm

- Search for the pixel with the maximum value.

- Center the initial curveX(s,0)(circle) with a small radius

- Define internal forcesFi(X(si)) and external forcesFe(X(si)) for each discrete point i of the contour.

- Until| Fi(X(si))+Fe(X(si))|< Epsilon for all i

- For each ido

X(si)= X(si)+ step (Fi(X(si))+ Fe(X(si))

We need to define the internal and external forces.

- As internal forces we can use elasticity (to avoid too much stretching) and curvature ( to keep it a smooth contour as possible) and a pressure force (to move outwards),

where is the inwards normal of the contour.

The derivatives of the contour are approximated with central differences and are parameters that need to be tuned.

- As external forces, we can use the f the gradient magnitude

where the convolution of the image f(x) with the gradient of a Gaussian with standard deviation , and is a parameter that need to be tuned.

Finally,

Since we expect that cavities will be present, we can use an improvement of the algorithm by implementing some of the possible solutions adding. a force based on signed distance, dynamic distance force

pros

+ The method will always give a unique close contour

+ We are able to introduce some shape constrains to compensate for the noisy of the images.

contras

- There are 5 parameters ( and ) that need to be tuned. It will be difficult to find the good set of parameter values.

- Parametric deformable models can have problems dealing with cavities. Some more complexity has to be added to the algorithm to be able to deal with them. Malign tumors do seem to have very irregular shape and is important to capture it.

There are several answers possible here. I just give a possible solution.

Feature 1- f1: Circularity:

It looks like benign tumors have a more circular form than malign tumors. Malign tumors have a more star like form.

We calculate A as the sum of pixels that are white (foreground) in the mask. P is calculated by summing the foreground pixels as follows:

This feature is invariant to Rotation, Translation and Isotropic Scaling. It is also invariant to grey value changes.

Feature 2- f2: Entropy of the image histogram. I assume that in the malign tumors there is more variation in intensities than in the benign images (due to more blood vessels).

The histogram is calculated using: where bi is the grey value i, N(bi)

is the number of pixels which are inside the contour of the mask. N is all the pixels inside the contour.

The entropy is calculated using:

where Lis the number of grey values

This feature is invariant to any affine transformation. It is also invariant to grey value shifts.

Feature 3 – f3: If we assume that malign tumors have higher intensityvalues( again due to more blood vessels) than the benign tumors. We can calculate the mean of the intensity values as a feature. We cangenerate a histogram as done in feature 2.

Afterwards we can calculate the mean of this histogram as feature.

The mean is calculated using where Lis the number of grey values

This feature is invariant to any affine transformation. It is not invariant to grey value shifts or changes.

1.- where N and M are the dimensions of the picture

2-3.- where N and M are the dimensions of the picture. In this case the computations are N*M+L, but the order is still M*N which overrules the L(entropy-mean calculation) usually L<M*N. So the complexity is ruled by building the histogram more than computing the entropy-mean.

There are several answers possible here. I just give a possible solution.

We can use a Bayesian Decision rule to minimize the risks (costs). To start with we need the Bayesian rule to calculate posteriori probabilities.

where wirepresents the state of nature w1 =benign and w2 =malign, x is the feature vector.

We need to be able to define the likelihood p(x| wi) and the priori probability P(wi).

For the priori probability we have no information apart of the training data. If we assume that this is a sample of real clinical situation. We can calculate the priori probability by:

We can also assume that the features probability density function for each class behaves as a Gaussian. So each class has a prototype feature value but due to noise this is spread. So the likelihood has this form:

where mi and are the parameters that need to be defined for each state of naturewi.

In order to estimate these parameters, we can use the maximum-likelihood estimation for a multivariate Gaussian. This is done by using our training data of each state of nature and calculating the different feature vectors,, for each sample k in our training data of the state of nature i. Once we have the feature vectors we can calculate:

Given that we can now calculate the posteriori probability for each new element.

We can then also calculate the conditional risks

where is the action of choosing as state of nature benign and is the action of choosing as state of nature malign. The cost of doing the right decision is just 0. C is the cost of classifying a tumor as malign when indeed it is benign.

The final classification rule can be written as:

e.(very similar to what was asked in exam 15-06-2006)

We can calculate the performance of the classifier using a leave-one-out method (or a cross-validation.)

We can calculate the False Positives, True Positives, False Negatives and True Negatives. Then sensitivity and specificity can be calculated for different thresholds of the risks, and the ROC curve representative of the classifier can be built.

We can see how good the method perform in relation to the ideal classifier (specificity=1 and sensitivity=1) and to a guessing algorithm just the diagonal.

We are assuming that the amount of data that we have is fully representative of the problem, the elements we have are good classified (ground truth is really truth) and that the training set is rich enough to estimate the parameters of the Gaussian accurately.

(the same answer to what was asked in exam 15-06-2006 is also valid here)

In order to compare it to the performance of the physicians we need to obtain an evaluation of the physicians performance.

We would like to build a ROC curve of human observers. We ask human observers to evaluate several elements by giving a scale from definitely benign – to definitely malign.

Once this information is gathered again changing classification threshold a ROC curve can be build.

We can calculate the area under the curve (AUC) for the human performance and for the classifier (see e.) and compare them. If one is higher than the other we can say that one performs better than the other in general.

The fact that classes have more than one cluster means that there is more than one mode or pick, and the probability density function of the likelihood cannot have the form of a multivariate Gaussian. So, it will influence the choice we made.

In this case, we can assume that we do not know anything about the form of the probability density function and therefore any non-parametric method would be more adequate. For example we could use Parzen windows. There are no much assumptions made using this method, just that enough training data is present to estimate the likelihood.

The implementation of the method is by estimating the likelihood using the following:

where V=h3 .

Then the rest of the classifier would be as described in point d.

The main disadvantage of this method is that we had to choose the window function and its width which are arbitrary choices. If the window function fits the local behavior of the real likelihood probability density function and ni is large enough the real likelihood can be good approximated. However, if it is not the case, the approximation to the real probability density function can be very poor. We can just ensure convergence to the real likelihood function if we have an infinite amount of samples.

A clear disadvantage with the first proposed method is that more samples are needed to approximate the distribution.

Q2.

- We calculate the centroid for each of the images

where where is one of the images of the vertebra.

- We can calculate the inertia matrix for each image:

- If we do eigenanalysis we obtain three eigenvectors and eigenvalues that indicate the main directions of the vertebra.

- We can use the eigenvectors and the centroid of each vertebra image to define a frame for each vertebra.

We define :

and where and are the eigenvectors multiplied by the corresponding eigenvalues and ordered by eigenvalue . is the centroid of each vertebra image i.

We also had to make sure that the sign of the eigenvectors correspond with a right hand defined frame.

So imagine we want to transform image one to the frame of image two. To achive that we do the following:

Then we can calculate an image with just the difference between the two vertebras:

We can sum all the pixels that are not zero in g(x) and this is an approximation of the differences.

We can calculate the percentage of V that it represent and see whether it is larger or smaller than 10%.

- We assume that the features distribution for both classes and features has a Gaussian form.

- We assume we have enough data to estimate the likelihood p(x|A) and p(x|B).

- We use maximum likelihood to estimate the parameters so we calculate the mean and the covariance matrix

- Since the features are statistically independent we can deal with them independently. Just need to calculate the standard deviation for each feature and class (diagonal elements of the covariance matrix)

and

- Since class A appears 2 times more frequently than B. The a priori probabilities are the following

and

Then we can calculate the posterior probability by using the Bayes formula:

similar for class B where

Then the decision is for a new element e is:

b. We just apply the previous formulas and we obtain:

The first element would be class A (posteriori probability of 0.90)

The second element would be class B (posteriori probability of 0.77)

Q4.

(The answer to this question is to use the Bayesian Estimator. This has not been given in the course 2005-1006, so you have the information in the handouts but this was not explained and will not enter in the exam)

- A priori probability is P(A)=0.75 and P(B)=0.25

- We need to estimate the parametergiven the form of the distribution