Machine Learning Project

4 minute read

Background

In medical decision-making, a doctor observes ‘features’ of a patient and makes a decision on the basis of these features whether the patient is healthy or has a particular disease. In this question, you are asked to train and test a machine learning classifer which can discriminate people who are healthy or Parkinson’s based on some training data “parkinsons.csv”. There are two features (dimension of y is d = 2), with feature y1 in column 1, y2 in column 2, and a associated training class x∈{0, 1} (i.e. 2 classes) in column 3 in the CSV file, where x=0 denotes healthy and x = 1 for Parkinson’s.

1 Visualisation

layers

Figure 1: MATLAB Code for visualising data.

The MATLAB code shown in Figure 1 makes use of logical statements to separate entries where x=0 and x=1 (line 3-4). This simplifies the code significantly because it gets rid of the need for an “if-else” statement which would make the code a lot more complex. This data was then used to plot a scatter graph which used values for each y1 and y2 where x=0 and x=1 (lines 8-9). This plot was then given a legend, labels, and a title (lines 10-13).

In Figure 2 we can see the resulting graph which shows healthy participants as black dots, and participants with Parkinson’s as blue crosses.

layers

2 Linear Discriminant

layers

Figure 3: MATLAB Code for Linear Discriminant.

b0

This question focuses on Linear Discriminant (LD) and its parameters. The questions asks us to estimate the LD parameters L(y | b0, b1, ∑, P0, b1). Figure 3 shows the code used to determine these parameters. The equations that were used in this code are given in the equations below. Equation 1 was used to work out bx=0 and bx=1 which are the class means for x = 0 and x = 1. These were calculated on Lines 19-20 in Figure 3. Equation 2 was used to work out Px=1 and Px=1 which are the proportions for each class which is shown in Lines 22-23. The covariance matrix was calculated using the “cov” command in MATLAB (Line 25). Since we are told to assume that the covariance matrix for each individual class (∑0, ∑1) are the same, this means that the covariance of the entire data set must be the same. We know this because the combined covariance matrix is determined using the formula in Equation 4.

Since P0+P1=1, if ∑0=∑1=∑*, then:

∑ = (∑* x P0) + (∑* x P1) = ∑* x (P0+P1) = ∑*

Therefore, the covariance of the entire data set is the same as covariance for each individual class IF covariance for each class is the same.

Finally, these parameters were used to determined the Linear Discriminant Function (L) which was done using Equation 3. Line 27 of the code shows the equation in MATLAB. The question informs us that if an L value is positive, then its corresponding value for y must be y∈C0. Line 28 of the code tells us exactly which values these are and their location in the data. The values for all parameters L(y | b0, b1, ∑, P0, P1) are recorded neatly in Table 1.

layers

Variable Denotation Value
Total number of data entries N 195
Class mean for x = 0 b0 (0.1230 2.1545)T
Class mean for x = 1 b1 (0.2338 2.4561)T
Covariance Matrix (0.0081 0.0166)
(0.0166 0.1465)
Class proportion for x = 0 P0 0.2462
Class proportion for x = 1 P1 0.2462
Linear Discriminant L (195x1) column vector

Table 1: Estimates of Linear Discriminant Parameters

3 Posterior Probability

![layers]https://i.imgur.com/ixyjlyS.png

Figure 4: MATLAB Code for Posterior Probabilities.

This question asks us to calculate the Posterior Probability P(C0|y) for each training data pair y1, y2 and class x=0. Line 30 of the code shown in Figure 4 finds all the Linear Discriminant values for the healthy participants and stores in into the vector L1. The equations used and the derivation is given in below and explained in detail. These were used from Slide 6, Page 13-17 of the Lecture notes.

Consider the case when ∑k = ∑ , and the covariance model is identical for both classes. Then the general case for the LDF is:

layers

In our case, x is y, m1 is b0 and m2 is b1. In addition to this, P(Cx) is the same as defined in Equation 2.

This is then rearranges to give:

layers

When both classes have identical covariance models, ∑, but different prior probabilities P(C1)≠P(C2), the class separation boundary is still a linear. We must now give a Bayes interpretation to this linear class separation. If we wish to predict (in our case) P(C0|y), then from Bayes’ theorem we have,

layers

where we have simply divided through by P(y C0)P(C0), and defined

layers

Now, we model the quantity a with some linear function g(x;w):

layers

We then substitute for P(y C0) to obtain:

layers

So, applying the logistic sigmoid activation function to the discriminant gives

layers

we can interpret y(x;w) as the posterior probability P(C0|y). Since Equation 12 is equal to Equation 7, this result also applies to the LDF as a “special case”, and we can use L instead of a, (L=a) as shown in Equation 13.

layers

We must note that this done with the assumption that two classes are generated with equal covariance matrix, but no assumptions have been made on prior probabilities. The posterior probability is calculated in Line 35-36 in the code. The posterior probability for healthy participants (x=0) has been calculated as well as the posterior probability for the entire data set. This was done to visualize the data and to be able to compare the two plots. The graph for healthy participants is shown in Figure 5, and the graph for all participants is shown in Figure 6.

layers

Leave a Comment