Machine Learning Project
Background
In medical decision-making, a doctor observes ‘features’ of a patient and makes a decision on the basis of these features whether the patient is healthy or has a particular disease. In this question, you are asked to train and test a machine learning classifer which can discriminate people who are healthy or Parkinson’s based on some training data “parkinsons.csv”. There are two features (dimension of y is d = 2), with feature y1 in column 1, y2 in column 2, and a associated training class x∈{0, 1} (i.e. 2 classes) in column 3 in the CSV file, where x=0 denotes healthy and x = 1 for Parkinson’s.
1 Visualisation
Figure 1: MATLAB Code for visualising data.
The MATLAB code shown in Figure 1 makes use of logical statements to separate entries where x=0 and x=1 (line 3-4). This simplifies the code significantly because it gets rid of the need for an “if-else” statement which would make the code a lot more complex. This data was then used to plot a scatter graph which used values for each y1 and y2 where x=0 and x=1 (lines 8-9). This plot was then given a legend, labels, and a title (lines 10-13).
In Figure 2 we can see the resulting graph which shows healthy participants as black dots, and participants with Parkinson’s as blue crosses.
2 Linear Discriminant
Figure 3: MATLAB Code for Linear Discriminant.
b0
This question focuses on Linear Discriminant (LD) and its parameters. The questions asks us to estimate the LD parameters L(y | b0, b1, ∑, P0, b1). Figure 3 shows the code used to determine these parameters. The equations that were used in this code are given in the equations below. Equation 1 was used to work out bx=0 and bx=1 which are the class means for x = 0 and x = 1. These were calculated on Lines 19-20 in Figure 3. Equation 2 was used to work out Px=1 and Px=1 which are the proportions for each class which is shown in Lines 22-23. The covariance matrix was calculated using the “cov” command in MATLAB (Line 25). Since we are told to assume that the covariance matrix for each individual class (∑0, ∑1) are the same, this means that the covariance of the entire data set must be the same. We know this because the combined covariance matrix is determined using the formula in Equation 4.
Since P0+P1=1, if ∑0=∑1=∑*, then:
∑ = (∑* x P0) + (∑* x P1) = ∑* x (P0+P1) = ∑*
Therefore, the covariance of the entire data set is the same as covariance for each individual class IF covariance for each class is the same.
Finally, these parameters were used to determined the Linear Discriminant Function (L) which was done using Equation 3. Line 27 of the code shows the equation in MATLAB. The question informs us that if an L value is positive, then its corresponding value for y must be y∈C0. Line 28 of the code tells us exactly which values these are and their location in the data. The values for all parameters L(y | b0, b1, ∑, P0, P1) are recorded neatly in Table 1.
Variable | Denotation | Value |
---|---|---|
Total number of data entries | N | 195 |
Class mean for x = 0 | b0 | (0.1230 2.1545)T |
Class mean for x = 1 | b1 | (0.2338 2.4561)T |
Covariance Matrix | ∑ | (0.0081 0.0166) (0.0166 0.1465) |
Class proportion for x = 0 | P0 | 0.2462 |
Class proportion for x = 1 | P1 | 0.2462 |
Linear Discriminant | L | (195x1) column vector |
Table 1: Estimates of Linear Discriminant Parameters
3 Posterior Probability
![layers]https://i.imgur.com/ixyjlyS.png
Figure 4: MATLAB Code for Posterior Probabilities.
This question asks us to calculate the Posterior Probability P(C0|y) for each training data pair y1, y2 and class x=0. Line 30 of the code shown in Figure 4 finds all the Linear Discriminant values for the healthy participants and stores in into the vector L1. The equations used and the derivation is given in below and explained in detail. These were used from Slide 6, Page 13-17 of the Lecture notes.
Consider the case when ∑k = ∑ , and the covariance model is identical for both classes. Then the general case for the LDF is:
In our case, x is y, m1 is b0 and m2 is b1. In addition to this, P(Cx) is the same as defined in Equation 2.
This is then rearranges to give:
When both classes have identical covariance models, ∑, but different prior probabilities P(C1)≠P(C2), the class separation boundary is still a linear. We must now give a Bayes interpretation to this linear class separation. If we wish to predict (in our case) P(C0|y), then from Bayes’ theorem we have,
where we have simply divided through by P(y | C0)P(C0), and defined |
Now, we model the quantity a with some linear function g(x;w):
We then substitute for P(y | C0) to obtain: |
So, applying the logistic sigmoid activation function to the discriminant gives
we can interpret y(x;w) as the posterior probability P(C0|y). Since Equation 12 is equal to Equation 7, this result also applies to the LDF as a “special case”, and we can use L instead of a, (L=a) as shown in Equation 13.
We must note that this done with the assumption that two classes are generated with equal covariance matrix, but no assumptions have been made on prior probabilities. The posterior probability is calculated in Line 35-36 in the code. The posterior probability for healthy participants (x=0) has been calculated as well as the posterior probability for the entire data set. This was done to visualize the data and to be able to compare the two plots. The graph for healthy participants is shown in Figure 5, and the graph for all participants is shown in Figure 6.
Leave a Comment