Sale!

# SENG474 Assignment 2

\$30.00

Category:

SENG474 Assignment 2

Question SENG474 CSC578D
1 10 10
2 20 20
3 20 20
4 10 10
Total 50 50
Overview
The goal of this assignment is to explore classification accuracy, ROC curves,
maximum-likelihood estimation and the perceptron algorithm. the Naive
Bayes classifier. Different marking schemes will be used for undergrads
All code questions use Python and the scikit-Learn library. You may
install it on your own computer or alternatively use the computers in the lab.
Submissions in other programming languages or environments are welcome
but you will need to find the appropriate libraries yourself or write the code
from scratch. Don’t hesitate to contact the instructor via email or utilize the
mattermost channels for any questions/clarifications you might need.
Submit a PDF for questions 1-2 and code for question 3 through
Connex
1
1 Classifier Accuracy (SEng 474 and CSc 578D:
10 points)
Assume you were given a dataset built from random data, where attributes
values have been randomly generated with no consideration to the class labels. The dataset has three classes: rectangle”, triangle and circle.
You were asked to build a classifier for this dataset, and told that 50% of
the data will be used for training, and 50% for testing.
The testing set is balanced, so you can assume it has the same distribution
as the training set. Because you are smart, you will start by establishing a
theoretical baseline for your classifiers performance.
• 1.1 (2 points)
Assume the data is equally split between the three classes (33.3% rectangle, 33.3%, triangle, 33.3%, “circle”) and your classifier systematically predicts rectangle for every test instances, what is the expected
• 1.2 (3 points)
What if instead of always predicting rectangle, the classifier predicted
rectangle with a probability of 0.7, and triangle with a probability of
0.3. What is the expected error rate of the classifier in this case? (Show
• 1.3 (2 points)
Now lets assume that the data is not split equally, but has half (1/2)
of its data labeled rectangle, onefourth (1/4) labeled as triangle, and
one-forth (1/4) labeled as circle . What is the expected error rate of
the classifier if, as in question a), the prediction is rectangle for every
test instances.
• 1.4 (3 points) With this dataset (half (1/2) labeled rectangle, onefourth (1/4) labeled triangle, and one-forth (1/4) labeled circle) What is
the expected error rate of the classifier if, as in question b), it predicted
rectangle with a probability of 0.7, and triangle with a probability of
2
2 ROC Curves (SEng 474 and CSc 578D: 20
points)
2.1 Background
You have been introduced to different measures of performance for classifiers:
accuracy, precision, recall, F-measure etc. It is important to look at them
all, as one measure can hide important information. For example, a 60%
accuracy obtained by a classifier dealing with 20 classes (labels) is doing much
better than one obtaining a 60% accuracy on 2 classes. ROC graphs are twodimensional graphs plotting the True Positive rate (TP rate) on the y-axis,
against the False Positive rate (FP rate) on the x-axis (See the paper in [1]
for definitions). A discrete classifier produces TP and FP rates, represented
by a single point in the ROC graph. The point at (0,0) represents a classifier
that always predicts negative. This strategy assures that the classifier never
generates a false positive. However, this strategy would also never generate
a true positive. The upper left corner (0,1) represents a classifier that yields
a perfect classification. Finally, the upper right corner (1,1) represents a
classifier that would always issue a positive classification, generating true
positives and false positives equal to the total number of positive and negative
instances in the test set (respectively). Thus, the TP rate and FP rate are
both 1.
Classifiers such as Nave Bayes and Neural Networks typically issue an
instance probability or score along with each of their predictions. For example, the Nave Bayes classification question on your last assignment chose
the class with the highest probability. Imagine that instead of choosing the
class with the highest probability, we choose a class when its probability is
greater than some threshold t. ROC graphs are often used to visualize the
performance of such classifiers at different score thresholds. Given a list of
instances, their classes and their scores, we can plot the ROC graph. Scores
are plotted one by one in descending order. Each instance is a step moving
up if positive, and right if negative. The length of the steps going up depends on the total number of positive instances, and the length of the steps
moving right depends on the number of negative instances. For example, if
the list has 10 positive instances, it will move up by 0.1 each time. If it has 5
negative instances, it will move right by 0.2. Figure 3 is given as an example
of a ROC graph built to analyze the performance of a classifier at different
3
Figure 1: Example for ROC curve
threshold.
Looking at Figure 1, we can see that if we chose to use a classifier with a
threshold of 0.8, we get a precision of 100% (2/2), but a recall of 20% (2/10).
With a threshold of 0.54, we lost in precision with 83.3% (5/6), but the recall
has increased to 50% (5/10).
4
Instance True Class Classifier A Classifier B
1 P 0.73 0.61
2 P 0.69 0.03
3 N 0.44 0.68
4 P 0.55 0.31
5 N 0.67 0.45
6 N 0.47 0.09
7 P 0.08 0.38
8 N 0.15 0.05
9 N 0.45 0.01
10 P 0.35 0.04
2.2 Problem
You are asked to evaluate the performance of two classifiers, A and B. The
following table shows the ranking obtained by applying the classifiers to a
test set of 10 instances.
• 2.1 (SENG 474 10/CSC578D 5 points) Plot the ROC graphs for
both A and B on the same graph, as in Fig 1.
• 2.2 (5 points) For classifier A, suppose you choose the cutoff threshold
to be t = 0.5. In other words, any test instances whose ranking is
greater than t will be classified as a positive example. Compute the
precision, recall, and F-measure for the classifier at this threshold value.
• 2.3 (5 points) Plot the curve of an unbiased random classifier (equal
probability of predicting positive or negative) on the graph in a). At
what threshold does classifier A performs better than a random classifier? At what threshold does classifier B perform better than a random
classifier?
• 2.4 (Only CSC578D 5 points)
Based on the Tom Fawcetts paper [1]: ROC graphs: Notes and practical
considerations for researchers: Consider two discrete classifiers whose
performances have been placed on ROC graph. Classifier As coordinate
are (0.3, 0.7) and classifier B is positioned at (0.8, 0.1). Which of the
two classifiers would you choose and why?
5
3 The Perceptron algorithm
In this question you will implement a simple 2D perceptron classifier, and
visualize how it learns the decision boundary based on some hand-crafted
two dimensional data. The training data (Xtrain) is sampled from two linearly separable gaussian distribution with same variance and different means.
Training data is normalized to have zero mean and unit variance and it is
organized in matrix (Xtrain) form where each row is a training sample. The
label of each training sample is determined by the corresponding generating
distribution, and it is also organized in matrix form (Ytrain).
To answer this question you are provided with some skeleton code and
you are asked to implement the missing functions (do not modify the skeleton code!) and submit your modifided python file. In particular, you have
to implement the stochastic gradient descent algorithm for the perceptron
classifier. This function has to go through the entire dataset (called training
epoch) and for each training sample update the parameters according to the
perceptron update described in the slides). Please note that the skeleton of
this function automatically adds a column of ones to the training data in
order to take into account the intercept parameter. After each epoch, we can
plot the learned decision boundary by using the provided function.
• 3.1 (SENG474 10 points, CSC578D 5 points) Set nepochs=25,
learning rate = 0.001. and plot the first 5 iterations as well as the last
decision boundary. Explain what happens.
• 3.2 (10 points) Set nepoch=25 and experiment with different learning
rates (0.1, 0.01, 0.0001). Explain what happens.
• 3.3 (only CSC578D 5 points) Using the same training set plot
the decision boundary computed by the perceptron algorithm in scikitlearn as well as the decision boundary computed by a linear support
vector machine (SVM) also using the algorithm in scikit-learn. Show
the resulting boundaries on top of the final decision boundary of your
algorithm. Comment on what you observe.
6
Figure 2: Example for ROC curve
4 Maximum Likelihood Estimation
• 4.1 (5 points)
Let θ = P(X = T). Calculate the MLE for θ for the following dataset
by finding the maximum of P(D|θ). D = T, T, T, T, T, T, T, F, F, F.
• 4.2 (5 points) Consider the two histograms shown in Figure 2 consisting of sets of data corresponding to two classes. Assume that
you can treat these histograms as a crude empirical approximation
to an underlying probability density function. Let’s call the left histogram A and the right histogram B. Note that the shaded rectangles belong to B (the Blue histogram) and are shown that way as
they overlap with the histogram of A. Consider the following data:
D = 0.1, 0.5, 0.6, 0.1, 0.15, 0.55, 0.68, 0.15, 0.12. Which of the two models (A and B) is more likely to explain the data. Don’t just provide the