Sale!

# SENG474 Assignment 1

\$30.00

Category:

SENG474 Assignment 1.  (100pts)
Student Name: Student Number:

Question SENG474 CSC578D
1 25 15
2 20 15
3 55 70
Total 100 100
1 Overview
The goal of this assignment is to explore decision trees and the Naive Bayes
classifier. Different marking schemes will be used for undergrads (SENG474)
All code questions use Python and the scikit-Learn library. You may
install it on your own computer or alternatively use the computers in the lab.
Submissions in other programming languages or environments are welcome
but you will need to find the appropriate libraries yourself or write the code
from scratch. Don’t hesitate to contact the instructor via email or utilize the
mattermost channels for any questions/clarifications you might need.
Submit a PDF for questions 1-2 and code for question 3 through
Connex
1
Question 1 (SENG474: 25 points/ CSC578D:
15 points)
• Q1.1 (SENG474: 20 points, CSC578D: 10 points By hand, construct the root and the first level of a decision tree for the contact lenses
data (attached to this assignment on connex) using the ID3 algorithm.
points will be given for solutions only.
• Q1.2 (SENG474: 5 poitns, CSC578D: 5 points)
Using the tree.DecisionTreeClassifier module from pythons sckit-learn,
fit a tree using the contact-lenses data using criterion=entropy. Compare the entropy values obtain in part a) with the ones calculated
by the sklearn.tree module. Explain in detail why the trees are not
the same. You may find the documentation for decision trees helpful:
http://scikit-learn.org/stable/modules/tree.html
Note: You can import the data directly from the contact-lenses.arff file
using the Arff2Skl() converter from util2.py provided with this assignment, using these lines of code:
from util2 import Arff2Skl
cvt = Arff2Skl(’contact-lenses.arff’)
label = cvt.meta.names()[-1]
X, y = cvt.transform(label)
Question 2 (SENG 474: 20 points; CSC578D:
15 points
Calculate the probabilities needed for Nave Bayes using the contact lens
dataset. Classify: prepresbyopic, hypermetrope, yes, reduced, ? using your
calculated probabilities. Use additive smoothing, as described in https:
2
Question 3 (SENG474: 55 points; CSC578D:
70 points)
• Q3.1 (SENG474: 40 points, CSC578D: 30 points)
Implement Nave Bayes, assuming that each feature is binary (can only
take on values 0 or 1). This is called Bernoulli Nave Bayes, because
each feature is a Bernoulli random variable. Use the skeleton code
provided, as we will be using the class for testing. (If you are not using
Python you will have to write a similar skeleton code).
Recall that for Nave Bayes, the classification decision is:
yˆ = argmaxy
(P(y)
Y
p
i=1
P(xi
|y))
where y is the class, p is the total number of attributes (features) in x,
and xi
is the i-th feature of the vector x. Using the training data, you
must calculate P(y) for each of the classes (y), as well as P(xi = 0|y)
and for each y and every feature xi
. Note that P(xi = 1|y) = 1−P(xi =
0|y), so you dont need to explicitly calculate that value. Consult the
class notes for information on how to calculate these values.
Implement additive smoothing with α = 1. We will be testing your
code by creating an instance of MyBayesClassifier and passing it a new
binary dataset, so dont assume that you know the number of features
or the number of classes.
• Q3.2 (SENG474: 15 points, CSC578D: 10 points)
smoothing so that you can vary α in your code. Create (and hand in)
a plot of the accuracy on the test set as a function of α, varying from
0.01 to 1 (use steps of 0.01). Based on your experiment, what is the
optimal value for alpha?
3
• Q3.3 (SENG474: 0 points, CSC578D: 30 points)
Implement a multinomial version of the Nave Bayes classifier that can
accept non-binary input vectors (see skeleton in code). We will rewrite
the probability of a feature given the class as:
P(xi
|y) = Nyi + α
Ny + αp
where Nyi is the total number of times feature i appeared for any instance with label y, and Ny =
Pp
i=1 Nyi is the total number of times
any feature appeared in an instance label y, p is the total number of
features, and α is the additive smoothing parameter. Note that in this
implementation, if a feature doesnt appear in a test instance, we ignore
it (that is, we dont model P(xi
|y) = 0, we just model the existence of
features.
Now you should be able to change the CountVectorizer to have parameter binary=false. Which performs better on the test data, the
(binary + Bernoulli) setup or (non-binary + multinomial)? Report SENG474 Assignment 1