Sale!

Assignment 1 SENG 474 / CSC 578D Data Mining

$30.00

Category:
5/5 - (2 votes)

Assignment 1
SENG 474 / CSC 578D
Data Mining
Total: 60 marks
Worth 5% of your final grade
I will accept submissions up to 24
hours late for a 10% reduction in your mark. After that point, no submissions will
be accepted.
Show your work for all questions.
Different marking schemes will be used for undergrad (SEng 474) and grad (CSc 578D)
students. Undergrad students do not have to answer the grad questions.
All code questions use the python and/or the sckit-learn library. You may install it, along
with the NumPy and SciPy libraries on your own computer. Alternatively, you can work
in the lab.
http://scikit-learn.org/stable/install.html
http://www.numpy.org/
1: Decision Trees (SEng 474: 20 points; CSc 578D: 15 points)
a) (SEng 474: 15 points; CSc 578D: 10 points)
By hand, construct the root and the first level of a decision tree for the contact lenses data
(attached to this assignment on connex) using the ID3 algorithm. Show the details of your
construction and all your calculations; no points will be given for solutions only.
b) (SEng 474: 5 points; CSc 578D: 5 points)
Using the tree.DecisionTreeClassifier module from python’s sckit-learn, fit a tree using
the contact-lenses data using criterion=’entropy’.
Compare the entropy values obtain in part a) with the ones calculated by the sklearn.tree
module. Explain in detail why the trees are not the same. You may find the
documentation for decision trees helpful:
http://scikit-learn.org/stable/modules/tree.html
Note: You can import the data directly from the ’contact-lenses.arff’ file using the
Arff2Skl() converter from util2.py provided with this assignment, using these lines of
code:
from util2 import Arff2Skl
cvt = Arff2Skl(‘contact-lenses.arff’)
label = cvt.meta.names()[-1]
X, y = cvt.transform(label)
2: Classifier Accuracy (SEng 474 and CSc 578D: 10 points)
Assume you were given a dataset built from random data, where attributes values have been
randomly generated with no consideration to the class labels. The dataset has three classes: “red”,
“blue” and “yellow”. You were asked to build a classifier for this dataset, and told that 50% of the
data will be used for training, and 50% for testing. The testing set is balanced, so you can assume
it has the same distribution as the training set. Because you are smart, you will start by
establishing a theoretical baseline for your classifier’s performance.
a) (2 points)
Assume the data is equally split between the three classes (33.3% “red”, 33.3% “blue” and 33.3%
“yellow”) and your classifier systematically predicts “red” for every test instances, what is the
expected error rate of your classifier? (Show your work)
b) (3 points)
What if instead of always predicting “red”, the classifier predicted “red” with a probability of 0.7,
and “blue” with a probability of 0.3. What is the expected error rate of the classifier in this case?
(Show your work)
c) (2 points)
Now lets assume that the data is not split equally, but has half (1/2) of its data labeled “red”, onefourth (1/4) labeled as “blue”, and one-forth (1/4) labeled as “yellow”. What is the expected error
rate of the classifier if, as in question a), the prediction is “red” for every test instances.
d) (3 points)
With this dataset (half (1/2) labeled “red”, one-fourth (1/4) labeled “blue”, and one-forth (1/4)
labeled “yellow”) What is the expected error rate of the classifier if, as in question b), it predicted
“red” with a probability of 0.7, and “blue” with a probability of 0.3. (Show your work)
3: MLE and MAP estimates (SEng 474: 10 points; CSc 578D: 15 points)
a) Let θ = P(X=T). Calculate the MLE for θ for the following dataset by finding the
maximum of P(D| θ). Show your work. [4 marks]
D={T, T, T, T, T, T, T, F, F, F}
b) Recall the PDF for a Beta random variable is proportional to
Θ(β1-1) * (1- θ)
(β2-1)
with parameters β1 and β2. Let’s say you have evidence from previous studies that
P(X=T) = ½. Let β1 = 4. Find β2 and then calculate the MAP estimate P(θ | D) for θ
with a Beta(β1, β2) prior and the dataset above. Show your work. [6 marks]
c) (578D students only) In class we used the mode to find β1 and β2. Repeat part b
using the mean of the beta distribution instead. [5 marks]
4: Gradient Descent (SEng 474: 20 points; CSc 578D: 20 points)
We have a new dataset where the input data has 2 continuous variables (x1 and x2), and
the task is to predict a continuous value as the output y (i.e. regression). We have reason
to believe the following new model for regression is a better fit to the problem domain:
y
ˆ
i
= w0 +w1xi,1 +w2 xi,2
4
A) Write down the error function for this model. You should use the sum of squared
error, as in class: [5 marks]
E(X) = 1
2N (yi − yˆi)
2
i=1
N

B) Derive the gradient descent update for w0 using learning rate κ[5 marks]
C) Derive the gradient descent update for w1 using learning rate κ[5 marks]
D) Derive the gradient descent update for w2 using learning rate κ[5 marks]

PlaceholderAssignment 1 SENG 474 / CSC 578D Data Mining
$30.00
Open chat
Need help?
Hello
Can we help?