Assignment 3: Naive Bayes/Logistic Regression Classification


Rate this product

Artificial Intelligence
Assignment 3: Naive Bayes/Logistic Regression Classification
In this assignment you will apply machine learning techniques for image and text classification task,
and apply logistic regression classifier to do binary classification.
Programming language
You may only use modules from the Python standard library and numpy.
 Part 1:Image Classification
 Part 2: Text Classification
 Part 3: Linear Classifier
 Extra Credit
 Provided Code Skeleton
 Deliverables
 Report checklist
Part 1: Digit image classification
Image from Baidu
Data: You are provided with part of the Digit Mnist dataset. There are 55000 training examples and 10000 test
examples. The labels are from 0 to 9, representing digits of 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. In this section, you
will apply Naïve Bayes model for this task.
Artificial Intelligence (Spring 2022) Assignment #3
Naive Bayes model
 Features: Each image consists of 28*28 pixels which we represent as a flattened array of size 784,
where each feature/pixel Fi takes on intensity values from 0 to 255 (8 bit grayscale).
 Training: The goal of the training stage is to estimate the likelihoods P(Fi | class) for every pixel
location i and for every digit class (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). The likelihood estimate is defined as
P(Fi = f | class) = (# of times pixel i has value f in training examples from this class) / (Total # of
training examples from this class)
In addition, as discussed in the lecture, you have to smooth the likelihoods to ensure that there are no
zero counts. Laplace smoothing is a very simple method that increases the observation count of every
value f by some constant k. This corresponds to adding k to the numerator above, and k*V to the
denominator (where V is the number of possible values the feature can take on). The higher the value
of k, the stronger the smoothing. Experiment with different values of k (say, from 0.1 to 10) and find
the one that gives the highest classification accuracy.
You should also estimate the priors P(class) by the empirical frequencies of different classes in the
training set.
 Testing: You will perform maximum a posteriori (MAP) classification of test digit class according
to the learned Naive Bayes model. Suppose a test image has feature values f1, f2, … , f784. According to
this model, the posterior probability (up to scale) of each class given the digit is given by
P(class) ⋅ P(f1 | class) ⋅ P(f2 | class) ⋅ … ⋅ P(f784 | class)
Note that in order to avoid underflow, it is standard to work with the log of the above quantity:
log P(class) + log P(f1 | class) + log P(f2 | class) + … + log P(f784 | class)
After you compute the above decision function values for all ten classes for every test image, you will
use them for MAP classification.
 Evaluation: Report your performance in terms of average classification rate and the classification
rate for each digit (percentage of all test images of a given item correctly classified). Also report
your confusion matrix. This is a 10×10 matrix whose entry in row r and column c is the percentage
of test images from class r that are classified as class c. In addition, for each class, show the test
examples from that class that have the highest and the lowest posterior probabilities according to your
classifier. You can think of these as the most and least “prototypical” instances of each digit class (and
the least “prototypical” one is probably misclassified).
 Likelihood visualization: When using classifiers in real domains, it is important to be able to inspect
what they have learned. One way to inspect a naive Bayes model is to look at the most likely features
for a given label. Another tool for understanding the parameters is to visualize the feature likelihoods
Artificial Intelligence (Spring 2022) Assignment #3
for high intensity pixels of each class. Here high intensities refer to pixel values from 128 to 255.
Therefore, the likelihood for high intensity pixel feature Fi of class c1 is sum of probabilities of the top
128 intensities at pixel location i of class c1.
feature likelihood(𝐹𝑖
, 𝑐1
) = ∑ 𝑃(𝐹𝑖 = 𝑘|𝑐1
For each of the ten classes, plot their trained likelihoods for high intensity pixel features to see what
likelihood they have learned.
Part 2: Text Classification
You are given a dataset consisting of texts which belong to 14 different classes. We have split the dataset into
a training set and a development dataset. The training set consists of 3865 texts and their corresponding class
labels from 1-14, with instances from each of the classes and the development set consists of 483 test
instances and their corresponding labels. We have already done the preprocessing of the dataset and extracted
into a Python list structure in Using the training set, you will learn a Naive Bayes classifier that
will predict the right class label given an unseen text. Use the development set to test the accuracy of your
learned model. Report the accuracy, recall, and F1-Score that you get on your development set. We will have
a separate (unseen) train/test set that we will use to run your code after you turn it in. No other outside nonstandard python libraries can be used.
Unigram Model
The bag of words model in NLP is a simple unigram model which considers a text to be represented as a bag of
independent words. That is, we ignore the position the words appear in, and only pay attention to their frequency
in the text. Here each text consists of a group of words. Using Bayes theorem, you need to compute the
probability of a text belonging to one of the 14 classes given the words in the text. Thus you need to estimate
the posterior probabilities:
𝑃(𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖
|𝑊𝑜𝑟𝑑𝑠) =
𝑃(𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖
𝑃(𝑊𝑜𝑟𝑑𝑠) ∏ 𝑃(𝑊𝑜𝑟𝑑|𝐶𝑙𝑎𝑠𝑠 = 𝐶𝑖
𝐴𝑙𝑙 𝑤𝑜𝑟𝑑𝑠
It is standard practice to use the log probabilities so as to avoid underflow. Also, P(words) is just a
constant, so it will not affect your computation.
Training and Development
 Training: To train the algorithm you are going to need to build a bag of words model using the texts.
After you build the model, you will need to estimate the log-likelihoods log𝑃(Word|Type =
) .The variable 𝐶𝑖 can only take on 14 values, 1-14. Additionally, you will need to make sure you
Artificial Intelligence (Spring 2022) Assignment #3
smooth the likelihoods to prevent zero probabilities. In order to accomplish this task, use Laplace
 Development: After you have computed the log-likelihoods, you will have your model predict class
labels of the text from the development set. In order to do this, you will do MAP estimatation
classification using the equation shown above.
Use only the training set to learn the individual probabilities. The following results should be put in your
1. Plot your confusion matrix. This is a 14×14 matrix whose entry in row r and column c is the
percentage of test text from class r that are classified as class c.
2. Accuracy, recall, and F1 scores for each of the classes on the development set.
3. Top 20 feature words of each of the classes .
4. Calculate your accuracy without including the class prior into the Naive Bayes equation i.e. Only
computing the ML inference of each instance. Report the change in accuracy numbers, if any. Also
state your reasoning for this observation. Is including the class prior always beneficial? Change your
class prior to a uniform distribution. What is the change in result?
Part 3: Linear Classifier
Image by TA
There are some points on the 2-D plane, some of which are labeled as 1, and others are labeled as 0. Your task
is to find the boundary line that can correctly separate these two categories of points. As in the image
shown above, the solid line is the real boundary, and the dashed line is the boundary found by a logistic
regression classifier.
a) You need to achieve a logistic regression classifier for this task. The are the only file you
need to modify in this section.
b) Although we only do classification on 2-D points in this task, your code should be working on arbitrary
Artificial Intelligence (Spring 2022) Assignment #3
Logistic regression model
Logistic regression model, also known as differentiable perceptron, which is as follows:
𝑻𝒇⃗ ) = 𝐬𝐢𝐠𝐦𝐨𝐢𝐝(𝒘⃗⃗⃗
𝑻𝒇⃗ ) =
𝟏 + 𝒆−𝒘⃗⃗⃗ 𝑻𝒇⃗
a) This logistic regression model is different from the one in the lecture slide. You should implement
this one in this task, NOT the one in the slide.
b) The derivative of sigmoid function is 𝑓

(𝑥) = 𝑓(𝑥) × (1 − 𝑓(𝑥))
• Features: The coordinates of every points. Denote the number of points as N, the dimension of
coordinates as P, the feature matrix should be P*N.
• Training: Achieve the training process of Logistic Regression model. Recall the loss function of
logistic regression in lecture slide, which is as follows: (Note: a better mesurement would be
logistic loss which is not required in this MP. If you are interested, see Logistic regression here.)
𝐿(𝑦1, … , 𝑦𝑛, 𝑓
1, … , 𝑓
𝑛) = ∑(𝑦𝑖 − sigmoid(𝑤⃗⃗ 𝑇𝑓
 Testing: The code provided has already achieve the testing process for you. You do NOT need to
achieve this. But do Not forget to report the test results on your report.
 Evaluation: We repeated the process of training and testing for many times, and take the average
training error and testing error as our evaluation of our model. This is also achieved in the skeleton
code for you.
Extra Credit Suggestion
Implement the naive Bayes algorithm over a bigram model as opposed to the unigram model. Bigram model is
defined as follows:
P(𝑤1. . 𝑤𝑛
) = 𝑃(𝑤1
). . 𝑃(𝑤𝑛|𝑤𝑛−1)
Then combine the bigram model and the unigram model into a mixture model defined with parameter λ:
(1 − λ)𝑃(𝑌)∏ 𝑃(𝑤𝑖
+ λ𝑃(𝑌)∏ 𝑃(𝑏𝑖
Did the bigram model help improve accuracy? Find the best parameter λ that gives the highest classification
accuracy. Report the optimal parameter λ and report your results (Accuracy number) on the bigram model and
optimal mixture model, and answer the following questions:
1. Running naive Bayes on the bigram model relaxes the naive assumption of the model a bit. However,
is this always a good thing? Why or why not?
2. What would happen if we did an N-gram model where N was a really large number?
Artificial Intelligence (Spring 2022) Assignment #3
Provided Code Skeleton
We have provided ( zip file) all the code to get you started on your MP.
For part 1, you are provided the following. The doc strings in the python files explain the purpose of each
 This is the main file which loads the dataset and calls your Naive Bayes algorithms.
 This is the only file that needs to be modified.
 x_train.npy, y_train.npy, x_test.npy and y_test.npy- These files contain the training and testing
For part 2, you are provided the following. The doc strings in the python files explain the purpose of each
 This is the main file which loads the dataset and calls your Naive Bayes Algorithm.
 This is the only file that needs to be modified.
 train_text.csv- This file contains the training examples.
 dev_text.csv- This file contains the development examples for testing your model.
 stop_words.csv- This file contains the stop words which are required for preprocessing the dataset.
For part 3, you are provided the following. The doc strings in the python files explain the purpose of each
• This is the main file which loads the dataset and calls your Perceptron and
Logistic Regression Algorithm.
• This is the only file that needs to be modified to achieve your Logistic Regression
• – This is the file to make synthetic data for your algorithm.
 – This file is to plot the experiment result of your perceptron model.
 This file is to plot the experiment result of your Logistic Regression model.
This MP will be submitted via blackboard.
Please upload only the following files to blackboard.
1. – your solution python file to part 1
2. – your solution python file to part 2
3. – your solution python file to part 3
4. report.pdf – your project report in pdf format
Artificial Intelligence (Spring 2022) Assignment #3
Report Checklist
Your report should briefly describe your implemented solution and fully answer the questions for every part
of the assignment. Your description should focus on the most “interesting” aspects of your solution, i.e., any
non-obvious implementation choices and parameter settings, and what you have found to be especially
important for getting good performance. Feel free to include pseudocode or figures if they are needed to
clarify your approach. Your report should be self-contained and it should (ideally) make it possible for us to
understand your solution without having to run your source code.
Kindly structure the report as follows:
1. Title Page:
List of all team members, course number and section for which each member is registered, date on
which the report was written
2. Section I:
Image Classification. Report average classification rate, the classification rate for each class and the
confusion matrix. For each class, show the test examples from that class that have the highest and
lowest posterior probabilities or perceptron scores according to your classifier. Show the ten
visualization plots both feature likelihoods.
3. Section II:
Text Classification. Report all your results, confusion matrix ,recall ,precision, F1 score for all the 14
classes. Include the top feature words for each of the classes. Also, report the change in accuracy
results when the class prior changes to uniform distribution and when its removed. Provide the
reasoning for these observations
4. Section III:
Linear Classifier . Report all your average error rate of training and test set for your Logistic
Regression model. Show your visual result of the models.
5. Extra Credit:
If you have done any work which you think should get extra credit, describe it here
6. Statement of Contribution:
Specify which team-member performed which task. You are encouraged to make this a many-to-many
mapping, if applicable. e.g., You can say that “Rahul and Jason both implemented the BFS function,
their results were compared for debugging and Rahul’s code was submitted. Jason and Mark both
implemented the DFS function, Mark’s code never ran successfully, so Jason’s code was submitted.
Section I of the report was written by all 3 team members. Section II by Mark and Jason, Section III
by Rahul and Jason.”… and so on.
Only attach files that are the required deliverables in blackboard. Your report must be a formatted pdf
document. Pictures and example outputs should be incorporated into the document. Exception: items which
are very large or unsuitable for inclusion in a pdf document (e.g. videos or animated gifs) may be put on the
web and a URL included in your report. You can write your report either in English or Chinese.
Extra credit:
We reserve the right to give bonus points for any advanced exploration or especially challenging or
creative solutions that you implement. This includes, but is not restricted to, the extra credit suggestion
given above.

Assignment 3: Naive Bayes/Logistic Regression Classification
Open chat
Need help?
Can we help?