EECS 491 Assignment 4

100 points total.

Submitting assignments to Canvas

For jupyter notebooks, submit the .ipynb file and a pdf export of the notebook.

Make sure you check that the pdf export represents the latest state of your notebook and that the equations and

figures are properly rendered.

If your are not using notebooks, writeup your assignment using latex and submit a pdf with your code. The writeup

should include relevant code with description if it can fit on a page.

Use the following format for filenames:

EECS491-A4-yourcaseid.ipynb

EECS491-A4-yourcaseid.pdf

If you have more than these two files, put any additional files in a directory named EECS491-A4-yourcaseid . Do

not include binaries or large data files. Then zip this directory and submit it with the name EECS491-A4-

yourcaseid.zip . Do not use other compression formats. The .ipynb file can be included in the zipped

directory, but make sure you submit the .pdf file along with the .zip file. This is so it appears at the top level on

canvas, which allows for easier grading.

Exercise 1. Multivariate Gaussians (10 points)

1.1 (5 pts) Consider the 2D normal distribution

Define three separate 2D covariance matrices for each of the following cases: and are uncorrelated; and are

correlated; and and are anti-correlated. Plot samples from these distributions to show these properties. Use a

different mean for each. Make sure your plots show the density.

1.2 (5 pts) Compute the principal axes for each of these distributions, i.e. the eigenvectors of the covariance matrices.

Use can use a linear algebra package. Plot the samples again, but this time overlay the 1, 2, and 3-sigma contours and

with the scaled eigenvectors.

Exercise 2. Linear Gaussian Models (20 pts)

Consider two independent multi-dimensional Gaussian random vector variables

Now consider a third variable that is the sum of the first two:

2.1 (5 pts) What is the expression for the distribution ?

2.2 (5 pts) What is the expression for the condidtional distribution ?

2.3 (10 pts) Write code to illustrate the result in Q2.1. Show both the components of and that the sampling

from the analytic result is the same as adding two samples.

Exercise 3. Dimensionality Reduction and PCA (25 pts)

In this quesiton you will use principal component analysis to reduce the dimensionality of your data and analyze the

results.

3.1 (5 pts) Find a set of high dimensional data. It should be continuous and have at least 6 dimensions, e.g. stats for

sports teams, small sound segments or images patches also work. Note that if the dimensionality of the data is too

large, you might run into computational efficiency problems using standard methods. Describe the data and illustrate it,

if appropriate.

3.2 (5 pts) Compute the principal components of the data. Plot a few of the largest eigenvectors and interpret them in

terms of how there are modeling the structure of the data.

3.3 (5 pts) Plot, in decreasing order, the cumulative percentage of variance each eigenvector accounts for as a function

of the eigenvector number. These values should be in decreasing order of the eigenvalues. Interpret these results.

3.4 (10 pts) Plot the original data projected into the space of the two principal eigenvectors (i.e. the eigenvectors with

the largest two eigenvalues). Be sure to either plot relative to the mean, or subtract the mean when you do this. Interpret

your results. What insights can you draw? Interpret the dimensions of the two largest principal components. Which

dimensions of the data are correlated? Or anti-correlated?

Exercise 4. Gaussian Mixture Models (25 pts)

4.1 (10 pts) Use the EM equations for multivariate Gaussian mixture model to write a program that implements the

Gaussian Mixture Model to estimates from an ensemble of data the means, covariance matrices, and class probabilities.

Choose reasonable values for your initial values and a reasonable stopping criterion. Explain your code and the steps of

the algorithm. Do not assume a diagonal or isotropic covariance matrices.

4.2 (5 pts) Write code to plot the 3-sigma contours of each Gaussian overlayed on the data (try to find a library function

to plot ellipses). Illustrate with an example.

4.3 (5 pts) Define a two-model Gaussian mixture test case, synthesize the data, and verify that your algorithm infers the

(approximately) correct values based on training data sampled from the model and plotting the results.

4.4 (5 pts) Apply your model to the Old Faithful dataset (supplied with the assignment files). Run the algorithm for the

cases , , and . For each case, plot the progression of the solutions at the beginning, middle, and

final steps in the learning. For each your plots (you should have 9 total), you should also print out the corresponding

values of the mean, covariance, and class probabilities.

Exploration (20 points)

Like in previous assignments, in this exercise you have more lattiude and are meant to do creative exploration. The

intention is for you to teach yourself about a topic beyond what’s been covered above. Please consult the rubric below

for what is expected.

Exploration Grading Rubric

Exploration problems will be graded according the elements in the table below. The scores in the column headers

indicate the number of points possible for each rubric element (given in the rows). A score of zero for an element is

possible if it is missing entirely.

Substandard (+1) Basic (+2) Good (+3) Excellent (+5)

Pedagogical

Value

No clear statement of

idea or concept being

explored or explained;

lack of motivating

questions.

Simple problem with

adequate motivation; still

could be a useful

addition to an

assignment.

Good choice of problem with

effective illustrations of

concept(s). Demonstrates a

deeper level of understanding.

Problem also illustrates or

clarifies common conceptual

difficulties or misconceptions.

Novelty of

Ideas

Copies existing problem

or makes only a trivial

modification; lack of

citation(s) for source of

inspiration.

Concepts are similar to

those covered in the

assignment but with

some modifications of an

existing exericse.

Ideas have clear pedagogical

motivation; creates different type

of problem or exercise to explore

related or foundational concepts

more deeply.

Applies a technique or

explores concept not covered

in the assignment or not

discussed at length in lecture.

Clarity of

Explanation

Little or confusing

explanation; figures lack

labels or useful

captions; no explanation

of motivations.

Explanations are present,

but unclear, unfocused,

wordy or contain too

much technical detail.

Clear and concise explanations of

key ideas and motivations.

Also clear and concise, but

includes illustrative figures;

could be read and understood

by students from a variety of

backgrounds.

Depth of

Exploration

Content is obvious or

closely imitates

assignment problems.

Uses existing problem for

different data.

Applies a variation of a technique

to solve a problem with an

interesting motivation; explores a

concept in a series of related

problems.

Applies several concepts or

techniques; has clear focus of

inquiry that is approached

from multiple directions.

p(x, y) ∼ N (µ, Σ)

Σ x y x y

x y

p(x) = N (x|µx, Σx)

p(z) = N (z|µz, Σz)

y = x + z

p(y)

p(y|x)

y = x + z

K = 1 K = 2 K = 3

In [ ]:

Sale!