ECEN 758 Data Mining and Analysis

Assignment 3

Procedure: Please Read

Please follow these guidelines to ensure your solutions reach me, and help me attribute your marks correctly

• Format: solutions must be typeset (using e.g. Microsoft Word or LaTex) and rendered in pdf.

• Transmittal: email your pdf solutions to me at duffieldng AT tamu DOT edu using the required subject line for

the assignment: ”DMA Assignment n” where is the number of the assignment (1,2,3, etc).

• File name: use file name DMA-n-UIN.pdf where n is the number of the assignment (1,2,3, etc), and UIN is your

UIN.

• Identification: please include your name and UIN near the top of the first page of your solutions.

• Numerical Computations: you may use packages or write code etc. to do the numerical computations. You

must include function calls or your code in your solutions.

• Algebraic Computations: You must include your derivation to receive full credit.

1 K-means: [25 marks]

X1 X2

x1 0 2

x2 0 0

x3 1.5 0

x4 5 0

x5 5 2

Data for Question 1

For the two-dimensional points in the table above. assume k = 2 clusters initially assign as C1 = {x1, x2, x4}

and C2 = {x3, x5}. Apply the K-means algorithm until convergence, i.e., until the clusters do not change, using usual

Euclidean distance kxi − xjk2 = (P

a=1,2

|xi,a − xj,a|

2

)

1/2

. Show the clusters at each stage of the iteration.

1

2 Gaussian Mixture Models : 37 Marks

X1 X2

x1 0.5 4.5

x2 2.2 1.5

x3 3.9 3.5

x4 2.1 1.9

x5 0.5 3.2

x6 0.8 4.3

x7 2.7 1.1

x8 2.5 3.5

x9 2.8 3.9

x10 0.1 4.1

Data for Question 2

x1, . . . x10 are ten data point with two attributes: see the table above. This question will use three Gaussian clusters

with initial means µ1 = (0.5, 4.5)T

, µ2 = (2.2, 1.6)T

and µ3 = (3, 3.5)T

, initial covariance matrices Σ1 = Σ2 =

Σ3 = {{1, 0}, {0, 1}} and initial mixture probabilities P(C1) = P(C2) = P(C3) = 1/3.

In the following parts (A), (C) and (D), quote the relevant general formulae, then apply it to the data.

(A) Compute the first EM iterates of the cluster means.

(B) Show the data on a scatter plot, together with the initial and iterated means. Comment on your answer.

(C) Compute the first EM iterates of the mixture probabilities.

(D) Compute the first iterates of the covariance matrices for the three clusters.

2

Sale!