ECEN 758 Data Mining and Analysis Assignment 3


ECEN 758 Data Mining and Analysis
Assignment 3
Procedure: Please Read
Please follow these guidelines to ensure your solutions reach me, and help me attribute your marks correctly
• Format: solutions must be typeset (using e.g. Microsoft Word or LaTex) and rendered in pdf.
• Transmittal: email your pdf solutions to me at duffieldng AT tamu DOT edu using the required subject line for
the assignment: ”DMA Assignment n” where is the number of the assignment (1,2,3, etc).
• File name: use file name DMA-n-UIN.pdf where n is the number of the assignment (1,2,3, etc), and UIN is your
• Identification: please include your name and UIN near the top of the first page of your solutions.
• Numerical Computations: you may use packages or write code etc. to do the numerical computations. You
must include function calls or your code in your solutions.
• Algebraic Computations: You must include your derivation to receive full credit.
1 K-means: [25 marks]
X1 X2
x1 0 2
x2 0 0
x3 1.5 0
x4 5 0
x5 5 2
Data for Question 1
For the two-dimensional points in the table above. assume k = 2 clusters initially assign as C1 = {x1, x2, x4}
and C2 = {x3, x5}. Apply the K-means algorithm until convergence, i.e., until the clusters do not change, using usual
Euclidean distance kxi − xjk2 = (P
|xi,a − xj,a|
. Show the clusters at each stage of the iteration.
2 Gaussian Mixture Models : 37 Marks
X1 X2
x1 0.5 4.5
x2 2.2 1.5
x3 3.9 3.5
x4 2.1 1.9
x5 0.5 3.2
x6 0.8 4.3
x7 2.7 1.1
x8 2.5 3.5
x9 2.8 3.9
x10 0.1 4.1
Data for Question 2
x1, . . . x10 are ten data point with two attributes: see the table above. This question will use three Gaussian clusters
with initial means µ1 = (0.5, 4.5)T
, µ2 = (2.2, 1.6)T
and µ3 = (3, 3.5)T
, initial covariance matrices Σ1 = Σ2 =
Σ3 = {{1, 0}, {0, 1}} and initial mixture probabilities P(C1) = P(C2) = P(C3) = 1/3.
In the following parts (A), (C) and (D), quote the relevant general formulae, then apply it to the data.
(A) Compute the first EM iterates of the cluster means.
(B) Show the data on a scatter plot, together with the initial and iterated means. Comment on your answer.
(C) Compute the first EM iterates of the mixture probabilities.
(D) Compute the first iterates of the covariance matrices for the three clusters.

