CS 446 / ECE 449 — Homework 2

Instructions.

• Everyone must submit individually on Gradescope under hw2 and hw2code.

• The “written” submission at hw2 must be typed, and submitted in any format Gradescope accepts

(to be safe, submit a PDF). You may use LATEX, markdown, Google Docs, MS word, whatever you like;

but it must be typed!

• When submitting at hw2, Gradescope will ask you to mark out boxes around each of your answers;

please do this precisely!

• Please make sure your NetID is clear and large on the first page of the homework.

• Your solution must be written in your own words.

• The list of library routines with the coding problems are only suggestive.

• When submitting to hw2code, upload hw2.py and hw2 utils.py. The CAFE Gamma directory will be

provided on the autograder.

Version History.

1. Initial version.

1

1. Support Vector Machines.

Recall that the dual problem of an SVM is

max

α∈C

X

N

i=1

αi −

1

2

X

N

i,j=1

αiαjyiyjκ(x

(i)

, x

(j)

),

where the domain C = [0, ∞]

N = {α : αi ≥ 0} for a hard-margin SVM and C = [0, C]

N = {α : 0 ≤ αi ≤ C}

for a soft-margin SVM.

Equivalently, we can frame the dual problem of SVM as the minimization problem

min

α∈C

f(α) :=

1

2

X

N

i,j=1

αiαjy

(i)

y

(j)κ(x

(i)

, x

(j)

) −

X

N

i=1

αi

.

(a) Derive the dual problem and prove that the domain is C = [0, C]

N = {α : 0 ≤ αi ≤ C} for a

soft-margin SVM. We assume that the optimization objective for this soft-margin SVM is

min

w,ξ

1

2

kwk

2

2 + C

X

N

i=1

ξi s.t. 1 − ξi ≤ y

(i)wT φ(x

(i)

), ξi ≥ 0, ∀i ∈ {1, 2, …, N}

Hint: A sketch proof is briefly mentioned in Lecture 6 slides. The purpose of this problem is to

ensure that you comprehend the Lagrangian and are familiar with the basic notations.

(b) Prove some theorems related to the margins of SVM, which connects different variables in SVM. In

the questions below, we assume that the data are linearly separable and all the parameters subject to

the same conditions as in the dual problem:

max

α∈C

X

N

i=1

αi −

1

2

X

N

i,j=1

αiαjy

(i)

y

(j)κ(x

(i)

, x

(j)

),

i. Denote the margin for hard-margin SVM as ρ, and denote the optimal primal solution as w∗

,

prove that

1

ρ

2

=kw∗

k

2

2

Hint: Think about the definition of margin and its expression. Also think about the complementary slackness.

ii. Prove the following equation under the same conditions of the previous question.

1

ρ

2

=

Xn

i=1

αi

Hint: Use the conclusion from the previous question and kw∗k

2

2 = w∗>w∗

. Also recall what is

support vectors and complementary slackness.

(c) Prove some conclusions about kernel methods.

i. For arbitrary 2D vectors x = [x0, x1]

T and z = [z0, z1]

T

, we define a kernel κ(x, z) = (1 + x

T z)

2

.

Derive the equation of φ(x) and φ(z). To keep your answer consistent with the standard solution,

please note that φ(x) and φ(z) are both vectors of monomials.

ii. Show that the dual objective for the RBF kernel SVM is given by:

h(α) = −

1

2

α

>Aα + 1

>α,

2

where 1 is a vector of ones and A ∈ R

N×N whose (i, j)-th entry is given by

Aij = y

(i)

y

(j)

exp

−

x

(i) − x

(j)

2

2

2σ

2

.

Solution.

3

2. Implementing Support Vector Machine

(a) Recall the dual problem of SVM in the previous problem and the domain C = [0, ∞]

N = {α : αi ≥ 0}

for a hard-margin SVM and C = [0, C]

N = {α : 0 ≤ αi ≤ C} for a soft-margin SVM. We can solve

this dual problem by projected gradient descent, which starts from some α0 ∈ C (e.g., 0) and updates

as follows:

αt+1 = ΠC

αt − η∇f(αt)

.

Here ΠC[α] is the projection of α onto C, defined as the closest point to α in C:

ΠC[α] := arg min

α0∈C

kα

0 − αk2.

If C is convex, the projection is uniquely defined. With such information, prove that

Π[0,∞)n [α]

i

= max{αi

, 0},

Π[0,C]n [α]

i

= min{max{0, αi}, C}.

Note: Include this question in the written submission.

Hint: Show that the i’th component of any α0 ∈ C is further from the i’th component of α than

the i’th component of the projection is. Specifically, show that

α

0

i − αi

≥

max {0, αi} − αi

for

α0 ∈ [0, ∞)

n and that

α

0

i − αi

≥

min

max {0, αi} , C

− αi

for α0 ∈ [0, C]

n.

(b) Implement an svm solver(), using projected gradient descent formulated as above. Initialize your α

to zeros. See the docstrings in hw2.py for details.

Remark: Consider using the .backward() function in pytorch. However, then you may have to use

in-place operations like clamp (), otherwise the gradient information is destroyed.

Library routines: torch.outer, torch.clamp, torch.autograd.backward, torch.tensor(…,

requires grad=True), with torch.no grad():, torch.tensor.grad.zero , torch.tensor.detach.

(c) Implement an svm predictor(), using an optimal dual solution, the training set, and an input. See

the docstrings in hw2.py for details.

(d) On the area [−5, 5] × [−5, 5], plot the contour lines of the following kernel SVMs, trained on the

XOR data. Different kernels and the XOR data are provided in hw2 utils.py. Learning rate 0.1 and

10000 steps should be enough. To draw the contour lines, you can use hw2 utils.svm contour().

• The polynomial kernel with degree 2.

• The RBF kernel with σ = 1.

• The RBF kernel with σ = 2.

• The RBF kernel with σ = 4.

Include these four plots in your written submission.

Solution.

4