Sale!

# LIGN 167: Problem Set 3

\$30.00

Category:

LIGN 167: Problem Set 3

Collaboration policy: You may collaborate with up to two other students on this problem
set. You must write up your own answers to the problems; do not just copy and paste from
your collaborators. You must also submit your work individually. If you do not submit a
copy of the problem set under your own name, you will not get credit. When you submit
your work, you must indicate who you worked with, and what each of your individual
contributions were.
Getting started: We will be uploading a file called pset3.py to Piazza. This file will contain
some starter code for the problem set (some functions that you should call in your answers),
In this problem set we will be implementing backpropagation for a multi-layer perceptron.
This network is illustrated in Figure 1, and has the following mathematical definition. The
vector ~r0
is defined in terms of the input x, which is a scalar, and the weight matrix W0
:
~r0 =

r
0
0
r
0
1
r
0
2

=

w
0
0
· x
w
0
1
· x
w
0
2
· x

= W0
· x (1)
Here we are using the following definition of W0
:
W0 =

w
0
0
w
0
1
w
0
2

(2)
The first hidden layer ~h
0
is defined by applying a non-linearity (ReLU) to ~r0
:
~h
0 =

h
0
0
h
0
1
h
0
2

=

ReLU(r
0
0
)
ReLU(r
0
1
)
ReLU(r
0
2
)

= ReLU(~r0
) (3)
The next layer, ~r1
, is defined as follows:
~r1 =

r
1
0
r
1
1
r
1
2

=

w
1
0,0
· h
0
0 + w
1
0,1
· h
0
1 + w
1
0,2
· h
0
2
w
1
1,0
· h
0
0 + w
1
1,1
· h
0
1 + w
1
1,2
· h
0
2
w
1
2,0
· h
0
0 + w
1
2,1
· h
0
1 + w
1
2,2
· h
0
2

= W1
·
~h
0
(4)
1
r0
0 r0
1 r0
2 h0 h 2
0 h 1
0
0
h1
0 h1
1 h1
r 2
1
0 r1
1 r1
2
w0
0 w0
1 w0
2
w2
0 w2
1 w2
2
ypred
W1
W0
W2
Figure 1: Our multi-layer perceptron.
2
The matrix in this equation is defined by:
W1 =

w
1
0,0 w
1
0,1 w
1
0,2
w
1
1,0 w
1
1,1 w
1
1,2
w
1
2,0 w
1
2,1 w
1
2,2

(5)
The second hidden layer ~h
1
is defined by applying a ReLU to ~r1
:
~h
1 =

h
1
0
h
1
1
h
1
2

=

ReLU(r
1
0
)
ReLU(r
1
1
)
ReLU(r
1
2
)

= ReLU(~r1
) (6)
Finally, the output ypred, which is a scalar value, is defined by:
ypred = w
2
0
· h
1
0 + w
2
1
· h
1
1 + w
2
2
· h
1
2 = W2
·
~h
1
(7)
We have a dataset that consists of two parts: X = {x1, …, xn} and Y = {y1, …, yn}.
Each xi and yi
is a scalar. The loss associated with a datapoint xi
, yi
is defined by:
`i = (ypred,i − yi)
2
(8)
Here we are writing ypred,i for the neural network’s prediction given input xi
. The total loss
L can be written:
L(θ|X, Y ) = Xn
i=1
`i (9)
The parameter term θ captures all of the model parameters that are being learned, in this
case: W0
, W1
, and W2
.
In the starter code that we have provided, we have given you an implementation of the
forward direction of the neural network. That is, the provided function mlp will compute
the output ypred of the network given a particular input x. In the problems, you will
be implementing the ]emphbackwards direction for the network, calculating the partial
derivatives of the loss function with respect to the weight parameters.
The function mlp in the starter code returns a Python dictionary called variable_dict.
The dictionary contains the value of all of the nodes in the network, after giving the network
a particular input value xi
. We will be using this variable_dict throughout the rest of the
problem set. You should spend some time reading through the code for mlp, to understand
how it is constructed.
Problem 1. In this problem we will begin implementing the backpropagation algorithm,
starting from the top of the network. You should write a function d_loss_d_ypredicted,
which calculates the partial derivative ∂`i
∂ypred
. The loss `i
is defined by Equation 8.
The function should take two arguments: variable_dict and y_observed. variable_dict
is a dictionary containing the values of all of the nodes of the network, for a particular input
value xi (as discussed above). y_observed is a real number, which equals the value yi
observed for the input xi
.
Hint: retrieve the network’s predicted value ypred by calling variable_dict[y_predicted].
3
Problem 2. Write a function d_loss_d_W2 which takes two arguments, variable_dict
and y_observed. variable_dict is a dictionary of network node values, and y_observed is
a real number, as in the previous problem.
The function should compute the partial derivative ∂`i
∂W2
, which is defined as follows:
∂`i
∂W2
=

∂`i
∂w2
0
∂`i
∂w2
1
∂`i
∂w2
2

(10)
These three partial derivatives should be returned as a 1 × 3 NumPy array, in the same
order as shown in the equation above.
Hint: call d_loss_d_ypredicted from the previous problem, and retrieve the network’s
value for the layer ~h
1
from variable_dict. Then take partial derivatives of Equation 7.
Problem 3. Write a function d_loss_d_h1, which takes three arguments: variable_dict,
W2, and y_observed. The arguments variable_dict and y_observed are the same as previous problems. The argument W2 is a 1 × 3 NumPy array, which represents the weight
matrix W2
from Equation 7.
The function should compute the partial derivative ∂`i
∂h1
, which is defined as follows:
∂`i
∂h1
=

∂`i
∂h1
0
∂`i
∂h1
1
∂`i
∂h1
2

(11)
These three partial derivatives should be returned as a 1 × 3 NumPy array, in the same
order as the equation above. (For the remainder of the problems, when a NumPy array is
being returned, it should be in the same order as the corresponding equation.)
Problem 4. Write a function relu_derivative, which takes a single argument x. The value
x is a real number.
It should return the derivative dReLU
dx
(x), where the ReLU function is defined by:
(
x, if x > 0
0, if otherwise
(12)
Problem 5. Write a function d_loss_d_r1, which takes three arguments: variable_dict,
W2, and y_observed. These arguments should be the same as in Problem 3. The function
should compute the partial derivative ∂`i
∂r1
, which is defined as follows:
∂`i
∂r1
=

∂`i
∂r1
0
∂`i
∂r1
1
∂`i
∂r1
2

(13)
These values should be returned as a 1 × 3 NumPy array.
Hint: Take partial derivatives in Equation 6, and use the function relu_derivative that
you defined in Problem 4.
Problem 6. Write a function d_loss_d_W1, which takes three arguments: variable_dict,
W2, and y_observed. These arguments should be the same as in Problem 3. The function
4
should compute a matrix of partial derivatives ∂`i
∂W1
:
∂`i
∂W1
=

∂`i
w1
0,0
∂`i
w1
0,1
∂`i
w1
0,2
∂`i
w1
1,0
∂`i
w1
1,1
∂`i
w1
1,2
∂`i
w1
2,0
∂`i
w1
2,1
∂`i
w1
2,2

(14)
These partial derivatives should be returned as a NumPy array of dimension 3 × 3.
To do this you should take partial derivatives in Equation 4.
Hint: This is not necessary, but it may be convenient to use the NumPy function
np.outer, which computes the outer product of two (one-dimensional) arrays.
Problem 7. Write a function d_loss_d_h0, which takes four arguments: variable_dict,
W1, W2, and y_observed. The arguments variable_dict, W2, and y_observed should be
the same as in previous problems. The argument W1 is a 3 × 3 matrix which represents the
weight matrix W1
.
The function should compute the partial derivative ∂`i
∂h0
, which is defined as follows:
∂`i
∂h0
=

∂`i
∂h0
0
∂`i
∂h0
1
∂`i
∂h0
2

(15)
These partial derivatives should be returned as a 1 × 3 NumPy array.
Do this by taking partial derivatives in Equation 4.
Problem 8. Write a function d_loss_d_r0, which takes four arguments: variable_dict,
W1, W2, and y_observed. These four arguments should be the same as in Problem 7. The
function should compute the partial derivative ∂`i
∂r0
, which is defined as follows:
∂`i
∂r0
=

∂`i
∂r0
0
∂`i
∂r0
1
∂`i
∂r0
2

(16)
These partial derivatives should be returned as a 1 × 3 NumPy array.
Do this by taking partial derivatives in Equation 3.
Problem 9. Write a function d_loss_d_W0, which takes four arguments: variable_dict,
W1, W2, and y_observed. These four arguments should be the same as in Problems 6 and
8.
The function should compute the partial derivative ∂`i
∂W0
, which is defined as follows:
∂`i
∂W0
=

∂`i
∂w0
0
∂`i
∂w0
1
∂`i
∂w0
2

(17)
These three partial derivatives should be returned as a 1 × 3 NumPy array
Do this by taking partial derivatives in Equation 1.
5
Comments on the problems: You have now computed the partial derivatives ∂`i
∂W0 ,
∂`i
∂W1 , and ∂`i
∂W2 . This is all that you need in order to perform gradient descent and optimize
the weight parameters.
We have also included PyTorch code for the model in the starter code. Entirely optional:
by slightly extending the starter code, you can compute gradients, and verify your solutions
to the problems.
6

LIGN 167: Problem Set 3
\$30.00
Hello
Can we help?