CSCE 636: Deep Learning
1. You need to submit (1) a report in PDF and (2) your code files, both to eCampus.
2. Your PDF report should include (1) answers to the non-programming part, and (2) results
and analysis of the programming part. For the programming part, your PDF report should at
least include the results you obtained, for example the accuracy, training curves, parameters,
etc. You should also analyze your results as needed.
3. Please put all your files (PDF report and code files) into a compressed file named
“HW# FirstName LastName.zip”
4. Unlimited number of submissions are allowed on eCampus and the latest one will be timed
5. Please read and follow submission instructions. No exception will be made to accommodate
incorrectly submitted files/reports.
6. All students are highly encouraged to typeset their reports using Word or LATEX. In case you
decide to hand-write, please make sure your answers are clearly readable in scanned PDF.
7. Only write your code between the following lines. Do not modify other parts.
### YOUR CODE HERE
### END YOUR CODE
Required Reading Materials:
 Deep Residual Learning for Image Recognition (https://arxiv.org/abs/1512.03385)
 Identity Mappings in Deep Residual Networks (https://arxiv.org/abs/1603.05027)
 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate
1. (10 points) Exercise 7.7 (e-Chap:7-11) in the book “Learning from Data”.
2. (10 points) Exercise 7.8 (e-Chap:7-15) in the book “Learning from Data”.
3. (20 points) Consider the standard residual block and the bottleneck block in the case where
inputs and outputs have the same dimension (e.g. Figure 5 in ). In another word, the
residual connection is an identity connection. For the standard residual block, compute the
number of training parameters when the dimension of inputs and outputs is 128×16×16×32.
Here, 128 is the batch size, 16 × 16 is the spatial size of feature maps, and 32 is the number
of channels. For the bottleneck block, compute the number of training parameters when the
dimension of inputs and outputs is 128 × 16 × 16 × 128. Compare the two results and explain
the advantages and disadvantages of the bottleneck block.
4. (20 points) Using batch normalization in training requires computing the mean and variance
of a tensor.
(a) (8 points) Suppose the tensor x is the output of a fully-connected layer and we want to
perform batch normalization on it. The training batch size is N and the fully-connected
layer has C output nodes. Therefore, the shape of x is N × C. What is the shape of the
mean and variance computed in batch normalization, respectively?
(b) (12 points) Now suppose the tensor x is the output of a 2D convolution and has shape
N × H × W × C. What is the shape of the mean and variance computed in batch
5. (50 points) We investigate the back-propagation of the convolution using a simple example. In
this problem, we focus on the convolution operation without any normalization and activation
function. For simplicity, we consider the convolution in 1D cases. Given 1D inputs with a
spatial size of 4 and 2 channels, i.e.,
x11 x12 x13 x14
x21 x22 x23 x24
we perform a 1D convolution with a kernel size of 3 to produce output Y with 2 channels.
No padding is involved. It is easy to see
where each row corresponds to a channel. There are 12 training parameters involved in this
convolution, forming 4 different kernels of size 3:
Wij = [w
], i = 1, 2, j = 1, 2, (3)
where Wij scans the i-th channel of inputs and contributes to the j-th channel of outputs.
Note that the notation here might be slightly different in that, one kernel/filter here connects
ONE input feature map (instead of ALL input feature maps) to ONE output feature map.
(a) (15 points) Now we flatten X and Y to vectors as
X˜ = [x11, x12, x13, x14, x21, x22, x23, x24]
Y˜ = [y11, y12, y21, y22]
Please write the convolution in the form of fully connected layer as Y˜ = AX˜ using the
notations above. You can assume there is no bias term.
Hint: Note that we discussed how to view convolution layers as fully connected layers in
the case of single input and output feature maps. This example asks you to extend that
to the case of multiple input and output feature maps.
(b) (15 points) Next, for the back-propagation, assume we’ve already computed the gradients
of loss L with respect to Y˜ :
Please write the back-propagation step of the convolution in the form of ∂L
∂X˜ = B
Explain the relationship between A and B.
(c) (20 points) While the forward propagation of the convolution on X to Y could be written
into Y˜ = AX˜, could you figure out whether ∂L
∂X˜ = B
also corresponds to a convolution
∂Y to ∂L
∂X ? If yes, write down the kernels for this convolution. If no, explain why.
6. (90 points)(Coding Task) Deep Residual Networks for CIFAR-10 Image Classification: In this assignment, you will implement advanced convolutional neural networks on
CIFAR-10 using Tensorflow. In this classification task, models will take a 32 × 32 image
with RGB channels as inputs and classify the image into one of ten pre-defined classes. The
“ResNet” folder provides the starting code. You must implement the model using the starting
code. In this assignment, you must use a GPU.
Requirements: Python 3.6, Tensorflow 1.10 (Make sure you use this particular version of
installation and documentation!!!), tqdm, numpy
(a) (10 points) Download the CIFAR-10 dataset (https://www.cs.toronto.edu/~kriz/
cifar.html) and complete “DataReader.py”. For the dataset, you can download any
version. But make sure you write corresponding code in “DataReader.py” to read it.
(b) (10 points) Implement data augmentation. To complete “ImageUtils.py”, you will implement the augmentation process for a single image using numpy. Corresponding Tensorflow functions are given.
(c) (30 points) Complete “Network.py”. Read the required materials carefully before this
step. You are asked to implement two versions of ResNet: version 1 uses original residual
blocks (Figure4(a) in ) and version 2 uses full pre-activation residual blocks (Figure4(e)
in ). In particular, for version 2, implement the bottleneck blocks instead of standard
residual blocks. In this step, only basic Tensorflow APIs in tf.layers and tf.nn are allowed
(d) (20 points) Complete “Model.py”. Note: For this step and last step, pay attention to
how to use batch normalization.
(e) (20 points) Tune all the hyperparameters in “main.py” and report your final testing