Intro to Image Understanding (CSC420)
Total: 150 marks
• You are allowed to work directly with one other person to discuss the questions. However, you are still expected to write the solutions/code/report in your own words; i.e.
no copying. If you choose to work with someone else, you must indicate this in your
assignment submission. For example, on the first line of your report file (after your
own name an information, and before starting your answer to Q1), you should have
a sentence that says: “In solving the questions in this assignment, I worked together
with my classmate [name & student number]. I confirm that I have written the solutions/code/report in my own words”.
• Your submission should be in the form of an electronic report (PDF), with the answers
to the specific questions (each question separately), and a presentation and discussion
of your results. For this, please submit a file named report.pdf to MarkUs directly.
• Submit documented codes that you have written to generate your results separately.
Please store all of those files in a folder called assignment1, zip the folder and then
submit the file assignment1.zip to MarkUs. You should include a README.txt
file (inside the folder) which details how to run the submitted codes.
• Do not worry if you realize you made a mistake after submitting your zip file; you can
submit multiple times on MarkUs.
Part I: Theoretical Problems (70 marks)
[Question 1] LTI Systems (10 marks)
Linear time-invariant (LTI) systems are a class of systems that are both linear and timeinvariant. In linear systems, the output for a linear combination of inputs is equal to the
linear combination of individual responses to those inputs. In other words, for a system T,
signals x1(n) and x2(n), and scalars a1 and a2, system T is linear if and only if:
T[a1x1(n) + a2x2(n)] = a1T[x1(n)] + a2T[x2(n)]
Also, a system is time-invariant if a shift in its input merely shifts the output; i.e. If T[x(t)] =
y(t), system T is time-invariant if and only if:
T[x(n n0)] = y(n n0)
Now, consider a linear time-invariant system T with input x(n) and impulse response h(n).
Recall that the impulse response is defined as the output of the system when the input is an
impulse function (n), i.e. T[(n)] = h(n), where:
(n) = (
1, if n = 0,
Prove that T[x(n)] = h(n) ⇤ x(n), where ⇤ denotes convolution operation.
Hint: represent signal x(n) as a function of (n).
[Question 2] Polynomial Multiplication and Convolution (15 marks)
Vectors can be used to represent polynomials. For example, 3rd-degree polynomial (a3x3 +
a2x2 + a1x + a0) can by represented by vector [a3, a2, a1, a0].
If u and v are vectors of polynomial coecients, prove that convolving them is equivalent to
multiplying the two polynomials they each represent.
Hint: You need to assume proper zero-padding to support the full-size convolution.
[Question 3] Image Pyramids (20 marks)
In Gaussian pyramids, the image at each level Ik is constructed by blurring the image at
the previous level Ik1 and downsampling it by a factor of 2. A Laplacian pyramid, on the
other hand, consists of the di↵erence between the image at each level (Ik) and the upsampled
version of the image in the next level of Gaussian pyramid (Ik+1).
Given an image of size 2n ⇥ 2n denoted by I0, and its Laplacian pyramid representation
denoted by L0, …, Ln1, show how we can reconstruct the original image, using the minimum
information from the Gaussian pyramid. Specify the minimum information required from
the Gaussian pyramid, and a closed-form expression for reconstructing I0.
Hint: The reconstruction follows a recursive process; What is the base-case that contains
the minimum information?
[Question 4] Laplacian Operator (25 marks)
The Laplace operator is a second-order di↵erential operator in the “n”-dimensional Euclidean
space, defined as the divergence (r.) of the gradient (rf). Thus if f is a twice-di↵erentiable
real-valued function, then the Laplacian of f is defined by:
f = r2
f = r · rf = Xn
where the latter notations derive from formally writing:
Now, consider a 2D image I(x, y) and its Laplacian, given by I = Ixx+Iyy. Here the second
partial derivatives are taken with respect to the directions of the variables x, y associated
with the image grid for convenience. Show that the Laplacian is in fact rotation invariant.
In other words, show that I = Irr + Ir0r0, where r and r0 are any two orthogonal directions.
Hint: Start by using polar coordinates to describe a chosen location (x, y). Then use the
Part II: Implementation Tasks (80 marks)
[Question 4] Edge Detection
In this question, the goal is to implement a rudimentary edge detection process which uses
derivative of Gaussian, through a series of steps. For each step (excluding step 1) you are
supposed to test your implementation on the provided image, and also on one image of your
own choice. Include the results in your report.
Step I – Gaussian Blurring (20 marks): Implement a function that returns a 2D Gaussian matrix for input size and scale . Please note that you should not use any of the
existing libraries to create the filter, e.g. cv2.getGaussianKernel(). Moreover, visualize this
2D Gaussian matrix for two choices of with appropriate filter sizes. For the visualization,
you may consider a 2D image with colormap, or a 3D graph. Make sure to include the color
bar or axis values.
Step II – Gradient Magnitude (20 marks): In the lectures, we discussed how partial
derivatives of an image is computed. We know that the edges in an image are from the
sudden changes of intensity and one way to capture that sudden change is to calculate the
gradient magnitude at each pixel. The edge strength or gradient magnitude is defined as:
g(x, y) = |rf(x, y)| =
x + g2
where gx and gy are the gradients of image f(x, y) along x and y-axis direction respectively.
Using the Sobel operator, gx and gy can be computed as:
5 ⇤ f(x, y) and gy =
1 2 1
5 ⇤ f(x, y)
Implement a function that receives an image f(x, y) as input and returns its gradient g(x, y)
magnitude as output using the Sobel operator. You are supposed to implement the convolution required for this task from scratch, without using any existing libraries.
Step III – Threshold Algorithm (30 marks): After finding the image gradient, the
next step is to automatically find a threshold value so that edges can be determined. One
algorithm to automatically determine image-dependent threshold is as follows:
1. Let the initial threshold ⌧0 be equal to the average intensity of gradient image g(x, y),
as defined below:
i=1 g(x, y)
h ⇥ w
where, h and w are height and width of the image under consideration.
2. Set iteration index i = 0, and categorize the pixels into two classes, where the lower
class consists of the pixels whose gradient magnitudes are less than ⌧0, and the upper
class contains the rest of the pixels.
3. Compute the average gradient magnitudes mL and mH of lower and upper classes,
4. Set iteration i = i + 1 and update threshold value as:
⌧i = mL + mH
5. Repeat steps 2 to 4 until |⌧i ⌧i1| ✏ is satisfied, where ✏ ! 0; take ⌧i as final
threshold and denote it by ⌧ .
Once the final threshold is obtained, each pixel of gradient image g(x, y) is compared
with ⌧ . The pixels with gradient higher than ⌧ are considered as edge point and is
represented as a white pixel; otherwise it is designated as black. The edge-mapped
image E(x, y), thus obtained is:
E(x, y) = (
255, if g(x, y) ⌧
Implement the aforementioned threshold algorithm. The input to this algorithm is the gradient image g(x, y) obtained from step II, and the output is a black and white edge-mapped
image E(x, y).
Step IV – Test (10 marks): Use the image provided along with this assignment, and also
one image of your choice to test all the previous steps (I to III) and to visualize your results
in the report. Convert the images to grayscale first. Please note that the input to each step
is the output of the previous step. In a brief paragraph, discuss how the algorithm works for
these two examples and highlight its strengths and/or its weaknesses.