Lab 7: Multirate Signal Processing and Signal Processing Applications

We will primarily focus on multirate signal processing – upsampling and downsampling – in this lab. We will wrap up the lab with a look at a signal processing application beyond the scope of ECE 310, but of interest in future signal processing courses.

Multirate Signal Processing

In many signal processing applications, we would like to change the sampling rate of our digital system. This could mean either reducing or increasing our effective sampling rate. One possibility is to perform digital-to-analog (D/A) conversion, then resample our signal in the time domain to the desired rate. However, this process requires access to the desired resampling circuitry in addition to potential artifacts introduced by the D/A conversion and quantization effects when resampling the signal.

The preferred method is to perform rate conversion in the discrete sample domain. We may increase our sampling rate via a process known as upsampling or interpolation, while we may decrease our sampling rate using downsampling or decimation. In the following background sections, we will discuss the mathematical construction of upsampling and downsampling, and also discuss some practical considerations that must be made when working on multirate signal processing.

Upsampling

Upsampling (or interpolation) by an integer factor of U is accomplished by interpolating U−1 zeros between every entry in our original sequence. Let x[n] be our original signal. Then our upsampled signal y[n] will be as follows:

y[n]={

x[

n

U

],n=±U,2U,…,kU

0,else

y[n]=[x[0]

00…0

⏟

U−1 zeros

x[1]

00…0

⏟

U−1 zeros

x[2]…]

The resulting DTFT for y[n] may be derived as follows:

Yd(ω)=

∞

∑

n=−∞

y[n]e−jωn

=

∞

∑

n=−∞

x[n]e−jωnU

Yd(ω)=Xd(Uω).

The second line above follows from the definition of y[n] since the n’th value of x[n] is actually the Un’th value of y[n] and all other values of y[n] are zero. Thus we see the resulting spectrum will be “compressed” along the frequency axis by a factor of U while maintaining the same height. The issue here is that some of our adjacent 2π periodic copies of Xd(ω) will cross over into the central copy of Yd(ω) after upsampling. This is readily apparent when we check that Yd(π)=Xd(πU), which must be greater than the π limits of our central copy.

The simple solution is to place a low-pass filter after our upsampler to make sure we only extract the frequencies corresponding to the central copy of our original spectrum. We see that the bandwidth of this filter should be such that our cutoff frequency ωc maps to π in our original specturm Xd(ω). Thus, we would like ωc=

π

U

. Formally, our low-pass filter A(ω) will be given by

A(ω)={

1,|ω|≤

π

U

0,

π

U

<|ω|≤π

.

This low-pass filter accomplishes the very important task of interpolating over the zeros we placed in between our original signal values. Think of the low-pass filter as “filling in the blanks” or inferring the newly interpolated information. In summary, upsampling by a factor U involves interpolating U−1 zeros between each sample in our original signal and then low-pass filtering with a cutoff frequency of

π

U

to prevent the inclusion of spurious frequencies presented by adjacent DTFT copies in the original signal.

Downsampling

Downsampling (or decimation) by an integer factor D is performed by only keeping every D’th sample in the original signal. Let x[n] be our original signal. Our downsampled signal y[n] is given by

y[n]=x[Dn]

y[n]=[x[0]x[D]x[2D]…]

We will skip the derivation of Yd(ω) since it is more involved than the derivation for the interpolation case. Refer to your ECE 310 textbook for the full details. The DTFT of our decimated signal Yd(ω) is given by

Yd(ω)=

1

D

D−1

∑

k=0

Xd(

ω−2πk

D

).

Qualitatively, we see that the input spectrum will be scaled down by a factor of

1

D

and the frequency axis is stretched by a factor of D. It is very important, however, to note that the above form assumes an appropriate anti-aliasing filter has been applied to the input signal prior to decimation. What should this filter look like? The anti-aliasing filter should remove all frequencies that would push into adjacent DTFT copies after decimation and cause aliasing. We know that the frequency axis will expand by a factor of D; thus, we know any frequencies beyond

π

D

would stretch beyond π and alias. Consequently, our anti-aliasing filter H(ω) should have the following form:

H(ω)={

1,|ω|≤

π

D

0,

π

D

<|ω|≤π

.

We observe that if we have a multirate signal processing system that interpolates and decimates by the same factor, we will be able to use one filter for both upsampling and downsampling! The difference, of course, is that we filter before decimation and after interpolation. In summary, downsampling by a factor of D requires us to first apply a low-pass anti-aliasing filter with cutoff frequency

π

D

then discard (or decimate) every sample that is not located at a multiiple of D.

Rate Conversion by

U

D

One final topic we should mention is the concatenation of multiple rate conversion systems to achieve an arbitrary sampling rate. Suppose we would like to increase our sampling rate by a non-integer factor like

4

3

. How can you interpolate

1

3

of a zero? You cannot! Instead, we may accomplish this new sampling rate by first upsampling by a factor of 4, then downsampling by a factor of 3. Furthermore, note that in this case the post-interpolation filter will have a cutoff frequency of

π

4

and thus eliminate the need for our anti-aliasing filter before decimation since

π

4

<

π

3

! We may achieve any new sampling rate by the same procedure with appropriate anti-aliasing filters.

#import libraries

import numpy as np

import matplotlib.pyplot as plt

from scipy import signal

from skimage.io import imread

from skimage.io import imsave

#Provided function with triangle magnitude spectrum

def toy_signal(w_c):

N = 1000

gain = 2*np.pi/w_c

return gain * np.array([(np.sin(w_c/2*(n+1))/(np.pi*(n+1)))**2 if n+1 != 0 else (w_c/(2*np.pi))**2 for n in range(-N,N+1)])

%matplotlib inline

Exercise 1: Upsampling

We will begin by exploring and verifying the above theoretical discussion regarding upsampling.

a. Fill in the below function upsample(U,x) to perform interpolation by a factor of U on an input signal x. Verify your function correctly interpolates zeros by printing the provided test signal before and after upsampling for some integer U.

b. The provided function toy_signal(ωc) creates a toy signal that has a triangular magnitude response with bandwidth ωc. More precisely, if our toy signal is given by x, the magnitude of the DTFT of this toy signal will be

|Xd(ω)|={

−|

ω

ωc

|+1|ω|≤ωc

0,ωc≤|ω|≤π

.

Use the provided toy signal function to create a signal with bandwidth

π

2

(ωc=

π

2

). Upsample this toy signal by a factor of U=3. Plot the magnitude of the FFT of the original and upsampled toy signals on separate subplots using np.fft.fft(). Hint: use np.fft.fftshift() to zero-center the frequency spectrum and np.linspace() to create your frequencies from −π to π for plotting.

c. Fill in the below function lowpass(C) to create an appropriate low-pass filter that should follow interpolation by a factor of C. You may use signal.remez() like in Lab 6 for Parks-McClellan filter design of a length 50 filter. You may also assume a transition bandwidth of

π

10

(0.1 in normalized frequency). Apply your low-pass filter to your upsampled toy signal from part 1.(b). Plot the magnitude of the FFT of the filtered signal to verify the spurious DTFT copies have been removed by your filter. Note that this filtering function will also work as an anti-aliasing filter for decimation by a factor of D.

#Fill in this function for part 1.a:

def upsample(U,x):

upsampled = []

for n in range (len(x)*U):

if (n%U == 0):

upsampled.append(x[int(n/U)])

else:

upsampled.append(0)

return upsampled

#Fill in this function for part 1.c:

def lowpass(C):

N = 50

a = [1,0]

lpf_bands = [0,1/C,1/C+.1,1]

lpf_desired = [1,0]

lpf = signal.remez(N,lpf_bands,lpf_desired, fs=2)

return lpf

#Test code for part 1.a:

test_signal = np.array([1,2,3,4,5,6])

upsampled_test = upsample(3, test_signal)

print(upsampled_test)

#Code for part 1.b:

tsignal = toy_signal(np.pi/2)

tsignal_up = upsample(3, tsignal)

tsignal_fft = np.fft.fft(tsignal)

tsignal_fft_up = np.fft.fft(tsignal_up)

tsignal_fft_shift = np.fft.fftshift(tsignal_fft)

tsignal_fft_up_shift = np.fft.fftshift(tsignal_fft_up)

x_axis = np.linspace(-np.pi, np.pi,len(tsignal_fft_shift))

x_axis_up = np.linspace(-np.pi, np.pi,len(tsignal_fft_up_shift))

plt.figure(figsize = (20,20))

plt.subplot(311)

plt.plot(x_axis, abs(tsignal_fft_shift))

plt.title(‘FFT for original signal’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

plt.subplot(312)

plt.plot(x_axis_up, abs(tsignal_fft_up_shift))

plt.title(‘FFT for upsampled signal’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

#Code for part 1.c:

lpf = lowpass(3)

signal_lpf = signal.convolve(lpf,tsignal_up)

signal_lpf_fft = np.fft.fft(signal_lpf)

signal_lpf_fft_shift = np.fft.fftshift(signal_lpf_fft)

x_axis_up_lpf = np.linspace(-np.pi,np.pi,len(signal_lpf_fft_shift))

plt.subplot(313)

plt.plot(x_axis_up_lpf, abs(signal_lpf_fft_shift))

plt.title(‘LPF with upsampled signals’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0, 6, 0, 0]

Text(0, 0.5, ‘Magnitude’)

Exercise 2: Downsampling

Now we will turn our attention to downsampling.

a. Fill in the below function downsample(D,x) to perform decimation by a factor of D on an input signal x. Verify your function correctly decimates by printing the provided test signal before and after downsampling for some integer D.

b. Use the provided toy signal function to create a signal with bandwidth

π

2

. Downsample this toy signal by a factor of D=2. Plot the magnitude of the FFT of the original and downsampled toy signals on separate subplots using np.fft.fft() (do not forget to use np.fft.fftshift() to center you FFT!). Do not worry about anti-aliasing filtering.

c. Use the provided toy signal function to create a signal with bandwidth

π

2

. Downsample this toy signal by a factor of D=3. Plot the magnitude of the FFT of the original and downsampled toy signals on separate subplots using np.fft.fft(). Do not worry about anti-aliasing filtering. Explain what is happening in the magnitude of the decimated signal’s FFT.

d. Finally, let’s use our upsampling, downsampling, and filtering functions to perform sampling rate conversion by a factor of

3

5

. This may be accomplished with two different schemes: downsampling followed by upsampling or upsampling followed by downsampling. These two schemes are not necessarily equivalent for a given input signal! Perform sampling rate conversion by a factor of

3

5

using both schemes on a toy signal with bandwidth

π

2

. Plot the magnitude of the FFT for the final system output for each scheme. What are the differences between the two outputs? Which scheme do you think works best? Why? Note: remember to apply appropriate anti-aliasing and post-interpolation filters before and after downsampling and upsampling, respectively!

#Fill in this function for part 2.a:

def downsample(D,x):

downsampled = []

for n in range (int(len(x)/D)):

downsampled.append(x[D*n])

return downsampled

#Test code for part 2.a:

test_signal = np.array([1,2,3,4,5,6])

test_signal_down = downsample(2,test_signal)

print(test_signal_down)

#Code for part 2.b:

tsig = toy_signal(np.pi/2)

tsig_down = downsample(2,tsig)

tsignal_fft = np.fft.fft(tsig)

tsignal_fft_down = np.fft.fft(tsig_down)

tsignal_fft_shift = np.fft.fftshift(tsignal_fft)

tsignal_fft_shift_down = np.fft.fftshift(tsignal_fft_down)

x_axis = np.linspace(-np.pi, np.pi,len(tsignal_fft_shift))

x_axis_down = np.linspace(-np.pi, np.pi,len(tsignal_fft_shift_down))

plt.figure(figsize = (20,20))

plt.subplot(321)

plt.plot(x_axis, abs(tsignal_fft_shift))

plt.title(‘FFT for original signal’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

plt.subplot(322)

plt.plot(x_axis_down, abs(tsignal_fft_shift_down))

plt.title(‘FFT for upsampled signal’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

#Code for part 2.c:

tsig = toy_signal(np.pi/2)

tsig_down = downsample(3,tsig)

tsignal_fft = np.fft.fft(tsig)

tsignal_fft_down = np.fft.fft(tsig_down)

tsignal_fft_shift = np.fft.fftshift(tsignal_fft)

tsignal_fft_shift_down = np.fft.fftshift(tsignal_fft_down)

x_axis = np.linspace(-np.pi, np.pi,len(tsignal_fft_shift))

x_axis_down = np.linspace(-np.pi, np.pi,len(tsignal_fft_shift_down))

plt.figure(figsize = (20,20))

plt.subplot(323)

plt.plot(x_axis, abs(tsignal_fft_shift))

plt.title(‘FFT for original signal’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

plt.subplot(324)

plt.plot(x_axis_down, abs(tsignal_fft_shift_down))

plt.title(‘FFT for upsampled signal’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

#Code for part 2.d:

# downsampling then upsampling:

tsig = toy_signal(np.pi/2)

lpf = lowpass(5)

tsig_lpf = signal.convolve(lpf,tsig)

tsig_lpf_down = downsample(5,tsig_lpf)

tsig_lpf_down_up = upsample(3,tsig_lpf_down)

tsig_final = signal.convolve(lowpass(3),tsig_lpf_down_up)

tsig_fft = np.fft.fft(tsig_final)

tsignal_fft_shift = np.fft.fftshift(tsig_fft)

x_axis = np.linspace(-np.pi, np.pi,len(tsignal_fft_shift))

plt.subplot(325)

plt.plot(x_axis, abs(tsignal_fft_shift))

plt.title(‘Down then Up’)

plt.xlabel(‘$\omega$’)

plt.ylabel(‘Magnitude’)

# upsampling then downsampling:

tri_2_first = upsample(3,tsig)

tri_2_first = signal.convolve(lowpass(3), tri_2_first)

tri_2_first = signal.convolve(lowpass(5), tri_2_first)

tri_2_first = downsample(5,tri_2_first)

TRI_2_FIRST = np.fft.fftshift(np.fft.fft(tri_2_first))

plt.subplot(326)

plt.title(‘Up then Down’)

plt.plot(np.linspace(-np.pi, np.pi, len(TRI_2_FIRST)),np.abs(TRI_2_FIRST))

[1, 3, 5]

[<matplotlib.lines.Line2D at 0x7f9ebc68b518>]

Comments for part 2.c:

It is clear that there is aliasing from the end of central copy with end of other copies. The height decreases as 1/D.

Comments for part 2.d:

After applying the two sampling in different order to the toy signal, up then down yields the better result. It also appears that the second sampling down dictates the end result. In addition, since downsampling would generally result in loss of data, it could explain why doing downsampling first gives a weird shape.

Image Resizing

One common application of multirate signal processing is image resizing. Have you ever considered what happens when you stretch or shrink an image while placing images in a document? This operation extends our previous discussion of decimation and interpolation to two dimensions. For this lab, we will focus on increasing the size of a small image.

To upsample an image, we interpolate zeros like we would for a 1D signal, except we now do this is along both the rows and columns. In this exercise, we assume we will upsample by the same factor for both rows and columns (maintain aspect ratio) though it is not difficult to extend these concepts to unique scaling for rows and columns. Consider the following example where x is our original image and y is the result of upsampling by a factor of two along the rows and columns:

x=[

1 9 5

5 3 5

7 9 1

]y=[

1 0 9 0 5 0

0 0 0 0 0 0

5 0 3 0 5 0

0 0 0 0 0 0

7 0 9 0 1 0

0 0 0 0 0 0

]

Nearest Neighbor Interpolation

After upsampling our image, we must consider how we can fill in these new blank values. We will consider two options in this lab: nearest neighbor and bilinear intperpolation. Nearest neighbor interpolation works based on the simple assumption that the nearest original pixel is the best guess for the interpolated pixel. The assignment rule is as follows

y(i,j)=x(⌊

i

U

⌋,⌊

j

U

⌋),

1D Example:[

1 0 0 10 0 0 4

]→nearest neighbor interpolation→[

1 1 1 10 10 10 4

]

where U is our upsampling factor and ⌊⋅⌋ is the floor operator. If we apply nearest neighbor interpolation to our upsampled image y, we have the following interpolated image znn:

znn=[

1 1 9 9 5 5

1 1 9 9 5 5

5 5 3 3 5 5

5 5 3 3 5 5

7 7 9 9 1 1

7 7 9 9 1 1

].

Bilinear Interpolation

Bilinear interpolation is a little more complicated. Let’s first consider linear interpolation in one dimension. Linear interpolation makes the assumption that adjacent original values may be connected with a straight line. The interpolated values will be assigned according to the value of the line at that interpolation location. We have actually already had experience with linear interpolation. When we plot lines in Python using Matplotlib, the points are connected using linear interpolation! Let’s consider a toy 1D example of linear interpolation on a signal a:

a=[

1 0 0 10 0 0 4

]→linear interpolation→[

1 4 7 10 8 6 4

]

Mathematically, if we would like to interpolate between indices i and j in our toy signal a at index k such that i<k<j, we will assign the value at k as follows (assuming upsampling by U):

a[k]=

(k−i)

⏟

Δx

⋅

a[j]−a[i]

U

⏟

Δy

Δx

or slope

+

a[i]

⏟

offset/intercept

.

This equation should remind you of an equation of a line where we have a slope, intercept, and an independent variable in the distance from the first index i. Now if we would like to expand linear interpolation to become bilinear interpolation, we may simply perform linear interpolation at each row and at each column: the order does not matter! The below figure demonstrates bilinear interpolation via rows-then-colums and columns-then-rows to verify they produce the same result. Note that we will assume any values outside the image are zero!

y=[

1 0 9 0 5 0

0 0 0 0 0 0

5 0 3 0 5 0

0 0 0 0 0 0

7 0 9 0 1 0

0 0 0 0 0 0

]→rows→[

1 5 9 7 5 2.5

0 0 0 0 0 0

5 4 3 4 5 2.5

0 0 0 0 0 0

7 8 9 5 1 0.5

0 0 0 0 0 0

]→columns→[

1 5 9 7 5 2.5

3 4.5 6 5.5 5 2.5

5 4 3 4 5 2.5

6 6 6 4.5 3 1.5

7 8 9 5 1 0.5

3.5 4 4.5 2.5 0.5 0.25

]

y=[

1 0 9 0 5 0

0 0 0 0 0 0

5 0 3 0 5 0

0 0 0 0 0 0

7 0 9 0 1 0

0 0 0 0 0 0

]→columns→[

1 0 9 0 5 0

3 0 6 0 5 0

5 0 3 0 5 0

6 0 6 0 3 0

7 0 9 0 1 0

3.5 0 4.5 0 0.5 0

]→rows→[

1 5 9 7 5 2.5

3 4.5 6 5.5 5 2.5

5 4 3 4 5 2.5

6 6 6 4.5 3 1.5

7 8 9 5 1 0.5

3.5 4 4.5 2.5 0.5 0.25

]

Exercise 3: Image Resizing

We have provided a small test image named small-img.jpg (shown below) that we would like to upsample.

a. Fill in the upsample_image function below which will interpolate zeros to perform 2D upsampling. We will assume that we are upsampling our image by the same factor U in both dimensions. You may re-use your upsample() function from Exercise 1 or write new upsampling code here. Apply your upsampling code to the small image with U=8 and plot the result.

b. Fill in the nn() function below, which performs nearest neighbor interpolation on an upsampled 1D-signal. Print the provided test signal and the result of applying your nearest neighbor interpolation function on the test signal. Hint: one possible solution is to convolve with a length U filter. Consider what this filter may look like to perform nearest neighbor interpolation. Think about a zero-order hold!

c. Fill in the linear() function below, which performs linear interpolation on an upsampled 1D-signal. Print the provided test signal and the result of applying your linear interpolation function on the test signal. Hint: one possible solution is to convolve with a length 2U−1 filter. Consider what this filter may look like to perform linear interpolation. Think about a first-order hold!

d. Apply your nearest neighbor and linear interpolation functions to the your upsampled image. Note that you may simply apply each function along the rows and columns separately to perform 2D nearest neighbor inteprolation and bilinear interpolation. Plot the two resulting images and comment on the differences. We recommend you make two copies of the upsampled image to separate your nearest neighbor and bilinear inteprolation computation/results.

#Fill in this function for part 3.a:

def upsample_image(U,img):

n_rows,n_cols = img.shape

up_img = np.zeros((n_rows*U,n_cols*U))

for i in range(n_rows):

for j in range(n_cols):

up_img[U*i,U*j] = img[i,j]

return up_img

#Fill in this function for part 3.b

#Assume x has already been upsampled

def nn(U,x):

for n in range(len(x)):

remainder = n%U

copy = n – remainder

x[n] = x[copy]

return x

#Fill in this function for part 3.c:

#Assume x has already been upsampled

def linear(U,x):

for n in range(len(x)):

if n == len(x) – U:

for j in range(U-1):

j+=1

x[n+j]=x[n+j-1]/2

remainder=n%U

if remainder==0:

n_last=n

n_next=n+U

if n==len(x)-U:

break

for k in range(U):

k+=1

delta_x=n+k-n_last

slope=(x[n_next]-x[n_last])/U

bias=x[n_last]

x[n+k]=delta_x*slope+bias

return x

#Code for part 3.a:

small_img = imread(‘small-img.jpg’)

upsample_dog = upsample_image(8,small_img)

plt.figure(figsize = (15,15))

plt.imshow(upsample_dog)

#Test signal for parts 3.b and 3.c:

test_signal = np.array([1.,0,0,10,0,0,4,0,0]) #U = 3

#Code for part 3.b:

sig_nn = nn(3, test_signal)

print(sig_nn)

#Code for part 3.c:

sig_linear = linear(3,test_signal)

print(sig_linear)

#Code for part 3.d:

up_nn = upsample_image(8, small_img)

up_linear = up_nn

rows, cols = up_nn.shape

##nn intropolation

for i in range(rows):

up_nn[i] = nn(8, up_nn[i])

for i in range(cols):

up_nn[:,i] = nn(8, up_nn[:,i])

plt.figure(figsize=(15,15))

plt.imshow(up_nn,’gray’)

#linear

for i in range(rows):

up_linear[i] = linear(8, up_linear[i])

for i in range(cols):

up_linear[:,i] = linear(8, up_linear[:,i])

plt.figure(figsize=(15,15))

plt.imshow(up_linear,’gray’)

[ 1. 1. 1. 10. 10. 10. 4. 4. 4.]

[ 1. 4. 7. 10. 8. 6. 4. 2. 1.]

<matplotlib.image.AxesImage at 0x7f9ebbdc0cc0>

Comments for 3.d:

Linear yields much better results than nearest neighbor. This could be due to the fact that linear interpolation calculates a sweeping average of the nearest neighbour, giving rise to a much smoother transitions between pixels.

Least Squares and Linear Regression

Many fields of signal processing work with optimization problems know as least squares problems. A least square problem may be stated many different ways. One common such way is as follows: suppose we have N input/output pairs where (xi,yi) is the i’th such pair, xi is P-dimensional and yi is a scalar. The least squares problem may be given by:

min

w

1

N

N

∑

i=1

(w⊤xi−yi)2

Intuitively, the least squares problem is asking for the vector w that minimizes the sum of our squared errors or, equivalently, our mean squared error (MSE) across all N input/output pairs where we estimate yi from ∑

p

j=1

x[p]w[p]. In any context, we would like to minimize the sum of our squared errors or equivalently, our mean squared error (MSE). In vectorized form, the least squares problem may be stated as

min

w

1

N

‖Xw−Y‖2,

where X is an N×P matrix, w is a length P vector, Y is a length N vector, and ‖⋅‖ is the L2 norm (Euclidean distance).

Linear Regression

One excellent example of a least squares problem is linear regression. Conceptually, linear regression seeks to find a “line of best fit” for a given set of data points. The image below illustrates a linear regression solution.

The notion of “best” line is defined by a least squares problem for N data points

min

w,b

1

N

N

∑

i=1

(xiw+b−yi)2,

where w is the slope of our line and b is our y-intercept and our input data is one-dimensional. We may again formulate our least squares problem in a vectorized form:

min

→

w

1

N

‖X

→

w

−Y‖2,

where

→

w

=[w,b] and X is augmented with a column of ones. Thus X looks like the following

X=[

x1 1

x2 1

⋮

xn 1

].

Take a couple minutes to verify the summation and vectorized formulations of the problems are equivalent! Furthermore, it is important to note the importance of the intercept b and the resulting augmentation of the matrix X to accommodate the intercept. Without this intercept, we would only be able to express solutions that pass through the origin of our cooridnate system. In this case, the intercept value allows us to express any line in the x−y plane. Recall from our discussion of adaptive filtering in Lab 6 that we may minimize our objective function by setting its gradient with respect to

→

w

to zero. This derivation is as follows:

∇

→

w

1

N

‖X

→

w

−Y‖2=

2

N

X⊤(X

→

w

−Y)=0

X⊤X

→

w

=X⊤Y

→

w

=(X⊤X)−1X⊤Y

Remember that X⊤ above refers to the transpose of the matrix X and X−1 refers to the inverse of a matrix. Thus, we have our solution vector in a relatively simple, closed form. Now, let’s try applying this solution to a linear regression problem!

Exercise 4: Linear Regression

We have provided four datasets in the files winter-data.csv, spring-data.csv, summer-data.csv, and fall-data.csv that capture the high and low temperatures of each day in Champaign, IL for each season in 2017. Therefore, each dataset has roughly 90 low, high temperature pairs. We would like to apply linear regression to predict the high temperature for a given day using only knowledge of the low temperature for that day (low temperatures are X, high temperatures are Y). Furthermore, it may be interesting to see which season’s low temperatures are most predictive of their high temperatures.

We have provided the code below that loads the data for each season. Note that the loaded data for the low temperatures already has a “1” appended to each row to augment the data for linear regression. We have also provided a function in visualize_solution() that plots the provided seasonal temperature data along with your linear regression solution. Refer to the below documentation on how to use this function.

a. Fill in the function linear_regression() which takes the low temperature data and high temperature data and computes the solution to the linear regression problem. Refer to the previous technical background which gives the closed form solution. Also, fill in the function mse() which takes the low and high temperature data and your linear regression solution to compute the mean squared error (MSE) for your solution. Note that the MSE is the same as evaluating the least-squares objective using your solution.

b. Compute the linear regression solution and corresponding mean squared error for Winter. Plot the data and your solution using the provided function and print the mean squared error.

c. Compute the linear regression solution and corresponding mean squared error for Spring. Plot the data and your solution and print the mean squared error.

d. Compute the linear regression solution and corresponding mean squared error for Summer. Plot the data and your solution and print the mean squared error.

e. Compute the linear regression solution and corresponding mean squared error for Fall. Plot the data and your solution and print the mean squared error.

f. In which season is it easiest to predict the high temperature from the low temperature? Which season is the hardest? Do these results surprise you? Do you have any theories or ideas to explain the results?

def load_temp_data(season_string):

file_name = season_string + ‘.npy’

data = np.load(file_name)

lows = np.zeros(data.shape)

lows[:,0] = data[:,0]

lows[:,1] = np.ones(data.shape[0]) #augment column of ones

highs = data[:,1]

return lows,highs

winter_lows,winter_highs = load_temp_data(‘winter’)

spring_lows,spring_highs = load_temp_data(‘spring’)

summer_lows,summer_highs = load_temp_data(‘summer’)

fall_lows,fall_highs = load_temp_data(‘fall’)

#Provided function

“””

Inputs:

low_data – One-augmented low temperature data for the given season.

high_data – High temperature data for the given season.

w – Solution vector for the given season (Should be length two!).

title – String for the title of your plot.

Example usage: visualize_solution(winter_low,winter_highs,winter_w,’Winter Solution and Data’)

“””

def visualize_solution(low_data,high_data,w,title):

if len(w) != 2:

print(‘Incorrect solution dimension!’)

return

min_low = np.min(low_data[:,0])

max_low = np.max(low_data[:,0])

x = np.linspace(min_low,max_low,1000)

y = w[0]*x + w[1]

plt.figure(figsize=(10,6))

plt.title(title)

plt.scatter(low_data[:,0],high_data,label=’Data’)

plt.plot(x,y,’r–‘,label=’Linear Regression’)

plt.xlabel(‘Low Temperatures’)

plt.ylabel(‘High Temperatures’)

plt.legend()

#Fill in these functions for part 4.a:

def linear_regression(low,high):

#Hint: use np.linalg.inv() to invert a matrix

return (np.linalg.inv(np.transpose(low)@low))@np.transpose(low)@high

def mse(low,high,w):

#Hint: use np.linalg.norm()

mse = (1 / len(low)) * np.linalg.norm([email protected] -high)**2

return mse

#Code for 4.b:

print(“MSE for Winter :”, mse(winter_lows,winter_highs,linear_regression(winter_lows,winter_highs)))

visualize_solution(winter_lows,winter_highs,linear_regression(winter_lows,winter_highs),”Winter Solution and Data”)

#Code for 4.c:

print(“MSE for Spring :”, mse(spring_lows,spring_highs,linear_regression(spring_lows,spring_highs)))

visualize_solution(spring_lows,spring_highs,linear_regression(spring_lows,spring_highs),”Spring Solution and Data”)

#Code for 4.d:

print(“MSE for Summer :”, mse(summer_lows,summer_highs,linear_regression(summer_lows,summer_highs)))

visualize_solution(summer_lows,summer_highs,linear_regression(summer_lows,summer_highs),”Summer Solution and Data”)

#Code for 4.e:

print(“MSE for Fall :”, mse(fall_lows,fall_highs,linear_regression(fall_lows,fall_highs)))

visualize_solution(fall_lows,fall_highs,linear_regression(fall_lows,fall_highs),”Fall Solution and Data”)

MSE for Winter : 69.71576908363453

MSE for Spring : 51.27564085101072

MSE for Summer : 19.454565150071687

MSE for Fall : 58.87896918882463

Comments for 4.f:

Summer is the easiest season to predict while winter is the hardest. This makes sense because the winter season has a lot more variability when determining the temperature such as wind chill or rainfall, while summer has less factors that determine the temperature variance. This observation is shown quantitatively in the MSE.