Project 3: Panoramic Mosaicing

What to Submit

Submit this iPython Notebook–containing all your code for the programming exercises below–on learning suite.

Your notebook file should produce the relevant plots and also provide a short write-up with answers to the questions below.

Please also fill in here the time that each part took you:

Part A: FILL IN TIME

Part B: FILL IN TIME

Part C: FILL IN TIME

Part D: FILL IN TIME

Write-up: FILL IN TIME

Programming Exercise

For this assignment, you will be writing a program that creates a panoramic mosaic from 2 or more images. In general this technique should be applicable to any number of photographs. The approach described below will work well for collective fields of up to 90 or even 120°, but won’t produce ideal results for large fields of view approaching or surpassing 180°. For large fields of view cylindrical or spherical projection is required.

When we construct a panorama, we assume that all of the photographs were taken from the exact same location and that the images are related by pure rotation (no translation of the camera). The easiest way to create the panorama is to project all of the photos onto a plane. One photo must be selected (either manually or by your program) to be the base photo. The other photos are aligned to this base photo by identifying a homography (a planar warp specified by 4 pairs of source/destination points) relating each pair. Each of the other images is appropriately warped and composited onto the plane (the base image doesn’t need to be warped).

In describing what you need to do, there will be a running example using the three photos below:

Part A: Find Interest Points/Descriptors in each Input Image

We will be using OpenCV for this project, which you should already have installed. However, you may need to install the contrib version–which comes seperate due to the SIFT algorithm being patented–by running the command:pip install opencv-contrib-python. A good tutorial on how to use SIFT features in OpenCV is found here. The first step to registering or aligning two images is to identify locations in each image that are distinctive or stand out. The sift.detectAndCompute() routine produces both these interest points and their corresponding SIFT descriptors. The first step of producing a panorama is to load all of the relevant images and find the interest points and their descriptors.

See the red circles on each image below indicating the sift keypoints that were found (note that we downsampled the images to 600 x 600 pixels before extracting SIFT). We scaled the circles according to the scale at which each keypoint was detected at.

Part B: Matching Features

Next, given the features present in each image, you need to match the features so as to determine corresponding points between adjacent/overlapping images. This page provides details to do feature matching using cv2.BFMatcher(), analogous to the approach proposed by David Lowe in his original implementation. Be aware that the resulting match is one directional. You want to find putative pairs–pairs of points which are each other’s best match (e.g. there might be 3 points in image I1 for which a point q in image I2 are the best match, only one of these could be the best matching point p in I1 for that point q in I2). In this part you need to compute the set of putative matches between each pair of images.

Look at the pairs of images and the lines showing the estimated matches (putative matches are green lines, one way matches are cyan or blue).

Part C: Estimating Pairwise Homographies using RANSAC

Use the RANSAC algorithm (Szeliski, Ch 6.1.4), estimate the homography between each pair of images. You will need to decide whether you’re going to manually specify the base image or determine in programmatically. Along with identifying the base image, you need to figure out the order in which you will composite the other images to the base.

You will need 4 pairs of points to estimate a homography. Begin by randomly sampling sets of 4 pairs and estimating the corresponding homography for each set. Instead of the two warping equations that we used earlier in the semester, it is recommended that you use a 3×3 homography (8 unknowns). You are trying to estimate the homography

⎡⎣⎢adgbehcf1⎤⎦⎥

such that a point (xs,ys) in the source image is tranformed to a point (xt,yt) in the target image as follows

⎡⎣⎢xtyt1⎤⎦⎥=⎡⎣⎢adgbehcf1⎤⎦⎥⎡⎣⎢xsys1⎤⎦⎥

Each pair of points will produce three linear equations in (a subset of) the 8 unknowns. For example, xt=xsa+ysb+c . Four pairs of points (assuming no degeneracies) are sufficient to estimate the homography. A more robust solution relying on more than four pairs can be obtained using least squares on the overconstrained linear system (solving Ax=b , where x is a column vector with the 8 unknowns and you populate rows of A and an entry of b with the linear equations just described). Note that this solution will not always be better.

Because of the homogeneous coordinates, the three equations can be reduced to two equations as follows:

xt=axs+bys+cgxs+hys+1,yt=dxs+eys+fgxs+hys+1

For more details, see the image alignment and stitching slides.

Below you will find a visualization of the RANSAC estimated homographies. Images 1, 2, and 3 have dots that are red, green and blue respectively (sorry the dots are a little small), representing the putative pairs. You can see where the homographies line up very well and in a few places (the middle vertically) they line up slightly less well.

Part D: Creating the Mosaic

Begin with the base image and warp the remaining images (using the estimated homographies) to composite them onto the base image.

For the ongoing campus example, here are the resulting warped images composited.

And, then with a very simple (but not ideal) compositing operation.

Part A: Find Interest Points/Descriptors

import numpy as np

import cv2 as cv

import matplotlib.pyplot as plt

from random import sample

img1 = cv.imread(“Images/width600/campus1_sq600.png”)[:, :, ::-1] # query image

img2 = cv.imread(“Images/width600/campus2_sq600.png”)[:, :, ::-1] # train image 2

img3 = cv.imread(“Images/width600/campus3_sq600.png”)[:, :, ::-1] # train image 3

gray1 = cv.cvtColor(img1, cv.COLOR_BGR2GRAY)

gray2 = cv.cvtColor(img2, cv.COLOR_BGR2GRAY)

gray3 = cv.cvtColor(img3, cv.COLOR_BGR2GRAY)

sift = cv.SIFT_create()

kp1, des1 = sift.detectAndCompute(gray1, None)

kp2, des2 = sift.detectAndCompute(gray2, None)

kp3, des3 = sift.detectAndCompute(gray3, None)

# Show an example output here

plt.imshow(cv.drawKeypoints(img1, kp1, np.zeros((600,600,3)))), plt.show()

(<matplotlib.image.AxesImage at 0x12ae86580>, None)

Part B: Matching Features

bg = cv.BFMatcher(crossCheck=True)

matches1 = bg.knnMatch(des2, des1, k=1)

matches3 = bg.knnMatch(des2, des3, k=1)

img1b = cv.drawMatchesKnn(img1, kp1, img2, kp2, matches1, None, flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

matches1 = list(map(lambda x: x[0], list(filter(lambda x: len(x) == 1, matches1))))

matches3 = list(map(lambda x: x[0], list(filter(lambda x: len(x) == 1, matches3))))

# Show an example output here

plt.imshow(img1b), plt.show()

(<matplotlib.image.AxesImage at 0x12be68070>, None)

Part C: Estimating Pairwise Homographies using RANSAC

def matrix_a_rows(match, kp_base, kp_prime):

pt = kp_base[match.queryIdx].pt

pt_prime = kp_prime[match.trainIdx].pt

row1 = [pt_prime[0], pt_prime[1], 1, 0, 0, 0, -1*pt[0]*pt_prime[0], -1*pt[0]*pt_prime[1]]

row2 = [0, 0, 0, pt_prime[0], pt_prime[1], 1, -1*pt[1]*pt_prime[0], -1*pt[1]*pt_prime[1]]

return row1, row2

def x_y_prime(match, kp_base):

pt_prime = kp_base[match.queryIdx].pt

return pt_prime[0], pt_prime[1]

def get_matrix_c(matches_sample, keypoints):

result = []

for match in matches_sample:

x, y = x_y_prime(match, keypoints)

result.append([x])

result.append([y])

return np.matrix(result)

def x_trans(x, y, a, b, c, g, h, i=1):

return (a*x + b*y + c)/(g*x + h*y + i)

def y_trans(x, y, d, e, f, g, h, i=1):

return (d*x + e*y + f)/(g*x + h*y + i)

def rand_homography(matches, kp_base, kp_prime):

matches_sample = sample(matches, 4)

# four-point algorithm is AB = C => B = inv(A)C

row1, row2 = matrix_a_rows(matches_sample[0], kp_base, kp_prime)

row3, row4 = matrix_a_rows(matches_sample[1], kp_base, kp_prime)

row5, row6 = matrix_a_rows(matches_sample[2], kp_base, kp_prime)

row7, row8 = matrix_a_rows(matches_sample[3], kp_base, kp_prime)

try:

matrix_a = np.matrix([row1, row2, row3, row4, row5, row6, row7, row8])

matrix_a_inv = np.linalg.inv(matrix_a)

matrix_c = get_matrix_c(matches_sample, kp_base)

matrix_b = np.matmul(matrix_a_inv, matrix_c)

a, b, c, d, e, f, g, h = matrix_b[0,0], matrix_b[1,0], matrix_b[2,0], matrix_b[3,0], matrix_b[4,0], matrix_b[5,0], matrix_b[6,0], matrix_b[7,0]

return a, b, c, d, e, f, g, h

except Exception as e:

print(e)

return 0, 0, 0, 0, 0, 0, 0, 0

# k is the number of iterations

# t is our threshold for how off a pixel can be

def best_homography(matches, kp_base, kp_prime, k=1000, t=50):

# best homography so far

a, b, c, d, e, f, g, h = 0,0,0,0,0,0,0,0

# best number of pixels that fit the model (inliers)

best_d = 0

for i in range(k):

temp_a, temp_b, temp_c, temp_d, temp_e, temp_f, temp_g, temp_h = rand_homography(matches, kp_base, kp_prime)

local_d = 0

for match in matches:

pt = kp_prime[match.trainIdx].pt

pt_prime_actual = kp_base[match.queryIdx].pt

tentative_x = x_trans(pt[0], pt[1], temp_a, temp_b, temp_c, temp_g, temp_h)

tentative_y = x_trans(pt[0], pt[1], temp_d, temp_e, temp_f, temp_g, temp_h)

if abs(tentative_x – pt_prime_actual[0]) > t:

continue

if abs(tentative_y – pt_prime_actual[1]) > t:

continue

local_d += 1

if local_d > best_d:

a, b, c, d, e, f, g, h = temp_a, temp_b, temp_c, temp_d, temp_e, temp_f, temp_g, temp_h

best_d = local_d

print(best_d)

return a, b, c, d, e, f, g, h

# Show an example output here

a1, b1, c1, d1, e1, f1, g1, h1 = best_homography(matches1, kp2, kp1)

homography1 = np.matrix([[a1, b1, c1], [d1, e1, f1], [g1, h1, 1]])

76

84

289

290

292

Singular matrix

Singular matrix

a3, b3, c3, d3, e3, f3, g3, h3 = best_homography(matches3, kp2, kp3)

homography2 = np.matrix([[a3, b3, c3], [d3, e3, f3], [g3, h3, 1]])

23

33

52

57

91

101

248

267

269

270

def mapped_coordinates(x, y, homography):

mapped = np.matmul(homography, np.matrix([[x],[y],[1]]))

return mapped[0,0], mapped[1,0]

# Image1

img1_upper_left_x, img1_upper_left_y = mapped_coordinates(0, 0, homography1)

img1_lower_left_x, img1_lower_left_y = mapped_coordinates(0, 599, homography1)

img1_upper_right_x, img1_upper_right_y = mapped_coordinates(599, 0, homography1)

img1_lower_right_x, img1_lower_right_y = mapped_coordinates(599, 599, homography1)

img1_furthest_left_x = min(img1_upper_left_x, img1_lower_left_x)

img1_furthest_right_x = max(img1_upper_right_x, img1_lower_right_x)

img1_furthest_top_y = min(img1_upper_left_y, img1_upper_right_y)

img1_furthest_bot_y = max(img1_lower_left_y, img1_lower_right_y)

img1_mapped_width = abs(img1_furthest_left_x) + abs(img1_furthest_right_x)

# Image3

img3_upper_left_x, img3_upper_left_y = mapped_coordinates(0, 0, homography2)

img3_lower_left_x, img3_lower_left_y = mapped_coordinates(0, 599, homography2)

img3_upper_right_x, img3_upper_right_y = mapped_coordinates(599, 0, homography2)

img3_lower_right_x, img3_lower_right_y = mapped_coordinates(599, 599, homography2)

img3_furthest_left_x = min(img3_upper_left_x, img3_lower_left_x)

img3_furthest_right_x = max(img3_upper_right_x, img3_lower_right_x)

img3_furthest_top_y = min(img3_upper_left_y, img3_upper_right_y)

img3_furthest_bot_y = max(img3_lower_left_y, img3_lower_right_y)

img3_mapped_width = abs(img3_furthest_left) + abs(img3_furthest_right)

Part D: Creating the Mosaic

canvas_width = int(abs(img1_furthest_left_x) + abs(img1_furthest_right_x-img3_furthest_left_x) + img3_furthest_right_x)

total_furthest_top_y = min(img1_furthest_top_y, 0, img3_furthest_top_y)

total_furthest_bot_y = max(img1_furthest_bot_y, 599, img3_furthest_bot_y)

canvas_height = int(abs(total_furthest_top_y) + total_furthest_bot_y)

canvas = np.zeros((canvas_height, canvas_width, 3))

canvas = np.array(canvas, dtype=int)

canvas_x_offset = int(abs(img1_furthest_left_x))

canvas_y_offset = int(abs(total_furthest_top_y))

canvas[canvas_y_offset:canvas_y_offset+600, canvas_x_offset:canvas_x_offset+600] = img2

trans_matrix = np.matrix([[1, 0, canvas_x_offset], [0, 1, canvas_y_offset], [0,0,1]])

img1_canvas = cv.warpPerspective(img1, trans_matrix*homography1, (canvas_width, canvas_height))

img3_canvas = cv.warpPerspective(img3, trans_matrix*homography2, (canvas_width, canvas_height))

imgs_1_3_canvas = cv.addWeighted(img1_canvas, 1, img3_canvas, 1, 0)

imgs_1_3_canvas = np.array(imgs_1_3_canvas, dtype=int)

imgs_1_2_3_canvas = cv.addWeighted(imgs_1_3_canvas, 0.5, canvas, 0.5, 0)

plt.imshow(imgs_1_2_3_canvas), plt.show()

(<matplotlib.image.AxesImage at 0x12bed91c0>, None)

Final Results and Improvements

# Output results for additional images here

# Feel free to add as many cells as you wish

Grading

To get 100% you need to (i) implement RANSAC and additionally (ii) either implement the feature matching yourself (instead of using built-in matching functions such as cv2.BFMatcher()), or incorporate one of the following improvements:

A nice clean compositing/blending approach so that edges/artifacts are not noticeable.

Automatic selection of which image should be the base

Handling more than 3 photos

Another enhancement approved by Dr. Farrell

Points for this assigment will be assigned as follows (100 points total):

[10 pts] Extracting features from both images (interest points and descriptors).

[20 pts] Four-point algorithm to estimate homographies.

[30 pts] RANSAC implemented (partial points given for poor alignments).

[20 pts] Images warped appropriately (aligning on top of each other).

[10 pts] Clean final image (extents of merged image should fit the enscribed rectangle).

[10 pts] Implementing matching or other improvement (see above). Bonus points may be given for additional enhancements.

Write-up:

Provide an explanation for the following items:

In what scenarios was it difficult to get good alignments between images?

If you have any suggestions for how to improve this project in the future, list them here.