Algorithms and Data Structures (ECE 345)

ASSIGNMENT 2

1. Assignments must be submitted by 5:00PM EST on the due date through the Quercus submission system as a single

PDF file.

2. Assignments must be completed individually except the programming exercise for which you can work in group

of up to two students. Report for the programming exercise must be submitted separately at the same time. Only

one report is needed for each group.

3. All pages must be numbered and no more than a single answer for any question.

4. Use LATEX or Microsoft Word for your assignment writeup. You can find a LATEX and a Word template for writing

your assignment on Quercus.

5. Any problem encountered with submission must be reported to the head TA as soon as possible.

6. Unless otherwise stated, you need to justify the correctness and complexity of algorithms you designed for the

problems.

EXERCISE 1 Another me?!, 10 points

Every UofT student has his/her unique UTORid. A UTORid is an eight-character (lowercase) alphanumeric string. We

call two UTORids similar if one can be converted to the other by replacing one single character (but not adding or

deleting). e.g., abcd0123 is similar to abcde123 or abcd0124, but not similar to abcd1023.You are given a set of n

UTORids, and your task is to find every UTORid in the set which has a similar UTORid in the set. Describe a worst-case

O(n) algorithm to complete your task. Justify your answer.

EXERCISE 2 Linear Time Sorting, 10 points

You are given an unsorted array A of b

√

nc integers ranging from −n to n. The absolute value of each element is a square

number. In other words, each of them is either of the form i

2 or −i

2

, where i is an integer. Devise an algorithm to sort A in

O(

√

n) time using only O(

√

n) memory. Partial credit will be awarded if your algorithm requires more time or memory

than what is specified above.

EXERCISE 3 AVL Trees, 10 points

Your friend Quincy and you established a very unreliable communication protocol: He consistently sends messages to

you in the form of (k, v), where k is the timestamp when he sends you the message, and v is the content of the message

represented by an integer value. We call two messages (k1, v1) and (k2, v2) you received out of order if k1 < k2 but

v1 > v2. Assume all messages have distinct k values and distinct v values, describe an O(logn) algorithm to determine

whether a new message is received out of order, where n is the number of messages you have already received.

1

EXERCISE 4 Hashing, 10 points

Given a list of n requests r1,··· ,rn, each request ri contains the ID of the server to which the request was sent. Your task

now is to find the first request in the list which is sent to a unique server, or report that no such request exists. Design an

algorithm that runs in expected O(n) time. Justify your answer.

EXERCISE 5 Hashing in Depth, 20 points

Suppose we have a hash table with m slots, and we have n values to hash into the hash table. Suppose m >> n and the

hashing function we use satisfies the simple uniform hashing assumption.

(a) Give a lower bound for the probability that a specific slot is empty after hashing n values. Your answer should be

given in the form of c0 +c1

n

m

, where c0 and c1 are constants. (Hint: You may want to use Binomial expansion).

(b) Give a lower bound for the probability that a specific slot stores only one value after hashing n values. Your answer

should be given in the form of f1(n)

m +

f2(n)

m2

, where f1 and f2 are polynomial functions of n.

(c) Give an upper bound for the probability that a specific slot stores at least two value after hashing n values. (Hint:

Use part (a) and (b)).

EXERCISE 6 Programming Exercise, 30 points

(In)secure Password Storage/Checking: In this programming assignment, you will experiment with simple and (in)secure

password storage/checking, and you will measure some performance statistics.

1. Outline:

• Assume that you will be given a file containing a list of existing passwords. The name of the file will be

passwords.txt, and the format will be one password per line without comma separators. Assume the file

will reside in the same path as your executables. A sample file will be provided to you on Blackboard. Your

program will need to parse the file and hash the passwords into a hash table. You may assume that your

program will be tested on files with 100−10000 passwords.

• Each password in passwords.txt will be alphanumeric without any complex characters such as !, ?,

#, &, etc. It will only consist of a mix of lower and/or upper case English alphabet letters with digits ∈ {0,…,9}. Each password will be 6 to 12 characters long. For example, ece345LoVe is a valid

password, whereas HATEece345HATE is invalid due to length violation (and probably other reasons…!), and

Pa33word!!?? is invalid due to containing complex characters.

• You are free to decide how to implement your hashing. Use only methods described in lecture. You are not

allowed to use cryptographic libraries such as OpenSSL. Security is not our concern in this exercise.

• Your executable should be named as checkpass, and it’s behavior should be as follows:

– It should be callable from command line as checkpass passwords.txt password, where passwords.txt

is a file containing passwords and password is user provided. For example: checkpass passwords.txt

23CdB9a13

For C/C++ please include a Makefile to build your sources into the executable. For any other compiled

language, please follow the same format as for C/C++.

For Java please create a Makefile that compiles your sources and a checkpass.sh script that runs your

code for the given input, with the correct classpath.

For Python please name your script checkpass.py, have #!/usr/bin/env pythonX (where X is the version of python you are using) as the first line of your file. Additionally please do chmod +x checkpass.py

so that you can run your file as ./checkpass.py passwords.txt password. For any other scripting

language (supported by the UG machines), please follow the same format as for python, except name

your file as per that language (ie for perl use checkpass.pl)

2

– Your program then should print in standard output VALID if and only if the password is valid according

to the constraints described above AND the password does not already exist in passwords.txt AND

the reverse of the password does not exist in passwords.txt. Think how to make the check for reverse

passwords efficient. In any other case your program will print INVALID.

– If the password entered is valid then your program should hash it and store the password (not its hash)

into passwords.txt, and then it should terminate. If the password is not valid then the program should

simply terminate after outputting INVALID.

2. Deliverables:

• Your full source code. Any code that does not build or does not run will receive a mark of 0. All code will be

marked on the UG machines. This should be handed in electronically.

• A written report with readable graphs (with proper lables, grid ticks, and legends as needed). The report must

address the following points:

– Implementation details regarding your hashing approach. What is the size of your hash table and which

hash function are you using? Are you applying chaining or open addressing? If you are using open

addressing state what probing sequence you are implementing.

– A brief justification of your implementation decisions above.

– For a fixed number of entries n = 1000, vary the size of your table to get various load factors. For each

load factor, run your code on five different password.txt files which you will create. Each file should

include 1000 password entries. Plot the average number of collisions vs. load factor. You will need

several values of load factor to see a trend on your plot. Briefly explain what you observe in the plot and

why.

NOTE: To set up this experiment, you can easily create another version of your program where there is

no password input from the command line. You only need to parse the passwords.txt files that you

will create and insert the entries in your hashtable. However, DO NOT submit this version of the code.

• A version of submission.toml file that contains your group information and details about your source files.

You can reuse the one from previous assignments. This should be handed in electronically.

3

Sale!