1. (15 pts) Bellatrix Lestrange is writing a secret message to Voldemort and wants to prevent it from being understood by meddlesome young wizards and Muggles. She decides to use Huﬀman encoding to encode the message. Magically, the symbol frequencies of the message are given by the Pell numbers, a famous sequence of integers known since antiquity and related to the Fibonacci numbers. The nth Pell number is deﬁned as Pn = 2Pn−1 + Pn−2 for n 1 with base cases P0 = 0 and P1 = 1. (a) For an alphabet of Σ = {a,b,c,d,e,f,g,h} with frequencies given by the ﬁrst |Σ| non-zero Pell numbers, give an optimal Huﬀman code and the corresponding encoding tree for Bellatrix to use. (b) Generalize your answer to (1a) and give the structure of an optimal code when the frequencies are the ﬁrst n non-zero Pell numbers.

2. (30 pts) A good hash function h(x) behaves in practice very close to the uniform hashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to h(). Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing.

Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet Σ = [A,…,Z], and let the function f(xi) return the index of a letter xi ∈ Σ, e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x ∈ Σm, deﬁne h(x) = ([Pm i=1 f(xi)] mod `), where ` is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the ` buckets.

(a) The following list contains US Census derived last names: http://www2.census.gov/topics/genealogy/1990surnames/dist.all.last Using these names as input strings, ﬁrst choose a uniformly random 50% of these name strings and then hash them using h(x). Produce a histogram showing the corresponding distribution of hash locations when ` = 200. Label the axes of your ﬁgure. Brieﬂy describe what the ﬁgure shows about h(x), and justify your results in terms of the behavior of h(x). Do not forget to append your code. Hint: the raw ﬁle includes information other than name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table.

(b) Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal behavior of uniform hashing. (c) Produce a plot showing (i) the length of the longest chain (were we to use chaining for resolving collisions under h(x)) as a function of the number n of these strings that we hash into a table with ` = 200 buckets, (ii) the exact upper bound on the depth of a red-black tree with n items stored, and (iii) the length of the longest chain were we to use a uniform hash instead of h(x). Include a guide of cn Then, comment (i) on how much shorter the longest chain would be under a uniform hash than under h(x), and (ii) on the value of n at which the red-black tree becomes a more eﬃcient data structure than h(x) and separately a uniform hash.

3. (15 pts) Draco Malfoy is struggling with the problem of making change for n cents using the smallest number of coins. Malfoy has coin values of v1 < v2 < ··· < vr for r coins types, where each coin’s value vi is a positive integer. His goal is to obtain a set of counts{di}, one for each coin type, such thatPr i=1 di = k and where k is minimized. (a) A greedy algorithm for making change is the cashier’s algorithm, which all young wizards learn. Malfoy writes the following pseudocode on the whiteboard to illustrate it, where n is the amount of money to make change for and v is a vector of the coin denominations:

wizardChange(n,v,r) : d[1 .. r] = 0 // initial histogram of coin types in solution while n 0 { k = 1 while ( k < r and v[k] n ) { k++ } if k==r { return ’no solution’ } else { n = n – v[k] } } return d

Hermione snorts and says Malfoy’s code has bugs. Identify the bugs and explain why each would cause the algorithm to fail. (b) Sometimes the goblins at Gringotts Wizarding Bank run out of coins,1 and make change using whatever is left on hand. Identify a set of U.S. coin denominations for which the greedy algorithm does not yield an optimal solution. Justify your

1It’s a little known secret, but goblins like to eat the coins. It isn’t pretty for the coins, in the end

answer in terms of optimal substructure and the greedy-choice property. (The set should include a penny so that there is a solution for every value of n.) (c) (8 pts extra credit) On the advice of computer scientists, Gringotts has announced that they will be changing all wizard coin denominations into a new set of coins denominated in powers of c, i.e., denominations of c0,c1,…,c` for some integers c 1 and ` ≥ 1. (This will be done by a spell that will magically transmute old coins into new coins, before your very eyes.) Prove that the cashier’s algorithm will always yield an optimal solution in this case. Hint: ﬁrst consider the special case of c = 2.