CS2040S: Data Structures and Algorithms

Problem Set 2

Overview. This problem set consists of two problems closely connected to binary search.

The first is an easy optimization problem: you are given an array, and your goal is to find the

largest element in the array.

The second is a more challenging load balancing problem: you have a collection of tasks and a

collection of processors, and your goal is to efficiently assign a sequence of tasks to each processor.

Collaboration Policy. You are encouraged to work with other students on solving these problems. However, you must write up your solution by yourself. We may, randomly, ask you

questions about your solution, and if you cannot answer them, we will assume you have cheated. In

addition, when you write up your solution, you must list the names of every collaborator, that is,

every other person that you talked to about the problem (even if you only discussed it briefly). You

can do so in Coursemology by adding a Comment. Any deviation from this policy will be considered

cheating, and will be punished severely, including referral to the NUS Board of Discipline.

1

Problem 1. Optimization

Here’s a simple problem—find the maximum element in an array! Implement the following function:

public static int searchMax(int[] dataArray)

Given an array dataArray where every element is unique, return the value of the largest integer in

the array.

There is one additional piece of information you need: the data in the array increases (or decreases)

from one end until it reaches its maximum (or minimum), and then decreases (or increases) until

it reaches the other end. The change in direction occurs at most once in the array.

More precisely, if the data array has size n, there is some index j such that the subarray dataArray[0..j]

is sorted (either increasing or decreasing) and the subarray dataArray[j..n-1] is sorted (either

increasing or decreasing).

For example, the following are possible input arrays:

[2, 5, 7, 9, 15, 23, 8, 6]

[73, 42, 13, 5, -17, -324]

[100, 42, 17, 3, 8, 92, 234]

Use the most efficient algorithm you can. State the running time of your algorithm.

To think about (Optional):

1. What is the best way to signal an error, e.g., if the data array is empty?

2. What if all the elements in the array are not unique? How efficiently can the problem be

solved then? What is the worst-case example?

3. This array changes direction once. What if it changes direction twice? Is the problem much

harder?

4. What if the array contains noisy data, so some of the entries in the array had +1 or −1 noise

randomly added/subtracted from their initial value (with probability 1/2 in each direction).

Can you solve the problem in that case? What would the expected running time be?

2

Problem 2. Balancing the Load

Your boss, Paul R. Allel, storms into the office, shouting, “The system is too slow! Our users are

sending me hate-tweets! Make it faster!” And then he stormed out again. What is he upset about?

Well, right now, every day, your system churns through calculating dense matrix inverses for secret

government agencies. And recently, there is so much work that one server just can’t keep up. So

you decide to parallelize: by buying more servers, you should be able to perform the computation

faster!

Today, you have m jobs, stored sequentially on disk, in order. Each job has a size: jobSizes[i] is

the size of job i.

You have p processors that can perform these jobs. Each processor can be assigned a consecutive

sequence of jobs, i.e., processor 2 might be assigned jobs [4, 5, 6, 7]. (For efficiency reasons1

, we do

not wants gaps: it would not be legal to assign processor 2 the jobs [4, 5, 7, 8].) Here, for example,

we have 10 tasks assigned to three processors:

0 1 2 3 4 5 6 7 8 9

p1 p2 p3

The total load of a processor is the sum of the job sizes, e.g., in the above case the load of processor

2 is:

jobSizes[4] + jobSizes[5] + jobSizes[6] + jobSizes[7]

Your goal in this problem is to find an assignment of jobs to processors in a manner that minimizes

the maximum load on any processor (that is, consider the processor with the highest load — we

want the load on this processor to be as small as possible). In fact, for today, we will just compute

the minimum possible load.

Problem 2.a. The first step is to figure out, given a specific target load, whether p processors

is sufficient. Design and implement the most efficient algorithm you can think of for the following

function:

static boolean isFeasibleLoad(int jobSizes[], int queryLoad, int p)

Your function should return true if it is possible to schedule the jobs on p processors in such a way

that no processor has more load than queryLoad. State the running time of your algorithm.

1This is obviously an artificial constraint that we have added to make the problem tractable. See part (c), below.

3

Problem 2.b. The second step is to determine the minimum achievable load. Suppose we have

5 processors and jobs with the following sizes:

[1, 3, 5, 7, 9, 11, 10, 8, 6, 4]

To ensure that the maximum load on any one processor is as small as possible, we can assign the

jobs to the 5 processors this way:

[1, 3, 5, 7] [9] [11] [10] [8, 6, 4]

Notice that the maximum load on any processor is 18 — that’s the load on the last processor (8 + 6

+ 4). With only 5 processors, this is the minimum achievable maximum load on any one processor,

we can’t go lower than this!

Design and implement the most efficient algorithm you can think of for the following function:

static int findLoad(int jobSizes[], int p)

The function should return the minimum possible load for the given jobSizes and p. State the

running time of your algorithm.

Problem 2.c. Optional, theory. In fact, the constraint that each processor can only handle

a consecutive set of jobs simplifies the problem, but in many practical systems is not necessary.

Unfortunately, even for p = 2, the problem is NP-hard if you allow an arbitrary assignment of

jobs to processors. However, there is a very simple algorithm that guarantees that no processor is

assigned more than twice the minimum possible load. (That is, it computes a 2-approximation of

optimal.) Give a simple algorithm that guarantees a 2-approximation, and prove that it is correct.

Hint 1: Think about a greedy algorithm that assigns each job to the least loaded processor.

Hint 2: Notice two facts about the “optimal” solution: its maximum load has to be at least as big

as the biggest individual job, and its maximum load has to be at least as big as the sum of the job

sizes divided by p.

4