Design and Analysis of Algorithms: Divide and Conquer

Introduction

What is this?

Introduction to divide-and-conquer
Recurrences

Merge sort recurrence:



We "solve" these by finding a closed-form equation that describes the recurrence but without recursion.
Solution: T(n) = Θ(n lg n)

Methods:


Technicalities
We often omit floors, ceilings, and boundary conditions. For instance, if n is odd, we may say n / 2 anyway.

The maximum-subarray problem



Only makes since in an array with both negative and positive values: otherwise the answer is either the whole array or the maximum member.

Brute-force solution

Try every combination of two elements!
A n choose 2 problem, so order of Ω(n2).
n choose 2 will be about 1/2 n2, since it equals n(n - 1) / 2. So we can establish a lower bound by setting c = 1/3, for instance, and n choose 2 will always be bounded from below by c*n2.

A transformation

We loon at the problem differently: let's find the nonempty, contiguous subarray of our array whose values have the largest sum. We call this the maximum subarray.

A solution using divide-and-conquer

To solve this problem, we divide an array A into three subarrays, and ask what is the maximum subarray in each:

  1. From A[low] to A[midpoint - 1].
  2. Crossing the mid-point.
  3. From A[midpoint + 1] to A[high]

Problems 1 and 3 are simply this same problem on a smaller array! Problem 2 can be solved by finding the maximum subarrays in low-to-mid and in mid+1-to-high.

The recurrence is the same as for merge sort.

Maximum sub-array video
Strassen's algorithm for matrix multiplication
Recursive Square-Matrix Multiply

We divide each of our initial matrices into four sub-matrices, and multiply them. Which we do by dividing each of them into four...

In the base case when each matrix has only one member, we just multiply them and return the result.

So what is our recurrence? Each step except the base case multiplies eight matrices of size n / 2. So they contribute 8T(n / 2) to running time. There are also four matrix additions of matrices containing n2 / 4 entries -- squared because n specifies an n x n matrix. So this contributes Θ(n2) time.

So our recurrence is:



The master method will show us that the solution to this recurrence is:
T(n) = Θ(n3)

Strassen's Algorithm



By adding ten additions, we can cut the divide portion of our algorithm down to seven multiplications instead of eight.

Let's try one!

Here is the method:

For two matrices:



Define:

P1 = A(F - H)
P2 = H(A + B)
P3 = E(C + D)
P4 = D(G - E)
P5 = (A + D) * (E + H)
P6 = (B - D) * (G + H)
P7 = (A - C) * (E + F)

Then:



So let's try this example:



Important Lesson

There are often serious trade-offs between set-up time and aymptotic run-time. One must carefully consider how large one's inputs are likely to be before opting for a complex algorithm like Strassen's. On modern hardware optimized for matrix multiplication, matrix sizes often need to be in the thousands before Strassen's algorithm yields significant gains.

The substitution method for solving recurrences

Note: I am presenting these three methods in my notes in textbook order. But in lectures, I present the substitution method last, because we can best make sense of our "guess" for a solution if we understand the other two methods first. I suggest students tackle recursion-tree, then master method, and then substitution.

Towers of Hanoi



The monks in a temple have the job of moving all of the disks on one peg to another, constrained by these rules:
1) Only one disk can be moved at a time.
2) Each move consists of taking the upper disk from one of the stacks and placing it on top of another stack i.e. a disk can only be moved if it is the uppermost disk on a stack.
3) No disk may be placed on top of a smaller disk.
( Source )



Recurrence:



Why is this the recurrence? Well, to move disk n, we first move disks 1 to n - 1 to the spare peg, then move n to the target peg, then move disks 1 to n - 1 to the target peg.

So we guess the closed-form solution is something like 2n. Why? Well, we multiply by a factor of 2 each recursion!
Now, let's try writing out a few elements of the sequence:
T(0) = 0
T(1) = 2*0 + 1 = 1
T(2) = 2*1 + 1 = 3
T(3) = 2*3 + 1 = 7
T(4) = 2*7 + 1 = 15
T(5) = 2*15 + 1 = 31
So is the answer 2n - 1?
Base case: T(0) = 0 = 20 - 1.
Yes!
Now induction: we want to show that, if T(n - 1) = 2(n - 1) - 1, then T(n) will equal 2n - 1.
How proof by induction works: we have proved our base case. Now we try to prove that for any n, if for n - 1 (or sometimes all smaller n) our hypothesis is true, then it is true for n as well. And since we have already proved that for n = 0 it is true, that will mean it will be true for all n whatsoever.
So we substitute in for n - 1:
T(n) = 2(2(n - 1) - 1) + 1
= 2 * 2(n - 1) - 2 + 1
= 2n - 1
And we are done!

Another substitution-method problem

Let's look at the recurrence:
T(n) = T(n/3) + T(2n/3) + n
T(0) = 1
T(1) = 1

This does not have the form the master method requires. And if we sketch a recursion tree, not every node is the same on a level, so it is different from what we usually deal with there. But if we do diagram a recursion tree, we will see that the work looks constant at each level, like a master theorem case 2. And since the function part of the equation is f(n) = n, let's "guess":
T(n) <= cn log n

But that is just our hunch: we have to prove it!

Let's look at base cases for our inductive proof. Use floors for divisions! Is:

How many base cases do we need to examine? We will see!
But we can prove it for any given "small" n > 2 by setting:
c ≥ T(n) / n log n

Recursion step:
We assume that for k where:
2 <= k < n
the claim is true.
Now, we need to show that if for sub-problems smaller than n the claim is true, then it is true for n.

The recursion-tree method for solving recurrences

There are two ways to use this method:

  1. As a way to generate a guess for the substitution method.
  2. As a way to generate a rigorous answer by itself.

Analyze the tree:



Calculate the work at each level:



This produces the geometric series:



If we set a = n2 and r = 1/2, then we have the general sum of a converging geometric series:



So the solution here is O(n2). The amount of work at each level is reduced by a power of two, and so is just a constant factor times the root.

Consider these three examples:

  1. T(n) = 4T(n/2) + cn
  2. T(n)) = 2T(n/2) + cn
  3. T(n) = 2T(n/2) + cn2
    (This is the recurrence in the diagram above.)

(We assume c > 0.)
Let's break down these cases:

T(n) = 4T(n/2) + cn1
Level # Nodes Work at
Node
Work at
Level
0 1 n n
1 4 n/2 2n
2 16 n/4 4n
3 64 n/8 8n
i 4i n/2i 2in
h =
log2n
4h T(1) 4hT(1)

The runtime then is:

h = log2n
so the first part equals:
4log2n = nlog24
We pull out the n from the sum and it is an increasing geometric series that evaluates to n - 1. So the closed form for the recurrence is:
n2T(1) + n(n - 1)

The very last level dominates, as it already has O(n2) complexity.


T(n)) = 2T(n/2) + cn1
Level # Nodes Equ. for
Node
Work
0 1 n cn
1 2 n/2 cn
2 4 n/4 cn
3 8 n/8 cn
i 2i n/2i cn
h =
log2n
2h T(1) 2hT(1)

And so we get:

2h = n.
The sum happens log n times, so we have cn * log n.
n + n log n
n log n dominates.

All levels contributed equally.


T(n) = 2T(n/2) + cn2
Level # Nodes Equ. for
Node
Work
0 1 n2 n2
1 2 (n/2)2 n2/2
2 4 (n/4)2 n2/4
3 8 (n/8)2 n2/8
i 2i (n/2i)2 n2/2i
h =
log2n
2h T(1) 2hT(1)

The runtime then is:

We pull out the n2 from the sum, and we get a geometric series. Obviously, n2 dominates. And it is the same as the top level.


These all have the form: T(n) = aT(n / b) + f(n).
When we work these out, we see that in our 3 cases:

  1. The recursion is the dominant factor, and the bottom level has the Big-O complexity of the whole recurrence.
  2. The recursion and f(n) contribute equally to the run time, and all levels are equal.
  3. f(n) dominates, and the top level has the Big-O complexity of the whole recurrence.

This observation leads us to:

The master method for solving recurrences

Form: T(n) = aT(n / b) + f(n).
Where a ≥ 1 and b > 1 and f(n) asymptotically positive.

Three cases
Compare nlogba and f(n):

  1. f(n) = O(nlogba-ε), ε > 0
    Solution: T(n) = Θ(nlogba)
  2. f(n) = Θ(nlogba)
    Solution: T(n) = Θ(nlogba lg n)
  3. f(n) = Ω(nlogba+ε), ε > 0
    Solution: T(n) = Θ(f(n))

Restrictions:

  1. a ≥ 1
    We must divide the problem into at least one sub-problem.
    The method will work if a is not an integer, although this doesn't really make sense: how can we have 1.5 sub-problems?
  2. b > 1
    If we don't get smaller sub-problems, we will never finish! (n/b < n)
  3. Regularity condition for case 3: af(n/b) ≤ cf(n)

    Notice the gap between the red and blue lines:
    The "polynomial line"
    Consider:
    T(n) = 2T(n/2) + n log n
    n log n grows faster than n, but not by a polynomial difference:
    log n is asymptotically < nε for any ε > 0
    A spreadsheet where you can see this for yourself.


    Detailed Example:
    T(n) = 4T(n/2) + (n2/log n)

    The first part gets us n2.
    (n2/log n) = O(n2)

    (n2/log n) / n2 = 1 / log n
    which goes to 0 as n goes to infinity.

    But is it polynomially larger than (n2/log n), so that we are in case 1? Or, is there an ε we can subtract from 2 so that n2-ε is still an upper bound for (n2/log n)?
    Is f(n) = O(nlogba-ε) ?

    Let's look at the ratio:
    (n2/log n) / n2-ε

    That yields:
    nε / log n → ∞ as n → ∞.


Sample problems:

  1. T(n) = 2T(n / 4) + n2
  2. T(n) = 4T(n / 2) + n2
  3. T(n) = 20T(n / 4) + n
  4. T(n) = 3n(n / 3) + n3
  5. T(n) = 8T(n / 2) + 2n
Source Code

Java
Ruby
Go
Javascript
C++
Python
Clojure

For Further Study
Homework
  1. Implement the Towers of Hanoi in a programming language of your choice. Put in test code to count the number of moves. Print out the number of moves after you have solved the puzzle. Run the code and show that we have solved the recurrence correctly. What you hand in should be your source code plus several runs showing your results.
  2. Implement Strassen's method in a programming language of your choice.
  3. A well-known recurrence is the Fibonacci sequence. Please find the closed form for it, showing how you found it. (You can easily look up the form online, but please try to solve this yourself.)
  4. Find tight asymptotic bounds for the following recurrence:

Credits