Design and Analysis of Algorithms: Divide and Conquer

Introduction

What is this?

Divide the problem into a number of sub problems that are smaller instances of the same problem.
Conquer the sub problems by solving them recursively. If the subproblem sizes are small enough, however, just solve the subproblems in a straightforward manner.
Combine the solutions for the sub problems into the solution for the original problem.

Introduction to divide-and-conquer

Recurrences

Merge sort recurrence:

We "solve" these by finding a closed-form equation that describes the recurrence but without recursion.
Solution: T(n) = Θ(n lg n)

Methods:

Substitution method: Guess a solution and then use induction to prove it.
Recursive-tree method: Convert the recurrence into a tree whose nodes represent costs incurred at each level.
Master method:
Solves recurrences of the form:
T(n) = aT(n / b) + f(n)
where a ≥ 1, b > 1.

Technicalities
We often omit floors, ceilings, and boundary conditions. For instance, if n is odd, we may say n / 2 anyway.

The maximum-subarray problem

Only makes since in an array with both negative and positive values: otherwise the answer is either the whole array or the maximum member.

Brute-force solution

Try every combination of two elements!
A n choose 2 problem, so order of Ω(n²).
n choose 2 will be about 1/2 n², since it equals n(n - 1) / 2. So we can establish a lower bound by setting c = 1/3, for instance, and n choose 2 will always be bounded from below by c*n².

A transformation

We loon at the problem differently: let's find the nonempty, contiguous subarray of our array whose values have the largest sum. We call this the maximum subarray.

A solution using divide-and-conquer

To solve this problem, we divide an array A into three subarrays, and ask what is the maximum subarray in each:

From A[low] to A[midpoint - 1].
Crossing the mid-point.
From A[midpoint + 1] to A[high]

Problems 1 and 3 are simply this same problem on a smaller array! Problem 2 can be solved by finding the maximum subarrays in low-to-mid and in mid+1-to-high.

The recurrence is the same as for merge sort.

Maximum sub-array video

Strassen's algorithm for matrix multiplication

Recursive Square-Matrix Multiply

We divide each of our initial matrices into four sub-matrices, and multiply them. Which we do by dividing each of them into four...

In the base case when each matrix has only one member, we just multiply them and return the result.

So what is our recurrence? Each step except the base case multiplies eight matrices of size n / 2. So they contribute 8T(n / 2) to running time. There are also four matrix additions of matrices containing n² / 4 entries -- squared because n specifies an n x n matrix. So this contributes Θ(n²) time.

So our recurrence is:

The master method will show us that the solution to this recurrence is:
T(n) = Θ(n³)

Strassen's Algorithm

By adding ten additions, we can cut the divide portion of our algorithm down to seven multiplications instead of eight.

Let's try one!

Here is the method:

For two matrices:

Define:

P₁ = A(F - H)
P₂ = H(A + B)
P₃ = E(C + D)
P₄ = D(G - E)
P₅ = (A + D) * (E + H)
P₆ = (B - D) * (G + H)
P₇ = (A - C) * (E + F)

Then:

So let's try this example:

Important Lesson

There are often serious trade-offs between set-up time and aymptotic run-time. One must carefully consider how large one's inputs are likely to be before opting for a complex algorithm like Strassen's. On modern hardware optimized for matrix multiplication, matrix sizes often need to be in the thousands before Strassen's algorithm yields significant gains.

The substitution method for solving recurrences

Note: I am presenting these three methods in my notes in textbook order. But in lectures, I present the substitution method last, because we can best make sense of our "guess" for a solution if we understand the other two methods first. I suggest students tackle recursion-tree, then master method, and then substitution.

Towers of Hanoi

The monks in a temple have the job of moving all of the disks on one peg to another, constrained by these rules:
1) Only one disk can be moved at a time.
2) Each move consists of taking the upper disk from one of the stacks and placing it on top of another stack i.e. a disk can only be moved if it is the uppermost disk on a stack.
3) No disk may be placed on top of a smaller disk.
( Source )

Recurrence:

Why is this the recurrence? Well, to move disk n, we first move disks 1 to n - 1 to the spare peg, then move n to the target peg, then move disks 1 to n - 1 to the target peg.

So we guess the closed-form solution is something like 2ⁿ. Why? Well, we multiply by a factor of 2 each recursion!
Now, let's try writing out a few elements of the sequence:
T(0) = 0
T(1) = 2*0 + 1 = 1
T(2) = 2*1 + 1 = 3
T(3) = 2*3 + 1 = 7
T(4) = 2*7 + 1 = 15
T(5) = 2*15 + 1 = 31
So is the answer 2ⁿ - 1?
Base case: T(0) = 0 = 2⁰ - 1.
Yes!
Now induction: we want to show that, if T(n - 1) = 2^{(n - 1)} - 1, then T(n) will equal 2ⁿ - 1.
How proof by induction works: we have proved our base case. Now we try to prove that for any n, if for n - 1 (or sometimes all smaller n) our hypothesis is true, then it is true for n as well. And since we have already proved that for n = 0 it is true, that will mean it will be true for all n whatsoever.
So we substitute in for n - 1:
T(n) = 2(2^{(n - 1)} - 1) + 1
= 2 * 2^{(n - 1)} - 2 + 1
= 2ⁿ - 1
And we are done!

Another substitution-method problem

Let's look at the recurrence:
T(n) = T(n/3) + T(2n/3) + n
T(0) = 1
T(1) = 1

This does not have the form the master method requires. And if we sketch a recursion tree, not every node is the same on a level, so it is different from what we usually deal with there. But if we do diagram a recursion tree, we will see that the work looks constant at each level, like a master theorem case 2. And since the function part of the equation is f(n) = n, let's "guess":
T(n) <= cn log n

But that is just our hunch: we have to prove it!

Let's look at base cases for our inductive proof. Use floors for divisions! Is:

T(0) <= c(0 * log 0) --> nonsense! (There is no log 0.)
T(1) <= c(1 * log 1) --> false! (There is no such c.)
T(2) <= c(2 * log 2) --> true!
T(2) = T(floor(2/3)) + T(floor(4/3)) + 2 = 1 + 1 + 2 <= c(2 * log 2) = c(2 * 1)
So if we set c = 2 (or greater) the inequality holds.

How many base cases do we need to examine? We will see!
But we can prove it for any given "small" n > 2 by setting:
c ≥ T(n) / n log n

Recursion step:
We assume that for k where:
2 <= k < n
the claim is true.
Now, we need to show that if for sub-problems smaller than n the claim is true, then it is true for n.

The recursion-tree method for solving recurrences

There are two ways to use this method:

As a way to generate a guess for the substitution method.
As a way to generate a rigorous answer by itself.

Analyze the tree:

Calculate the work at each level:

This produces the geometric series:

If we set a = n² and r = 1/2, then we have the general sum of a converging geometric series:

So the solution here is O(n²). The amount of work at each level is reduced by a power of two, and so is just a constant factor times the root.

Consider these three examples:

T(n) = 4T(n/2) + cn
T(n)) = 2T(n/2) + cn
T(n) = 2T(n/2) + cn²
(This is the recurrence in the diagram above.)

(We assume c > 0.)
Let's break down these cases:

T(n) = 4T(n/2) + cn¹
Level	# Nodes	Work at Node	Work at Level
0	1	n	n
1	4	n/2	2n
2	16	n/4	4n
3	64	n/8	8n
i	4ⁱ	n/2ⁱ	2ⁱn
h = log₂n	4^h	T(1)	4^hT(1)

The runtime then is:

h = log₂n
so the first part equals:
4^log₂n = n^log₂4
We pull out the n from the sum and it is an increasing geometric series that evaluates to n - 1. So the closed form for the recurrence is:
n²T(1) + n(n - 1)

The very last level dominates, as it already has O(n²) complexity.

T(n)) = 2T(n/2) + cn¹
Level	# Nodes	Equ. for Node	Work
0	1	n	cn
1	2	n/2	cn
2	4	n/4	cn
3	8	n/8	cn
i	2ⁱ	n/2ⁱ	cn
h = log₂n	2^h	T(1)	2^hT(1)

And so we get:

2^h = n.
The sum happens log n times, so we have cn * log n.
n + n log n
n log n dominates.

All levels contributed equally.

T(n) = 2T(n/2) + cn²
Level	# Nodes	Equ. for Node	Work
0	1	n²	n²
1	2	(n/2)²	n²/2
2	4	(n/4)²	n²/4
3	8	(n/8)²	n²/8
i	2ⁱ	(n/2ⁱ)²	n²/2ⁱ
h = log₂n	2^h	T(1)	2^hT(1)

The runtime then is:

We pull out the n² from the sum, and we get a geometric series. Obviously, n² dominates. And it is the same as the top level.

These all have the form: T(n) = aT(n / b) + f(n).
When we work these out, we see that in our 3 cases:

The recursion is the dominant factor, and the bottom level has the Big-O complexity of the whole recurrence.
The recursion and f(n) contribute equally to the run time, and all levels are equal.
f(n) dominates, and the top level has the Big-O complexity of the whole recurrence.

This observation leads us to:

The master method for solving recurrences

Form: T(n) = aT(n / b) + f(n).
Where a ≥ 1 and b > 1 and f(n) asymptotically positive.

Three cases
Compare n^log_ba and f(n):

f(n) = O(n^log_ba-ε), ε > 0
Solution: T(n) = Θ(n^log_ba)
f(n) = Θ(n^log_ba)
Solution: T(n) = Θ(n^log_ba lg n)
f(n) = Ω(n^log_ba+ε), ε > 0
Solution: T(n) = Θ(f(n))

Restrictions:

a ≥ 1
We must divide the problem into at least one sub-problem.
The method will work if a is not an integer, although this doesn't really make sense: how can we have 1.5 sub-problems?
b > 1
If we don't get smaller sub-problems, we will never finish! (n/b < n)
Regularity condition for case 3: af(n/b) ≤ cf(n)

Notice the gap between the red and blue lines:

The "polynomial line"

Consider:
T(n) = 2T(n/2) + n log n
n log n grows faster than n, but not by a polynomial difference:
log n is asymptotically < n^ε for any ε > 0

A spreadsheet where you can see this for yourself.

Detailed Example:
T(n) = 4T(n/2) + (n²/log n)

The first part gets us n².
(n²/log n) = O(n²)

(n²/log n) / n² = 1 / log n
which goes to 0 as n goes to infinity.

But is it polynomially larger than (n²/log n), so that we are in case 1? Or, is there an ε we can subtract from 2 so that n^2-ε is still an upper bound for (n²/log n)?
Is f(n) = O(n^log_ba-ε) ?

Let's look at the ratio:
(n²/log n) / n^2-ε

That yields:
n^ε / log n → ∞ as n → ∞.

Sample problems:

T(n) = 2T(n / 4) + n²
T(n) = 4T(n / 2) + n²
T(n) = 20T(n / 4) + n
T(n) = 3ⁿ(n / 3) + n³
T(n) = 8T(n / 2) + 2ⁿ

Source Code

Java
Ruby
Go
Javascript
C++
Python
Clojure

For Further Study

MIT: Intro to Divide-and-Conquer

Homework

Implement the Towers of Hanoi in a programming language of your choice. Put in test code to count the number of moves. Print out the number of moves after you have solved the puzzle. Run the code and show that we have solved the recurrence correctly. What you hand in should be your source code plus several runs showing your results.
Implement Strassen's method in a programming language of your choice.
A well-known recurrence is the Fibonacci sequence. Please find the closed form for it, showing how you found it. (You can easily look up the form online, but please try to solve this yourself.)
Find tight asymptotic bounds for the following recurrence:

Credits

Recursion-tree diagrams: http://www.cs.cornell.edu/courses/cs3110/2012sp/lectures/lec20-master/lec20.html