Algorithm Analysis

Algorithm Analysis measure the speed of the task algorithm suppose to accomplished. Here the necessary maths back ground to analyse the algorithmic complexity.

CONTENTS



There are two basic rules in the Algorithm analysis:

Rule of sum: if an action is performed making A choices or B choices, then it can be performed \(A+B\) ways.

Rule of product: If an action can be performed by making A choices followed by B choices, then it can be performed \(A\times B\) ways.

Permutation: an ordered list where every object appears exactly once

\[
_{n}P_{r} = \frac {n!} {(n - r)!} \text{ where } !0 = 1
\]

combinations: When order is not important and the repetition is not allowed, the number of ways to choose k from the distinct n is as for \(n \ge k \ge 0\):

\[
{n \choose k }= \frac {n!}{ k!(n-k)! }
\]

in general, choosing k out of n is same as not choose n-k out of n.

\[
{n \choose k} = {n \choose n - k}
\]

According to the Pascal's Triangle:

\[
{n \choose k} = {n -1 \choose k} + {n -1 \choose k-1}
\]

Again from the pascal triangle:

\[
{n \choose 0}+{n \choose 1}+ {n \choose 2}+ ... + {n \choose n} = 2^{n}\\
\sum^n_{k=0}{n \choose k} = 2^{n}
\]

ordered repeating
sequence \(n^k\) yes yes
Permutation \(_{n}P_{r}\) yes no
Multisubset \(\left(\!\!{n\choose k}\!\!\right)\) no yes
combination \(_{n}C_{k}\) or \(n \choose k\) no no

It is sufficent to implement \(n!\).: the programming construct is recursion because \( n !=n(n-1)!\).

12 -fold way

This is the mathematical basis for the Algorithm Analysis.
Here D - distinct and I - identical. In the following table, if \(x \gt b\) then all the values of the at most are zeros. if \(b \gt x\), then all the values of the at least are zeros.

A B any at most at least
D D \(b^{a}\) \(_{b}P_{a}\) \(S(a,b)!b\)
I D \(\binom {a+b-1} {a}\) \(_{b}C_{a}\) \(x - 1 \choose b-1\)
D I \(\sum ^{b}_{k=1} S(a,k)\) 1 if \(a \le b\) else 0 \(S(a,b)\)
I I \(\sum ^{b}_{k=1} p_{k}(a)\) 1 if \(a \le b\) else 0 \(p_{b}(a)\)

In the above table, Stirling numbers of the second kind \(S(a,b)\) count the number of ways to partition a set of a elements into b nonempty subsets. Stirling numbers of the first kind \(s(a,b)\) count permutations of a objects with exactly b cycles.

In the above table, \(p_{b}(a)\) is the number of ways to partition the number a as the sum of b positive numbers since the order don't matter. For \(1 \lt k \lt n\):

\[
p_{k}(n) =p_{k-1}(n-1) + p_{k}(n-k)
\]

Multi-choosing \(\left(\!\!{n\choose k}\!\!\right)\) is the number of ways to choose k objects from a set of n objects where order is not important but repetition is allowed.

\[
\left(\!\!{n\choose k}\!\!\right)={{n+k-1}\choose{k}}
\]

Simplification,

\[
\sum^m_{k=0}{\left(\!\!{n\choose k}\!\!\right)} = {\left(\!\!{n+1\choose m}\!\!\right)}
\]

Multinomial theorem where \(a, b,..,c > 0 \text{ and } a+b+ ... +c = n\)

\[
(x+y+...+z)^n = \sum {n \choose a,b,..c}x^ay^b...z^c
\]

Principle of Inclusion-Exclusion

\[
\left\vert{A\cup B \cup C}\right\vert = \left\vert{A}\right\vert + \left\vert{B}\right\vert + \left\vert{C}\right\vert - \left\vert{A \cap B}\right\vert - \left\vert{A \cap C}\right\vert - \left\vert{B \cap C}\right\vert \\+ \left\vert{A \cap B \cap C}\right\vert
\]

Formula for the sterling number:

\[
S(a,b) = \frac {1}{b!} \sum ^{b}_{j=0} (-1) ^j{y\choose j} (4-j)^a
\]

Java

There are two choices for the intermediate represetiation

  1. portable machine language
  2. graph based

Initially Java was created only with byte code interpreter. But current Java has Just In Time (JIT) compiler.

There are two popular representations:

  1. Stack based - 0 operand
  2. Register based - 3 operand

Java uses Stack based representation.

If you consider the for single loop in Java

for (int i =0; i < N; i++){
  ...
}

The running time cost of the above loop is \(N\) means linear. The running time cost of the double loop is quadratic. Triple loop running time is cubic.The worst running time can be something like \(2^{N}\) exponential.

Order of the Growth

In the Algorithm Analysis, fortunately there are limited models to consider. Here the growth from best running time to worst:

  1. \(\log N\)
  2. \(N\) (Knuth shuffle)
  3. \(N \log N\) (Mergesort, QuickSort)
  4. \(N^{2}\) (selection =\(N^{2}/2\), insertion = 1/4 \(N^{2}\) )
  5. \(N^{3}\)
  6. \(2^{N}\)

Each instance of the java.util.ArrayList has the capacity, when reach to the capacity, the array need to be increased. Assume, each time ArrayList is double when it reach to the capacity then the equation is

\[
2+4+8+16 +...+N\sim 3N
\]

This is because ArrayList is using java array of Object as a implementation. Advantages are that every operation takes constant time of Amortised time( Average running time per operation over the worst case sequence of operations) and less wasted space compared to linked list implementation because linked list is based on object.

Algorithm Analysis of Sorting

There java.lang.Comparable<T> and java.util.Comparator<T> is based on the total order which is a binary relation that satisfy:

  • Antisymmetry: if \(v \leq w\) and \(w \leq v\) , then \(v = w\)
  • Transitivity: if \(v \leq w\) and \(w \leq x\) , then \(v \leq x\)
  • Totality: either \(v \leq w\) or \(w \leq v\) or both

Quicksort is little bit faster than Mergesort because Quicksort doesn't exchange the elements always. However, quick sort wort case running time is quadratic (\(N^{2}\)) and average case is \(1.39 N \log N\). Quciksort random shuffle is the probabilistic guarantee to avoid worst case.

The java.utils.Arrays sort() method is using Quicksort for the primitives and Mergesort for the objects.

Dijkstart 3-way partitioning is the way to compromise with the duplicate keys because lower bound is reduced linearithmic to linear for most of the applications as follows:

\[
\lg\left( \dfrac {N!}{x_{1}!x_{2}!\ldots x_{n}!}\right) \sim \sum ^{n}_{i=1}x_{i}\lg\dfrac {x_{i}}{N}
\]

Algorithm Worst Average Best Remarks
selection \(N^{2}/2\) \(N^{2}/2\) \(N^{2}/2\) N exchanges
insertions \(N^{2}/2\) \(N^{2}/4\) \(N\) small N or partially
shell ? ? N tight code
merge \(N \lg N\) \(N \lg N\) \(N \lg N\) \(N \lg N\) guaranteed
quick \(N^{2}/2\) \(2N \lg N\) \(N \lg N\) \(N \lg N\) probabilistic guarantee
3-way quick \(N^{2}/2\) \(2N \lg N\) \(N\) support duplicate keys
Heapsort \(2N \lg N\) \(2N \lg N\) \(N \lg N\) In-place algo. Inner loop is longer than quicksort. Poor use of cache memory. Not stable.

Algorithm Analysis of Searching

Binary Search Tree (BST) is a binary tree(BT) in symmetric order. BT can be either empty or two disjoint binary trees at left and right(left/right nodes can be null).

In the BST every node has a key:

  • Larger than all keys in its left subtree
  • smaller than all keys in its right subtree

For the N number of distinct values in random order, the number of comparisons are \(2 \ln N\).

Red-Black trees

The java.util.TreeMap is based on the Red-black tree. Here some of the characteristics of the Red-Black tree:

  • Represents 2-3 tree as
  • use internal left-leaning link to glue three nodes which is red link
  • no node has two red links
  • every path from root to null link there are same number of black links.

Appendix

quadratic

\[
1+2+\ldots +N\Rightarrow \sum ^{n}_{i=1}i\sim \int ^{n}_{x=1}xdx\sim \dfrac {1}{2}N^{2}
\]

logarithmic ( \(N \ln N\) is called linearithmic)

\[
1+1/2+1/3+\ldots +1/N\Rightarrow \sum ^{.N}_{i=1}\dfrac {1}{i}\sim \int ^{N}_{x=1}\dfrac {1}{x}dx=\ln N
\]

cubic ( \(N^{2}\) is called quadratic)

\[
\sum ^{N}_{i=1}\sum ^{N}_{j=i}\sum ^{N}_{k=j}1\sim \int ^{N}_{x=1}\int ^{N}_{y=x}\int ^{N}_{z=y}dzdydx\sim
\dfrac {1}{6}N^{3}
\]

Comments

Popular posts from this blog

How To: GitHub projects in Spring Tool Suite

Spring 3 Part 7: Spring with Databases

Parse the namespace based XML using Python