In this post I present a simpler proof to derive expected asymptotic bounds for the

* Randomized Selection * algorithm --

Las Vegas version to be more precise. Given a set $S = \{ a_1\,a_2\,a_3\ldots a_n \}$ of keys and an integer $k \in [1,n]$ the

* selection problem * asks to report the $k^{th}$ largest key in the set $S$. Recall that this problem can be solved optimally using $O(n)$ comparison operations using the BFPRT (Blum, Floyd, Pratt, Rivest and Tarjan 1973) algorithm. However a correct implementation of BFPRT algorithm may take a lot of code. On the other hand randomized algorithms for many deterministic algorithms are often simpler to implement. The algorithm chooses a random pivot for partitioning the input and recursively picks on of the partition until the partition (sub problem) size is just one. Following is an implementation of

**RandSelect** in around 25 lines.

Our goal here is the analyze the

** expected run-time ** and

** expected number of steps ** for the problem size converge to

* unity * -- more precisely the expected value of the variable

**rand_walk_steps** in the following code. I have purposefully kept the analysis very rigorous to make it accessible to everyone.

TL;DR "Expected number of comparisons in

**RandSelect** is $\Theta(n)$ and expected number of

**rand_walk_steps** is $\Theta(\log(n))$".

// Randomized selection algorithm to pick k^th largest element
// in a mutable array.
template < typename T >
T RandSelect(T *array, size_t n, size_t k) {
assert(k > 0 && k <= n);
size_t problem_size = n, select = k;
// What is the expected value of rand_walk_steps ??
size_t rand_walk_steps = 0;
while (problem_size > 1) {
size_t i = 0, j = problem_size - 1;
size_t ridx = RandArrayIndex(problem_size);
const T pivot = array[ridx];
// Partition the sub problem with pivot.
while (i <= j) {
if (array[i] <= pivot) {
i++;
} else if (array[j] > pivot) {
j--;
} else {
std::swap(array[i], array[j]);
}
}
problem_size = (i >= select) ? i : problem_size - i;
array = (i >= select) ? array : &array[i];
select = (i >= select) ? select : select - i;
++rand_walk_steps;
}
return array[0];
}

Let $X_i$ be the random variable corresponding to problem size in $i^{th}$ step ($1 \geq i \geq n-1$) in the algorithm -- each step corresponds to one iteration of the outer most

**while** loop. If the algorithm terminates in $j$ steps then all $X_k$ with $k>j$ are set to $0$.

$$

\begin{array}{lll}

X_i &=& \text{Random variable for problem size in }i^{th}\text{ iteration of the outer most}\textsf{ while}\text{ loop} \\

X_i && \left\{ \begin{array}{lr} \in [1,X_{i-1}-1] & \textsf{If } X_{i-1} > 1 \\

= 0 & \textsf{Otherwise}

\end{array} \right. \\

X_0 &=& n \,\,\,\,\,\,\, \text{Initial problem size and }\,\,\, E[X_0] = n \\

&& \\

\end{array}

$$

** Lemma-1: ** If $X_{i-1} >1$ Then $E[X_i | X_{i-1}] = \frac{X_{i-1}}{2}$

* Proof: * For a given $X_{i-1} > 1$ the random variable $X_{i}$ is uniformly distributed in the range $[1,X_{i-1}-1]$. Hence the expected value is simply the average $\frac{1 + (X_{i-1}-1)}{2} = \frac{X_{i-1}}{2}$ $\Box$.

** Lemma-2: ** $E[X_i] = \frac{n}{2^i}$

* Proof: *

$$

\begin{array}{llll}

&E[X_i|X_{i-1}] &=& \frac{X_{i-1}}{2}\,\,\,\,\,\text{From Lemma-1}\\

&E[\,E[X_i|X_{i-1}]\,] &=& E[\frac{X_{i-1}}{2}] \\

\Rightarrow & E[X_i] &=& E[\frac{X_{i-1}}{2}] = \frac{E[X_{i-1}]}{2}\,\,\,\,\,\text{for any two random variables } X,Y\,\,E[E[X|Y]] = E[X]\\

&&=&\frac{1}{2}\times E[X_{i-1}] = \frac{1}{2^2}\times E[X_{i-2}] \ldots \,\,\,\,\, \text{Apply recursively} \\

&&\ldots& \\

&&=& \frac{1}{2^i}\times E[X_{i-i}] = \frac{E[X_0]}{2^i} = \frac{n}{2^i} \,\, \Box\\

\end{array}

$$

** Theorem-1: ** Expected number of comparisons done by `RandSelect` is $\Theta(n)$.

* Proof: * The number comparisons done by the algorithm in each iteration is $\leq 3\times X_i$. We at most $2$ comparisons in the inner ` if..else` branch and $1$ comparison to test $i <= j$, so a total of $3$ comparisons in the each iteration of the inner most `while` loop. The loop runs at most $X_i$ times in the $i^{th}$ iteration of the outer most ` while` loop. Finally we need $1$ comparison to test the termination of the `while` loop. Let $T$ be the random variable corresponding the total number of comparisons in the algorithm. So we need to find $E[T]$.

$$

\begin{array}{ll}

&T \leq 3(X_0 + X_1 + X_2 + \ldots X_{n-1}) + 1 & \text{Total number of comparisons in the Algorithm. }\\

&&\\

\Rightarrow& \sum_{i=0}^{n-1} X_i \leq T \leq 3\sum_{i=0}^{n-1} X_{i} + 1 & \\

& \sum_{i=0}^{n-1} E[X_i] \leq E[T] \leq 3\sum_{i=0}^{n-1} E[X_{i}] + 1 & \text{Linearity of expectation.} \\

&&\\

& \sum_{i=0}^{n-1} \frac{n}{2^i} \leq E[T] \leq 3\sum_{i=0}^{n-1} \frac{n}{2^i} + n & \text{Apply Lemma-2.} \\

& 2n\left(1-\frac{1}{2^n}\right) \leq E[T] \leq 6n\left(1-\frac{1}{2^n}\right) +n & \text{Use: } \sum_{i=0}^{n-1}\frac{1}{2^i} = \frac{1-1/2^n}{1-1/2}.\\

&&\\

\Rightarrow &n \leq E[T] < 6n + 1 = 6n +1 & \text{Use: for } n\geq 1\,\,\,,\frac{1}{2} \leq \left(1-\frac{1}{2^n}\right) < 1 \\
&&\\
\Rightarrow & E[T] = \Theta(n)\,\,\Box & \\
\end{array}
$$

We now return the question of the expected number of steps for the problem size to reach unity (i.e. expected number of iterations of the outer most

`while` loop). This is also referred as the expected number of

*recursive* calls in Motwani and Raghavan's book. However the following analysis is much simpler than the application of a calculus based lemma based on random walks in the book. Let $Y_{i}$ be a random variable corresponding to the number of iterations starting with a problem size of $X_{i} > 1$.

$$

\begin{array}{lll}

Y_{i} &:=&\text{random variable for the number of steps with problem size }\,\, X_{i} >1 \\

Y_i &=& 1 + Y_{i+1} \,\,\,\,\,\, \text{(Recursive formulation)}\\

Y = Y_0 &=& \underbrace{1 + 1 + 1 \dots + 1}_{k \text{-times}} + Y_{k} \,\,\,\,\,\, \text{(Expand } k \text{ times)}\\

&& \\

\end{array}

$$

** Theorem-2: ** Expected number of steps for the problem size to reach unity, $E[Y] = \Theta(\log(n))$.

* Proof: *

$$

\begin{array}{ll}

&Y_i \leq X_{i}\,\,\,\, \text{ (Number of steps are at most problem size at } i^{th} \text{ step which is } X_i)\\

& \\

\Rightarrow &E[Y_i] \leq E[X_i] \,\,\,\, \text{( Let } Z = X_i-Y_i\,\,\,\,,\,\,\,\, E[Z] = E[X_i]-E[Y_i] \geq 0 \text{)}\\

&E[Y_i] \leq E[X_i] = \frac{n}{2^i} \,\,\,\, \text{ (Apply Lemma-2)} \\

&Y = Y_0 = \underbrace{1 + 1 + 1 \dots + 1}_{t \text{-times}} + Y_{t} \\

\Rightarrow & E[Y] = \underbrace{E[1] + E[1] \dots + E[1]}_{t \text{-times}} + E[Y_{t}] = t + E[Y_{t}] \\

&E[Y] = t + E[Y_{t}] \leq t + E[X_{t}] \leq t + \frac{n}{2^t} \\

&\\

\Rightarrow &t \leq E[Y] \leq t + \frac{n}{2^t} \,\,\,\, \text{After } t \text{ steps }\\

& \\

&\text{When } t = \log_2(n) \text{ the above inequality becomes the following } \\

& \log_2(n) \leq E[Y] \leq \log_2(n) + \frac{n}{2^{log_2(n)}} = \log_2(n) +1 \\

\Rightarrow & E[Y] =\Theta(log(n)) \,\, \Box \\

\end{array}

$$

This completes our expected analysis of the

**RandSelect** algorithm. The analysis just uses elementary probability theory without relying on a Calculus based random walk theorem in the

* "Randomized Algorithms" * book by Motwani and Raghavan -- (

**Theorem-1.3 ** on page-15).