# Jan 16. k-Selection algorithm

k-Selection

Given $A = \{ a_1, a_2, \ldots , a_n \}$ find the $k^{th}$ largest element.

A quicksort like approach:  Pick a pivot $p$ and partition into two arrays $A_1 = \{ a \mid a \le p \}$ and $A_2 = \{ a \mid a > p \}$. If $k \le |A_1|$ then recursively find the $k^{th}$ largest element in $A_1$, otherwise recursively find $k-|A_1|^{th}$ largest element in $A_2$. If $|A_1| = i$ and $|A_2| = n - i-1$. Then the recurrence relation becomes $T(n) = T(\max\{i,n-i-1\}) + O(n)$. We have the following two cases:

1. $p$ is arbitrary: In the worst case $i$ could be as small as $0$. Then we have $T(n) = T(n-1) + O(n)$ and so $T(n) = \Theta(n^2)$.
2. $p$ is picked randomly: Let $ET(n)$ denote the expected time taken on the list of size $n$. Then $ET(n) = ET(\max\{i,n-i-1\}) + c_1 n$ for some constant $c_1 > 0$. We observe that $ET(\max\{i,n-i-1\})$ $= \frac{2}{n} (T(n-1) + T(n-2) + T(n-3) + \cdots + T(\frac{n}{2}))$. To solve the recurrence, we guess that $ET(n) \le cn$ for some constant $c>0$. We claim that choosing $c = 4c_1$ suffices. Plugging this into the recurrence, we get $ET(n) \le$ $4c_1\frac{2}{n}(\frac{n^2}{4}+\frac{n}{4}(\frac{n}{2}-1)) + c_1n$ $\le 4c_1n$. Thus $ET(n) = O(n)$.

A better choice of pivot:

1. Divide the list into groups of $5$ elements
2. Find the median for each group of $5$ elements, let them be $b_1, b_2, \cdots, b_{n/5}$.
3. Find median of $b_1, \cdots, b_{n/5}$ recursively; let it be $m$.
4. Use $m$ as a pivot to partition $a_1, a_2, \cdots, a_n$
5. Recurse on appropriate side

Let $T(n)$ be the time to run k-select on a list of size $n$. Then step $3$ takes $T(n/5)$, while we claim in the worst case the size of the largest partition is at most $\frac{7n}{10}$. This is because the number of elements smaller than $m$ are all the $b_i's$such that $b_i \le m$ as well as all elements in the list smaller than each of these $b_i's$. Thus the number of elements smaller than $m$ is atleast $\frac{3n}{10}$ and so the number of elements larger than $m$ is atmost $\frac{7n}{10}$. The recurrence relation is therefore $T(n) = T(\frac{n}{5}) + T(\frac{7n}{10}) + c_1n$. Let us guess $T(n) = cn$ for some constant $c>0$. Thus setting $c=10c_1$ suffices,  giving us a running time of $T(n) = \Theta(n)$.