Jan 16. k-Selection algorithm


Given A = \{ a_1, a_2, \ldots , a_n \} find the k^{th} largest element.

A quicksort like approach:  Pick a pivot p and partition into two arrays A_1 = \{ a \mid a \le p \} and A_2 = \{ a \mid a > p \}. If k \le |A_1| then recursively find the k^{th} largest element in A_1, otherwise recursively find k-|A_1|^{th} largest element in A_2. If |A_1| = i and |A_2| = n - i-1. Then the recurrence relation becomes T(n) = T(\max\{i,n-i-1\}) + O(n). We have the following two cases:

  1. p is arbitrary: In the worst case i could be as small as 0. Then we have T(n) = T(n-1) + O(n) and so T(n) = \Theta(n^2).
  2. p is picked randomly: Let ET(n) denote the expected time taken on the list of size n. Then ET(n) = ET(\max\{i,n-i-1\}) + c_1 n for some constant c_1 > 0. We observe that ET(\max\{i,n-i-1\}) = \frac{2}{n} (T(n-1) + T(n-2) + T(n-3) + \cdots + T(\frac{n}{2})). To solve the recurrence, we guess that ET(n) \le cn for some constant c>0. We claim that choosing c = 4c_1 suffices. Plugging this into the recurrence, we get ET(n) \le 4c_1\frac{2}{n}(\frac{n^2}{4}+\frac{n}{4}(\frac{n}{2}-1)) + c_1n \le 4c_1n. Thus ET(n) = O(n).

A better choice of pivot:

  1. Divide the list into groups of 5 elements
  2. Find the median for each group of 5 elements, let them be b_1, b_2, \cdots, b_{n/5}.
  3. Find median of b_1, \cdots, b_{n/5} recursively; let it be m.
  4. Use m as a pivot to partition a_1, a_2, \cdots, a_n
  5. Recurse on appropriate side

Let T(n) be the time to run k-select on a list of size n. Then step 3 takes T(n/5), while we claim in the worst case the size of the largest partition is at most \frac{7n}{10}. This is because the number of elements smaller than m are all the b_i'ssuch that b_i \le m as well as all elements in the list smaller than each of these b_i's. Thus the number of elements smaller than m is atleast \frac{3n}{10} and so the number of elements larger than m is atmost \frac{7n}{10}. The recurrence relation is therefore T(n) = T(\frac{n}{5}) + T(\frac{7n}{10}) + c_1n. Let us guess T(n) = cn for some constant c>0. Thus setting c=10c_1 suffices,  giving us a running time of T(n) = \Theta(n).

One Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s