## Lecture 34 - 04/13

### Sorting III

#### Improving Quicksort: Tony Hoare’s In-place Partitioning Scheme

• Partition by having two pointers that walk towards each other and meet at middle:
• Left pointer loves small items, hates large items
• Right pointer loves large items, hates small items
• The pointers walk towards each other, stopping whenever either comes across a hated item (too large/small depending on the pointer)
• When both pointers have stopped, they both find an element they hate
• Swap these two elements, and advance pointers by one
• When pointers cross, we are done walking, and have created our left and right partitions for quicksort
• Result is that things on left are “small” and things on the right are “large”
• This gives us a very fast quicksort!

#### Quick Select

• Say we have an array of N$N$ items, and we want to find the kth$k^{th}$ smallest item (k=N2$k=\frac{N}2$ is equivalent to finding median)
• How do we do this? Many trivial ways to do involving Θ(NlogN)$\Theta(N\log N)$ time, but how to do in linear time?
• Solution: Quick Select algorithm (history note: precursor to Quick Select was an algorithm called Median of Medians)
• Quick select algorithm:
• Goal is to find the median
• We will use the fact that for an array of size N$N$, the sorted array should have it's median at position N2$\frac{N}2$
• We use quicksort-like partitioning: choose a pivot, partition, and compare the pivot's index to the N2$\frac{N}2$ index
• If the pivot index is less than N2$\frac{N}2$, ignore the left/small subpartition of the partitioning (pruning this whole left bit from the quick select), and repeat on the right/larger subpartition
• Vice versa done for when the pivot index is larger than N2$\frac{N}2$
• We keep doing this pivoting and pruning cycle, all the while comparing pivot location to the N2$\frac{N}2$ index, until our pivot lands at the N2$\frac{N}2$ index. Then, this is our median
• Time complexity:
• Worst case: the input is in sorted order, and our quick select algorithm takes O(N2)$O(N^2)$ time
• Average case: Θ(N)$\Theta(N)$ time
• Can improve the runtime using same techniques as discussed before for improving quicksort (randomization)

#### Sorting Stability

• Let's say we have a list of (age, name) pairs, and the list is ordered alphabetically by name
• If we sort the list by age, we will have several instances where multiple people have the same age (multiple equivalent items according to our sorting order rule)
• Stability question: will equivalent-age pairs be in alphabetical order (among the sublist of pairs of the same age, will the ordering of this sublist preserve alphabetical order?)
• Generalized stability question: is the order of equivalent items in a sort preserved?
• Insertion and merge sorts are stable, as equivalent items never move past their equivalent brethren
• Heapsort is not stable
• Quicksort is stable depending on the partitioning algorithm:
• Unstable partitioning algorithms, such as Hoare partitioning, are faster
• Stable partitioning algorithms still give Θ(NlogN)$\Theta(N\log N)$ expected runtime, but with higher constants

#### Optimizing Sorts

• When a subproblem reaches size 15$15$ or smaller, switch over to insertion sort (this is fastest for small subproblem size)
• Adaptive sorting: exploit the existing order in an array (insertion sort, SmoothSort, TimSort)
• Note: TimSort, a hybrid of insertion and merge sort, is the sort used in Python and Java
• We can also sort faster by restricting the number of keys (for example, only sorting elements of four possible values)
##### Sorting in Java
• In Java, we call Java's own sorting algorithm via Arrays.sort(someArray)
• This sort is a TimSort if someArray consists of Objects, and is a quicksort if it consists of primitives

#### Array Shuffling

• Shuffling arrays is useful for things such as improving quicksort runtime
• Easiset way is to generate N$N$ random numbers for our N$N$ elements, and assign one to each item. We then sort the items by their attached random number