# Sorting

## Goals

• Discover how sorting algorithms work at an intuitive level.
• Learn the general categories of sorting algorithms.
• Be able to make intelligent decisions when choosing sorting algorithms.
• Understand and use the `Comparable<T>` interface for natural order comparisons in Java.

## Concepts

• bubble sort
• bucket sort
• comparison sort
• counting sort
• heap
• heap sort
• in place
• insertion sort
• merge sort
• natural ordering
• pivot
• recursion
• quicksort
• searching
• selection sort
• sorting
• space complexity
• stability
• stable
• time complexity

## Lesson

In your first encounter with algorithms in this course, you learned that finding things and ordering things are two common uses for algorithms. You've already implemented an algorithm for searching that combines with a special data structure called a hash table. You also looked at another technique called “binary search” which required that the items first be placed in order. Now you will learn more about sorting techniques for placing items in some defined order, so that the data may be used with other algorithms such as binary search or simply because the user prefers results to be shown in some order.

Computer scientists have spent decades researching efficient ways of sorting items with little memory. This was particularly important in the early days of computing, because processors had limited power and computer had little memory. With more and more data being produced, efficiently searching and sorting data remains important. Although most programming languages already contain libraries for sorting simple lists, studying the different categories of sorting algorithms helps understand the development and evaluation of algorithms in general. Moreover some sorting techniques are better applied to some types of data than others, and each comes with its own benefits and limitations. As with searching, you'll discover that sometimes certain sorting algorithms require or at least prefer the information to be stored in certain data structures to be most effective.

### Evaluating Algorithms

If there has been so much research into sorting algorithms, why can't we just pick the best one and always use it? There is no “perfect” sorting algorithm because each technique has ways in which it excels, each one usually has a downside. One algorithm might be extremely fast, however, but might only work with certain types of data. Other algorithms might use large amounts of memory. And sometimes an algorithm may perform better in most cases but in some cases may perform worse than others if the input data is arranged in a certain way. (Nevertheless some sorting algorithms have become very popular in general-purpose libraries, explained later in the Quicksort section.)

There are several different aspects that allow us to compare algorithms.

time complexity
How much longer does an algorithm take when presented with more data? This is the order of grown you studied in the initial lesson on algorithms. You learned that often what concerns us most is the worst case running time of algorithms, but with sorting algorithms the analysis become more nuanced: there are some sorting algorithms with horrible worst case performance, but that in the average case performs pretty well.
space complexity
How much more space does the algorithm use? Some sorting algorithms make their changes in place, modifying the original data directly. Other algorithms require copying the data, resulting doubling or more the memory used.
stability
Does the algorithm maintain the original relative order of the items? It might seem odd to ask if a sorting algorithm maintains an original order, because sorting by definition is meant to change the order. But you if the key on which you are searching has duplicates, the items grouped together may or not be in their original order. Only a stable sorting algorithm will maintain the same relative order for grouped items. Suppose you had a list of flights in order of landing at some airport. You might sort the flights by the airline name, the landing times of each airline might be shuffled if the algorithm were not stable.

### O(n2) Sorting Algorithms

#### Bubble Sort

One of the most famous sorting algorithms is also one of the easiest to understand. The algorithm is conceptually two steps:

1. Pass through the items two at a time, for each pair switching them if they are out of order.
2. Repeat step #1 until nothing gets switched during a pass.

If an item is out of order, each pass will slowly move it to the right—it will “bubble up” to the top, which is how the algorithm gets its name: the bubble sort. In fact each pass results in the largest item moving all the way to the right. In the next pass, the next-largest item is moved to the second-to-top position, and so on. Thus the bubble sort performs two different activities: a comparison and a swap.

In a worst case scenario, in which the original items are in reverse order, the first pass will move the first (largest) item all the way to the right. Put another way, the first pass moves the item basically `n` positions. Then the second pass moves the next item almost `n` positions. In rough terms, we have to move around n positions for each of the `n` items, or `n × n = n2` movements altogether.

In a worse case therefore the bubble sort has O(n2) time complexity. You will remember from the algorithms discussion on big O notation, O(n2) time is not very fast, to put it mildly. Fortunately there are more efficient algorithms than bubble sort.

#### Selection Sort

The selection sort is another approach that is simple to understand.

1. Look through the list and find the smallest item.
2. Swap it with the first item.
3. Repeat steps #1 and #2 with each next-to-smallest item until you reach the end of the array.

This doesn't make the algorithm any faster than bubble sort. In fact it makes things worse: because it repeatedly searches for each next-to-smallest item, it will always look through the list n2 times even when the list is already sorted! In other words, selection sort has O(n2) running time even in the best case scenario, when nothing needs to be changed.

#### Insertion Sort

The insertion sort at first appears similar to the selection sort, but it is more efficient. Rather than searching for the next smallest element, it simply takes the next element and moves it down to the correct position.

2. Find the first previous item that is smaller than the item and insert the current item after that item. The first time performing these steps there will be no previous items at all.
3. Go to the next item and repeat steps #1 and #2.

This algorithm moves through the array, taking each item and moving it down (effectively inserting it) to its correct position. At the end of each step, all items to the left of the current position will be sorted.

The benefit of the insertion sort is that it has less work to do if the list is already partially sorted. In fact if the list is already sorted, insertion sort performs no changes, going through the list a single time, resulting in O(n) performance! Nevertheless in general, for random unsorted data, insertion sort has an order of growth of O(n2) like selection sort and bubble sort.

### O(n log n) Sorting Algorithms

#### Merge Sort

A lot of the overhead of the O(n2) algorithms involve comparing an item over and over with different items, and moving an item back and forth, one step at a time, over long distances. We need some way to cut down on the comparisons.

What if you had two arrays that were already sorted. Would you be able to put them together into a single, sorted array? That wouldn't be too difficult: you'd simply look at each array and take the smallest available from each array until both arrays were finished.

1. Create an array big enough to hold all the values.
2. Keep an index into both input arrays and into the output array.
3. See which value is smaller in one of the input arrays, and copy it to the output array.
4. Continue repeating step #3 until both arrays have been copied.

You should easily see that the time complexity of this merge is only O(n), which is pretty fast. The algorithm needs to iterate over all the arrays a single time, at the same time.

Even though it is easy and fast to merge two sorted arrays, you may wonder how to get the two sorted input arrays to begin with. After all, sorting arrays is the problem we started out with! But first think of a couple of possibilities. If you were given the following array, with a single element (value `38` at index `0`)? Would it be easy to sort?

0
`38`

A single value is already sorted; there is nothing to do. You are done before you start. What about two values: `38` and `27`?

0 1
`38` `27`

That array is easy to sort as well. Simply compare the two values, and if they are out of order, switch the two values as you did in the bubble sort. Then you are finished. What about four values: `38`, `27`, `43`, `3`?

0 1 2 3
`38` `27` `43` `3`

Before you start deciding which sort algorithm to use, remember that this array of four values can be separated into two arrays of two values each:

0 1
`38` `27`
0 1
`43` `3` Graph of O(1), O(log n), O(n), O(n log n), and O(n2) running times.

By now you already know how to sort an array of two items, so that is simple enough. Then you will have two sorted arrays. But how do you get them back together? You probably already remember at the beginning of this section that we already created a method for merging two sorted arrays.

In the lesson on algorithms you learned about a general technique called divide and conquer. By dividing up the problem into pieces, it can be more efficient than working on the whole. The merge sort uses this approach to continually divide up the array into pieces, sort them, and then merge the pieces together. You should recognize that this sort of approach lends itself well to the technique of recursion, in which a method continually calls itself with each smaller divided piece.

1. Divide the array in half.
2. For each piece, divide that piece in half again using step #1.
3. When you reach a piece that is one or two elements long, stop dividing and switch those values if needed to make them sorted.
4. Merge the sorted piece with the other half, which has now been sorted as well using the same approach.
5. Keep merging the sorted pieces using step #4 until you arrive back at the full size of the original array, which is now sorted!

Dividing up the problem into smaller pieces drastically cuts down on the number of comparisons need to be made. It also lowers the number of times needed to move an item into its correct place relative to the others. The binary search algorithm, which also uses divide and conquer, was able to reduce the time to O(log n) because each division effectively discarded the rest of the data. Unfortunately the merge sort can't discard data (it is sorting, not searching), and each merge step takes a separate amount of time. Nevertheless, the merge sort is able to significantly reduce its running time in comparison with the bubble sort, resulting in an order of grown called O(n log n). This means O(n × log n), which is slower than O(log n) but still much faster than O(n2)!

Merge sort is stable; for any keys that are equal, their original order is maintained. Merge sort's biggest drawback is that it does not sort in place. Rather it requires values to be copied to some other location, sorted, and then merged back into the original array. A well written merge sort implementation can make this work with a single array the same size of the original array, but still this requires double the memory than would be required by an in-place sorting algorithm. For small data sets this will not be a problem, but if the amount of data to be sorted already takes up most of the memory, it might not be feasible allocate a complete second copy just to use merge sort.

#### Quicksort

The divide-and-conquer approach works well for making searching more efficient than O(n2). But the merge sort uses a lot of memory and results in a lot of copying. The quicksort algorithm manages to recursively sort an array in place, without making a copy of the array.

1. Choose some item in the array and designate it as the pivot. The algorithm will work using any value as the pivot, although some pivots perform better than others.
2. Swap elements as needed so that all items below the pivot are less than the pivot, and all items above the pivot are greater than the pivot. The technique described here varies slightly from the technique shown in the diagram, but the result is the same.
1. Find the first item greater than the pivot.
2. Find the last item greater than the pivot.
3. Swap the two items.
4. Keep searching towards the center of list until there are no more items to swap.
5. Swap the pivot with the center item; the pivot is now in its sorted order, but the items on each side still need to be sorted.
3. Recursively repeat steps #1 and #2 with the subarray on each side of the pivot.

A crucial factor in the quicksort algorithm is the choice of the pivot. Ideally, to make the most of “divide and conquer”, one would want to “divide” as much as possible on each recursive pass. That is, you would want to choose a pivot that winds up close to the middle of the array. Knowing ahead of time which item would be in the middle of the array is impossible, because that would require knowing how the values will be sorted! If nothing is known about the data, some quicksort algorithms choose a value at random to be the pivot. You can improve your chances at getting a pivot near the middle if you look at the first, last, and middle value of each segment and choose the median of the three.

The quicksort algorithm is clever and is usually pretty fast. Its average order of growth is O(n log n). The problem with comes with its worse case, which degrades to O(n2). Making matters worse, if a pivot is chosen on the end, quicksort exhibits worst-case performance for already-sorted data! Compare this to the insertion sort algorithm, which most of the time performs with O(n2) complexity, but for already sorted data provides a speedy O(n) performance.

Thus you can see that three algorithms we have discussed have trade-offs:

1. Insertion sort will give slow performance of O(n2) most of the time, but if the data is almost sorted already it can give an amazing O(n).
2. Merge sort consistently gives O(n log n) performance, but it takes a lot of memory.
3. Quicksort will perform faster than the other two most of the time, but if the data is mostly sorted already it can deteriorate into a horrible O(n2) performance.

#### Heap Sort

One other O(n log n) sorting algorithm is heap sort. You saw in the lesson on trees that placing items into a binary search tree and then using a depth-first, in-order traversal would iterate over the items in sorted order. A heap sort uses a special type of tree (usually a binary tree) called a heap.

The heap can be efficiently represented as a sequence of array elements instead of actual node objects with references. Sorting is performed in place in the array by manipulating the order of the items based upon the tree structure the array represents. Heap sort is usually not quite as fast as quicksort, but it doesn't degrade to O(n2) performance as quicksort does.

### Sorting Algorithms Faster than O(n log n)

The sorting algorithms you have seen so far are examples of comparison sorts, algorithms that work by comparing two numbers at a time. Because numbers logically have to be compared with each other a certain number of times, it turns out that O(n log n) is the fastest theoretical time possible for a comparison sort. Although by luck some input data might sort faster, such as almost-sorted data with insertion sort, but there is no way to improve the order of growth on average with arbitrary data. Some comparison sorts will still be faster than others depending on the type of data, but will still exhibit O(n log n) time.

#### Counting Sort

It is however possible to sort certain types of data very quickly if you do not need to compare the actual values with each other. Suppose that you know in advance that all the values to be sorted fall in the range `0``999`. Remembering that array lookup is a constant time operation, rather than comparing the numbers to each other, you simply use the number as an index into some separate array. The numbers inherently have an order, even without comparing them to each other. And array indexes in an array are also inherently in order.

The counting sort uses a separate “counting” array to record the presence of each number as it is encountered. Because it is assumed numbers can appear more than once, rather than recording a `true` mark in the counting array, a total count of each number is kept. After going through the original array once and marking the presence of each number, the counting array is used to rewrite the original array with the numbers in order.

1. Create a separate “counting” array with enough positions to hold the highest largest as an index.
2. Look at each value in the original array, and use it as an index into the counting array to increase the total count for that number.
3. Step through the counting array and rewrite the values into the original array, now in order.

You can see that the counting sort goes through two loops separately. The first loop depends on the number of items which you can refer to as n. The second loop depends on the highest value supported, in this example 999, which you can refer to as k. Thus the running time is O(n + k). If the highest value supported is close to the number of values to sort, the result is close to O(n) time, which is extremely fast for sorting!

#### Bucket Sort

The memory required for counting sort depends on the highest value supported. Supporting very high values would require creating a counting array with the same number of indexes. Supporting values up to 1,000,000 for example would require a counting array of a million indexes, each holding an `int` requiring four bytes, for a total of four megabytes!

You might have noticed that the counting sort technique of uses values as indexes in an array resembles the approach used by hash tables. You recall that hash tables would have the same memory problem with large hash values; hash tables mitigate this problem by placing the grouping hash codes into “buckets”. Because this would result in hash collisions, each bucket contains a list of values which will need to be used to distinguish values with the same hash code.

The bucket sort uses the same approach. Each value is mapped to some bucket, using the modulus operator used in hash tables for example. A list is used to keep separate counts for the separate values in the buckets. Each bucket list itself will have to be sorted one of the other sorting techniques such as insertion sort.

Overall the efficiency of the bucket sort is similar to the O(n + k) of the counting sort, except that less memory is used and higher values can be sorted. With a well chosen number of buckets the running time can even approach O(n + k), which depending on the highest value supported can be close to O(n) time.

### `Comparable<T>`

The first two categories of sorting algorithms, those taking O(n2) and O(n log n), work by comparison of two values being sorted. But more needs to be said about that it means to “compare” two values. The examples so far used numbers, the primitive `int` type, which has a natural ordering—the order you would expect when counting with numbers for example. The natural order for primitive numbers is built into the JVM, which is why the less than `<` and greater than `>` operators work for them.

But how does one indicate the natural ordering of objects such as `Vehicle` or `Animal`? Java comes with an interface `java.lang.Comparable<T>` to be used with non-primitive types, that is classes, that should have a natural ordering. Its method `Comparable.compareTo​(T o)` compares any instance of the object with another instance of the object. It returns an `int` indicating which object is greater than the other. Interestingly the actual magnitude of the returned value is not important; what is important is the sign. If a negative value is returned, the first object is less than the second. If a positive value is returned, the first object is greater than the second. If zero is returned, the objects are considered equal for purposes of their natural ordering.

Imagine that a class `PlayerScore` encapsulates the score of someone who has just played a video game. You might want to sort the scores to show the top-scoring players on the screen. To have a natural ordering, the class would need to implement `Comparable<PlayerScore>`, and in the implementation of `compareTo(final PlayerScore playerScore)` compare the object's score with the score of the given player.

Now a general-purpose sorting method can sort a sequence of any objects with natural ordering, not just primitive numbers such as those stored in `int`. The Java standard library, for example, has a `java.util.Arrays` containing general utilities for working with arrays. The `Arrays.sort​(Object[] a)` method will sort any array of objects that have a natural ordering. By now you should realize that this means each object should implement `Comparable<T>`, and should all be comparable to each other. Rather than using the less than `<` and greater than `>` operators, the `Arrays.sort​(Object[] a)` method calls `Comparable.compareTo​(T o)`. For example, instead of comparing `object1 < object2`, the method compares `object1.compareTo(object2) < 0`.

## Review

### Summary

Algorithm Time Complexity Stable In Place
Bubble Sort O(n2) yes yes
Selection Sort O(n2) no yes
Insertion Sort O(n2) Best case O(n) for partially sorted data. yes yes
Merge Sort O(n log n) yes no
Quicksort O(n log n) Very fast in practice. Worst case O(n2). no yes
Heap Sort O(n log n) no yes
Counting Sort O(n + k) yes no
Bucket Sort O(n + k) yes no

### Gotchas

• Not every type has a natural ordering that makes sense. If you want to sort some items that have no inherent ordering, you may not want to implement `Comparable<T>` and instead use a sorting strategy, explained in a future lesson.
• Don't return the difference of two integers inside `Comparable.compareTo(…)`, as this can result in an integer overflow.
• What does it mean for a `Comparable<T>` implementation to be “inconsistent with equals”? What problems can this cause?

### In the Real World

Most of the time the built-in sorting algorithm for your programming language library has been tuned and will work fine for general sorting. Normally this algorithm winds up being a variant of quicksort or merge sort. In certain circumstances you may need to verify whether the algorithm is stable, if that is important to your use case. Otherwise, it is rare to implement sorting algorithms from scratch nowadays. However understanding the considerations provide important background when approaching larger data sets, especially those that may not all fit in memory and/or require several levels of sorting.

Think back to the supplier addresses in an early lesson regarding furniture and toy suppliers. One `Address` class instance might represent “123 Some Street”, while another `Address` instance might represent “456 Some Street”. You might say that the `Address` class then has a natural ordering, because obviously “123 Some Street” comes before “456 Some Street”.

But what about other streets? Is the address “123 Some Street” considered the same as “123 Other Street”? You may want to have two levels of comparison: first to sort the street names, and then if the street names are equal (i.e. you would otherwise have returned `0` for the comparison), compare the actual address number on the street.

### Self Evaluation

• If most programming languages already contain libraries for sorting, why is it useful to study sorting algorithms?
• What is one of the biggest drawback of the merge sort? What is one of its benefits, besides its quick running time?
• The performance of merge sort depends on what factor, and how does this relate to the design of hash tables?
• What is the crucial choice that needs to be made that will determine how quicksort will perform?
• Which general type of sorting algorithm does Java use for sorting numbers in its core libraries? What about objects? Why?

When you created a binary search tree in a previous lesson, you had no way to make it work generally for objects. You defined its comparisons in terms of publication date, naming it `PublicationBinarySearchTree` and putting it in the `booker` project. But now you have a way to make comparisons for objects in general, as long as they implement `Comparable<T>`.

• Improve your publication hierarchy to have a natural order based upon publication date—the same date that you are currently using for comparison in the `PublicationBinarySearchTree`.
• Your publication interface will extend `Comparable<T>` interface.
• Specify in the the publication API documentation that implementations must make comparisons based upon the date. This is part of the publication interface contract.
• Because you are not overriding `equals()`, note that the comparison implementation is inconsistent with equals.
• Refactor your `PublicationBinarySearchTree` to make comparisons using any instance of `Comparable<T>` instead of just `Publication`.
• Remove “Publication” from the name of the interface and its implementation.
• Move the tree interface and implementation to the `datastruct` project to be with your other data structures such as linked lists and hash tables. Your unit tests that test the tree with `Publication` will of course need to stay in the `booker` project.
• Add unit tests that show the binary search tree works with `Integer` and `Long` instances, which also implement `Comparable<T>`.
• Print out your publications in order of publication date using your official user publication display methods. Currently you print out the publications from the application's linked list. Now you will sort the publications before printing them. You are already printing the publications using `toString()` by traversing the binary search tree; you can leave this for now. This will result in publications being printed twice: once in user display format, and the second time using the `toString()` format.
2. Sort the publications using `Arrays.sort​(Object[] a)`. This works because your publications now implement `Comparable<T>`.