Flashsort

Flashsort is a distribution sorting algorithm showing linear computational complexity for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.[1]

Concept

Flashsort is an efficient in-place implementation of bucket sort. It assigns each of the the n input elements to one of m buckets, efficiently rearranges the input to place the buckets in the correct order, then sorts each bucket. The original algorithm sorts an input array A as follows:

  1. Using a first pass over the input or a priori knowledge, find the minimum and maximum sort keys.
  2. Linearly divide the range [Amin, Amax] into m buckets.
  3. Make one pass over the input, counting the number of elements Ai which fall into each bucket. (Neubert calls the buckets "classes" and the mapping of elements to classes "classification".)
  4. Convert the counts of elements in each bucket to a prefix sum, where Lj is the number of elements Ai in bucket j or less. (L0 = 0 and Lm = n.)
  5. Rearrange the input to all elements of each bucket j are stored in positions Ai where Lj1 < iLj.
  6. Sort each bucket using insertion sort.

Steps 13 and 6 are common to any bucket sort, and can be improved using techniques generic to bucket sorts. In particular, the goal is for the buckets to be of approximately equal size (n/m elements each),[1] with the ideal being division into m quantiles. If the input distribution is known to be non-uniform, a non-linear division will more closely approximate this ideal. Likewise, the final sort can use any of a number of techniques, including a recursive flash sort.

The significant contribution of flash sort is step 5: an efficient O(n) in-place algorithm for collecting the elements of each bucket together in the correct relative order using only m words of additional memory.

Memory efficient implementation

To execute flashsort with its low memory benefits, the algorithm does not use additional data structures to store the classes. Instead it stores the upper bounds of each class on the input array in an auxiliary vector . These upper bounds are obtained by counting the number of elements in each class, and the upper bound of a class is the number of elements in that class and every class before it. These bounds serve as pointers into the classes.

Classification is implemented through a series of cycles, where a cycle-leader is taken from the input array and its class is calculated. The pointers in vector are used to insert the cycle-leader into the correct class, and the class's pointer in is decremented after each insertion. Inserting the cycle-leader will evict another element from array , which will be classified and inserted into another location and so on. The cycle terminates when an element is inserted into the cycle-leader's starting location.

An element is a valid cycle-leader if it has not yet been classified. As the algorithm iterates on array , previously classified elements are skipped and unclassified elements are used to initiate new cycles. It is possible to discern whether an element has been classified or not without using additional tags: if an element has been classified, its index is greater than the upper bound of its class in . Based on this, we can find all unclassified elements in time total, by keeping a single pointer which initially points to the beginning of and gradually moves to the right, until an unclassified element is found. This unclassified element is identified by being at an index lower than or equal to the upper bound of its class. This element then becomes the ring leader, a ring permutation is performed, and is incremented. This process is repeated until reaches the end of , at which point all elements are classified.[1][2]

Performance

The only extra memory requirements are the auxiliary vector for storing class bounds and the constant number of other variables used.

In the ideal case of a balanced data set, each class will be approximately the same size. If the number of classes is linear in the input size , each class has a constant size, so sorting a single class has complexity . The running time of the final insertion sort is . In the worst-case scenarios where almost all the elements are in a few or one class, the complexity of the algorithm is limited by the performance of the final-step sorting method. For insertion sort, this is . Variations of the algorithm improve worst-case performance by using better-performing sorts such as quicksort or recursive flashsort on classes that exceed a certain size limit.[2][3]

Choosing a value for , the number of classes, trades off time spent classifying elements (high ) and time spent in the final insertion sort step (low ).

Memory-wise, flashsort avoids the overhead needed to store classes in the very similar bucket sort. For with uniformly distributed random data, flashsort is faster than heapsort for all and faster than quicksort for . It becomes about as twice as fast as quicksort at .[1]

Due to the in situ permutation that flashsort performs in its classification process, flashsort is not stable. If stability is required, it is possible to use a second array so elements can be classified sequentially. However, in this case, the algorithm will require space.

See also

References

  1. Neubert, Karl-Dietrich (February 1998). "The Flashsort1 Algorithm". Dr. Dobb's Journal. 23 (2): 123–125, 131. Retrieved 2007-11-06.
  2. Neubert, Karl-Dietrich (1998). "The FlashSort Algorithm". Retrieved 2007-11-06.
  3. Xiao, Li; Zhang, Xiaodong; Kubricht, Stefan A. (2000). "Improving Memory Performance of Sorting Algorithms: Cache-Effective Quicksort". ACM Journal of Experimental Algorithmics. 5. CiteSeerX 10.1.1.43.736. doi:10.1145/351827.384245. Archived from the original on 2007-11-02. Retrieved 2007-11-06.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.