What is Flash Sort?

The flash SortNeubert, Karl-Dietrich. 1997. “Flash-Sort: Sorting by in situ permutation.” euroFORTH 97. algorithm was published by Karl-Dietrich Neubert in 1998. It’s an in-place sorting algorithm that promises linear time complexity for evenly distributed data. It’s an efficient way to implement the bucket sort. It creates buckets (the author calls them classes) and rearranges all elements according to buckets. Lastly, it sorts each bucket.

Algorithm steps

The algorithm works through the following steps:

  1. Find minimum and maximum values.

  2. Linearly divide the array into mm buckets.

  3. Count the number of elements LbL_b for each bucket bb.

  4. Convert the counts of each bucket into the accumulative sum.

  5. Rearrange all the elements of each bucket bb in a position AiA_i where Lb1<i<LbL_{b-1} < i < L_b.

  6. Sort each bucket using the insertion sort.

Note: The original algorithm states to divide the range into mm buckets in step 2. In this Answer, we divide the array into mm buckets for some obvious reasons. The range of the array can vary and be higher than the length of the array. Dividing the array into mm buckets will always make buckets less than the array length, which guarantees the space complexity will be lower than O(n)O(n).

Play the following slides to visualize the working of the flash sort algorithm:

canvasAnimation-image
1 of 22

Code example

Let’s implement the code of flash sort in the following playground:

def flash_sort(array):
n = len(array)
# Step 1: Find mininum and maximum values
min_value, max_value = array[0], array[0]
for i in range(1, n):
if array[i] > max_value: max_value = array[i]
if array[i] < min_value: min_value = array[i]
if min_value == max_value: return
# Step 2: Divide array into m buckets
import math
m = max(math.floor(0.45 * n), 1)
# Step 3: Count the number of elements in each class
def get_bucket_id(value):
return math.floor((m * (value-min_value)) / (max_value-min_value+1))
Lb = [0]*m
for value in array:
Lb[get_bucket_id(value)] +=1
# Step 4: Convert the count to prefix sum
for i in range(1, m):
Lb[i] = Lb[i-1]+Lb[i]
# Step 5: Rearrange the elements
def find_swap_index(b_id):
for ind in range(Lb[b_id-1],Lb[b_id]):
if get_bucket_id(array[ind])!=b_id: break
return ind
def arrange_bucket(i1, i2, b):
for i in range(i1, i2):
b_id = get_bucket_id(array[i])
while(b_id != b):
s_ind = find_swap_index(b_id)
array[i], array[s_ind] = array[s_ind], array[i]
b_id = get_bucket_id(array[i])
return
for b in range(0, m-1):
if b == 0: arrange_bucket(b, Lb[b], b)
else: arrange_bucket(Lb[b-1], Lb[b], b)
# Step 6: Sort each bucket
def insertion_sort(array):
for i in range(1, len(array)):
temp = array[i]
j = i - 1
while j >= 0 and temp < array[j]:
array[j + 1] = array[j]
j -= 1
array[j + 1] = temp
return array
for i in range(m):
if i == 0: array[i:Lb[i]] = insertion_sort(array[i:Lb[i]])
else: array[Lb[i-1]:Lb[i]] = insertion_sort(array[Lb[i-1]:Lb[i]])
return
array = [15, 13, 24, 7, 18, 3, 22, 9]
print("Unsorted array:", array)
flash_sort(array)
print("Sorted array:", array)

Code explanation

Let’s discuss the code below:

  • Lines 5–9: We use a for loop to find out the minimum and maximum values of the array.

  • Line 13: We calculate the number of buckets. We can choose the percentage of buckets proportional to the input length. We set 45% of the input length for buckets.

  • Lines 16–20: We define a function get_bucket_id() that returns the bucket ID for a given input. We create an array Lb of zeros to store the count of each bucket. We iterate through each value of the array, and get the bucket ID to increment the count of the bucket.

  • Lines 23–24: We convert the count of buckets Lb into the accumulative sum by adding the previous count to the current count.

  • Lines 27–38: We define two functions to rearrange the array elements according to the buckets.

    • The find_swap_index() function returns the index of a value that doesn’t belong to the given bucket.

    • The arrange_bucket() function arranges the values of a given bucket. It gets the correct bucket ID from the get_bucket_id() function for each bucket element. It swaps the misfit value of the current bucket with the misfit value of the correct bucket. It repeats this process until it finds the element that belongs to the current bucket.

  • Lines 40–43: We call the arrange_bucket() function by passing the bucket ID and its indexes for m-1 buckets because the last bucket doesn’t need any rearrangement of elements. All the elements that belong to the last bucket are already swapped in.

  • Lines 45–53: We define the insertion_sort() function to sort the buckets.

  • Lines 55–57: We call the insertion_sort() function for sorting individual buckets.

  • Lines 60–61: We create an unsorted array and print the array before sorting.

  • Lines 63–64: We call the flash_sort() function and print the array after sorting.

Complexity

The best and average time complexity of the flash sort is O(n)O(n). However, the worst-case time complexity can go up to the worst time complexity of the insertion sort, which is O(n2)O(n^2). This can happen when there’s an outlier in the input data, which can cause nearly all the elements to belong to one bucket.

The space complexity of the flash sort is O(m)O(m), where mm is the number of buckets, which is 0.45×n0.45\times n.

Benefits and limitations

Let’s discuss some benefits and limitations of the flash sort.

Benefits

  1. The flash sort is an in-place sorting algorithm, which means it doesn’t require any extra space to sort the data.

  2. It’s a fast sorting algorithm that has O(n)O(n) average time complexity.

  3. It’s efficient for linearly distributed large arrays in comparison to other linear-time sorting algorithms, i.e., count sort, radix sort, pigeonhole sort, bucket sort, etc.

Limitations

  1. The flash sort is not a stable sorting algorithm, which means it doesn't guarantee that the the order of similar elements in the sorted array will be maintained.

  2. It can be slow for non-linearly distributed arrays or data with outliers, which can be the worst case for the flash sort.

  3. The algorithm of flash sort is quite complex and difficult to implement and debug.

When to use flash sort?

The flash sort is an efficient sorting algorithm that performs linear-time sorting in average case time complexity. It outperforms with linear distributed datasets. It’s a distribution sorting algorithm that doesn’t compare the elements with each other for sorting. It’s the best choice when data has no outliers and is distributed evenly.

Copyright ©2024 Educative, Inc. All rights reserved