Performance profiles

This notebook shows how to use the corsort package to determine the performance profiles of several sorting algorithms.

Here the following sorts will be compared:

Top-down merge
Binary insertion
Binary insertion \(\rho\)
Ford-Johnson \(\rho\)
Quicksort
ASort \(\rho\)
Bottom-up merge \(\rho\)
Multizip \(\rho\)
Corsort \(\rho\)

First we load some packages.

[1]:

import numpy as np
from matplotlib import pyplot as plt
from multiprocess.pool import Pool

from corsort import *

We populate the list of sorts we want to study. As we consider profiles, the scorer used matters (here \(\rho\) is the main external scorer).

[2]:

sort_list = [
    SortMergeTopDown(compute_history=True),
    SortBinaryInsertion(compute_history=True),
    WrapSortScorer(sort=SortBinaryInsertion(), scorer=jit_scorer_rho, compute_history=True),
    WrapSortScorer(sort=SortFordJohnson(), scorer=jit_scorer_rho, compute_history=True),
    SortQuick(compute_history=True),
    WrapSortScorer(sort=SortAsortQuickselect(), scorer=jit_scorer_rho, compute_history=True),
    WrapSortScorer(sort=SortMergeBottomUp(), scorer=jit_scorer_rho, compute_history=True),
    WrapSortScorer(sort=SortMultizip(), scorer=jit_scorer_rho, compute_history=True),
    JitCorsortDeltaMaxRho(compute_history=True)
]
legends = {'mergesort_top_down': 'Top-down merge',
 'binary_insertion_sort': 'Binary insertion',
 'binary_insertion_sort_rho': r'Binary insertion $\rho$',
 'ford_johnson_rho': r'Ford-Johnson $\rho$',
 'quicksort': 'Quicksort',
 'asort_quickselect_rho': r'Asort $\rho$',
 'mergesort_bottom_up_rho': r'Bottom-up merge $\rho$',
 'multizip_sort_rho': r'Multizip $\rho$',
 'corsort_delta_max_rho': r"Corsort-$\Delta$ $\rho$"}

We decide the value of \(n\) to study and the number of trials.

[3]:

n = 1000
nt = 10000

We now run the experiments, with some multi-processing to speed-up things.

[4]:

with Pool() as p:
    convergence = evaluate_convergence(sort_list=sort_list, n=n, nt=nt, pool=p)

Evaluate convergence of mergesort_top_down for n = 1000

100%|██████████| 10000/10000 [28:20<00:00,  5.88it/s]

Evaluate convergence of binary_insertion_sort for n = 1000

100%|██████████| 10000/10000 [29:23<00:00,  5.67it/s]

Evaluate convergence of binary_insertion_sort_rho for n = 1000

100%|██████████| 10000/10000 [58:20<00:00,  2.86it/s]

Evaluate convergence of ford_johnson_rho for n = 1000

100%|██████████| 10000/10000 [1:08:32<00:00,  2.43it/s]

Evaluate convergence of quicksort for n = 1000

100%|██████████| 10000/10000 [40:55<00:00,  4.07it/s]

Evaluate convergence of asort_quickselect_rho for n = 1000

100%|██████████| 10000/10000 [1:35:46<00:00,  1.74it/s]

Evaluate convergence of mergesort_bottom_up_rho for n = 1000

100%|██████████| 10000/10000 [49:53<00:00,  3.34it/s]

Evaluate convergence of multizip_sort_rho for n = 1000

100%|██████████| 10000/10000 [52:46<00:00,  3.16it/s]

Evaluate convergence of corsort_delta_max_rho for n = 1000

100%|██████████| 10000/10000 [2:42:29<00:00,  1.03it/s]

We save the raw results.

[5]:

import dill as pickle
from pathlib import Path

fn = Path(f"profiles_n_{n}_nt_{nt}.pkl")
if fn.exists():
    with open(fn, 'rb') as f:
        convergence = pickle.load(f)
else:
    with open(fn, 'wb') as f:
        pickle.dump(convergence, f)

Then we display the results.

[6]:

m = n*(n-1)/2
fig = plt.figure(figsize=(15, 10))
ax = plt.axes()
decim=40
color_dict = auto_colors(sort_list)
for name in legends:
    ref = convergence[name]
    color = color_dict[name]
    p_m = ref.shape[1]
    x = np.arange(p_m)[::decim]
    ref = ref[:, ::decim]
    q = np.zeros((5, ref.shape[1]))
    for i, per in enumerate([2.5, 50, 97.5]):
        q[i, :] = np.percentile(ref, per, axis=0)
    q = q/m
    ax.plot(x, q[1, :], label=legends[name], color=color)
    ax.fill_between(x, q[0, :], q[2, :], alpha=.2, color=color)
plt.legend()
plt.grid()
ax.tick_params(labelright=True, right=True)
plt.ylabel('Distance to sorted list')
plt.xlabel('Comparisons performed')
plt.ylim([0, None])
plt.xlim([0, 12000])
plt.show()