• Stars
    star
    2,334
  • Rank 19,705 (Top 0.4 %)
  • Language
    C++
  • License
    zlib License
  • Created over 9 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pattern-defeating quicksort.

pdqsort

Pattern-defeating quicksort (pdqsort) is a novel sorting algorithm that combines the fast average case of randomized quicksort with the fast worst case of heapsort, while achieving linear time on inputs with certain patterns. pdqsort is an extension and improvement of David Mussers introsort. All code is available for free under the zlib license.

Best        Average     Worst       Memory      Stable      Deterministic
n           n log n     n log n     log n       No          Yes

Usage

pdqsort is a drop-in replacement for std::sort. Just replace a call to std::sort with pdqsort to start using pattern-defeating quicksort. If your comparison function is branchless, you can call pdqsort_branchless for a potential big speedup. If you are using C++11, the type you're sorting is arithmetic and your comparison function is not given or is std::less/std::greater, pdqsort automatically delegates to pdqsort_branchless.

Benchmark

A comparison of pdqsort and GCC's std::sort and std::stable_sort with various input distributions:

Performance graph

Compiled with -std=c++11 -O2 -m64 -march=native.

Visualization

A visualization of pattern-defeating quicksort sorting a ~200 element array with some duplicates. Generated using Timo Bingmann's The Sound of Sorting program, a tool that has been invaluable during the development of pdqsort. For the purposes of this visualization the cutoff point for insertion sort was lowered to 8 elements.

Visualization

The best case

pdqsort is designed to run in linear time for a couple of best-case patterns. Linear time is achieved for inputs that are in strictly ascending or descending order, only contain equal elements, or are strictly in ascending order followed by one out-of-place element. There are two separate mechanisms at play to achieve this.

For equal elements a smart partitioning scheme is used that always puts equal elements in the partition containing elements greater than the pivot. When a new pivot is chosen it's compared to the greatest element in the partition before it. If they compare equal we can derive that there are no elements smaller than the chosen pivot. When this happens we switch strategy for this partition, and filter out all elements equal to the pivot.

To get linear time for the other patterns we check after every partition if any swaps were made. If no swaps were made and the partition was decently balanced we will optimistically attempt to use insertion sort. This insertion sort aborts if more than a constant amount of moves are required to sort.

The average case

On average case data where no patterns are detected pdqsort is effectively a quicksort that uses median-of-3 pivot selection, switching to insertion sort if the number of elements to be (recursively) sorted is small. The overhead associated with detecting the patterns for the best case is so small it lies within the error of measurement.

pdqsort gets a great speedup over the traditional way of implementing quicksort when sorting large arrays (1000+ elements). This is due to a new technique described in "BlockQuicksort: How Branch Mispredictions don't affect Quicksort" by Stefan Edelkamp and Armin Weiss. In short, we bypass the branch predictor by using small buffers (entirely in L1 cache) of the indices of elements that need to be swapped. We fill these buffers in a branch-free way that's quite elegant (in pseudocode):

buffer_num = 0; buffer_max_size = 64;
for (int i = 0; i < buffer_max_size; ++i) {
    // With branch:
    if (elements[i] < pivot) { buffer[buffer_num] = i; buffer_num++; }
    // Without:
    buffer[buffer_num] = i; buffer_num += (elements[i] < pivot);
}

This is only a speedup if the comparison function itself is branchless, however. By default pdqsort will detect this if you're using C++11 or higher, the type you're sorting is arithmetic (e.g. int), and you're using either std::less or std::greater. You can explicitly request branchless partitioning by calling pdqsort_branchless instead of pdqsort.

The worst case

Quicksort naturally performs bad on inputs that form patterns, due to it being a partition-based sort. Choosing a bad pivot will result in many comparisons that give little to no progress in the sorting process. If the pattern does not get broken up, this can happen many times in a row. Worse, real world data is filled with these patterns.

Traditionally the solution to this is to randomize the pivot selection of quicksort. While this technically still allows for a quadratic worst case, the chances of it happening are astronomically small. Later, in introsort, pivot selection is kept deterministic, instead switching to the guaranteed O(n log n) heapsort if the recursion depth becomes too big. In pdqsort we adopt a hybrid approach, (deterministically) shuffling some elements to break up patterns when we encounter a "bad" partition. If we encounter too many "bad" partitions we switch to heapsort.

Bad partitions

A bad partition occurs when the position of the pivot after partitioning is under 12.5% (1/8th) percentile or over 87,5% percentile - the partition is highly unbalanced. When this happens we will shuffle four elements at fixed locations for both partitions. This effectively breaks up many patterns. If we encounter more than log(n) bad partitions we will switch to heapsort.

The 1/8th percentile is not chosen arbitrarily. An upper bound of quicksorts worst case runtime can be approximated within a constant factor by the following recurrence:

T(n, p) = n + T(p(n-1), p) + T((1-p)(n-1), p)

Where n is the number of elements, and p is the percentile of the pivot after partitioning. T(n, 1/2) is the best case for quicksort. On modern systems heapsort is profiled to be approximately 1.8 to 2 times as slow as quicksort. Choosing p such that T(n, 1/2) / T(n, p) ~= 1.9 as n gets big will ensure that we will only switch to heapsort if it would speed up the sorting. p = 1/8 is a reasonably close value and is cheap to compute on every platform using a bitshift.

More Repositories

1

glidesort

A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.
Rust
1,571
star
2

slotmap

Slotmap data structure for Rust
Rust
1,104
star
3

ed25519

Portable C implementation of Ed25519, a high-speed high-security public-key signature system.
C
487
star
4

polymur-hash

The PolymurHash universal hash function.
C
316
star
5

dev-on-windows

An opiniated guide to set up a development environment on Windows.
226
star
6

matt-parker-five-letter-clique

Rust
45
star
7

devector

Resizable contiguous sequence container with fast appends on either end.
C++
37
star
8

recursive

Easy recursion in Rust, without stack overflows.
Rust
28
star
9

num-ord

A wrapper type for cross-type numeric comparisons.
Rust
27
star
10

peekread

Rust crate for making Read streams peekable.
Rust
26
star
11

ReducePing

ReducePing is a small utility to tune the "TcpAckFrequency" setting of Windows to get better latency in TCP networked games.
C
20
star
12

aoc2022

My Advent of Code 2022 solutions, in Rust.
Rust
20
star
13

golf-cpu

Reference implementation of the GOLF CPU.
Python
17
star
14

pyglfw

Python bindings for GLFW
C
16
star
15

PyGG2

A Python rewrite of Gang Garrison 2
Python
14
star
16

pyth5

Clean-sheet rewrite of Pyth.
Python
13
star
17

bitwise-binary-search

Accompanying code for https://orlp.net/blog/bitwise-binary-search/.
C++
12
star
18

dotfiles

My dotfiles.
HTML
10
star
19

pygrafix

pygrafix is a Python/Cython hardware-accelerated 2D graphics library.
C
10
star
20

xcharter

XCharter font build.
Python
10
star
21

vim-bunlink

A replacement for :bdelete that decouples the concept of 'deleting a buffer' from 'closing a window'.
Vim Script
9
star
22

libop

My personal C++ library.
Objective-C
8
star
23

aoc2023

My Advent of Code 2023 solutions, in Rust.
Rust
7
star
24

vim-quick-replace

A quick find/replace plugin for Vim.
Vim Script
6
star
25

secudht

A secure design and implementation of the Kademlia DHT.
C
6
star
26

sum-bench

This is the code accompanying https://orlp.net/blog/taming-float-sums/.
Rust
6
star
27

multilive

Multiple poe.trade live searches at once.
JavaScript
5
star
28

iwyu

A small utility that helps you include the right C++ headers.
Python
4
star
29

aoc2021

My Advent of Code 2021 solutions, in Rust.
Rust
4
star
30

commonc

Various common C algorithms and things
C
3
star
31

stable-alloc-shim

A stable Rust shim for the unstable Allocator API.
Rust
3
star
32

synth

A real-time self-hosting MIDI software synth written in Rust from scratch.
Rust
3
star
33

qcon

Quake-style console for windows.
AutoHotkey
3
star
34

ncUI

A lightweight user interface for World of Warcraft - DEAD
Lua
3
star
35

deps

deps is a minimalistic building system for any process which consists of smaller processes that depend on each other.
Python
2
star
36

euler

My solutions for Project Euler.
C++
1
star
37

osrs-fragment-calc

OSRS fragment set calculator
HTML
1
star
38

hades-boons

Python
1
star
39

picture-in-picture

Java
1
star
40

Robotics2020-Final

This repository hosts my final project for the 2020 Robotics course at Leiden university.
JavaScript
1
star
41

boost-win-builds

Some Windows builds for Boost.
Shell
1
star
42

lolpriority

Little tool for Leage of Legends, automatically puts the LoLClient.exe process on low priority when the in-game client is open for extra performance.
C
1
star
43

amazons

Web-based Game of the Amazons
CSS
1
star
44

distris

Distributed socializing.
C++
1
star
45

p

C++
1
star
46

poe-trade-qol

Quality of life for PoE's trade site.
JavaScript
1
star
47

poedps

A Path of Exile weapon DPS calculator with optional crafting.
JavaScript
1
star
48

StrongholdCoords

An application for finding out the exact coordinates of end portal frames in Minecraft seeds.
Java
1
star
49

unite-git-repo

A source for Unite.vim that lists all files from the git repository root.
Vim Script
1
star
50

pyflat

pyflat is a Python hardware-accelerated 2D graphics library.
C
1
star