• Stars
    star
    769
  • Rank 59,078 (Top 2 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ…State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The PGM-index

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.

Website | Documentation | Paper | Slides | Python wrapper | Aยณ Lab

GitHub Workflow Status License GitHub stars GitHub forks Run on Repl.it

Quickstart

This is a header-only library. It does not need to be installed. Just clone the repo with

git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index

and copy the include/pgm directory to your system's or project's include path.

The examples/simple.cpp file shows how to index and query a vector of random integers with the PGM-index:

#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm/pgm_index.hpp"

int main() {
    // Generate some random data
    std::vector<int> data(1000000);
    std::generate(data.begin(), data.end(), std::rand);
    data.push_back(42);
    std::sort(data.begin(), data.end());

    // Construct the PGM-index
    const int epsilon = 128; // space-time trade-off parameter
    pgm::PGMIndex<int, epsilon> index(data);

    // Query the PGM-index
    auto q = 42;
    auto range = index.search(q);
    auto lo = data.begin() + range.lo;
    auto hi = data.begin() + range.hi;
    std::cout << *std::lower_bound(lo, hi, q);

    return 0;
}

Run and edit this and other examples on Repl.it. Or run it locally via:

g++ examples/simple.cpp -std=c++17 -I./include -o simple
./simple

Classes overview

Other than the pgm::PGMIndex class in the example above, this library provides the following classes:

  • pgm::DynamicPGMIndex supports insertions and deletions.
  • pgm::MultidimensionalPGMIndex stores points in k dimensions and supports orthogonal range queries.
  • pgm::MappedPGMIndex stores data on disk and uses a PGMIndex for fast search operations.
  • pgm::CompressedPGMIndex compresses the segments to reduce the space usage of the index.
  • pgm::OneLevelPGMIndex uses a binary search on the segments rather than a recursive structure.
  • pgm::BucketingPGMIndex uses a top-level lookup table to speed up the search on the segments.
  • pgm::EliasFanoPGMIndex uses a top-level succinct structure to speed up the search on the segments.

The full documentation is available here.

Compile the tests and the tuner

After cloning the repository, build the project with

cmake . -DCMAKE_BUILD_TYPE=Release
make -j8

The test runner will be placed in test/. The tuner executable will be placed in tuner/. The benchmark executable will be placed in benchmark/.

License

This project is licensed under the terms of the Apache License 2.0.

If you use the library please put a link to the website and cite the following paper:

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.

@article{Ferragina:2020pgm,
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
  Year = {2020},
  Volume = {13},
  Number = {8},
  Pages = {1162--1175},
  Doi = {10.14778/3389133.3389135},
  Url = {https://pgm.di.unipi.it},
  Issn = {2150-8097},
  Journal = {{PVLDB}}}

More Repositories

1

PyGM

๐Ÿ Python library implementing sorted containers with state-of-the-art query performance and compressed memory usage
Python
205
star
2

la_vector

๐Ÿ”ถ Compressed bitvector/container supporting efficient random access and rank queries
C++
40
star
3

unlister

๐Ÿ“ฌ Script for Mail on macOS that automatically unsubscribes from promotional emails and newsletters
AppleScript
38
star
4

Learned-indexes-effectiveness

Code for the TCS paper "On the performance of learned data structures" and the ICML paper "Why are learned indexes so effective?"
C++
18
star
5

countmein

๐Ÿพ People counting and surveillance with IoT devices
Python
11
star
6

BlockEpsilonTree

๐ŸŒณ A compressed rank/select dictionary exploiting approximate linearity and repetitiveness.
C++
11
star
7

nnweaver

๐Ÿง  + ๐Ÿ•ธ = Neural Network Weaver. A tiny library to build and train neural networks
Python
8
star
8

PrefixPGM

Proof-of-concept extension of the PGM-index to support fixed-length strings
C++
6
star
9

CSS-tree

Single-header C++11 implementation of the Cache Sensitive Search tree (CSS-tree)
C++
6
star
10

custode

๐Ÿ‘ผ๐Ÿป Personal safety app for Android
Java
5
star
11

LZEpsilon

Compressed rank/select dictionary based on Lempel-Ziv and LA-vector compression
C++
5
star
12

RearCodedArray

Compressed string dictionary based on rear-coding
C++
4
star
13

AE2020-tutorial

Repository for the students of Algorithm Engineering @ UNIPI ๐Ÿ‘ผ
C++
3
star
14

AE2022-tutorial

Repository for the students of Algorithm Engineering @ UNIPI ๐Ÿ‘ผ
C++
3
star
15

divide-and-conquer

An implementation of the divide-and-conquer parallel pattern
C++
1
star
16

TDP-2017

Application of Design Patterns to Conway's Game of Life, and other Design Patterns exercises in Java.
Java
1
star