• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Rust
  • License
    MIT License
  • Created about 7 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast and scalable minimal perfect hashing for massive key sets

Fast and Scalable Minimal Perfect Hash Functions in Rust

A Rust impl of Fast and scalable minimal perfect hashing for massive key sets.

The library generates a minimal perfect hash functions (MPHF) for a collection of hashable objects. This algorithm generates MPHFs that consume ~3-6 bits/item. The memory consumption during construction is a small multiple (< 2x) of the size of the dataset and final size of the MPHF. Note, minimal perfect hash functions only return a usable hash value for objects in the set used to create the MPHF. Hashing a new object will return an arbitrary hash value. If your use case may result in hashing new values, you will need an auxiliary scheme to detect this condition.

See Docs

Example usage:

use boomphf::*;

// sample set of obejcts
let possible_objects = vec![1, 10, 1000, 23, 457, 856, 845, 124, 912];
let n = possible_objects.len();

// generate a minimal perfect hash function of these items
let phf = Mphf::new(1.7, possible_objects.clone(), None);

// Get hash value of all objects
let mut hashes = Vec::new();
for v in possible_objects {
    hashes.push(phf.hash(&v));
}
hashes.sort();

// Expected hash output is set of all integers from 0..n
let expected_hashes: Vec<u64> = (0 .. n as u64).collect();
assert!(hashes == expected_hashes)

Note: this crate carries it's own bit-vector implementation to support rank-select queries and multi-threaded read-write access.

More Repositories

1

cellranger

10x Genomics Single Cell Analysis
Rust
357
star
2

vartrix

Single-Cell Genotyping Tool
Rust
199
star
3

loupeR

Convert Seurat objects to 10x Genomics Loupe files.
R
95
star
4

single-cell-3prime-paper

R
83
star
5

subset-bam

Rust
66
star
6

supernova

10x Genomics Linked-Read Diploid De Novo Assembler
C++
64
star
7

rust-debruijn

De Bruijn graphs in Rust
Rust
63
star
8

bamtofastq

Convert 10x BAM files to the original FASTQs compatible with 10x pipelines
Rust
59
star
9

scHLAcount

Count HLA alleles in single-cell RNA-seq data
TeX
58
star
10

rust-pseudoaligner

Single-Cell RNA-seq pseudo-aligner
Rust
50
star
11

rust-shardio

Out-of-memory sorting of large datasets map / reduce style processing
Rust
47
star
12

enclone

VDJ Clonotyping & Analysis Tools
Rust
46
star
13

rust-toolbox

Rust utility code
Rust
32
star
14

HumanColonCancer_VisiumHD

Associated code to the manuscript "Characterization of immune cell populations in the tumor microenvironment of colorectal cancer using high definition spatial profiling"
R
32
star
15

longranger

10x Genomics Linked-Read Alignment, Variant Calling, Phasing, and Structural Variant Calling
Python
30
star
16

lariat

Linked-Read Alignment Tool
Go
28
star
17

single-cell-3prime-snp-clustering

Python
22
star
18

rust-bwa

Rust wrapper of the BWA C API
Rust
18
star
19

cellranger-atac

Python
16
star
20

scan-rs

Single-cell analysis methods in Rust
Rust
15
star
21

fastq_set

Tools for working FASTQ files from sequencers (R1/R2/I1/I2)
Rust
14
star
22

orbit

Rust wrapper for STAR aligner
C
14
star
23

janesick_nature_comms_2023_companion

Code companion to the publication "High resolution mapping of the breast cancer tumor microenvironment using integrated single cell, spatial and in situ analysis of FFPE tissue"
Jupyter Notebook
14
star
24

thermite

Spliced short read aligner implemented in Rust
Rust
8
star
25

websummary

Generate an HTML report in Rust
Rust
5
star
26

analysis_guides

10x Genomics analysis guides files
Jupyter Notebook
5
star
27

enclone_ranger

Core components for enclone
Rust
4
star
28

supernova-chili-pepper

C++
2
star
29

rules_conda

Bazel rules for creating conda environments
Go
1
star
30

louvain

Louvain graph clustering
C++
1
star
31

cellranger-dna

Single Cell DNA Copy Number Profiling
Go
1
star