• Stars
    star
    138
  • Rank 264,508 (Top 6 %)
  • Language
    Haskell
  • License
    BSD 2-Clause "Sim...
  • Created over 11 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Haskell implementation of crit-bit trees.

Crit-bit trees for Haskell

This is the first purely functional implementation of crit-bit trees that I'm aware of.

A crit-bit tree is a key/value container that allows efficient lookups and ordered traversal for data that can be represented as a string of bits.

This package exists in part with education in mind:

  • The core data structures are simple.

  • The core algorithms are easy to grasp.

  • I have intentionally structured the source to be easy to follow and extend.

  • Originally, I deliberately left the package incomplete. (It has since been substantially fleshed out.) Ever thought to yourself, "I'd write a bit of Haskell if only I had a project to work on"? Well, here's your chance! I will set aside time to review your code and answer what questions I can.

Education aside, crit-bit trees offer some interesting features compared to other key/value container types in Haskell.

  • For some operations, they are much faster than Data.Map from the containers package, while for others, they are slower.

  • Compared to Data.HashMap, you get about the same lookup performance, but also some features that a hash-based structure can't provide: prefix-based search, efficient neighbour lookup, ordered storage.

Of course crit-bit trees have some downsides, too. For example, building a tree from randomly ordered inputs is somewhat slow, and of course the set of usable key types is small (only types that can be interpreted as bitstrings "for free").

Compared to the most easily findable crit-bit tree code you'll come across that's written in C, the core of this library has a lot less accidental complexity, and so may be easier to understand. It also handles arbitrary binary data that will cause the C library to go wrong.

How to contribute

I've purposely published this package in an incomplete state, and I'd like your help to round it out. In return, you get to learn a little Haskell, have your code reviewed by someone who wants to see you succeed, and contribute to a rather nifty library.

Do you need any prior experience with Haskell to get started? No! All you need is curiosity and the ability to learn from context. Oh, and a github account.

My aim with this library is drop-in API compatibility with the widely used Haskell containers library, which has two happy consequences:

  • There are lots of functions to write!

  • In almost every case, you'll find a pre-existing function in containers that (from a user's perspective) does exactly what its counterparts in this library ought to do.

Getting started

If you want to contribute or play around, please use the most modern version of the Haskell Platform.

Once you have the Platform installed, there are just a few more steps.

Set up your local database of known open source Haskell packages.

cabal update

Both the new cabal command and cabal-dev will install to $HOME/.cabal/bin, so put that directory at the front of your shell's search path before you continue.

Get the critbit source.

git clone git://github.com/bos/critbit

Set up a sandbox.

The first time through, you may need to download and install a ton of dependencies, so hang in there.

cd critbit
cabal sandbox init
cabal install \
--enable-tests \
--enable-benchmarks \
	--only-dependencies \
-j

The -j flag above tells cabal to use all of your CPUs, so even the initial build shouldn't take more than a few minutes.

To actually build:

cabal build

Running the test suite

Once you've built the code, you can run the entire test suite fairly quickly. This takes about 30 seconds on my oldish 8-core Mac laptop:

dist/build/tests/tests +RTS -N

(The +RTS -N above tells GHC's runtime system to use all available cores.)

If you're feeling impatient, run a subset of the test suite:

dist/build/tests/tests -t properties/map/bytestring +RTS -N

And if you want to explore, the tests program accepts a --help option. Try it out.

Running benchmarks

It is just as easy to benchmark stuff as to test it.

First, you need a dictionary. If your system doesn't have a file named /usr/share/dict/words, you can download a dictionary here.

If you've downloaded a dictionary, tell the benchmark suite where to find it by setting the WORDS environment variable.

export WORDS=/my/path/to/linuxwords

You can then run benchmarks and generate a report. For instance, this runs every benchmark that begins with bytestring/lookup.

dist/build/benchmarks/benchmarks -o lookup.html \
    bytestring/lookup

Open the lookup.html file in your browser. Here's an example of what to expect.

As with tests, run the benchmarks program with --help if you want to do some exploring.

What your code should look like

Okay, so you've bought into this idea, and would like to try writing a patch. How to begin?

I've generally tried to write commits with a view to being readable, so there are examples you can follow.

For instance, here's the commit where I added the keys function. This commit follows a simple pattern:

Naturally, you'll follow the prevailing coding and formatting style. If you forget, I'll be sad and offer you only a terse "fix your formatting" review, and then you'll be sad too.

What your commits should look like

Please follow the guidelines below, as they make it easier to review your pull request and deal with your commits afterwards.

  • One logical idea per commit! If you want to add five functions, that's fine, but please spread them across five commits.

  • Do not reorganize or refactor unrelated code in a commit whose purpose is to add new code.

  • When you add a new function, add its tests and benchmarks in the same commit.

  • Do not add trailing whitespace. Follow the same formatting and naming conventions as you already see in the code around you.

  • Keep your maximum line length at 80 columns for everything except lines of example code in documentation.

(If you can't follow the guidelines, there's a good chance I'll ask you to fix your commits and resubmit them.)

More Repositories

1

haskell-language-server

Official haskell ide support via language server (LSP). Successor of ghcide & haskell-ide-engine.
Haskell
2,709
star
2

haskell-ide-engine

The engine for haskell ide-integration. Not an IDE
Haskell
2,383
star
3

cabal

Official upstream development repository for Cabal and cabal-install
Haskell
1,623
star
4

haskell-mode

Emacs mode for Haskell
Emacs Lisp
1,329
star
5

aeson

A fast Haskell JSON library
Haskell
1,254
star
6

stylish-haskell

Haskell code prettifier
Haskell
986
star
7

parsec

A monadic parser combinator library
Haskell
844
star
8

ghcide

A library for building Haskell IDE tooling
Haskell
584
star
9

vscode-haskell

VS Code extension for Haskell, powered by haskell-language-server
TypeScript
561
star
10

attoparsec

A fast Haskell library for parsing ByteStrings
Haskell
514
star
11

criterion

A powerful but simple library for measuring the performance of Haskell code.
Haskell
501
star
12

hackage-server

Hackage-Server: A Haskell Package Repository
Haskell
415
star
13

text

Haskell library for space- and time-efficient operations over Unicode text.
Haskell
406
star
14

haskell-platform

Distribution of Haskell with batteries included
Haskell
381
star
15

wreq

Haskell
379
star
16

lsp

Haskell library for the Microsoft Language Server Protocol
Haskell
366
star
17

mtl

The Monad Transformer Library
Haskell
366
star
18

vector

An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework .
Haskell
363
star
19

haddock

Haskell Documentation Tool
HTML
361
star
20

network

Low-level networking interface
Haskell
326
star
21

containers

Assorted concrete container types
Haskell
315
star
22

statistics

A fast, high quality library for computing with statistics in Haskell.
Haskell
300
star
23

alex

A lexical analyser generator for Haskell
Haskell
298
star
24

bytestring

An efficient compact, immutable byte string type (both strict and lazy) suitable for binary or 8-bit character data.
Haskell
291
star
25

happy

The Happy parser generator for Haskell
Haskell
287
star
26

ghcup-hs

Haskell
284
star
27

ghcup

DEPRECATED IN FAVOR OF haskell/ghcup-hs
Shell
263
star
28

haskeline

A Haskell library for line input in command-line programs.
Haskell
222
star
29

c2hs

c2hs is a pre-processor for Haskell FFI bindings to C libraries
Haskell
198
star
30

fgl

A Functional Graph Library for Haskell
Haskell
184
star
31

hie-bios

Set up a GHC API session for various Haskell Projects
Haskell
180
star
32

HTTP

Haskell HTTP package
Haskell
177
star
33

ThreadScope

A graphical tool for profiling parallel Haskell programs
Haskell
153
star
34

actions

Github actions for Haskell CI
TypeScript
147
star
35

play-haskell

Haskell Playground
Haskell
129
star
36

time

A time library
Haskell
119
star
37

primitive

This package provides various primitive memory-related operations.
Haskell
114
star
38

unix

POSIX functionality
Haskell
107
star
39

binary

Efficient, pure binary serialisation using ByteStrings in Haskell.
Haskell
106
star
40

rfcs

This repo is archived, consider using https://github.com/ghc-proposals/ghc-proposals instead
TeX
98
star
41

win32

Haskell support for the Win32 API
Haskell
98
star
42

stm

Software Transactional Memory
Haskell
97
star
43

core-libraries-committee

95
star
44

haskell-report

Haskell Language Report
TeX
91
star
45

parallel

a library for parallel programming
Haskell
91
star
46

process

Library for dealing with system processes
Haskell
87
star
47

hoopl

Higher-order optimization library
Haskell
73
star
48

error-messages

72
star
49

pretty

Haskell Pretty-printer library
Haskell
69
star
50

filepath

Haskell FilePath core library
Haskell
66
star
51

docker-haskell

Dockerfile
63
star
52

directory

Platform-independent library for basic file system operations
Haskell
58
star
53

hackage-security

Hackage security framework based on TUF (The Update Framework)
Haskell
56
star
54

mwc-random

A very fast Haskell library for generating high quality pseudo-random numbers.
Haskell
55
star
55

random

Random number library
Haskell
53
star
56

ecosystem-proposals

Proposals for the Haskell Ecosystem
51
star
57

text-icu

This package provides the Haskell Data.Text.ICU library, for performing complex manipulation of Unicode text.
Haskell
47
star
58

base64-bytestring

Fast base64 encoding and decoding for Haskell.
Haskell
45
star
59

security-advisories

Haskell
44
star
60

deepseq

Deep evaluation of data structures
Haskell
41
star
61

tar

Reading, writing and manipulating ".tar" archive files.
Haskell
40
star
62

math-functions

Special mathematical functions
Haskell
40
star
63

hsc2hs

Pre-processor for .hsc files
Haskell
38
star
64

pvp

Haskell Package Version Policy (PVP)
CSS
38
star
65

zlib

Compression and decompression in the gzip and zlib formats
C
35
star
66

ghc-events

Library and tool for parsing .eventlog files from GHC
Haskell
33
star
67

text-format

A Haskell text formatting library optimized for ease of use and high performance.
Haskell
32
star
68

ghcup-metadata

GHCup metadata repository
Haskell
32
star
69

cabal-userguide

A handy user guide for the Cabal build tool
Nix
28
star
70

base16-bytestring

Fast base16 (hexadecimal) encoding and decoding for Haskell bytestrings.
Haskell
27
star
71

network-uri

URI manipulation facilities
Haskell
25
star
72

entropy

Easy entropy source for Haskell users.
Haskell
23
star
73

winghci

Simple Windows GUI for GHCi.
C
17
star
74

double-conversion

A fast Haskell library for converting between double precision floating point numbers and text strings. It is implemented as a binding to the V8-derived C++ double-conversion library.
C++
15
star
75

file-io

File IO (read/write/open) for OsPath API
Haskell
11
star
76

terminfo

Haskell bindings to the terminfo API.
Haskell
10
star
77

array

Haskell
10
star
78

xhtml

XHTML combinator library
Haskell
9
star
79

old-time

This package provides the old time library.
Haskell
7
star
80

meta

A place for discussing & documenting the github.com/haskell organization
7
star
81

blog.haskell.org

Repository of the Haskell Blog
JavaScript
7
star
82

ghc-builder

ghc builder bot
Haskell
6
star
83

cabal-website

The http://www.haskell.org/cabal/ website
HTML
4
star
84

network-bsd

POSIX network database (<netdb.h>) API
Haskell
4
star
85

haskell-wiki-configuration

Issue tracking for Haskell Wiki
PHP
4
star
86

clc-stackage

Meta-package to facilitate impact assessment for CLC proposals
Nix
3
star
87

hiw

Haskell Implementors Workshop
TeX
3
star
88

text-test-data

Test data for the Haskell text project
Text
3
star
89

old-locale

This package provides the ability to adapt to locale conventions such as date and time formats.
Haskell
3
star
90

tokenize

Simple tokenizer for English text
Haskell
3
star
91

bzlib

Compression and decompression in the bzip2 format
Haskell
2
star
92

os-string

Haskell
2
star
93

hpc

1
star