• This repository has been archived on 22/Feb/2024
  • Stars
    star
    2,527
  • Rank 18,152 (Top 0.4 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase

FeatureBase

Pilosa is now FeatureBase

As of September 7, 2022, the Pilosa project is now FeatureBase. The core of the project remains the same: FeatureBase is the first real-time distributed database built entirely on bitmaps. (More information about updated capabilities and improvements below.)

FeatureBase delivers low-latency query results, regardless of throughput or query volumes, on fresh data with extreme efficiency. It works because bitmaps are faster, simpler, and far more I/O efficient than traditional column-oriented data formats. With FeatureBase, you can ingest data from batch data sources (e.g. S3, CSV, Snowflake, BigQuery, etc.) and/or streaming data sources (e.g. Kafka/Confluent, Kinesis, Pulsar).

For more information about FeatureBase, please visit www.featurebase.com.

Getting Started

Build FeatureBase Server from source

  1. Install go. Ensure that your shell's search path includes the go/bin directory.
  2. Clone the FeatureBase repository (or download as zip).
  3. In the featurebase directory, run make install to compile the FeatureBase server binary. By default, it will be installed in the go/bin directory.
  4. In the idk directory, run make install to compile the ingester binaries. By default, they will be installed in the go/bin directory.
  5. Run featurebase server --handler.allowed-origins=http://localhost:3000 to run FeatureBase server with default settings (learn more about configuring FeatureBase at the link below). The --handler.allowed-origins parameter allows the standalone web UI to talk to the server; this can be omitted if the web UI is not needed.
  6. Run curl localhost:10101/status to verify the server is running and accessible.

Ingest Data and Query

  1. Run
molecula-consumer-csv \
    --index repository \
    --header "language__ID_F,project_id__ID_F" \
    --id-field project_id \
    --batch-size 1000 \
    --files example.csv

This will ingest the example.csv file into a FeatureBase table called repository. If the table does not exist, it will be automatically created. Learn more about ingesting data into FeatureBase

  1. Query your data.
curl localhost:10101/index/repository/query \
     -X POST \
     -d 'Row(example=5)'

Learn about supported SQL, native Pilosa Query Language (PQL).

Data Model

Because FeatureBase is built on bitmaps, there is bit of a learning curve to grasp how your data is represented. Learn about Data Modeling.

More Information

Installation

Configuration

Community

You can email us at [email protected] and learn more about contributing.

Chat with us: https://discord.gg/FBn2vEp7Na

What's Changed Since the Pilosa Days?

A lot has changed since the days of Pilosa. This list highlights some new capabilites included in FeatureBase. We have also made signficant improvements to the performance, scalability, and stability of the FeatureBase product.

  • Query Languages: FeatureBase supports Pilosa Query Language (PQL), as well as SQL
  • Stream and Batch Ingest: Combine real-time data streams with batch historical data and act on it within milliseconds.
  • Mutable: Perform inserts, updates, and deletes at scale, in real time and on-the-fly. This is key for meeting data compliance requirements, and for reflecting the constantly-changing nature of high-volume data.
  • Multi-Valued Set Fields: Store multiple comma-delimited values within a single field while increasing query performance of counts, TopKs, etc.
  • Time Quantums: Setting a time quantum on a field creates extra views which allow ranged Row queries down to the time interval specified. For example, if the time quantum is set to YMD, ranged Row queries down to the granularity of a day are supported.
  • RBF storage backend: this is a new compressed bitmap format which improves performance in a number of ways: ACID support on a per shard basis, prevents issues with the number of open files, reduces memory allocation and lock contention for reads, provides more consistent garbage collection, and allows backups to run concurrently with writes. However, because of this change, Pilosa backup files cannot be restored into FeatureBase.

License

FeatureBase is licensed under the Apache License, Version 2.0

More Repositories

1

DoctorGPT

💻📚💡 DoctorGPT provides advanced LLM prompting for PDFs and webpages.
Python
230
star
2

go-pilosa

Go client library for Pilosa
Go
58
star
3

python-pilosa

Python client library for Pilosa
Python
32
star
4

pdk

Pilosa Dev Kit - implementation tooling and use case examples are here!
Go
31
star
5

tools

Tools for development and ops
Go
20
star
6

getting-started

Code and Data for Getting Started documentation
Python
20
star
7

java-pilosa

Java client library for Pilosa
Java
19
star
8

picap

Network data use case showing PDK and Pilosa.
Go
7
star
9

featurebase-examples

Examples for FeatureBase Community
Python
6
star
10

console

JavaScript
6
star
11

cosmosa

Index Microsoft's Azure CosmosDB with Pilosa
Go
6
star
12

infrastructure

Assorted tools for operations and deployment of infrastructure
HCL
5
star
13

PythonGPT

PythonGPT writes and indexes code to implement dynamic code execution using generative models. Younger sibling of DoctorGPT.
Python
5
star
14

vis

Latex source for figures for Pilosa documentation, whitepapers, etc.
HTML
5
star
15

python-pilosa-roaring

This library implements Roaring Bitmaps with Pilosa 64 bit extension, compatible with Pilosa Roaring Bitmap imports.
Python
4
star
16

Laminoid

An ML instance manager
Python
3
star
17

slothbot

SlothBot | A generally useful analytical Discord bot that does support and writes SQL.
Python
3
star
18

notebooks

ipython notebooks for pilosa demonstrations/blog posts
Jupyter Notebook
3
star
19

SlothAI

A simple document pipeline manager for AI. Runs Python on AppEngine.
Python
3
star
20

helm

Helm Chart for a Pilosa Cluster
Smarty
2
star
21

lua-pilosa

Lua Client Library for Pilosa
Lua
2
star
22

python-featurebase

Python client for Featurebase SQL endpoint.
Python
2
star
23

java-pilosa-roaring

Roaring Bitmaps with Pilosa 64 bit extension, compatible with Pilosa Roaring Bitmap imports.
Java
2
star
24

demo-taxi

JavaScript
1
star
25

demo-ssb

Go
1
star
26

chem-usecase

Chemical Similarity Usecase
Python
1
star
27

mindy

An experimental Multi-INDex proxY
Go
1
star
28

www

Pilosa website
HTML
1
star
29

general

1
star
30

upgrade-utils

Go
1
star
31

unexport

Automatically unexport identifiers that are not used externally
Go
1
star
32

sample-ogg-handler

Repository for the article: Writing a Custom Handler for Oracle GoldenGate
Java
1
star
33

sql-examples

Example SQL queries for FeatureBase.
Python
1
star