• Stars
    star
    1,480
  • Rank 31,765 (Top 0.7 %)
  • Language
    C++
  • License
    Other
  • Created about 2 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.



🌎 ArcticDB Website | πŸ“’ ArcticDB Blog | πŸ“£ Press Release | πŸ“£ Press Release | πŸ‘₯ Community


ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem. Launched in March 2023, it is the successor to Arctic.

ArcticDB offers an intuitive Python-centric API enabling you to read and write Pandas DataFrames to S3 or LMDB utilising a fast C++ data-processing and compression engine.

ArcticDB allows you to:

  • Pandas in, Pandas out: Read and write Pandas DataFrames, NumPy arrays and native types to S3 and LMDB without leaving Python.
  • Built for time-series data: Efficiently index and query time-series data across billions of rows
  • Time travel: Travel back in time to see previous versions of your data and create customizable snapshots of the database
  • Schemaless Database: Append, update and modify data without being constrained by the existing schema
  • Optimised for streaming data: Built in support for efficient sparse data storage
  • Powerful processing: Filter, aggregate and create new columns on-the-fly with a Pandas-like syntax
  • C++ efficiency: Accelerate analytics though concurrency in the C++ data-processing engine

ArcticDB handles data that is big in both row count and column count, so a 20-year history of more than 400,000 unique securities can be stored in a single symbol. Each symbol is maintained as a separate entity with no shared data which means ArcticDB can scale horizontally across symbols, maximising the performance potential of your compute, storage and network.

ArcticDB is designed from the outset to be resilient; there is no single point of failure, and persistent data structures in the storage mean that once a version of a symbol has been written, it can never be corrupted by subsequent updates. Pulling compressed data directly from storage to the client means that there is no server to overload, so your data is always available when you need it.

Quickstart

Prebuilt binary availability

PyPI (Python 3.6 - 3.11) conda-forge (Python 3.8 - 3.11)
Linux βœ”οΈ βœ”οΈ
Windows Beta βž–
MacOS (Apple Silicon) βž– βœ”οΈ

Storage compatibility

PyPI conda-forge
S3 βœ”οΈ βœ”οΈ
LMDB βœ”οΈ βœ”οΈ
Azure Blob Storage βœ”οΈ βž–

Support for Azure Blob Storage in conda-forge is tracked in #519.

Installation

Install ArcticDB:

$ pip install arcticdb

or using conda-forge

$ conda install -c conda-forge arcticdb

Import ArcticDB:

>>> from arcticdb import Arctic

Create an instance on your S3 storage (with or without explicit credentials):

# Leave AWS to derive credential information
>>> ac = Arctic('s3://MY_ENDPOINT:MY_BUCKET?aws_auth=true')

# Manually specify creds
>>> ac = Arctic('s3://MY_ENDPOINT:MY_BUCKET?region=YOUR_REGION&access=ABCD&secret=DCBA')

Or create an instance on your local disk:

>>> ac = Arctic("lmdb:///<path>")

Create your first library and list the libraries in the instance:

>>> ac.create_library('travel_data')
>>> ac.list_libraries()

Create a test dataframe:

>>> import numpy as np
>>> import pandas as pd
>>> NUM_COLUMNS=10
>>> NUM_ROWS=100_000
>>> df = pd.DataFrame(np.random.randint(0,100,size=(NUM_ROWS, NUM_COLUMNS)), columns=[f"COL_{i}" for i in range(NUM_COLUMNS)], index=pd.date_range('2000', periods=NUM_ROWS, freq='h'))

Get the library, write some data to it, and read it back:

>>> lib = ac['travel_data']
>>> lib.write("my_data", df)
>>> data = lib.read("my_data")

To find out more about working with data, visit our docs


Documentation

The source code for the ArcticDB docs are located in the docs folder, and are hosted at docs.arcticdb.io.

License

ArcticDB is released under a Business Source License 1.1 (BSL)

BSL features are free to use and the source code is available, but users may not use ArcticDB for production use or for a Database Service, without agreement with Man Group Operations Limited.

Use of ArcticDB in production or for a Database Service requires a paid for license from Man Group Operations Limited and is licensed under the ArcticDB Software License Agreement. For more information please contact [email protected].

The BSL is not certified as an open-source license, but most of the Open Source Initiative (OSI) criteria are met. Please see version conversion dates in the below table:

ArcticDB Version License Converts to Apache 2.0
1.0 Business Source License 1.1 Mar 16, 2025
1.2 Business Source License 1.1 May 22, 2025
1.3 Business Source License 1.1 Jun 9, 2025
1.4 Business Source License 1.1 Jun 23, 2025
1.5 Business Source License 1.1 Jul 11, 2025
1.6 Business Source License 1.1 Jul 25, 2025
2.0 Business Source License 1.1 Aug 29, 2025
3.0 Business Source License 1.1 Aug 31, 2025

Code of Conduct

Code of Conduct

This project has adopted a Code of Conduct. If you have any concerns about the Code, or behaviour that you have experienced in the project, please contact us at [email protected].

Contributing/Building From Source

We welcome your contributions to help us improve and extend this project!

Please refer to the Contributing page and feel free to open issues on GitHub.

We are also always looking for feedback from our dedicated community! If you have used ArcticDB please let us know, we would love to hear about your experience!

Our release process is documented here.

Community

We would love to hear how your ArcticDB journey evolves, email us at [email protected] or come chat to us on Twitter!

Interested in learning more about ArcticDB? Head over to our blog!

Do you have any questions or issues? Chat to us and other users through our dedicated Slack Workspace - sign up for Slack access on our website.

More Repositories

1

dtale

Visualizer for pandas data structures
TypeScript
4,687
star
2

arctic

High performance datastore for time series and tick data
Python
3,053
star
3

notebooker

Productionise & schedule your Jupyter Notebooks as easily as you wrote them.
Python
856
star
4

pytest-plugins

A grab-bag of nifty pytest plugins
Python
562
star
5

PythonTrainingExercises

Code to exercise your Python knowledge.
Python
259
star
6

dapr-sidekick-dotnet

Dapr Sidekick for .NET - a lightweight lifetime management component for Dapr
C#
175
star
7

mdf

Data-flow programming toolkit for Python
Python
167
star
8

PyBloqs

A flexible framework for visualizing data and automated creation of reports.
Python
151
star
9

page-objects

Page Objects web testing pattern for Python
Python
126
star
10

pynorama

Natural Language Processing Visualization in Python
JavaScript
108
star
11

partialtesting

Run only the tests that are relevant for your changes
Python
73
star
12

jupyterlab-autoplot

Magical Plotting in JupyterLab
Python
65
star
13

okcli

An Oracle-DB command line client
Python
50
star
14

adaero

A platform for managing peer-to-peer feedback
Python
43
star
15

pkglib

Company-centric Python packaging and testing library
Python
39
star
16

openstack_load_leveller

Openstack Load Leveller / Load Balancer
Python
30
star
17

prometheus-flashblade-exporter

Export metrics from Pure Storage FlashBlade to Prometheus
Go
29
star
18

mockextras

Addon library for the python Mock library
Python
25
star
19

hubot-servicenow-tickets

a servicenow plugin for hubot
JavaScript
19
star
20

jenkins-blueprint-plugin

Build Jenkins projects according to a .jenkins.yml file in the repository.
Java
17
star
21

sparrow

C++20 idiomatic APIs for the Apache Arrow Columnar Format
C++
15
star
22

ftp-coredump

FTP core dump script and related Ansible roles
Shell
13
star
23

servicenow-lite

utility library for interacting with servicenow
JavaScript
11
star
24

hiveminder

Python
8
star
25

hexplode

Python
8
star
26

microbit

Information and examples about the BBC micro:bit
Python
5
star
27

pydata2022

3
star