• This repository has been archived on 12/Sep/2018
  • Stars
    star
    1,650
  • Rank 28,331 (Top 0.6 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

UNMAINTAINED A persistent, relational store inspired by Datomic and DataScript.

UNMAINTAINED Project Mentat

Build Status

Project Mentat is no longer being developed or actively maintained by Mozilla. This repository will be marked read-only in the near future. You are, of course, welcome to fork the repository and use the existing code.

Project Mentat is a persistent, embedded knowledge base. It draws heavily on DataScript and Datomic.

Mentat is implemented in Rust.

The first version of Project Mentat, named Datomish, was written in ClojureScript, targeting both Node (on top of promise_sqlite) and Firefox (on top of Sqlite.jsm). It also worked in pure Clojure on the JVM on top of jdbc-sqlite. The name was changed to avoid confusion with Datomic.

The Rust implementation gives us a smaller compiled output, better performance, more type safety, better tooling, and easier deployment into Firefox and mobile platforms.

Documentation


Motivation

Mentat is intended to be a flexible relational (not key-value, not document-oriented) store that makes it easy to describe, grow, and reuse your domain schema.

By abstracting away the storage schema, and by exposing change listeners outside the database (not via triggers), we hope to make domain schemas stable, and allow both the data store itself and embedding applications to use better architectures, meeting performance goals in a way that allows future evolution.

Data storage is hard

We've observed that data storage is a particular area of difficulty for software development teams:

  • It's hard to define storage schemas well. A developer must:

    • Model their domain entities and relationships.
    • Encode that model efficiently and correctly using the features available in the database.
    • Plan for future extensions and performance tuning.

    In a SQL database, the same schema definition defines everything from high-level domain relationships through to numeric field sizes in the same smear of keywords. It's difficult for someone unfamiliar with the domain to determine from such a schema what's a domain fact and what's an implementation concession — are all part numbers always 16 characters long, or are we trying to save space? — or, indeed, whether a missing constraint is deliberate or a bug.

    The developer must think about foreign key constraints, compound uniqueness, and nullability. They must consider indexing, synchronizing, and stable identifiers. Most developers simply don't do enough work in SQL to get all of these things right. Storage thus becomes the specialty of a few individuals.

    Which one of these is correct?

    {:db/id          :person/email
     :db/valueType   :db.type/string
     :db/cardinality :db.cardinality/many     ; People can have multiple email addresses.
     :db/unique      :db.unique/identity      ; For our purposes, each email identifies one person.
     :db/index       true}                    ; We want fast lookups by email.
    {:db/id          :person/friend
     :db/valueType   :db.type/ref
     :db/cardinality :db.cardinality/many}    ; People can have many friends.
    CREATE TABLE people (
      id INTEGER PRIMARY KEY,  -- Bug: because of the primary key, each person can have no more than 1 email.
      email VARCHAR(64),       -- Bug?: no NOT NULL, so a person can have no email.
                               -- Bug: nobody will ever have a long email address, right?
    );
    CREATE TABLE friendships (
      FOREIGN KEY person REFERENCES people(id),  -- Bug?: no indexing, so lookups by friend or person will be slow.
      FOREIGN KEY friend REFERENCES people(id),  -- Bug: no compound uniqueness constraint, so we can have dupe friendships.
    );

    They both have limitations — the Mentat schema allows only for an open world (it's possible to declare friendships with people whose email isn't known), and requires validation code to enforce email string correctness — but we think that even such a tiny SQL example is harder to understand and obscures important domain decisions.

  • Queries are intimately tied to structural storage choices. That not only hides the declarative domain-level meaning of the query — it's hard to tell what a query is trying to do when it's a 100-line mess of subqueries and LEFT OUTER JOINs — but it also means a simple structural schema change requires auditing every query for correctness.

  • Developers often capture less event-shaped than they perhaps should, simply because their initial requirements don't warrant it. It's quite common to later want to know when a fact was recorded, or in which order two facts were recorded (particularly for migrations), or on which device an event took place… or even that a fact was ever recorded and then deleted.

  • Common queries are hard. Storing values only once, upserts, complicated joins, and group-wise maxima are all difficult for non-expert developers to get right.

  • It's hard to evolve storage schemas. Writing a robust SQL schema migration is hard, particularly if a bad migration has ever escaped into the wild! Teams learn to fear and avoid schema changes, and eventually they ship a table called metadata, with three TEXT columns, so they never have to write a migration again. That decision pushes storage complexity into application code. (Or they start storing unversioned JSON blobs in the database…)

  • It's hard to share storage with another component, let alone share data with another component. Conway's Law applies: your software system will often grow to have one database per team.

  • It's hard to build efficient storage and querying architectures. Materialized views require knowledge of triggers, or the implementation of bottleneck APIs. Ad hoc caches are often wrong, are almost never formally designed (do you want a write-back, write-through, or write-around cache? Do you know the difference?), and often aren't reusable. The average developer, faced with a SQL database, has little choice but to build a simple table that tries to meet every need.

Comparison to DataScript

DataScript asks the question: "What if creating a database were as cheap as creating a Hashmap?"

Mentat is not interested in that. Instead, it's strongly interested in persistence and performance, with very little interest in immutable databases/databases as values or throwaway use.

One might say that Mentat's question is: "What if an SQLite database could store arbitrary relations, for arbitrary consumers, without them having to coordinate an up-front storage-level schema?"

(Note that domain-level schemas are very valuable.)

Another possible question would be: "What if we could bake some of the concepts of CQRS and event sourcing into a persistent relational store, such that the transaction log itself were of value to queries?"

Some thought has been given to how databases as values — long-term references to a snapshot of the store at an instant in time — could work in this model. It's not impossible; it simply has different performance characteristics.

Just like DataScript, Mentat speaks Datalog for querying and takes additions and retractions as input to a transaction.

Unlike DataScript, Mentat exposes free-text indexing, thanks to SQLite.

Comparison to Datomic

Datomic is a server-side, enterprise-grade data storage system. Datomic has a beautiful conceptual model. It's intended to be backed by a storage cluster, in which it keeps index chunks forever. Index chunks are replicated to peers, allowing it to run queries at the edges. Writes are serialized through a transactor.

Many of these design decisions are inapplicable to deployed desktop software; indeed, the use of multiple JVM processes makes Datomic's use in a small desktop app, or a mobile device, prohibitive.

Mentat was designed for embedding, initially in an experimental Electron app (Tofino). It is less concerned with exposing consistent database states outside transaction boundaries, because that's less important here, and dropping some of these requirements allows us to leverage SQLite itself.

Comparison to SQLite

SQLite is a traditional SQL database in most respects: schemas conflate semantic, structural, and datatype concerns, as described above; the main interface with the database is human-first textual queries; sparse and graph-structured data are 'unnatural', if not always inefficient; experimenting with and evolving data models are error-prone and complicated activities; and so on.

Mentat aims to offer many of the advantages of SQLite — single-file use, embeddability, and good performance — while building a more relaxed, reusable, and expressive data model on top.


Contributing

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

See CONTRIBUTING.md for further notes.

This project is very new, so we'll probably revise these guidelines. Please comment on an issue before putting significant effort in if you'd like to contribute.


Building

You first need to clone the project. To build and test the project, we are using Cargo.

To build all of the crates in the project use:

cargo build

To run tests use:

# Run tests for everything.
cargo test --all

# Run tests for just the query-algebrizer folder (specify the crate, not the folder),
# printing debug output.
cargo test -p mentat_query_algebrizer -- --nocapture

For most cargo commands you can pass the -p argument to run the command just on that package. So, cargo build -p mentat_query_algebrizer will build just the "query-algebrizer" folder.

What are all of these crates?

We use multiple sub-crates for Mentat for four reasons:

  1. To improve incremental build times.
  2. To encourage encapsulation; writing extern crate feels worse than just use mod.
  3. To simplify the creation of targets that don't use certain features: e.g., a build with no syncing, or with no query system.
  4. To allow for reuse (e.g., the EDN parser is essentially a separate library).

So what are they?

Building blocks

edn

Our EDN parser. It uses rust-peg to parse EDN, which is Clojure/Datomic's richer alternative to JSON. edn's dependencies are all either for representing rich values (chrono, uuid, ordered-float) or for parsing (serde, peg).

In addition, this crate turns a stream of EDN values into a representation suitable to be transacted.

mentat_core

This is the lowest-level Mentat crate. It collects together the following things:

  • Fundamental domain-specific data structures like ValueType and TypedValue.
  • Fundamental SQL-related linkages like SQLValueType. These encode the mapping between Mentat's types and values and their representation in our SQLite format.
  • Conversion to and from EDN types (e.g., edn::Keyword to TypedValue::Keyword).
  • Common utilities (some in the util module, and others that should be moved there or broken out) like Either, InternSet, and RcCounter.
  • Reusable lazy namespaced keywords (e.g., DB_TYPE_DOUBLE) that are used by mentat_db and EDN serialization of core structs.

Types

mentat_query

This crate defines the structs and enums that are the output of the query parser and used by the translator and algebrizer. SrcVar, NonIntegerConstant, FnArg… these all live here.

mentat_query_sql

Similarly, this crate defines an abstract representation of a SQL query as understood by Mentat. This bridges between Mentat's types (e.g., TypedValue) and SQL concepts (ColumnOrExpression, GroupBy). It's produced by the algebrizer and consumed by the translator.

Query processing

mentat_query_algebrizer

This is the biggest piece of the query engine. It takes a parsed query, which at this point is independent of a database, and combines it with the current state of the schema and data. This involves translating keywords into attributes, abstract values into concrete values with a known type, and producing an AlgebraicQuery, which is a representation of how a query's Datalog semantics can be satisfied as SQL table joins and constraints over Mentat's SQL schema. An algebrized query is tightly coupled with both the disk schema and the vocabulary present in the store when the work is done.

mentat_query_projector

A Datalog query projects some of the variables in the query into data structures in the output. This crate takes an algebrized query and a projection list and figures out how to get values out of the running SQL query and into the right format for the consumer.

mentat_query_translator

This crate works with all of the above to turn the output of the algebrizer and projector into the data structures defined in mentat_query_sql.

mentat_sql

This simple crate turns those data structures into SQL text and bindings that can later be executed by rusqlite.

The data layer: mentat_db

This is a big one: it implements the core storage logic on top of SQLite. This crate is responsible for bootstrapping new databases, transacting new data, maintaining the attribute cache, and building and updating in-memory representations of the storage schema.

The main crate

The top-level main crate of Mentat assembles these component crates into something useful. It wraps up a connection to a database file and the associated metadata into a Store, and encapsulates an in-progress transaction (InProgress). It provides modules for programmatically writing (entity_builder.rs) and managing vocabulary (vocabulary.rs).

Syncing

Sync code lives, for referential reasons, in a crate named tolstoy. This code is a work in progress; current state is a proof-of-concept implementation which largely relies on the internal transactor to make progress in most cases and comes with a basic support for timelines. See Tolstoy's documentation for details.

The command-line interface

This is under tools/cli. It's essentially an external consumer of the main mentat crate. This code is ugly, but it mostly works.


SQLite dependencies

Mentat uses partial indices, which are available in SQLite 3.8.0 and higher. It relies on correlation between aggregate and non-aggregate columns in the output, which was added in SQLite 3.7.11.

It also uses FTS4, which is a compile time option.

By default, Mentat specifies the "bundled" feature for rusqlite, which uses a relatively recent version of SQLite. If you want to link against the system version of SQLite, omit "bundled_sqlite3" from Mentat's features.

[dependencies.mentat]
version = "0.6"
# System sqlite is known to be new.
default-features = false

License

Project Mentat is currently licensed under the Apache License v2.0. See the LICENSE file for details.

More Repositories

1

pdf.js

PDF Reader in JavaScript
JavaScript
43,965
star
2

DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
C++
25,096
star
3

send

Simple, private file sharing from the makers of Firefox
FreeMarker
13,234
star
4

sops

Simple and flexible tool for managing secrets
Go
12,778
star
5

BrowserQuest

A HTML5/JavaScript multiplayer game experiment
JavaScript
9,167
star
6

nunjucks

A powerful templating engine with inheritance, asynchronous control, and more (jinja2 inspired)
JavaScript
8,570
star
7

geckodriver

WebDriver for Firefox
7,166
star
8

TTS

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Jupyter Notebook
6,749
star
9

readability

A standalone version of the readability lib
JavaScript
6,470
star
10

sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Rust
5,763
star
11

mozjpeg

Improved JPEG encoder.
C
5,216
star
12

Fira

Mozilla's new typeface, used in Firefox OS
CSS
4,920
star
13

rhino

Rhino is an open-source implementation of JavaScript written entirely in Java
JavaScript
4,138
star
14

shumway

Shumway is a Flash VM and runtime written in JavaScript
TypeScript
3,692
star
15

source-map

Consume and generate source maps.
JavaScript
3,556
star
16

gecko-dev

Read-only Git mirror of the Mercurial gecko repositories at https://hg.mozilla.org. How to contribute: https://firefox-source-docs.mozilla.org/contributing/contribution_quickref.html
2,897
star
17

multi-account-containers

Firefox Multi-Account Containers lets you keep parts of your online life separated into color-coded tabs that preserve your privacy. Cookies are separated by container, allowing you to use the web with multiple identities or accounts simultaneously.
JavaScript
2,718
star
18

web-ext

A command line tool to help build, run, and test web extensions
JavaScript
2,695
star
19

bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
Python
2,651
star
20

node-convict

Featureful configuration management library for Node.js
JavaScript
2,334
star
21

cbindgen

A project for generating C bindings from Rust code
Rust
2,314
star
22

MozDef

DEPRECATED - MozDef: Mozilla Enterprise Defense Platform
Python
2,166
star
23

popcorn-js

The HTML5 Media Framework. (Unmaintained. See https://github.com/menismu/popcorn-js for activity)
JavaScript
2,148
star
24

webextension-polyfill

A lightweight polyfill library for Promise-based WebExtension APIs in Chrome
JavaScript
2,088
star
25

fathom

A framework for extracting meaning from web pages
JavaScript
1,969
star
26

cipherscan

A very simple way to find out which SSL ciphersuites are supported by a target.
Python
1,912
star
27

hawk

HTTP Holder-Of-Key Authentication Scheme
JavaScript
1,903
star
28

neqo

Neqo, the Mozilla Firefox implementation of QUIC in Rust
Rust
1,828
star
29

persona

Persona is a secure, distributed, and easy to use identification system.
JavaScript
1,828
star
30

http-observatory

Mozilla HTTP Observatory
Python
1,784
star
31

uniffi-rs

a multi-language bindings generator for rust
Rust
1,783
star
32

task.js

Beautiful concurrency for JavaScript
JavaScript
1,635
star
33

hubs

Duck-themed multi-user virtual spaces in WebVR. Built with A-Frame.
JavaScript
1,561
star
34

fx-private-relay

Keep your email safe from hackers and trackers. Make an email alias with 1 click, and keep your address to yourself.
Python
1,473
star
35

pontoon

Mozilla's Localization Platform
Python
1,463
star
36

thimble.mozilla.org

UPDATE: This project is no longer maintained. Please check out Glitch.com instead.
JavaScript
1,423
star
37

kitsune

Platform for Mozilla Support
Python
1,289
star
38

mig

Distributed & real time digital forensics at the speed of the cloud
Go
1,195
star
39

grcov

Rust tool to collect and aggregate code coverage data for multiple source files
Rust
1,184
star
40

bedrock

Making mozilla.org awesome, one pebble at a time
HTML
1,176
star
41

OpenWPM

A web privacy measurement framework
Python
1,150
star
42

policy-templates

Policy Templates for Firefox
1,138
star
43

server-side-tls

Server side TLS Tools
HTML
1,114
star
44

rust-android-gradle

Kotlin
989
star
45

contain-facebook

Facebook Container isolates your Facebook activity from the rest of your web activity in order to prevent Facebook from tracking you outside of the Facebook website via third party cookies.
JavaScript
975
star
46

pdfjs-dist

Generic build of PDF.js library.
JavaScript
952
star
47

narcissus

INACTIVE - http://mzl.la/ghe-archive - The Narcissus meta-circular JavaScript interpreter
JavaScript
901
star
48

openbadges-backpack

Mozilla Open Badges Backpack
JavaScript
861
star
49

addons-server

🕶 addons.mozilla.org Django app and API 🎉
Python
833
star
50

awsbox

INACTIVE - http://mzl.la/ghe-archive - A featherweight PaaS on top of Amazon EC2 for deploying node apps
JavaScript
811
star
51

dxr

DEPRECATED - Powerful search for large codebases
Python
804
star
52

ssh_scan

DEPRECATED - A prototype SSH configuration and policy scanner (Blog: https://mozilla.github.io/ssh_scan/)
Ruby
793
star
53

chromeless

DEPRECATED - Build desktop applications with web technologies.
JavaScript
761
star
54

node-client-sessions

secure sessions stored in cookies
JavaScript
745
star
55

blurts-server

Mozilla Monitor arms you with tools to keep your personal information safe. Find out what hackers already know about you and learn how to stay a step ahead of them.
Fluent
727
star
56

playdoh

PROJECT DEPRECATED (WAS: "Mozilla's Web application base template. Half Django, half awesomeness, half not good at math.")
Python
714
star
57

DeepSpeech-examples

Examples of how to use or integrate DeepSpeech
Python
682
star
58

cargo-vet

supply-chain security for Rust
Rust
665
star
59

tofino

Project Tofino is a browser interaction experiment.
HTML
655
star
60

addon-sdk

DEPRECATED - The Add-on SDK repository.
641
star
61

standards-positions

Python
639
star
62

MozStumbler

Android Stumbler for Mozilla
Java
621
star
63

application-services

Firefox Application Services
Rust
608
star
64

fxa

Monorepo for Mozilla Accounts (formerly Firefox Accounts)
TypeScript
593
star
65

lightbeam

Orignal unmaintained version of the Lightbeam extension. See lightbeam-we for the new one which works in modern versions of Firefox.
JavaScript
587
star
66

firefox-translations

Firefox Translations is a webextension that enables client side translations for web browsers.
JavaScript
579
star
67

moz-sql-parser

DEPRECATED - Let's make a SQL parser so we can provide a familiar interface to non-sql datastores!
Python
574
star
68

spidernode

Node.js on top of SpiderMonkey
JavaScript
560
star
69

ichnaea

Mozilla Ichnaea
Python
559
star
70

inclusion

Our repository for Diversity, Equity and Inclusion work at Mozilla
557
star
71

positron

a experimental, Electron-compatible runtime on top of Gecko
551
star
72

addons-frontend

Front-end to complement mozilla/addons-server
JavaScript
525
star
73

nixpkgs-mozilla

Mozilla overlay for Nixpkgs.
Nix
522
star
74

tls-observatory

An observatory for TLS configurations, X509 certificates, and more.
Go
518
star
75

bugbug

Platform for Machine Learning projects on Software Engineering
Python
503
star
76

neo

INACTIVE - http://mzl.la/ghe-archive - DEPRECATED: See https://neutrino.js.org for alternative
JavaScript
503
star
77

notes

DEPRECATED - A notepad for Firefox
HTML
495
star
78

django-csp

Content Security Policy for Django.
Python
486
star
79

skywriter

Mozilla Skywriter
JavaScript
481
star
80

Spoke

Easily create custom 3D environments
JavaScript
480
star
81

zamboni

Backend for the Firefox Marketplace
Python
474
star
82

vtt.js

A JavaScript implementation of the WebVTT specification
JavaScript
461
star
83

FirefoxColor

Theming demo for Firefox Quantum and beyond
JavaScript
460
star
84

mozilla-django-oidc

A django OpenID Connect library
Python
448
star
85

libdweb

Extension containing an experimental libdweb APIs
JavaScript
441
star
86

pointer.js

INACTIVE - http://mzl.la/ghe-archive - INACTIVE - http://mzl.la/ghe-archive - Normalizes mouse/touch events into 'pointer' events.
JavaScript
435
star
87

agithub

Agnostic Github client API -- An EDSL for connecting to REST servers
Python
419
star
88

cubeb

Cross platform audio library
C++
411
star
89

fxa-auth-server

DEPRECATED - Migrated to https://github.com/mozilla/fxa
JavaScript
401
star
90

zilla-slab

Mozilla's Zilla Slab Type Family
Shell
398
star
91

r2d2b2g

Firefox OS Simulator is a test environment for Firefox OS. Use it to test your apps in a Firefox OS-like environment that looks and feels like a mobile phone.
JavaScript
391
star
92

masche

Deprecated - MIG Memory Forensic library
Go
387
star
93

qbrt

CLI to a Gecko desktop app runtime
JavaScript
386
star
94

mp4parse-rust

Parser for ISO Base Media Format aka video/mp4 written in Rust.
Rust
380
star
95

valence

INACTIVE - http://mzl.la/ghe-archive - Firefox Developer Tools protocol adapters (Unmaintained)
JavaScript
377
star
96

OpenDesign

Mozilla Open Design aims to bring open source principles to Creative Design. Find us on Matrix: chat.mozilla.org/#/room/#opendesign:mozilla.org
370
star
97

ssl-config-generator

Mozilla SSL Configuration Generator
Handlebars
366
star
98

reflex

Functional reactive UI library
JavaScript
364
star
99

mortar

INACTIVE - http://mzl.la/ghe-archive - A collection of web app templates
364
star
100

minion

Minion
354
star