• This repository has been archived on 22/Dec/2020
  • Stars
    star
    1,629
  • Rank 28,719 (Top 0.6 %)
  • Language
    Ruby
  • License
    MIT License
  • Created almost 12 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MongoDB → PostgreSQL streaming replication

MoSQL: a MongoDB → SQL streaming translator

MoSQL is no longer being actively maintained. If you are interested in helping maintain this repository, please let us know. We would love for it to find a forever home with someone who can give it the love it needs!

At Stripe, we love MongoDB. We love the flexibility it gives us in changing data schemas as we grow and learn, and we love its operational properties. We love replsets. We love the uniform query language that doesn't require generating and parsing strings, tracking placeholder parameters, or any of that nonsense.

The thing is, we also love SQL. We love the ease of doing ad-hoc data analysis over small-to-mid-size datasets in SQL. We love doing JOINs to pull together reports summarizing properties across multiple datasets. We love the fact that virtually every employee we hire already knows SQL and is comfortable using it to ask and answer questions about data.

So, we thought, why can't we have the best of both worlds? Thus: MoSQL.

MoSQL: Put Mo' SQL in your NoSQL

MoSQL

MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL.

Installation

Install from Rubygems as:

$ gem install mosql

Or build from source by:

$ gem build mosql.gemspec

And then install the built gem.

The Collection Map file

In order to define a SQL schema and import your data, MoSQL needs a collection map file describing the schema of your MongoDB data. (Don't worry -- MoSQL can handle it if your mongo data doesn't always exactly fit the stated schema. More on that later).

The collection map is a YAML file describing the databases and collections in Mongo that you want to import, in terms of their SQL types. An example collection map might be:

mongodb:
  blog_posts:
    :columns:
    - id:
      :source: _id
      :type: TEXT
    - author_name:
      :source: author.name
      :type: TEXT
    - author_bio:
      :source: author.bio
      :type: TEXT
    - title: TEXT
    - created: DOUBLE PRECISION
    :meta:
      :table: blog_posts
      :extra_props: true

Said another way, the collection map is a YAML file containing a hash mapping

<Mongo DB name> -> { <Mongo Collection Name> -> <Collection Definition> }

Where a <Collection Definition> is a hash with :columns and :meta fields.

:columns is a list of hashes mapping SQL column names to an hash describing that column. This hash may contain the following fields:

  • :source: The name of the attribute inside of MongoDB.
  • :type: (Mandatory) The SQL type.

Use of the :source attribute allows for renaming attributes, and extracting elements of a nested hash using MongoDB's dot notation. In the above example, the name and bio fields of the author sub-document will be expanded, and the MongoDB _id field will be mapped to an SQL id column.

At present, MoSQL does not support using the dot notation to access elements inside arrays.

As a shorthand, you can specify a one-element hash of the form name: TYPE, in which case name will be used for both the source attribute and the name of the destination column. You can see this shorthand for the title and created attributes, above.

Every defined collection must include a mapping for the _id attribute.

:meta contains metadata about this collection/table. It is required to include at least :table, naming the SQL table this collection will be mapped to. extra_props determines the handling of unknown fields in MongoDB objects -- more about that later.

By default, mosql looks for a collection map in a file named collections.yml in your current working directory, but you can specify a different one with -c or --collections.

Usage

Once you have a collection map. MoSQL usage is easy. The basic form is:

mosql [-c collections.yml] [--sql postgres://sql-server/sql-db] [--mongo mongodb://mongo-uri]

By default, mosql connects to both PostgreSQL and MongoDB instances running on default ports on localhost without authentication. You can point it at different targets using the --sql and --mongo command-line parameters.

mosql will:

  1. Create the appropriate SQL tables
  2. Import data from the Mongo database
  3. Start tailing the mongo oplog, propagating changes from MongoDB to SQL.

After the first run, mosql will store the status of the optailer in the mongo_sql table in your SQL database, and automatically resume where it left off. mosql uses the replset name to keep track of which mongo database it's tailing, so that you can tail multiple databases into the same SQL database. If you want to tail the same replSet, or multiple replSets with the same name, for some reason, you can use the --service flag to change the name mosql uses to track state.

You likely want to run mosql against a secondary node, at least for the initial import, which will cause large amounts of disk activity on the target node. One option is to specify this in your connect URI:

mosql --mongo mongodb://node1,node2,node3?readPreference=secondary

(Note that this requires you be using at least version 1.8.3 of mongo-ruby-driver)

Advanced usage

For advanced scenarios, you can pass options to control mosql's behavior. If you pass --skip-tail, mosql will do the initial import, but not tail the oplog. This could be used, for example, to do an import off of a backup snapshot, and then start the tailer on the live cluster. This can also be useful for hosted services where you do not have access to the oplog.

If you need to force a fresh reimport, run --reimport, which will cause mosql to drop tables, create them anew, and do another import.

Normaly, MoSQL will scan through a list of the databases on the mongo server you connect to. You avoid this behavior by specifiying a specific mongo db to connect to with the --only-db [dbname] option. This is useful for hosted services which do not let you list all databases (via the listDatabases command).

Schema mismatches and _extra_props

If MoSQL encounters values in the MongoDB database that don't fit within the stated schema (e.g. a floating-point value in a INTEGER field), it will log a warning, ignore the entire object, and continue.

If it encounters a MongoDB object with fields not listed in the collection map, it will discard the extra fields, unless :extra_props is set in the :meta hash. If it is, it will collect any missing fields, JSON-encode them in a hash, and store the resulting text in _extra_props in SQL. You can set :extra_props to use JSON, JSONB, or TEXT.

As of PostgreSQL 9.3, you can declare columns as type "JSON" and use the native JSON support to inspect inside of JSON-encoded types. In earlier versions, you can write code in an extension language, such as plv8.

Non-scalar types

MoSQL supports array types, using the INTEGER ARRAY array type syntax. This will cause MoSQL to create the column as an array type in PostgreSQL, and insert rows appropriately-formatted.

Fields with hash values, or array values that are not in an ARRAY-typed column, will be transformed into JSON TEXT strings before being inserted into PostgreSQL.

Authentication

At present, in order to use MoSQL with a MongoDB instance requiring authentication, you must:

  • Have a user with access to the admin database.
  • Specify the admin database in the --mongo argument
  • Specify the username and password in the --mongo argument

e.g.

mosql --mongo mongodb://$USER:$PASSWORD@$HOST/admin

In order to use MongoDB 2.4's "roles" support (which is different from that in 2.6), you need to create the user in the admin database, give it explicit read access to the databases you want to copy and to the local database, and specify authSource in the URL. eg, connect to mydb/admin with the mongo shell and run:

> db.addUser({user: "replicator", pwd: "PASSWORD", roles: [], otherDBRoles: {local: ["read"], sourceDb: ["read"]}})

(Note that roles: [] ensures that this user has no special access to the admin database.) Now specify:

mosql --mongo mongodb://$USER:$PASSWORD@$HOST/sourceDb?authSource=admin

I have not yet tested using MoSQL with 2.6's rewritten "roles" support. Drop me a note if you figure out anything I should know.

Sharded clusters

MoSQL does not have special support for sharded Mongo clusters at this time. It should be possible to run a separate MoSQL instance against each of the individual backend shard replica sets, streaming into separate PostgreSQL instances, but we have not actually tested this yet.

Development

Patches and contributions are welcome! Please fork the project and open a pull request on github, or just report issues.

MoSQL includes a small but hopefully-growing test suite. It assumes a running PostgreSQL and MongoDB instance on the local host. To run the test suite, first install all of MoSQL's dependencies:

bundle install

Then, run the tests:

rake test

You can also point the suite at a different target via environment variables; See test/functional/_lib.rb for more information.

More Repositories

1

jquery.payment

[DEPRECATED] A general purpose library for building credit card forms, validating inputs and formatting numbers.
CoffeeScript
3,538
star
2

react-stripe-elements

Moved to stripe/react-stripe-js.
JavaScript
3,026
star
3

stripe-payments-demo

Sample store accepting universal payments on the web with Stripe Elements, Payment Request, Apple Pay, Google Pay, Microsoft Pay, and the PaymentIntents API. 💳🌍✨
JavaScript
1,471
star
4

shop

Single-page shop
CSS
1,126
star
5

flow-to-typescript-codemod

Codemod Stripe used to migrate 6.5m+ lines of code from Flow to TypeScript
TypeScript
675
star
6

safesql

Static analysis tool for Golang that protects against SQL injections
Go
563
star
7

PaymentKit

Easily accept payments on iOS
Objective-C
470
star
8

brushfire

Distributed decision tree ensemble learning in Scala
Scala
391
star
9

stripe-webhook-monitor

Stripe Webhook Monitor provides a real-time feed and graph of Stripe events received via webhooks. 📈✨
JavaScript
366
star
10

accept-a-card-payment

Learn how to accept a basic card payment on web, iOS, Android
Java
351
star
11

jquery.mobilePhoneNumber

[DEPRECATED] A general purpose library for validating and formatting mobile phone numbers.
CoffeeScript
331
star
12

nextjs-typescript-react-stripe-js

Full-stack TypeScript example using Next.js, react-stripe-js, and stripe-node.
TypeScript
329
star
13

topmodel

Standard evaluations for binary classifiers so you don't have to
Python
316
star
14

gaps

Easy management of your Google Groups subscriptions.
Ruby
284
star
15

developer-office-hours

A collection of Stripe Developer Office Hours demos 🎬
Ruby
245
star
16

ApplePayStubs

Test your Apple Pay integration without Apple Pay
Objective-C
193
star
17

timberlake

Timberlake is a Job Tracker for Hadoop.
Go
177
star
18

wilde-things

A tutorial integrating Stripe in PHP
PHP
175
star
19

sequins

A key/value store for serving static batch data
Go
174
star
20

checkout-subscription-and-add-on

Uses Stripe Checkout to create a payment page that starts a subscription for a new customer.
CSS
162
star
21

mongoriver

A library for writing MongoDB oplog tailers.
Ruby
153
star
22

stripe-demo-connect-kavholm-marketplace

Demo app for Global Marketplace using Stripe Connect
JavaScript
139
star
23

herringbone

Tools for working with parquet, impala, and hive
Thrift
135
star
24

pd2pg

Import PagerDuty data into Postgres for analysis
Ruby
110
star
25

payment-form-modal

How to implement Stripe Elements within a modal dialog.
JavaScript
106
star
26

datadog-checks

Checks for the Datadog Agent that Stripe finds useful.
Python
99
star
27

set-up-subscriptions

Getting started with Stripe Elements and Stripe Billing to charge a customer for a monthly subscription.
CSS
96
star
28

macgyver

A Chrome extension which duct tapes an SSH agent to the platformKey API
Go
90
star
29

react-elements-card-payment

Learn how to build a checkout form with React
CSS
87
star
30

chalk-log

Chalk::Log adds a logger object to any class, which can be used for unstructured or semi-structured logging.
Ruby
72
star
31

agate

Scoring ONNX models on the JVM in scala
Scala
68
star
32

sbt-bazel

Easily convert SBT projects to Bazel workspaces
Scala
54
star
33

charging-for-multiple-plan-subscriptions

Getting started with Stripe Elements and Stripe Billing to charge a customer for a monthly subscription with multiple items.
JavaScript
54
star
34

checkout-remember-me-with-twilio-verify

Use Stripe Checkout to collect payment details for future payments and Twilio Verify to authenticate the customer via SMS code and charge their stored card.
JavaScript
50
star
35

firebase-mobile-payments

Firebase Cloud Functions to create payments in native Android and iOS applications.
Kotlin
49
star
36

identity-verification

Securely collect and verify identity documents
JavaScript
44
star
37

falconer

High throughout, unsampled tracing span buffer with streaming search
Go
40
star
38

web-elements-sepa-debit-payment

Collect SEPA Debit mandates and payments.
Objective-C
37
star
39

payment-tag

CoffeeScript
34
star
40

stripe-stdlib-demo

Sample store accepting universal payments built with @Stripe and @StdLib.
JavaScript
33
star
41

chalk-config

Maps on-disk config files into a loaded global configatron instance, taking into account your current environment.
Ruby
28
star
42

go-einhorn

Talk to einhorn from your Go worker
Go
25
star
43

sample-terminal-ios-app

Learn how to take in-person payments with a physical reader and Terminal in your iOS app
Swift
19
star
44

adding-sales-tax

Learn how to use PaymentIntents to build a simple checkout flow
CSS
18
star
45

javascript-style

Javascript linter with rules for Stripe projects
JavaScript
16
star
46

scrooge-shapes

Shapeless generic instances for Scrooge types
Scala
14
star
47

datadog-cli-tools

CLI tools we find useful for Datadog
Ruby
13
star
48

submigrate

Combine multiple subscriptions into a single subscription with multiple items
Go
12
star
49

web-elements-fpx-payment

Accept Malaysian online bank transfers with the Stripe FPX Element.
JavaScript
12
star
50

siv-go

A pure Go implementation of the SIV AEAD.
Go
11
star
51

au-becs-debit-payment

Collecting AU BECS Direct Debit mandates and payments.
Java
10
star
52

oxxo-payment

Learn how to accept OXXO and card payments
JavaScript
10
star
53

round-up-and-donate

Build a round up and donate feature with Connect
CSS
10
star
54

random

A collection of random utilities
Shell
9
star
55

web-elements-card-payment

Learn how to accept a basic card payment on the web
JavaScript
7
star
56

grabpay-payment

Accept GrabPay Payments with Stripe, a popular digital wallet in Southeast Asia.
CSS
5
star
57

yard-sorbet

Types are documentation
Ruby
5
star
58

terraform-provider-confidant

A terraform provider for confidant. See https://github.com/terraform-providers
Go
5
star
59

simple-powershell-dsc

Simple Powershell DSC pull server in Go
Go
4
star
60

stripe-magento1-releases

4
star
61

pb

Lint protocol buffers
Go
2
star
62

mobile-elements-card-payment

Learn how to accept a basic card payment on iOS & Android
Java
2
star
63

bazel-bloop-exporter

This proof of concept exports a bazel project to bloop. The motivation is to allow the use of any tooling that already has a bloop integration, such as the metals language server.
Starlark
2
star
64

sentry-restricted-github

Python
2
star
65

time-utils

Ruby
1
star