• Stars
    star
    137
  • Rank 257,583 (Top 6 %)
  • Language
    Java
  • License
    MIT License
  • Created almost 9 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🚥 A dataset migration tool from MongoDB to Elasticsearch and vice versa.

Mongolastic

Build Status Codacy code quality Docker Pulls,link= mongolastic mongo.java.driver 3.4.2 brightgreen elastic.java.driver 6.2.4 brightgreen license MIT blue

Mongolastic enables you to migrate your datasets from a mongod node to an elasticsearch node and vice versa. Since mongo and elastic servers can run with different characteristics, the tool provides several optional and required features to ably connect them. Mongolastic works with a yaml or json configuration file to begin a migration process. It reads your demand on the file and start syncing data in the specified direction.

How it works

First, you can either pull the corresponding image of the app from Docker Hub

Supported tags and respective Dockerfile links:

or download the latest mongolastic.jar file.

Second, create a yaml or json file which must contain the following structure:

misc:
    dindex:
        name: <string>      (1)
        as: <string>        (2)
    ctype:
        name: <string>      (3)
        as: <string>        (4)
    direction: (em | me)    (5)
    batch: <number>         (6)
    dropDataset: <bool>     (7)
mongo:
    host: <ip-address>      (8)
    port: <number>          (9)
    query: "mongo-query"    (10)
    project: "projection"   (11)
    auth:                   (12)
        user: <string>
        pwd: "password"
        source: <db-name>
        mechanism: ( plain | scram-sha-1 | x509 | gssapi | cr )
elastic:
    host: <ip-address>     (13)
    port: <number>         (14)
    dateFormat: "<format>" (15)
    longToString: <bool>   (16)
    clusterName: <string>  (17)
    auth:                  (18)
        user: <string>
        pwd: "password"
  1. the database/index name to connect to.

  2. another database/index name in which documents will be located in the target service (Optional)

  3. the collection/type name to export.

  4. another collection/type name in which indexed/collected documents will reside in the target service (Optional)

  5. direction of the data transfer. the default direction is me (that is, mongo to elasticsearch). You can skip this option if your data move from mongo to es.

  6. Override the default batch size which is normally 200. (Optional)

  7. configures whether or not the target table should be dropped prior to loading data. Default value is true (Optional)

  8. the name of the host machine where the mongod is running.

  9. the port where the mongod instance is listening.

  10. data will be transferred based on a json mongodb query (Optional)

  11. with 1.4.1, you can manipulate documents that will be migrated from mongo to es based on the $project operator (Optional)

  12. as of v1.3.5, you can access an auth mongodb by giving auth configuration. (Optional)

  13. the name of the host machine where the elastic node is running.

  14. the transport port where the transport module will communicate with the running elastic node. E.g. 9300 for node-to-node communication.

  15. a custom formatter for Date fields rather than the default DateCodec (Optional)

  16. serialize long value as a string for backwards compatibility with other tools (Optional)

  17. connect to a spesific elastic cluster (Optional)

  18. as of v1.3.9, you can access an auth elastic search by giving auth configuration. (Optional)


Alternatively, a JSON file can be specified as a mongolastic configuration file including the same YAML file structure above.

{
	"misc": {
		"dindex": {
			"name": "twitter",
			"as": "media"
		},
		"ctype": {
			"name": "tweets",
			"as": "posts"
		},
		"direction": "me",
		"batch": 400,
		"dropDataset": true
	},
	"mongo": {
		"host": "127.0.0.1",
		"port": 27017,
		"query": "{ lang: 'en' }",
		"project": "{ user:1, name:'$user.name', location: { $substr: [ '$user.location', 10, 15 ] }}",
		"auth": {
			"user": "joe",
			"pwd": "1234",
			"source": "twitter",
			"mechanism": "scram-sha-1"
		}
	},
	"elastic": {
		"host": "127.0.0.1",
		"port": 9300,
		"dateFormat": "yyyy-MM-dd",
		"longToString": true,
		"auth": {
			"user": "joe",
			"pwd": "4321"
		}
	}
}

Example #1

The following files have the same configuration details:

yaml file
misc:
    dindex:
        name: twitter
        as: kodcu
    ctype:
        name: tweets
        as: posts
mongo:
    host: localhost
    port: 27017
    query: "{ 'user.name' : 'kodcu.com'}"
elastic:
    host: localhost
    port: 9300
json file
{
	"misc": {
		"dindex": {
			"name": "twitter",
			"as": "kodcu"
		},
		"ctype": {
			"name": "tweets",
			"as": "posts"
		}
	},
	"mongo": {
		"host": "localhost",
		"port": 27017,
		"query": "{ 'user.name' : 'kodcu.com'}"
	},
	"elastic": {
		"host": "localhost",
		"port": 9300
	}
}

the config says that the transfer direction is from mongodb to elasticsearch, mongolastic first looks at the tweets collection, where the user name is kodcu.com, of the twitter database located on a mongod server running on default host interface and port number. If It finds the corresponding data, It will start copying those into an elasticsearch environment running on default host and transport number. After all, you should see a type called "posts" in an index called "kodcu" in the current elastic node. Why the index and type are different is because "dindex.as" and "ctype.as" options were set, these indicates that your data being transferred exist in posts type of the kodcu index.

After downloading the jar or pulling the image and providing a conf file, you can either run the tool as:

$ java -jar mongolastic.jar -f config.file

or

$ docker run --rm -v $(PWD)/config.file:/config.file --net host ozlerhakan/mongolastic:<tag> config.file

Example #2

Using the project field, you are able to manipulate documents when migrating them from mongodb to elasticsearch. For more examples about the $project operator of the aggregation pipeline, take a look at its documentation.

misc:
    dindex:
        name: twitter
    ctype:
        name: tweets
mongo:
    host: 192.168.10.151
    port: 27017
    project: "{ user: 1, name: '$user.name', location: { $substr: [ '$user.location', 10, 15 ] }}" (1)
elastic:
    host: 192.168.10.152
    port: 9300
  1. the migrated documents will include the user field and contain new fields name and location.

Note
Every attempt of running the tool drops the mentioned db/index in the target environment unless the dropDataset parameter is configured otherwise.

License

Mongolastic is released under MIT.

More Repositories

1

mongodb-json-files

📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
Shell
647
star
2

poiji

🍬 A library converting XLS and XLSX files to a list of Java objects based on Apache POI
Java
442
star
3

datacamp

🍧 DataCamp data-science and machine learning courses
Jupyter Notebook
298
star
4

rapid

🐳 A lightweight Docker Developer Interface for Docker Remote API
Java
139
star
5

deep-learning-notes

🤖 Deep Learning notes and snippets
Jupyter Notebook
26
star
6

java9-module-examples

a list of Java 9 module samples to dive into the modular world
Java
25
star
7

categoric

🎨 [Chrome Extension] Categorizing your mixed GitHub notifications per repository.
JavaScript
12
star
8

aop-metrics

🔎 master thesis: a metrics suite including both aspect-oriented and object-oriented features using Ekeko
Clojure
8
star
9

toci

🗂 markdown tool to create table of content from jupyter notebooks
Python
8
star
10

bikeshare

Analysing bike data from three different cities in the US
Python
7
star
11

aspectj-ebook

📘 AspectJ E-Book (in Turkish)
AGS Script
7
star
12

solring

🔍 A tiny Solr import tool to save records of custom queries from Solr to local storage
Python
7
star
13

pelikan

🦩 A Python tool to create comment-free Jupyter notebooks.
Python
7
star
14

ab-test

A repo that analyzes an A/B test result of an e-commerce website.
Jupyter Notebook
6
star
15

datacamp-projects

A list of data-camp projects I have done
Jupyter Notebook
6
star
16

mongodb-query-examples

Query examples based on a tweets collection by means of MongoDB Java Driver and Spring MongoDB Project
Java
5
star
17

jest

💍 ipython magic tool to make smooth HTTP calls
Python
4
star
18

learn-git-cherry-pick

A repo showing you how to use the basic cherry-pick 🍒 command
4
star
19

keywords_clustering

cluster text data using sentence bert
Jupyter Notebook
4
star
20

registry

docker registry lab
Nginx
4
star
21

java9

Başlıca önemli Java 9 yeniliklerine ait yazılar
Java
4
star
22

markov-chain

A markov chain implementation written with Python
Python
3
star
23

cyber

Jupyter Notebook
3
star
24

jambda-antlr

A concrete example of how to use ANTLR in a Java project
Java
3
star
25

fsm-rocks

FSM Vakif University Workshop
Jupyter Notebook
3
star
26

gapa

a macOS utility to hide folders and files on the Desktop
Python
3
star
27

hello-openshift

Java
2
star
28

engoo

JavaScript
2
star
29

zone

📅 find out what the given time means in your local zone
Java
2
star
30

solr-chrome-extension

A tool for scroll-to-up and display-row-number features
JavaScript
2
star
31

swapur

🚣 Bir butondan daha fazlası, anlayamazsınız.
JavaScript
2
star
32

forest-fires-dataset

2
star
33

eda

Exploratory Data Analysis Samples
Jupyter Notebook
1
star
34

tri

Translate text from English to Turkish
Rust
1
star
35

soccer

Investigate a database
HTML
1
star
36

medium-history

"Reading history" is back to the main page of medium.com
JavaScript
1
star
37

solr-news

A CLI to fetch fresh news from SOLR
Rust
1
star
38

tureng-dictionary

English-Turkish Word Repository
C#
1
star
39

poiji-documentation

JavaScript
1
star
40

hljs-checker

A Chrome extension that allows you to select/deselect all the available languages located in the 'Other' section on
JavaScript
1
star