• Stars
    star
    647
  • Rank 67,174 (Top 2 %)
  • Language
    Shell
  • Created about 9 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB

MongoDB JSON Data

A dedicated repository that collects collections to practice/use in MongoDB.

List of small datasets

Name Size Data type How to import

Tweets

610 Ko

zip → dump folder

mongorestore

Zips

3.1 Mo

JSON

mongoimport

Palbum

731 Ko

zip → JSON files

mongoimport

Grades

92 Ko

JSON

mongoimport

Students

35 Ko

JSON

mongoimport

Profiles

454 Ko

JSON

mongoimport

Products

2.8 Ko

JSON

mongoimport

Countries small

329 Ko

JSON

mongoimport

Countries big

2.3 Mo

JSON

mongoimport

Restaurants

666 Ko

JSON

mongoimport

Covers

470 Ko

JSON

mongoimport

Books

525 Ko

JSON

mongoimport

List of bigger datasets

Name Size Data type How to import

People

21 Mo

zip → dump gzip

mongorestore --gzip

City inspections

24 Mo

JSON

mongoimport

Companies

75 Mo

JSON

mongoimport

Stocks

85 Mo

zip → dump folder

mongorestore

Trades

232 Mo

JSON

mongoimport

Enron

55 Mo

RAR (named .zip for confusion) → dump folder

mongorestore

List of other dataset

Name Size Data type

Enron

423 Mo

Email server tarball (slow DL server)

Import in MongoDB

Use the import.sh script provided to insert the "small" and the "bigger" datasets. You can see the help and the options with import.sh --help.

Current features:

  • Docker support: starts a MongoDB automatically in Docker for you.

  • Only insert the smallest dataset for a quick data import with --small (cool for live demos).

Requirements:

  • Docker if you use the docker option.

  • MongoDB (mongoimport, mongorestore)

  • unzip

  • unrar (for the Enron dataset)

Contributing

Feel free to make a pull request to add your collection files into the list.

License

http://creativecommons.org/publicdomain/zero/1.0/

More Repositories

1

poiji

🍬 A library converting XLS and XLSX files to a list of Java objects based on Apache POI
Java
442
star
2

datacamp

🍧 DataCamp data-science and machine learning courses
Jupyter Notebook
298
star
3

rapid

🐳 A lightweight Docker Developer Interface for Docker Remote API
Java
139
star
4

mongolastic

🚥 A dataset migration tool from MongoDB to Elasticsearch and vice versa.
Java
137
star
5

deep-learning-notes

🤖 Deep Learning notes and snippets
Jupyter Notebook
26
star
6

java9-module-examples

a list of Java 9 module samples to dive into the modular world
Java
25
star
7

categoric

🎨 [Chrome Extension] Categorizing your mixed GitHub notifications per repository.
JavaScript
12
star
8

aop-metrics

🔎 master thesis: a metrics suite including both aspect-oriented and object-oriented features using Ekeko
Clojure
8
star
9

toci

🗂 markdown tool to create table of content from jupyter notebooks
Python
8
star
10

bikeshare

Analysing bike data from three different cities in the US
Python
7
star
11

aspectj-ebook

📘 AspectJ E-Book (in Turkish)
AGS Script
7
star
12

solring

🔍 A tiny Solr import tool to save records of custom queries from Solr to local storage
Python
7
star
13

pelikan

🦩 A Python tool to create comment-free Jupyter notebooks.
Python
7
star
14

ab-test

A repo that analyzes an A/B test result of an e-commerce website.
Jupyter Notebook
6
star
15

datacamp-projects

A list of data-camp projects I have done
Jupyter Notebook
6
star
16

mongodb-query-examples

Query examples based on a tweets collection by means of MongoDB Java Driver and Spring MongoDB Project
Java
5
star
17

jest

💍 ipython magic tool to make smooth HTTP calls
Python
4
star
18

learn-git-cherry-pick

A repo showing you how to use the basic cherry-pick 🍒 command
4
star
19

keywords_clustering

cluster text data using sentence bert
Jupyter Notebook
4
star
20

registry

docker registry lab
Nginx
4
star
21

java9

Başlıca önemli Java 9 yeniliklerine ait yazılar
Java
4
star
22

markov-chain

A markov chain implementation written with Python
Python
3
star
23

cyber

Jupyter Notebook
3
star
24

jambda-antlr

A concrete example of how to use ANTLR in a Java project
Java
3
star
25

fsm-rocks

FSM Vakif University Workshop
Jupyter Notebook
3
star
26

gapa

a macOS utility to hide folders and files on the Desktop
Python
3
star
27

hello-openshift

Java
2
star
28

engoo

JavaScript
2
star
29

zone

📅 find out what the given time means in your local zone
Java
2
star
30

solr-chrome-extension

A tool for scroll-to-up and display-row-number features
JavaScript
2
star
31

swapur

🚣 Bir butondan daha fazlası, anlayamazsınız.
JavaScript
2
star
32

forest-fires-dataset

2
star
33

eda

Exploratory Data Analysis Samples
Jupyter Notebook
1
star
34

tri

Translate text from English to Turkish
Rust
1
star
35

soccer

Investigate a database
HTML
1
star
36

medium-history

"Reading history" is back to the main page of medium.com
JavaScript
1
star
37

solr-news

A CLI to fetch fresh news from SOLR
Rust
1
star
38

tureng-dictionary

English-Turkish Word Repository
C#
1
star
39

poiji-documentation

JavaScript
1
star
40

hljs-checker

A Chrome extension that allows you to select/deselect all the available languages located in the 'Other' section on
JavaScript
1
star