• Stars
    star
    1,705
  • Rank 26,281 (Top 0.6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Embulk: Pluggable Bulk Data Loader.

What's Embulk?

Embulk is a parallel bulk data loader that helps data transfer between various storages, databases, NoSQL and cloud services.

Embulk supports plugins to add functions. You can share the plugins to keep your custom scripts readable, maintainable, and reusable.

Embulk Embulk, an open-source plugin-based parallel bulk data loader at Slideshare

Document

Embulk documents: https://www.embulk.org/

Using plugins

You can use plugins to load data from/to various systems and file formats. Here is the list of publicly released plugins: list of plugins by category.

An example is embulk-output-command plugin. It executes an external command to output the records.

To install plugins, you can use embulk gem install <name> command:

embulk gem install embulk-output-command
embulk gem list

Embulk bundles some built-in plugins such as embulk-encoder-gzip or embulk-formatter-csv. You can use those plugins with following configuration file:

in:
  type: file
  path_prefix: "./try1/csv/sample_"
  ...
out:
  type: command
  command: "cat - > task.$INDEX.$SEQID.csv.gz"
  encoders:
    - {type: gzip}
  formatter:
    type: csv

Resuming a failed transaction

Embulk supports resuming failed transactions. To enable resuming, you need to start transaction with -r PATH option:

embulk run config.yml -r resume-state.yml

If the transaction fails, embulk stores state some states to the yaml file. You can retry the transaction using exactly same command:

embulk run config.yml -r resume-state.yml

If you give up on resuming the transaction, you can use embulk cleanup subcommand to delete intermediate data:

embulk cleanup config.yml -r resume-state.yml

Using plugin bundle

embulk mkbundle subcommand creates a isolated bundle of plugins. You can install plugins (gems) to the bundle directory instead of ~/.embulk directory. This makes it easy to manage versions of plugins. To use the bundle, add -b <bundle_dir> option to guess, preview, or run subcommand. embulk mkbundle also generates some example plugins to <bundle_dir>/embulk/*.rb directory.

See the generated <bundle_dir>/Gemfile file how to plugin bundles work.

embulk mkbundle ./embulk_bundle  # please edit ./embulk_bundle/Gemfile to add plugins. Detailed usage is written in the Gemfile
embulk guess -b ./embulk_bundle ...
embulk run   -b ./embulk_bundle ...

Use cases

For further details, visit Embulk documentation.

Upgrading to the latest version

Following command updates embulk itself to the specific released version.

embulk selfupdate x.y.z

Embulk Development

Build

./gradlew cli  # creates pkg/embulk-VERSION.jar

You can see JaCoCo's test coverage report at ${project}/build/reports/tests/index.html You can see Findbug's report at ${project}/build/reports/findbug/main.html # FIXME coverage information is not included somehow

You can use classpath task to use bundle exec ./bin/embulk for development:

./gradlew -t classpath  # -x test: skip test
./bin/embulk

To deploy artifacts to your local maven repository at ~/.m2/repository/:

./gradlew install

To compile the source code of embulk-core project only:

./gradlew :embulk-core:compileJava

Task dependencies shows dependency tree of embulk-core project:

./gradlew :embulk-core:dependencies

Update JRuby

Modify jrubyVersion in build.gradle to update JRuby of Embulk.

Release

Prerequisite: Sonatype OSSRH

You need an account in Sonatype OSSRH, and configure it in your ~/.gradle/gradle.properties.

ossrhUsername=(your Sonatype OSSRH username)
ossrhPassword=(your Sonatype OSSRH password)

Prerequisite: PGP signatures

You need your PGP signatures to release artifacts into Maven Central, and configure Gradle to use your key to sign.

signing.keyId=(the last 8 symbols of your keyId)
signing.password=(the passphrase used to protect your private key)
signing.secretKeyRingFile=(the absolute path to the secret key ring file containing your private key)

Release

Modify version in build.gradle at a detached commit to bump Embulk version up.

git checkout --detach master
(Remove "-SNAPSHOT" in "version" in build.gradle.)
git add build.gradle
git commit -m "Release vX.Y.Z"
git tag -a vX.Y.Z
(Write the release note for vX.Y.Z in the tag annotation.)
./gradlew clean && ./gradlew release
git push -u origin vX.Y.Z

More Repositories

1

embulk-output-bigquery

Embulk output plugin to load/insert data into Google BigQuery
Ruby
122
star
2

embulk-input-jdbc

MySQL, PostgreSQL, Redshift and generic JDBC input plugins for Embulk
Java
101
star
3

embulk-output-jdbc

MySQL, PostgreSQL, Redshift and generic JDBC output plugins for Embulk
Java
86
star
4

embulk-filter-column

A filter plugin for Embulk to filter out columns
Java
44
star
5

embulk-input-s3

S3 file input plugin for Embulk
Java
39
star
6

embulk-output-elasticsearch

Java
32
star
7

embulk-input-mongodb

MongoDB input plugin for Embulk loads records from MongoDB.
Java
17
star
8

embulk-output-s3

Embulk S3 output plugin
Java
16
star
9

embulk-input-gcs

Embulk plugin that loads records from Google Cloud Storage
Java
14
star
10

embulk-filter-calcite

Java
14
star
11

embulk-executor-mapreduce

MapReduce executor plugin for Embulk
Java
13
star
12

embulk-filter-expand_json

Java
13
star
13

embulk-output-gcs

Google Cloud Storage output plugin for Embulk
Java
11
star
14

guice-bootstrap

Guice with JSR 250 Lifecycle annotations (@ PostConstruct and @ PreDestroy)
Java
10
star
15

embulk-input-script

Java
7
star
16

embulk-input-command

Command-line file input plugin for Embulk
Java
7
star
17

embulk-filter-encrypt

Encrypt filter plugin for Embulk
Java
6
star
18

gradle-embulk-plugins

A Gradle plugin to build and publish Embulk plugins
Java
6
star
19

embulk-base-restclient

Base class library for Embulk plugins to access RESTful services
Java
6
star
20

embulk-output-sftp

Store files on remote server using SFTP
Java
5
star
21

embulk-input-ftp

Embulk FTP input plugin
Java
5
star
22

embulk-input-sftp

Reads files stored on remote server using SFTP
Java
4
star
23

embulk-input-azure_blob_storage

Microsoft Azure Blob Storage file input plugin for Embulk
Java
2
star
24

embulk-output-command

Command file output plugin for Embulk
Java
2
star