Giter VIP home page Giter VIP logo

embulk's Introduction

What's Embulk?

Embulk is a plugin-based parallel bulk data loader that helps data transfer between various storages, databases, NoSQL and cloud services.

You can release plugins to share your efforts of data cleaning, error handling, transaction control, and retrying. Packaging effrots into plugins brings OSS-style development to the data scripts which was tend to be one-time adhoc scripts.

Embulk, an open-source plugin-based parallel bulk data loader at Slideshare

Embulk

Document

Quick Start

The single-file package is the simplest way to try Embulk. You can download the latest embulk-VERSION.jar from the releases page and run it with java:

wget https://bintray.com/artifact/download/embulk/maven/embulk-0.3.2.jar -O embulk.jar
java -jar embulk.jar --help

Let's load a CSV file, for example. embulk example subcommand generates a csv file and config file for you.

java -jar embulk.jar example ./try1
java -jar embulk.jar guess   ./try1/example.yml -o config.yml
java -jar embulk.jar preview config.yml
java -jar embulk.jar run     config.yml

Using plugins

You can use plugins to load data from/to various systems and file formats. An example is embulk-plugin-postgres-json plugin. It outputs data into PostgreSQL server using "json" column type.

java -jar embulk.jar gem install embulk-plugin-postgres-json
java -jar embulk.jar gem list

You can search plugins on RubyGems: search for "embulk-plugin".

Using plugin bundle

embulk bundle subcommand creates (or updates if already exists) a plugin bundle directory. You can use the bundle using -b <bundle_dir> option. embulk bundle also generates some example plugins to <bundle_dir>/embulk/*.rb directory.

See generated <bundle_dir>/Gemfile file how to plugin bundles work.

java -jar embulk.jar bundle ./embulk_bundle
java -jar embulk.jar guess  -b ./embulk_bundle ...
java -jar embulk.jar run    -b ./embulk_bundle ...

Releasing plugins to RubyGems

TODO: documents

embulk-plugin-xyz

Resuming a failed transaction

Embulk supports resuming failed transactions. To enable resuming, you need to start transaction with -r PATH option:

java -jar embulk.jar run config.yml -r resume-state.yml

If the transaction fails, embulk stores state some states to the yaml file. You can retry the transaction using exactly same command:

java -jar embulk.jar run config.yml -r resume-state.yml

If you giveup to resume the transaction, you can use embulk cleanup subcommand to delete intermediate data:

java -jar embulk.jar cleanup config.yml -r resume-state.yml

Embulk Development

Build

./gradlew cli  # creates pkg/embulk-VERSION.jar
./gradlew gem  # creates pkg/embulk-VERSION.gem

You can see JaCoCo's test coverage report at ${project}/build/reports/tests/index.html You can see Findbug's report at ${project}/build/reports/findbug/main.html # FIXME coverage information is not included somehow

You can use classpath task to use ./bin/embulk for development:

./gradlew classpath  # -x test: skip test
./bin/embulk

To deploy artifacts to your local maven repository at ~/.m2/repository/:

./gradlew install

To compile the source code of embulk-core project only:

./gradlew :embulk-core:compileJava

Task dependencies shows dependency tree of embulk-core project:

./gradlew :embulk-core:dependencies

Release

You need to add your bintray account information to ~/.gradle/gradle.properties

bintray_user=(bintray user name)
bintray_api_key=(bintray api key)

Update following files:

  • ChangeLog: release note
  • build.gradle: version number
  • lib/embulk/version.rb: version number

Then, build and upload using gradle:

./gradlew releaseCheck
./gradlew cli gem
./gradlew bintrayUpload
gem push pkg/embulk-....gem
open "https://bintray.com/embulk/maven/embulk"  # and upload pkg/embulk-....jar

See also:

embulk's People

Contributors

frsyuki avatar muga avatar seratch avatar kiyoto avatar komamitsu avatar xerial avatar niku avatar takashiyamazaki avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.