Giter VIP home page Giter VIP logo

elasticsearch-dump's Introduction

ElasticSearch BigData importer

GitHub last commit GitHub code size in bytes GitHub license Open Source Love

Imports raw JSON to Elasticsearch in a multi-thread way

diagram

We have 5 state here

  • Only validating data
  • Import data to ElasticSearch without validation
    • Import using single-thread
    • Import using multi-thread
  • Import data to ElasticSearch after validation
    • Import using single-thread
    • Import using multi-thread

Prerequisites

Install the elasticsearch package with pip :

pip install elasticsearch

Read more about versions here

Use

Options

--data          : The data file
--check         : Validate data file
--bulk          : ElasticSearch endpoint ( http://localhost:9200 )
--index         : Index name
--type          : Index type
--import        : Import data to ES
--thread        : Threads amount, default = 1
--help          : Display help message

Validate data

I suggest you check your data before ( or during ) import process

python import.py --data test_data.json --check

Single Thread

Import without validation
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name
Import after validation
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --check

Multi Thread

Import without validation
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --thread 16
Import after validation
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --check --thread 16

We have much faster process using multi-thread way. It depends on your computer/server resources. This script used linecache to put data in RAM, so you need enough memory capacity too

My test situation :

  • AMD Ryzen 3800X ( 8 core / 16 thread )
  • 64GB Ram ( 3000MHz / CL16 )
  • Windows 10
  • 10Gb JSON file with ~24 million objects
  • Elasticsearch v7

The whole process took about ~30 minutes and the usage of resources were efficient

usage

Support

ko-fi

Contributing

  1. Fork it!
  2. Create your feature branch : git checkout -b my-new-feature
  3. Commit your changes : git commit -am 'Add some feature'
  4. Push to the branch : git push origin my-new-feature
  5. Submit a pull request :D

Issues

Each project may have many problems. Contributing to the better development of this project by reporting them

elasticsearch-dump's People

Contributors

hatamiarash7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.