Giter VIP home page Giter VIP logo

oozie_majorcompaction_example's Introduction

What is this?

This is an example of how one can run a scheduled rolling HBase major compaction on a table with Oozie. The compactions are performed by an hbase shell script which will major compact only one reigon per region-server at a time. Further, only regions which have a non-zero cost from a weight function will be compacted by default. Regions will be compacted in priority of those with the highest weight. For those using major compaction to increase data-locality compactions can be forced.

Steps to deploy a single-shot workflow and reocurring coordinator:

One will deploy the workflow, coordinator and files to HDFS. Then, one will submit the workflow.properties to Oozie kicking off compactions.

Configure and stage files:

  1. Modify example_workflows/workflow.properties and example_workflows/coordinator.properties to match your cluster configuration (look for the items in angle brackets).
  2. Modify example_workflows/coordinator.xml to match your desired frequency of compaction.
  3. Upload example_workflows to HDFS (E.g. hdfs dfs -copyFromLocal example_workflows oozie_compaction`.)

NOTE: workflow.xml has a hardcoded hbase path of /usr/bin/hbase

Submit the one-time workflow and run the job:

One can follow the below steps to deploy the workflow:

  1. Submit the job via oozie job -config example_workflows/workflow.properties -run.
  2. One can see that their table was compacted by looking in the action's YARN logs for the string "Done Compacting".

Example output from yarn logs -applicationId application_######:

Stdoutput Regions to compact for table clay_test:
Stdoutput myhost1.example.com,60200,1556069347115 has 1 region(s) to compact
Stdoutput Compacting myhost1.example.com,60200,1556069347115 region 1d8d46167cdd550b4ac10363c0982191
Heart beat
Heart beat
Stdoutput myhost1.example.com,60200,1556069347115 region 1d8d46167cdd550b4ac10363c0982191
Stdoutput Done compacting in 68.4029998779297 seconds

Submit the scheduled coordinator to regularly run the job:

One can follow the below steps to deploy the coordinator:

  1. Submit the job via oozie job -config example_workflows/coordinators.properties -run recording the coordinator ID returned.
  2. Verify that only one workflow job is running via oozie job -info <coordinator ID>

Steps to run the rolling_compaction.rb script by hand

One may run the rolling_compaction.rb script manually via:

$ export table_name="<your table>"
$ export force_compaction="true|false"
$ ./rolling_compaction.rb

If one has an hbase binary not at /usr/bin/hbase, one can run:

$ export table_name="<your table>"
$ export force_compaction="true|false"
$ <path to your hbase binary> shell ./rolling_compaction.rb

oozie_majorcompaction_example's People

Contributors

cbaenziger avatar

Stargazers

Aravind Yarram avatar Ronald MacMaster  avatar Lei Chen avatar Alberto Miorin avatar

Watchers

James Cloos avatar Lei Chen avatar  avatar

Forkers

dbist

oozie_majorcompaction_example's Issues

CompactionStates

I noticed some issues with your code below:

you should be checking for the compaction state MAJOR and MAJOR_MINOR also if you convert to a string ".equals()" would safer :)

One other thing, the admin.getCompactionState() opens and closes a catalog tracker (which opens and closes a zk connection each time). This might give you problems if you are doing it on a lot of regions and this is happening every 10 seconds.

    Configuration conf = HBaseConfiguration.create();
    conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));

    if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
      conf.set("mapreduce.job.credentials.binary",
               System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
    }

    Connection connection = ConnectionFactory.createConnection(conf);
    Admin admin = connection.getAdmin();

    System.out.println("Compacting table " + argc[0]);
    TableName tableName = TableName.valueOf(argc[0]);
    admin.majorCompact(tableName);
    while (admin.getCompactionState(tableName).toString() == "MAJOR") {
      TimeUnit.SECONDS.sleep(10);
      System.out.println("Compacting table " + argc[0]);
    }
    System.out.println("Done compacting table " + argc[0]);
}```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.