Giter VIP home page Giter VIP logo

hbase-secondary-index's Introduction

#More details, see wiki page(Chinese wiki 中文使用手册): https://github.com/mayanhui/hbase-secondary-index/wiki

###################################################

Methods of building secondary index

###################################################

##0.Environment

  • hadoop: 1.0.4
  • hbase: 0.94.0
  • zookeeper: 3.4.3
  • hive: 0.9.0
  • thrift: 0.9.0

##1.Many ways to build index

###1.1 MapReduce

Using integration mapreduce to build hbase index for main table. The main structure is:

(1) scan input table by TableMapper<ImmutableBytesWritable, Writable>

(2) get the rowkey and special colum name and value

(3) create instance of Put with value=rowkey, and rowkey=columnName + "_" +columnValue

(4) use IdentityTableReducer to put data into index table

Index type support:

  1. build single column index

  2. build multi single-column index together

  3. build combined-column index

  4. build json column index. single-field, combined-field index

  5. build rowkey only index

Command to build index:

    1. build single column index

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid -s 20130101 -e 20130120 -v 1

    1. build multi single-column index together

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg -s 20130101 -e 20130120 -v 3

    1. build combined-column index

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg -si false

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:mid,cf1:age,cf2:msg -si false -s 20130101 -e 20130120 -v 1

    1. build json column index. single-field, combined-field index

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category -s 20130101 -e 20130120 -v 1

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category -si false

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c cf1:msg -j area,type,category -si false -s 20130101 -e 20130120 -v 1

    1. build rowkey only index

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey -r uid:1,mid:2,isrowkey:1

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey:cf1:content -r uid:1,mid:2,isrowkey:1

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey:cf1:content -r uid:1,mid:2,isrowkey:1 -s 20130101 -e 20130120

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i demo_table -o demo_table_index -c rowkey:cf1:content -r uid:1,mid:2,isrowkey:1 -s 20130101 -e 20130120 -v 1

###1.2 ITHBase

$HBASE_HOME/conf/hbase-site.xml:

  • hbase.hlog.splitter.impl

org.apache.hadoop.hbase.regionserver.transactional.THLogSplitter

  • hbase.regionserver.class

org.apache.hadoop.hbase.ipc.IndexedRegionInterface

  • hbase.regionserver.impl

org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer

  • hbase.hregion.impl

org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegion

###1.3 IHBase

The implementation of this method is from https://github.com/ykulbak/ihbase. However, the code is not available at all due to many classes missing. This method is not recommended because it is invasive.

###1.4 Coprocessor A demo is implemented. This method is proposed from habse-0.92.0 and not perfect now. The characteristic are:

  • Must implement a train of code for an index. Poor Reusability.
  • Must disable table before using alter table. unfriendly method for online service.
  • Better than other online index building methods(invasive).

#####################

MapReduce

#####################

##2 MapReduce Usage

###2.1 Build from source code Download the source code first and then use maven to build jar. go into the project and do:

mvn install

Note: You need to install maven >= 2.2.1

###2.2 use jar You can see the jar file in root directory of project: hbase-secondary-index-0.1.jar You can use it directly!

###2.3 Build index Use the example of buildindex.sh in directory 'src/main/resources' Such as:

hadoop jar hbase-secondary-index-0.1.jar net.hbase.secondaryindex.mapred.Main -i user_behavior_attribute_noregistered -o user_behavior_attribute_noregistered_index -c bhvr:vvmid -s 20130101 -e 20130120 -v 3

usage: Build-Secondary-Index -c family:qualifier [-d] [-e ] -i [-j ] -o [-r ] [-s ] [-si ] [-v ]

-c,--column family:qualifier column to store row data into (must exist). Such as: cf1:age,cf2:tag,cf2:msg or rowkey or rowkey,cf1:age. The last two usage are for 'rowkey' index building.

-d,--debug switch on DEBUG log level

-e,--edate the end date of data to build index(default is today), such as: 20130120

-i,--input the directory or file to read from (must exist)

-j,--json json fields to build index. The max number of fields is 3! This kind of data uses IndexJsonMapper.class.

-o,--output table to import into (must exist)

-r,--rowkey rowkey fields to build index. The max number of fields is 2! This kind of data uses IndexRowkeyMapper.class. The format is: uid:1,msgid:2,isrowkey:1 uid and msgid are the field name, 1 and 2 is the order in the rowkey(like: uid_msgid_ts). isrowkey is the label to define which field is the new rowkey. The separator in rowkey is _ . You can use validate column to build incremental index. If use validate column, you need to add a column to -c parameter, the -c should be 'rowkey,cf1:age'

-s,--sdate the start date of data to build index(default is 19700101), such as: 20130101

-si,--sindex if use single index. true means 'single index', false means 'combined index'(default is true). If build combined index, the max number of columns is 3.

-v,--versions the versions of each cell to build index(default is Integer.MAX_VALUE)

##License Released under the GPLv3 license. For full details, pleasesee the LICENSE file included in this distribution.

hbase-secondary-index's People

Contributors

mayanhui avatar

Watchers

MingKong avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.