Giter VIP home page Giter VIP logo

fangshi's Introduction

Fangshi

Generate the date string to import data incrementally when running oozie workflows

Example

    <action name="get-date">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.queue.name</name>
                    <value>default</value>
                </property>
            </configuration>
            <main-class>com.scmspain.bigdata.hadoop.Fangshi</main-class>
            <arg>--import-seconds</arg>
            <arg>${secondsToImport}</arg>
            <arg>--date-format</arg>
            <arg>${dateFormat}</arg>
            <capture-output/>
        </java>
        <ok to="end"/>
        <error to="error"/>
    </action>

Arguments

  • s (import-seconds): the number of seconds / offset to import from (for example, "-3600" would import the previous hour). Default is "-86400" (previous day)
  • d (date-format): format in which date strings will be generated. Default is "yyyy-MM-dd HH:mm:ss"
  • f (partition-format): format in which partition strings will be generated. Default is "yyyyMMdd"
  • t (timezone): specify the timezone used when dates are being calculated. Default is "Europe/Madrid"
  • h (static-hours): used specifically when we want to generate static partitioning string values. Default value is "0" (no static partitioning).

Example for static partitioning

Specifying h argument (static-hours) will put static partitioning string values in the following format:

  • date_hour_X: this string contains the ${partitionHour} value for the X hour
  • date_range_start_X: this string contains the partition range start string for the X hour
  • date_range_end_X: this string contains the partition range end string for the X hour
  • date_hour_current/date_range_start_current/date_range_end_current: they both contain the same values that their _X relatives but for the current hour.

For example, calling the java class on 12:23 AM with h = 2 (last two hours to import) will generate the following values:

  • 0
    • date_hour_0 -> 10
    • date_range_start_0 -> 2016-02-18 10:00:00
    • date_range_end_0 -> 2016-02-18 10:59:59
  • 1
    • date_hour_1 -> 11
    • date_range_start_1 -> 2016-02-18 11:00:00
    • date_range_end_1 -> 2016-02-18 11:59:59
  • current (h + 1)
    • date_hour_current -> 12
    • date_range_start_current -> 2016-02-18 12:00:00
    • date_range_end_current -> 2016-02-18 12:59:59

How to call the java class

Using short arguments

java -jar dates.jar -s -3600 -d "yyyy-MM-dd" -f "MMddyyyy" -t "UTC" -h 2

Using long arguments

java -jar dates.jar --import-seconds -3600 --date-format "yyyy-MM-dd" --partition-format "MMddyyyy" --timezone "UTC" --static-hours 2

fangshi's People

Contributors

vcapelca avatar javiyt avatar

Stargazers

Miquel Angel Andreu avatar

Watchers

Carlos Villuendas Zambrana avatar James Cloos avatar Alan Bover avatar Ramon Rius avatar  avatar Albert Manyà avatar Ramón Martínez avatar  avatar  avatar  avatar  avatar Antonio García Serrano avatar  avatar  avatar  avatar Cristian Garcia Marín avatar Jordi Sans avatar Àlex Camps avatar Victor avatar  avatar marcelo alves avatar  avatar Vanessa Fayos avatar Ramón Romero avatar Daniel Abril avatar MAI2TA avatar  avatar  avatar  avatar  avatar  avatar Andrés Alonso avatar

fangshi's Issues

Return partition day format for every hour range

Partition day format (i.e. for timestamp 2016-05-24 03:00:00 should be "20160524") should be returned for every hour range generated instead only once, to avoid inconsistencies if Fangshi generates hour ranges for today and yesterday (when running on midnight hour).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.