Giter VIP home page Giter VIP logo

azkaban-plugins's Introduction

Azkaban Plugins

Build Status

Because this Plugin repo is difficult to maintain, AZ team is actively moving plugin code to the main azkaban repo. You might want to check out Azkaban Github if you miss finding some code.

For all Azkaban Plugins documentation, please go to Azkaban Project Site

azkaban-plugins's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azkaban-plugins's Issues

Job Summary tweaks

  • Add 20px margin to bottommost box
  • 25% width for Job Type key cell
  • Display placeholder when no stats are available

can not clone in windows env

when i clone the repos, it always throw

 fatal: cannot create directory at 'plugins/jobtype/jobtypes/hive-0.8.1/hive-0.8.
1/aux': Invalid argument

it seems aux is not a valid name in windows environment?

thx

Reportal should let user kill his or her running report

Sometimes after firing off a run, I realize there is a bug in my code. I would like to kill the running report to avoid wasting resources. Currently, there is no way for me to kill the report from the Reportal UI. We should add this functionality.

This is especially critical now that Reportal only allows you to have one RUNNING execution of each report at a time. Sometimes jobs have OOM errors and hang, causing the flow to remain in the RUNNING state forever. In such a scenario, I will never be able to run my report again, since I have no way of killing the currently RUNNING execution.

Alternatively, we should roll back this commit, or make a new commit so that concurrent execution can be enabled/disabled by a property in the Azkaban conf file.

Tracked by internal JIRA HADOOP-6976.

ParquetFileViewer only works on world-readable files

As mentioned in #114, the ParquetFileViewer tries to view every file as azkaban, meaning it can only view files owned by azkaban, with the group set to azkaban, or world-readable. This means users will not be able to view their Parquet files through the HDFSViewer unless their Parquet files are world-readable.

Visualizer tweaks

  • Display placeholder if Pig Visualizer has nothing to visualize.
  • Fix Auto Pan Zoom and Reset Pan Zoom buttons

Job Summary plugin job_id parsing does not work for Hadoop 2 job ids and URLs

In log-data.js, the job_id regexes expect job_<12 digits>_<4+ digits>. However, on Hadoop 2, instead of <12 digits> of the form YYYYMMDDHHMM, the job_id appears appears to contain the milliseconds since epoch, which is variable in length and is currently 13 digits.

Also, in Hadoop 2, the job URL printed out in the logs does not contain job_ (which the current url regex is looking for) but instead looks something like http://<host>:<port>/proxy/application_<milliseconds_since_epoch>_<counter>.

We should fix the regexes so that the job summary plugin will find the job ids and URLs.

Tracked by internal JIRA HADOOP-6977.

Pig 0.12 job type

May be useful to create a Pig 0.12.0 job type since Pig 0.12 is now out.

Tracked by internal JIRA ticket: HADOOP-4414

Hive plugin missing Antlr jar

hive-0.8.1/aux/lib should have the antlr runtime. I used version 3.0.1 and that worked. I don't know what is latest.
antlr-runtime-3.0.1.jar

Hive seems to ignore user property

The Hive plugin works with no user.to.proxy or proxy.user set anywhere. It probably defaults to the user that runs the azkaban executor daemon.

If Pig/hadoopJava did this, Azkaban security would be very easy for small shops (like us) who do not use Hadoop security features.

BinaryJSON HDFS file viewer issues

The BinaryJSON file viewer seems to be unstable. At times, it fails to display a file at all and at other times, it dumps binary junk.

Tracked by internal JIRA: HADOOP-4478

Permission denied when viewing files in user directory

When running Azkaban on a grid with Hadoop security enabled, viewing a file in one's user directory results in a permission denied error.

I have root-caused this to the fact that the Parquet file viewer does not use the FileSystem object passed in from the HdfsBrowserServlet, which is properly set up to doAs the current user logged rather than the azkaban user. As a result, the Parquet file viewer ends up trying to view the file as azkaban, throwing the AccessControlException. Currently, AvroParquetReader does not have an API that lets one pass in a FileSystem object. The fix for now is to remove the catch AccessControlException block from the Parquet file viewer.

Tracked by internal ticket: HADOOP-5350

Cannot find right place for user.to.proxy (also, confusion with proxy.user)

The HadoopSecurityManager_H_1_0 class expects to find a property 'user.to.proxy'. I have placed this in every configuration file and .job file, and nothing has worked.

Which file should this property be in?

Here is the full log section for this attempt at running the java-wc job. HadoopSecurityManager_H_1_0 is clearly trying to pass in a 'user.to.proxy' property which is not there.

2013/08/02 03:08:05.255 +0000 INFO [pig-upload] [Azkaban] Need to proxy. Getting tokens.
2013/08/02 03:08:05.255 +0000 INFO [pig-upload] [Azkaban] Getting hadoop tokens for apxqueue
2013/08/02 03:08:05.255 +0000 INFO [HadoopSecurityManager] [Azkaban] proxy user apxqueue not exist. Creating new proxy user
2013/08/02 03:08:05.258 +0000 INFO [pig-upload] [Azkaban] Getting DFS token from 10.176.235.204:8020hdfs://ip-10-176-235-204.us-west-1.compute.internal:8020
java.lang.NullPointerException
at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:246)
at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:408)
at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:571)
at azkaban.security.HadoopSecurityManager_H_1_0$2.getToken(HadoopSecurityManager_H_1_0.java:268)
at azkaban.security.HadoopSecurityManager_H_1_0$2.run(HadoopSecurityManager_H_1_0.java:258)
at azkaban.security.HadoopSecurityManager_H_1_0$2.run(HadoopSecurityManager_H_1_0.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
at azkaban.security.HadoopSecurityManager_H_1_0.prefetchToken(HadoopSecurityManager_H_1_0.java:253)
at azkaban.jobtype.HadoopPigJob.getHadoopTokens(HadoopPigJob.java:166)
at azkaban.jobtype.HadoopPigJob.run(HadoopPigJob.java:102)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:379)
at azkaban.execapp.JobRunner.run(JobRunner.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
azkaban.security.commons.HadoopSecurityManagerException: Failed to get hadoop tokens! nullnull
at azkaban.security.HadoopSecurityManager_H_1_0.prefetchToken(HadoopSecurityManager_H_1_0.java:318)
at azkaban.jobtype.HadoopPigJob.getHadoopTokens(HadoopPigJob.java:166)
at azkaban.jobtype.HadoopPigJob.run(HadoopPigJob.java:102)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:379)
at azkaban.execapp.JobRunner.run(JobRunner.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2013/08/02 03:08:05.261 +0000 ERROR [pig-upload] [Azkaban] Job run failed!
2013/08/02 03:08:05.261 +0000 ERROR [pig-upload] [Azkaban] Failed to get hadoop tokens! nullnullnull
2013/08/02 03:08:05.261 +0000 INFO [pig-upload] [Azkaban] Finishing job pig-upload at 1375412885261
2013/08/02 03:08:05.267 +0000 INFO [wordcount-java] [Azkaban] Job Finished pig-upload with status FAILED
2013/08/02 03:08:05.278 +0000 INFO [wordcount-java] [Azkaban] Killing wordcount-java due to prior errors.
2013/08/02 03:08:05.289 +0000 INFO [wordcount-java] [Azkaban] Finishing up flow. Awaiting Termination
2013/08/02 03:08:05.289 +0000 INFO [wordcount-java] [Azkaban] Setting flow status to Failed.
2013/08/02 03:08:05.289 +0000 INFO [wordcount-java] [Azkaban] Flow is set to FAILED
2013/08/02 03:08:05.289 +0000 INFO [wordcount-java] [Azkaban] Setting end time for flow 8 to 1375412885289
2013/08/02 03:08:05.305 +0000 INFO [FlowRunnerManager] [Azkaban] Flow 8 is finished. Adding it to recently finished flows list.
2013/08/02 03:10:01.532 +0000 INFO [FlowRunnerManager] [Azkaban] Cleaning recently finished
2013/08/02 03:10:01.532 +0000 INFO [FlowRunnerManager] [Azkaban] Cleaning execution 8 from recently finished flows list.

CamusJob cannot be launched using hadoopJavaJob and Azkaban2

I am filing this issue because I encountered the same issue described at https://groups.google.com/d/msg/azkaban-dev/S9G9Lqmfm1Q/7pV0P7Re820J but there does not seem to be a bug report for yet.

The problem is that the CamusJob run() method signaturerun(String[] args) is not supported by the hadoopJavaJob plugin. This results in the following error:

14-08-2014 14:32:44 PDT consume_kafka ERROR - Caused by: java.lang.IllegalArgumentException: Can not create a Path from a null string
14-08-2014 14:32:44 PDT consume_kafka ERROR -   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:87)
14-08-2014 14:32:44 PDT consume_kafka ERROR -   at org.apache.hadoop.fs.Path.(Path.java:99)
14-08-2014 14:32:44 PDT consume_kafka ERROR -   at com.linkedin.camus.etl.kafka.mapred.EtlMultiOutputFormat.getDestinationPath(EtlMultiOutputFormat.java:113)
14-08-2014 14:32:44 PDT consume_kafka ERROR -   at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:181)

run() is invoked by default but it has not received the main.args to load the Camus specific properties, hence when run() is called it is missing key settings. It seems that the proper behaviour would be to call the Camus run(String[] args) method in order to have Camus properly initialize.

I am happy to give this a shot if I can get some pointers on how / what to adjust in the hadoopJavaJob plugin.

Ensure Reportal reports can't run concurrently

This is to prevent scheduled reports with repeating from hogging Azkaban server. One use case we saw was there were many running instances of the same report because this report takes a while to complete and it is configured to repeat every minute.

Import Azkaban 2.6.1 JARs

Due to azkaban/azkaban#255, package names of some classes have been change. For example, azkaban.webapp.AzkabanServer is now azkaban.server.WebServer.

The new Azkaban 2.6.1 JARs need to be imported so that the plugins can compile against the JARs with the new packages to be compatible with the new Azkaban core.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.