Giter VIP home page Giter VIP logo

dagobah's People

Contributors

brainwane avatar dwoos avatar michaelmartinez avatar movermeyer avatar studer avatar surbas avatar thieman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dagobah's Issues

Schedule Times on Jobs Page

Next run times on the jobs page (/jobs) are sometimes not displaying correctly. The screenshot in the readme shows this problem. For context, that shot was taken immediately after adding a bunch of jobs and then using the back button to get back to the jobs page; maybe that will help reproduce.

Allow email without authentication

Hello,

I have an email server that doesn't need authentication. I noticed if user is None in the yaml config then emailing is disabled.

The way I propose to allow unauthenticated email is add a config option allowing the specification if authentication is required. If it is True and user is None, then issue a warning in the log, and disable emailing as done now.

I am happy to do this myself and submit a pull request if you agree with the approach.

Job Categorization

Related to #62. The ability to categorize jobs could make things a lot more manageable if you have a bunch of jobs in the system. Additionally, this should serve as the top level of configuration rather than having global defaults in the config.

Job-level configuration would override category-level configuration if set.

New jobs should be placed in a default category that is configurable just like user-created categories are.

Anything we can move out of the config file and into backend-managed state (and the UI) is a win for usability, too.

Add automated task retrying

Some tasks have a tendency to break (I am thinking specifically of data processes reliant on shaky third party APIs) but it may be difficult to properly account for all of the potential errors ahead of time and make sure your task can survive it. In this situation, it can be helpful for Dagobah to automatically retry your job up to a certain number of times before giving up and declaring the task a failure.

I think we need to do three things:

  1. Add a user-configurable maximum number of times to retry a Task, set at the Task level
  2. Add a property on the Task object that tracks how many times the Task was retried during its last run. This will end up in the run log.
  3. Patch the existing email templates to show task retries.

Job Notes

This is another enhancement added by @utkarsh2012's fork, which I happen to find useful. It adds a notes field for each job. This is useful in job maintenance, for example when working with a team, it allows you to provide Job-level documentation on the function of the task and any nuances it may have in execution so that others may maintain and troubleshoot it.

If this sounds like a worthwhile feature, I'll make a branch and submit a pull request for the feature, whereupon we can iterate on it. Right now it looks like this:
screenshot 2014-05-12 17 18 08

How to use alembic in dagobah?

I have changed the Dagobah model and added remote_host.
Then I do this:

cd dagobah/backend
(sched)zengr@V2 (master *) ~/dev/python/dagobah/dagobah/backend: ls
__init__.py       alembic.ini       base.pyc          migrations        mongo.pyc         sqlite.pyc        sqlite_models.pyc
__init__.pyc      base.py           dagobah.db        mongo.py          sqlite.py         sqlite_models.py
(sched)zengr@V2 (master *) ~/dev/python/dagobah/dagobah/backend: alembic revision -m "add remote host"
  Path doesn't exist: 'dagobah/backend/migrations'.  Please use the 'init' command
  to create a new scripts folder.

The migrations dir exists but I get the error, am I missing anything?

I am using alembic for the first time and I am referring to this: https://alembic.readthedocs.org/en/latest/tutorial.html

stdout Leakage when calling echo_dagobah_conf

The entry_point echo_dagobah_conf is loading the entire package to just print a conf. If you don't have a backend setup you get the import errors. if you do you see this at the top...

Logging output to d:\dev\dagobah\dagobah\daemon\dagobah.log
INFO  [alembic.migration] Context impl SQLiteImpl.
INFO  [alembic.migration] Will assume non-transactional DDL.
Email.auth_required is True but Email.user is None. Emailing of reports will be disabled.

I think the best way to handle is to move return_standard_config and print_standard_config to it own module which would have no dependance on dagobah.

Generate secret key on first run

The default config file currently has a scary WARNING flag letting the user know to create their own secret key. This will probably get overlooked in a bunch of cases, leading to uncustomized Dagobah installations all using the same key.

I think a better solution would be to generate a new secret key the first time Dagobah is run, replace the secret key value in the config file, then put an indicator somewhere (possibly in the config file itself) so we can track whether any given run is the first run.

Handle optional installs better

Right now, we're directing people to run stuff like pip install sqlalchemy without specifying versions or anything. We should control this installation flow within Dagobah, allowing the user to simply say that they want a certain backend and letting Dagobah handle the rest of the work.

Better handling of schema change

If the model is changed, these functions needs modification:

In core.py:

  1. def from_backend(self, dagobah_id) (Else it will overwrite the data)
  2. Task._serialize

Try to find a better way of managing this.

Also, after a schema change, I am currently deleteing the existing dagobah.db and recreating it. Is there a better way of doing this? schema migrations?

List libxml 2.9+ dependency

Hey, I tried to build dagobah today on Centos 6.2, and ran into a gigantic pile of compilation errors. Through some strategic googling, I realized that the dagobah depends on lxml that depend on a more recent version of libxml2 and libxslt than my package manager had available (I had to install dev versions). More info here: http://stackoverflow.com/questions/15759150/src-lxml-etree-defs-h931-fatal-error-libxml-xmlversion-h-no-such-file-or-di

Thought it might be worth noting since not everyone's distribution will have the proper libraries by default, and the error messages associated with it are not immediately obvious. My suggesion is to mention it in you README near the pip installation, but feel free to close the issue if you feel like the issue is not in your wheelhouse (I think it technically is an issue with pip/lxml not checking for the proper version of lxml)

Run Jobs inside Jobs

It would be cool to be able to run a Job inside of another Job (basically like it's a Task). For those of you not familiar with the insane jargon, I'm talking about a DAG that could look like, for a simple example:

Job 1: Task 1 -> Job 2 -> Task 2

Job 2: Task 3 -> Task 4

Job 1 would complete Task 1, start Job 2, monitor its completion status, and if it finishes successfully then proceed on to Task 2.

I think the first step is to make Jobs either scheduled or able to be called from inside other Jobs. There will be a lot of edge cases to argue about along the way.

Add Authentication

This has the makings of a damn fine web app, authentication of some kind is mandatory if you want it to survive for more than one minute "out there"... winter is always coming.

Behavior on Daemon Exit

On daemon restart, known jobs are constructed from their representations in the backend. This currently does not store state of running jobs. This could lead to some unexpected behavior if the daemon exits while a job is running. Find a better way.

Add ability to manually bypass failed tasks

Use case: I just had a task fail, so I went into IPython and screwed around in the interpreter until I fixed the bug (which occured half-way through a processing task) and let the task complete in my interpreter. Now, I want Dagobah to continue the rest of the failed job without having to run the entire task that I just completed manually.

Email error when log is too big to send.

I log a lot in my processes and got the following in stdout on completion of a job.

Traceback (most recent call last):
File "c:\dagobah\dagobah\core\components.py", line 31, in emit
method.call(_args, *_kwargs)
File "c:\dagobah\dagobah\daemon\daemon.py", line 151, in job_complete_email
email_handler.send_job_completed(kwargs['event_params'])
File "c:\dagobah\dagobah\email\basic.py", line 24, in send_job_completed
self._construct_and_send('Job Completed: %s' % data.get('name', None))
File "c:\dagobah\dagobah\email\common.py", line 39, in _construct_and_send
self._send_message()
File "c:\dagobah\dagobah\email\common.py", line 72, in _send_message
self.message.as_string())
File "C:\Python27\Lib\smtplib.py", line 739, in sendmail
raise SMTPDataError(code, resp)
SMTPDataError: (552, 'message line is too long')

So should we truncate logs if they are too big? This is actually controlled by the smtp server, so how big were my logs, and what is too big?

Should we offer an option not to send a log with an email (I personally don't need to see my logs in email).

Add LICENSE

BSD? Something unrestrictive. The goal would be for anyone to do whatever they want with the code, without any restrictions whatsoever. WTFPL is maybe appealing as well.

Add Multi-User Authentication

Spun off from #5, see original discussion there.

This issue concerns adding a multi-user auth model to Dagobah. This would add various user-level permissions to the core Dagobah classes, such as user-level ownership and sharing of a specific Job. The exact changes we want to make as a result of this change are open to discussion.

This will almost certainly be built off of the work done to close single-user auth, #11.

alembic.ini missing

See Issue #42 that was closed prematurely:

seems that alembic.ini is missing from the release on PyPi, resulting in

ConfigParser.NoSectionError: No section: 'alembic'

when invoking dagobahd after running pip install.

Disallow saving bad DAGs

From a discussion on #60. Currently, it's possible to save a Job when its DAG has cycles. Let's disallow this and update the UI to not be as annoying about it when a cycle is created.

Add Single-User Authentication

Spun off from #5, see original discussion there.

This issue concerns adding a single-user auth model to protect the Dagobah web client from unauthorized access. This would take the form of denying access to all web client functionality before the user has logged in. Since there would be only one user, this would not add any sort of true user-level permissions on any aspect of the service.

Handle Changes to Config File on Update

0.1.2 is going to have some new config file entries to deal with the single-user auth. There needs to be a way to gracefully inform users of these changes and help them roll them into their existing config files. Either that, or work with defaults even if the located config file doesn't have the necessary keys.

handler_name needs to be restricted.

Currently if handler_name is anything but the None, email or text then the get_email_handler implicitly returns None, and there is no warning to the user that emailing will not work.
Should we warn if this is the case and keep going or should we raise an exception?

Dagobah's features compared to chronos or azkaban?

dagobah is exactly what I was looking for, its much simpler (than chronos or azkaban) but also not similar to them.

So some questions/suggestions:

  1. Remote task execution? I believe this can be easily implemented via fabric. What do you think about it?
  2. Logging all the task runs. Right now only the most recent task logs can be seen. Adding logs for all the tasks would be a useful feature (atleast for our usecase).
  3. A fairly easy change will be give an option to disable auth if not needed.

I am playing around with Dagobag and I am trying to work on 1 and 2 now, not sure about the timeline though :)

Error in dagobah log if a deleted job is accessed

I get this error continuously (due to constant ajax calls) if a deleted job is accessed in the log and internal server error in the browser.

ERROR [dagobah.daemon.daemon] Exception on /api/job [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python2.7/dist-packages/Flask_Login-0.2.6-py2.7.egg/flask_login.py", line 663, in decorated_view
    return func(*args, **kwargs)
  File "/home/stack/dagobah/dagobah/daemon/util.py", line 47, in wrapper
    result = fn(*args, **kwargs)
  File "/home/stack/dagobah/dagobah/daemon/api.py", line 32, in get_job
    return job._serialize()
AttributeError: 'NoneType' object has no attribute '_serialize'

Maybe appropriate error for non existent jobs should be returned.

Export/Import jobs

This can be a useful feature in dagobah, it will help version control the complex chains of jobs.

Crontab format not correct

The 'Cron Schedule String' entry:
00 22 * * *

is evaluating 'Next Scheduled Run' as:
January 31 2014 2:00 PM

but I think this should be:
January 30 2014 11:00 PM

More per-job configuration

Some examples of things that would be useful, mostly around email so far but could extend to other things:

  1. Only email Steve for this job, nobody else cares.
  2. Only email anyone about this job if it fails.
  3. I need more detail on this job than others, use a different email template than the default.

alembic.ini missing from pypi release

seems that alembic.ini is missing from the release on PyPi, resulting in

ConfigParser.NoSectionError: No section: 'alembic'

when invoking dagobahd after running pip install.

ValueError: could not infer dagobah ID, multiple available in backend

Hey Travis,

We moved from sqlite to mongo. And running dagobah on apache.
It worked fine but after 2-3 days (and we had apache restarts).

But I just restarted apache and I am getting this error:

    ValueError: could not infer dagobah ID, multiple available in backend

I am not sure about the fix. I see about 50 entries in the dagobah collection. I think dagobah expects just 1. When I hardcode to pick just one, it picks a set of old jobs and doesn't pull the most recent jobs which is stored in dagobah_job collection.

Any suggestions? This has blocked some jobs for now.

Add Host table to manage remote hosts

I have added "add remote host" form in the settings page for now.

I have 2 primary endpoints:

  1. add_host (will be called from settings page)
  2. add_host_to_task (will be called from job details)

Once the request comes in api.py's add_host what function should be invoked to add the host to dagobah_host table?

I am little confused with the way I should structure my code.
Should I create a new Host(object) class in core.py? And write a new function commit_host in base.py (which will endup in sqlite.py and mongo.py)?

If I do all that, I will end up writing a parallel Job object. But again, remote host info is completely isolated from the job info so does it make sense to keep these in two different classes.

DagobahHost model:

class DagobahHost(Base):
    __tablename__ = 'dagobah_host'

    id = Column(Integer, primary_key=True)
    task_id = Column(Integer, ForeignKey('dagobah_task.id'), index=True)
    name = Column(String(1000), nullable=False)
    username = Column(String(1000), nullable=False)
    password = Column(String(1000))
    key = Column(String(1000))

    def __init__(self, name, username):
        self.name = name
        self.username = username

    def __repr__(self):
        return "<SQLite:DagobahHost (%d)>" % self.id

    @property
    def json(self):
        return {'task_id': self.task_id,
                'name': self.name,
                'username': self.username,
                'password': self.password,
                'key': self.key}

    def update_from_dict(self, data):
        for key in ['task_id', 'name', 'username', 'password', 'key']:
            if key in data:
                setattr(self, key, data[key])

And in DagobahTask(Base):

    hosts = relationship('DagobahHost', backref='task')

What do you suggest? How will you structure your code?

Config parsing of None

Make it so when reading in config YAML, "None" is converted to python None and not the string "None", and then clean up all the place where there are tests for None in code base.

Development Version

So I wanted to make some changes to config and couldn't figure out what was going on for awhile.

How do you run this in "development mode" locally?

I looked back through your commits and found some Flasky app.run() stuff I am used to, but its completely changed now. Looking for guidance on how to modify for "production" or easy_install script based start vs. command line dev mode start.

Display task run logs in Task Detail page

I am working on adding run log history for a task, wanted to get your thoughts about it. This is what I am planning to do:

  1. In job/<job_id>/<task_name> page
  2. After the "Run Logs", add a table which has a list of last X task runs (where X can be say 10).
  3. After clicking any one of the timestamp, a new modal (or a new tab) opens which has the full log for that task run.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.