Giter VIP home page Giter VIP logo

dequindre's People

Contributors

vogt4nick avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

dequindre's Issues

Allow Dequindre to restart from point of failure

Per Reddit

If something goes wrong with a 5-script pipeline, let's say problem is with the 3rd script, after the problem is fixed, does the pipeline runs from the very beginning, or can it start where the failure occured?

I bet a .dequindre.json file would be enough to track which tasks have and have not run. Actually, it would only have to track which one's errored out. We can uniquely identify DAGs and Dequindres if we want. This may be a solution.

Allow users to "add" DAGs together

Large workflows can usually be broken up into parts. It'd be nice to define these parts separately and then add the DAGs together by combining all Tasks and dedges. The DAGs don't even have to be connected, since the DAG class allows for multiple graphs.

# define tasks
buy_groceries = DAG(tasks=tasks)
# define tasks
make_breakfast = DAG(tasks=tasks)

schedule = buy_groceries + make_breakfast

Support Concurrent Tasks

Following up #75. asyncio or threading will support concurrency.

Maybe at a later date we can use multithreading to support parallel tasks, but I'm still opposed to it. If parallelism is a breaking point, then Airflow is almost always going to be a better tool for the job.

Rework schedule methods

Dequindre.get_task_priorities and Dequindre.get_priorities is a little opaque.

Proposal A

We rename get_task_priorities to get_task_schedules, and rename get_priorities to get_schedules. They'll return the same values, but the new names will be more obvious to users.

Proposal B

We keep get_task_priorities and get_priorities as they are. We introduce get_schedules which returns an ordered list of when the tasks will be run at runtime.

Proposal A sounds best right now, but we should stew on it a bit before making a final decision.

Add option to fail hard or soft

If a task fails, Dequindre will keep going as normal. We should have an option to accept this behavior or abort the whole thing.

Consider merging DAG and Dequindre functionality as Workflow

Suppose we had a different class structure built on Tasks and Workflows. Workflows would basically combine the functionality of DAGs and Dequindres.

from dequindre import Task, Workflow

def my_workflow():
    ## define tasks and environments
    boil_water = Task('boil_water.py')
    steep_tea = Task('steep_tea.py')
    drink_tea = Task('drink_tea.py')

    ## define runtime dependencies
    make_tea = Workflow(dependencies={
        boil_water: pour_water,
        steep_tea: {boil_water, prep_infuser}
    })

    ## run tasks
    make_tea.get_schedules()
    # defaultdict(<class 'set'>, {
    #     1: {Task(prep_infuser.py), Task(pour_water.py)},  
    #     2: {Task(boil_water.py)},  
    #     3: {Task(steep_tea.py)}})
        
    make_tea.run_tasks()

I think that UI looks better and more intuitive. It still plays nice with schedules too:

import time
from dequindre import schedule

if __name__ == '__main__':
    stoptime = time.time() + 3600 # seconds
    schedule.every(10).minutes.do(my_workflow)
    while time.time() < stoptime:
        schedule.run_pending()

Remove activate_env_cmd parameter

We can make assumptions about the conda infrastructure, notably where to find the python. That means we can just call the environment directly.

flush prints

Right now we get output like this:

I am preparing the infuser...
I am pouring water...
I am boiling water...
I am steeping tea...

Running Task(./tea-tasks/prep_infuser.py)
Running Task(./tea-tasks/pour_water.py)
Running Task(./tea-tasks/boil_water.py)
Running Task(./tea-tasks/steep_tea.py)

It should be like

Running Task(./tea-tasks/prep_infuser.py)
I am preparing the infuser...
Running Task(./tea-tasks/pour_water.py)
I am pouring water...
Running Task(./tea-tasks/boil_water.py)
I am boiling water...
Running Task(./tea-tasks/steep_tea.py)
I am steeping tea...

Whiteboard the dynamic configuration API

what should the API look and feel like?

The API is somewhat limited by the simple structure of Task, DAG, and Dequindre as the node-, edge-, and scheduler-classes respectively.

Above all, dequindre exists to simplify things. It's baby's first scheduler. It's a learning tool. Priorities are:

  1. to make the dag configuration easy to skim,
  2. to make the dag easy to configure, and
  3. to make the dag easy to debug.

Debugging is the last priority because dequindre is so minimalist that there really shouldn't be any puzzling errors.

With that in mind, here are some possible dependency configurations for the API:


current path

tea_dag = DAG(empty=True)
tea_dag.add_dependencies({
    pour_water: {get_mug, boil_water},
    steep_tea: pour_water,
    drink_tea: steep_tea
})

Reference last key as current value?

tea_dag = DAG(empty=True)
tea_dag.add_dependencies(
    pour_water={get_mug, boil_water},
    steep_tea=.,
    drink_tea=.
)

airflow-like

tea_dag = DAG(empty=True)
tea_dag.set_downstream(boil_water, pour_water)
tea_dag.set_downstream(pour_water, steep_tea)
tea_dag.set_downstream(steep_tea, drink_tea)

We could add dags together by overriding the byteshift operators.

make_tea >> go_to_work  # all of make_tea must finish before any of go_to_work starts

Offer option to config pipeline with yaml?

It might look like this:

env: {"HOMEDIR": path/to/homedir}
docker_image: continuum/miniconda3
task: my_task
  runfile: /path/to/runfile.py
  stage: 1
  conda_env: python36
  depends: [your_task, their_task]
  • Is this a good alternative to dynamic configuration?
  • How much work would it take?
  • Can we keep dynamic configuration without muddying the standard?

protect DAG.edges from end users

rename edges to _edges. Users shouldn't be accessing edges unless they know what they're doing. The naming should also reflect that edges are an internal attribute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.