Giter VIP home page Giter VIP logo

deploy-recipe-action's Introduction

pangeo-forge-recipes

PyPI version CI Codecov Documentation Status Code style: black NSF Award 2026932

pangeo-forge is an open-source tool designed to aid the extraction, transformation, and loading of datasets. The goal of pangeo-forge is to make it easy to extract datasets from traditional data repositories and deposit them into cloud object storage in analysis-ready, cloud-optimized format.

pangeo-forge is inspired by conda-forge, a community-led collection of recipes for building Conda packages. We hope that pangeo-forge can play the same role for datasets.

Documentation

More can be learned about pangeo-forge, its progress, and related subprojects in its official documentation.

Contributing

pangeo-forge is still early in development - there are several ways to contribute:

  1. Create a recipe for a dataset you are interested in
  2. Open an issue or pull request here or in any of the related subprojects (pangeo-smithy, staged-recipes)
  3. Check out the project roadmap

Get in touch

Discussions on Pangeo Forge are generally hosted biweekly on Mondays at 2pm ET. Calendar link here. We aim to announce cancellations on this discourse thread.

License

This project is licensed under the Apache License, Version 2.0.

deploy-recipe-action's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deploy-recipe-action's Issues

Drop `pangeo/forge` image in favor of lighter base image

pangeo-forge/pangeo-forge-runner#90 drops use of the pangeo/forge image dependency in pangeo-forge-runner so we should probably do the same. This will allow us to keep pace with current beam releases, and should also speed up builds, which @jbusecke has reported are (unsurprisingly) slow. This should be pretty simple to experiment with:

  1. Choose a lighter image to update this line with

    FROM quay.io/pangeo/forge:5e51a29

    Maybe read this or similar blog posts. Probably end up just trying one of these?

  2. Replace quay.io/pangeo/forge:5e51a29 on that line with the URI for the chosen image tag

  3. Now we don't have conda, so drop the conda run -n notebook from this line:

    RUN conda run -n notebook pip install git+https://github.com/pangeo-forge/pangeo-forge-runner@main

  4. That might be all it takes?

  5. Now make a PR with these changes, try running the action from @pr-branch and see if it works (and is faster).

  6. We don't actually test the docker build in our testing here. Maybe do that before merge, I can help with that when we get to that point.

@jbusecke wanna give it a shot?

Use local checkout for pangeo-forge-runner call

In this action, we have an actions/checkout step, so the action is running in a locally-checked-out copy of the feedstock repo. This makes fetching the repo over https redundant (and unnecessarily costly). So instead of:

# assemble https url for pangeo-forge-runner
repo = urljoin(server_url, repository)

"--repo",
repo,

I think we can save networking time and just do:

"--repo=."

Consistent treatment of labels with leading (trailing) spaces

I just discovered that when you specify recipe labels with leading spaces (run: some_recipe_id), the recipe will get picked up by the action (it tries to run it), but will fail due to the constucted jobname still containing spaces (which dataflow hates).

We should make this behavior consistent by doing one of the below:

  • Error out earlier in the action (with helpful error message) if the label based id contains spaces
  • Or remove spaces automatically as part of the name generation.

I guess if we are implementing one or the other we want to take care of trailing spaces too?

Tests

#4 highlights the need for tests (in particular integration tests). This is a top priority.

Generate clickable dataflow link for each submitted job

I seem to often get confused which dataflow jobs are associated with which run of the GitHub deploy workflow. I think we have all the necessary info to create a list of clickable links.

Ideally this would be shown in some sort of expandable field in the 'checks' section under a PR and could link the recipe name to a clickable link to the dataflow console.

Deployment error on new repo without pull requests

I just noticed a quirk with the deploy action for a brand new repo where the action was entirely triggered by pushes to main.

I was trying to separate a recipe example from a more mature and complex bakery repo and ran into this error.

assert len(pulls) == 1  # pretty sure this is always true, but just making sure

I think this line here basically fails because there is not a single PR in the repo yet (will test this in a minute). But just wanted to document this here.

Docs

There is not yet any documentation regarding how to use this action.

Here are some examples used during development:

(These may be out of date, and should only serve as reference.)

Filling out the README here with usage examples/instructions is a top priority.

Print out STDERR when job submission fails

I am getting a lot of errors like this

Submitting job...
[54](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:55)
recipe_ids = ['METAFLUX_GPP_RECO_monthly']
[55](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:56)
Submission jobname = 'METAFLUX_GPP_RECO_monthlybc9fe248'
[56](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:57)
Running PGF runner with extra_cmd = ['--Bake.recipe_id=METAFLUX_GPP_RECO_monthly', '--Bake.job_name=METAFLUX_GPP_RECO_monthlybc9fe248']
[57](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:58)
  File "/deploy_recipe.py", line 100, in <module>
[58](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:59)
    deploy_recipe_cmd(cmd + extra_cmd)
[59](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:60)
  File "/deploy_recipe.py", line 17, in deploy_recipe_cmd
[60](https://github.com/leap-stc/data-management/actions/runs/4942452173/jobs/8836187762#step:5:61)
    raise ValueError("Job submission failed.")

but there is no information provided about why it failed. Could we capture the stderr just like here and print it in case of failure?

Some feedback on the label based triggering

Just wanted to write down some thoughts about the label based triggering of deployments.

I am still finding myself in a lot of pretty messy situations for two reasons:

  • Mixing the running of recipes with code changes. I often want to just rerun a recipe and currently have to open a dummy PR or mix the running of recipes into other PRs. This gets messy over time and is not very easy to grok after a while (or other contributors).
  • I have to add/remove labels one by one. I often find myself in situations where I want to run 2+ recipes. But each setting of a label 'fires' off a github action. This means if I add run:a, run:b as labels in one go I will fire of two github workflows (one deploying recipe a, the other deploying recipe a and b ). This is very hard to keep track of after a while or with many recipes. The only solution I have found so far is to add run:a, then remove run:a, then add run:b and remove run:b to fire of two jobs (the first one runs a, the second one runs b).

I am not quite sure how to improve this behavior but wanted to note this down somewhere for future discussion.

Support comments in requirements.txt

Failed parsing of requirements.txt in leap-stc/data-management@d30240a:

https://github.com/leap-stc/data-management/actions/runs/5685911763/job/15411769228?pr=33#step:6:58

reveals that this logic for parsing requirements.txt

with open("feedstock/requirements.txt") as f:
to_install = f.read().splitlines()
print(f"Installing extra packages {to_install}...")

is too naive.

We should refine that logic to parse requirements.txt's that have comments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.