Giter VIP home page Giter VIP logo

user-stories's Introduction

user-stories

user-stories's People

Contributors

rabernat avatar

Stargazers

Emmanuel Jolaiya avatar

Watchers

Derek O'Callaghan avatar  avatar Alex Kerney avatar Anderson Banihirwe avatar

user-stories's Issues

Append-only production runs

User Profile

As a recipe maintainer

User Action

I want to re-run recipes in my feedstock (either manually or on a schedule) to append newly released data to my dataset

User Goal

So that I can keep the dataset built by my feedstock up-to-date with the latest releases from the data provider without needing to re-run the entire recipe

Acceptance Criteria

The ability to trigger append-only production runs (manually or on a schedule) from a feedstock. This might be inferred from the recipe itself, or perhaps specified by a new property in the meta.yaml

Linked Issues

Release automations

User Profile

As a project owner

User Action

I want all images and repos which are affected by releases of pangeo-forge-recipes to be automatically updated with each release of pangeo-forge-recipes

User Goal

So that I do not have to devote manual toil to syncing all parts of the platform following every release of pangeo-forge-recipes

Acceptance Criteria

To start, I thought it would be useful to brain dump a list of everything that we'd want to happen automatically following a pangeo-forge-recipes release, in order of dependency:

Linked Issues

See above

How to prioritize user stories

User Profile

As a project owner

User Action

I want to know how how to prioritize user stories

User Goal

So that I can drive growth of key metrics for Pangeo Forge

Acceptance Criteria

A process for linking user stories to key metrics we'd like to achieve for the platform, as described in this tweet:

  1. Is this the group of users whose needs we want to address right now? Why? Are we trying to improve a particular metric for that specific type of user? Are they particularly underserved by the product or important to our business or other goals?

Linked Issues

No response

Security & importing contributed recipes

User Profile

As a project owner

User Action

I want to reach a consensus with other project owners regarding best security practices for importing contributed recipes

User Goal

So that I know what security guardrails to observe while to developing new features on Pangeo Forge Cloud

Acceptance Criteria

An internal document and/or mutual understanding regarding best practices for importing nominally "untrusted" recipe modules. More details regarding motivating cases in Linked Issues section below.

Linked Issues

By way of background, there are two currently two places in the Registrar where we automatically create recipe runs in response to a push event:

  1. For recipes in a PR commit
  2. For recipes pushed to the default branch of a merged feedstock

In the second case, we can assume some Pangeo Forge maintainer (either a project owner or the maintainer of a feedstock) has looked at the code already. There may be risks here due to inattentiveness, etc. but we can leave those for another day.

What I'd like to discuss here is first case, wherein the submitted code is truly untrusted in the sense that literally anyone in the whole world can make a PR to /staged-recipes, and if it has a properly formatted and complete meta.yaml, then recipe runs will be created for all recipes listed in the meta.yaml. For this reason, I've assumed thus far that we should never actually import the recipe module when automatically creating recipe runs, and that is how the Registrar currently operates.

Certain open User Stories challenge this model, however. Namely:

In both of these cases, without importing the recipe module, we don't have enough information to create recipe runs. Specifically, as #3 is currently conceived, to determine whether or not to re-run a given recipe we would need to call self.sha256() on each of the recipes, in order to compare the resulting hashes to those of the prior run (if any) for the recipe. If the hashes match, we wouldn't create recipe runs at all. And for #10, we wouldn't know the names of the individual recipes within a dict_object without importing the recipe module and introspecting the specified dictionary.

Both of these User Stories have real, already-existing contributors that would like to use them, and from a design perspective would be big improvements to the platform. They would also be specifically useful for the low trust case of creating recipe runs for PRs, so simply saying "we don't support these features on PRs" seems far from ideal.

A few further questions/possibilities to kick off discussion:

  • Is there some importlib equivalent to yaml.safe_load which might be useful in this case?
  • One obvious option is to require a maintainer's approval to create recipe runs (rather than generating them automatically), but this feels (1) very un-ergonomic and tedious; (2) actually not that safe, because maintainers juggling lots of other tasks could potentially be fooled with phishing-style slight typos on import paths or the like.

Link deployed feedstocks to dataset page

User Profile

As a recipe maintainer

User Action

I want to be able to see where the data produced by my deployed recipe has been deposited

User Goal

so that I can perform data-proximate analysis on the data.

Acceptance Criteria

For a particular feedstock repo (e.g. https://github.com/pangeo-forge/WOA_1degree_monthly-feedstock), after the recipe has been run in production mode, the following should be possible

  • User visits the dashboard page for the feedstock (e.g. https://pangeo-forge.org/dashboard/feedstock/6) and sees a clear link on this page pointing to a catalog page for the resulting dataset. The catalog page displays a URL and instructions for opening the dataset
  • User visits the GitHub repo of a feedstock. The deployments link can be followed to find the associated catalog page.

Linked Issues

No response

Accurate metrics

User Profile

As the project owner

User Action

I want to get an accurate estimate of how many production datasets and recipe runs have been executed cumulatively and on a weekly basis

User Goal

so that I can track progress of the project.

Acceptance Criteria

  • Dataset numbers and recipe runs on pangeo-forge.org accurately reflect the latest correct number of production runs
  • We build an query to extract this information into a weekly report

Linked Issues

No response

Local reproduction of scale-related issues

User Profile

As a Pangeo Forge developer

User Action

I want to know how to reproduce scale-related execution failures with a cluster deployed from a local (or cloud-based) personal notebook and/or Python session

User Goal

So that I can debug scale-related problems outside Pangeo Forge Cloud infrastructure

Acceptance Criteria

  • Documentation which describes how to accomplish this task

Linked Issues

Don't rerun unchanged recipes

User Profile

As a recipe maintainer

User Action

I want to push commits to the default branch of my feedstock repository, and have the resulting production deployment only rerun new recipes or those that have changed, and not rerun unchanged recipes

User Goal

So that I can add or update certain recipes in my feedstock without rerunning all of them.

Acceptance Criteria

A mechanism to check the hash of all recipes at deployment time, and skip re-running if the hash matches the hash for the same recipe in the last production deployment

Linked Issues

In the order in which they should be merged:

  1. pangeo-forge/pangeo-forge-recipes#349
  2. pangeo-forge/pangeo-forge-orchestrator#63
  3. https://github.com/pangeo-forge/registrar/pull/36

From deployment to `dataset_public_url`

User Profile

As a feedstock contributor

User Action

I want one click between my feedstock's deployment page and the dataset_public_url for my dataset

User Goal

So that, starting from feedstock's GitHub repo, I can easily find a dataset built by my feedstock

Acceptance Criteria

From the deployments page of a feedstock repo, an average user should be able find a dataset_public_url for successful production deployments in one click, without needing to read any documentation or any prior specialized knowledge.

Linked Issues

No response

CLI for preparing / validating recipes

User Profile

As recipe contributor

User Action

I want to test my recipe on the command line before submitting a pull request

User Goal

so that I can avoid a slow debugging cycle talking to the pangeo forge bot on github. (Example: pangeo-forge/staged-recipes#150)

Acceptance Criteria

I run a command like

$ pangeo-forge recipe validate recipe_folder/

and see output like

It looks like your meta.yaml does not conform to the specification.

            1 validation error for MetaYaml
    pangeo_notebook_version
      field required (type=value_error.missing)

or

When I tried to import your recipe module, I encountered this error

            line 17, in <module>
        fs.ls(url_base + str(year), detail=False)
    NameError: name 'fs' is not defined

Please correct your recipe module so that it's importable.

Linked Issues

No response

More detailed catalog information

User Profile

As a data library user

User Action

I want to be able to see detailed information (e.g. variables, dimensions, attributes) about the datasets in the catalog

User Goal

So that I can decide whether I want to use a particular dataset.

Acceptance Criteria

Linked Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.