pangeo-forge / user-stories Goto Github PK

View Code? Open in Web Editor NEW

1.0 4.0 0.0 4 KB

User stories to guide PF development

user-stories's Introduction

user-stories

user-stories's People

Contributors

Stargazers

Watchers

user-stories's Issues

Append-only production runs

User Profile

As a recipe maintainer

User Action

I want to re-run recipes in my feedstock (either manually or on a schedule) to append newly released data to my dataset

User Goal

So that I can keep the dataset built by my feedstock up-to-date with the latest releases from the data provider without needing to re-run the entire recipe

Acceptance Criteria

The ability to trigger append-only production runs (manually or on a schedule) from a feedstock. This might be inferred from the recipe itself, or perhaps specified by a new property in the meta.yaml

Linked Issues

Release automations

User Profile

As a project owner

User Action

I want all images and repos which are affected by releases of pangeo-forge-recipes to be automatically updated with each release of pangeo-forge-recipes

User Goal

So that I do not have to devote manual toil to syncing all parts of the platform following every release of pangeo-forge-recipes

Acceptance Criteria

To start, I thought it would be useful to brain dump a list of everything that we'd want to happen automatically following a pangeo-forge-recipes release, in order of dependency:

pangeo-forge/pangeo-forge-recipes#357
Update Conda Forge feedstock, xref conda-forge/pangeo-forge-recipes-feedstock#4
- We created the Conda Forge feedstock when 0.8.2 was the latest pangeo-forge-recipes release. Since then, a Conda Forge bot noticed that 0.8.3 was available on PyPI, and opened ☝️ that PR. We have not yet merged that PR. I am unclear if a manual merge is required on the pangeo-forge-recipes-feedstock for every release.
Build and push new bakery image, xref pangeo-forge/pangeo-forge-bakery-images#7
- It seems preferable to wait until the new release is available on Conda Forge to build the bakery image, because then we can install with conda like this PR for 0.8.2 rather then having to install from pip like this PR for 0.8.3
Update Registrar BAKERY_IMAGES dict, xref https://github.com/pangeo-forge/registrar/issues/37
- As noted in the linked Issue for this point, we could consider changing this architecture to make this easier to automate.
Update Sandbox template and (maybe) Dockerfile
- pangeo-forge/sandbox#8
- pangeo-forge/sandbox#9

Linked Issues

See above

How to prioritize user stories

User Profile

As a project owner

User Action

I want to know how how to prioritize user stories

User Goal

So that I can drive growth of key metrics for Pangeo Forge

Acceptance Criteria

A process for linking user stories to key metrics we'd like to achieve for the platform, as described in this tweet:

Is this the group of users whose needs we want to address right now? Why? Are we trying to improve a particular metric for that specific type of user? Are they particularly underserved by the product or important to our business or other goals?

Linked Issues

No response

Security & importing contributed recipes

User Profile

As a project owner

User Action

I want to reach a consensus with other project owners regarding best security practices for importing contributed recipes

User Goal

So that I know what security guardrails to observe while to developing new features on Pangeo Forge Cloud

Acceptance Criteria

An internal document and/or mutual understanding regarding best practices for importing nominally "untrusted" recipe modules. More details regarding motivating cases in Linked Issues section below.

Linked Issues

By way of background, there are two currently two places in the Registrar where we automatically create recipe runs in response to a push event:

For recipes in a PR commit
For recipes pushed to the default branch of a merged feedstock

In the second case, we can assume some Pangeo Forge maintainer (either a project owner or the maintainer of a feedstock) has looked at the code already. There may be risks here due to inattentiveness, etc. but we can leave those for another day.

What I'd like to discuss here is first case, wherein the submitted code is truly untrusted in the sense that literally anyone in the whole world can make a PR to /staged-recipes, and if it has a properly formatted and complete meta.yaml, then recipe runs will be created for all recipes listed in the meta.yaml. For this reason, I've assumed thus far that we should never actually import the recipe module when automatically creating recipe runs, and that is how the Registrar currently operates.

Certain open User Stories challenge this model, however. Namely:

In both of these cases, without importing the recipe module, we don't have enough information to create recipe runs. Specifically, as #3 is currently conceived, to determine whether or not to re-run a given recipe we would need to call self.sha256() on each of the recipes, in order to compare the resulting hashes to those of the prior run (if any) for the recipe. If the hashes match, we wouldn't create recipe runs at all. And for #10, we wouldn't know the names of the individual recipes within a dict_object without importing the recipe module and introspecting the specified dictionary.

Both of these User Stories have real, already-existing contributors that would like to use them, and from a design perspective would be big improvements to the platform. They would also be specifically useful for the low trust case of creating recipe runs for PRs, so simply saying "we don't support these features on PRs" seems far from ideal.

A few further questions/possibilities to kick off discussion:

Is there some importlib equivalent to yaml.safe_load which might be useful in this case?
One obvious option is to require a maintainer's approval to create recipe runs (rather than generating them automatically), but this feels (1) very un-ergonomic and tedious; (2) actually not that safe, because maintainers juggling lots of other tasks could potentially be fooled with phishing-style slight typos on import paths or the like.

Link deployed feedstocks to dataset page

User Profile

As a recipe maintainer

User Action

I want to be able to see where the data produced by my deployed recipe has been deposited

User Goal

so that I can perform data-proximate analysis on the data.

Acceptance Criteria

For a particular feedstock repo (e.g. https://github.com/pangeo-forge/WOA_1degree_monthly-feedstock), after the recipe has been run in production mode, the following should be possible

User visits the dashboard page for the feedstock (e.g. https://pangeo-forge.org/dashboard/feedstock/6) and sees a clear link on this page pointing to a catalog page for the resulting dataset. The catalog page displays a URL and instructions for opening the dataset
User visits the GitHub repo of a feedstock. The deployments link can be followed to find the associated catalog page.

Linked Issues

No response

Accurate metrics

User Profile

As the project owner

User Action

I want to get an accurate estimate of how many production datasets and recipe runs have been executed cumulatively and on a weekly basis

User Goal

so that I can track progress of the project.

Acceptance Criteria

Dataset numbers and recipe runs on pangeo-forge.org accurately reflect the latest correct number of production runs
We build an query to extract this information into a weekly report

Linked Issues

No response

Dict objects in `meta.yaml`

User Profile

As a recipe contributor

User Action

I want to be able to use dict objects as defined in ADR-2

User Goal

So that I can dynamically generate recipe instances in my recipe module using dictionary comprehensions

Acceptance Criteria

Tested feature(s) to support this in the Registrar

Linked Issues

Local reproduction of scale-related issues

User Profile

As a Pangeo Forge developer

User Action

I want to know how to reproduce scale-related execution failures with a cluster deployed from a local (or cloud-based) personal notebook and/or Python session

User Goal

So that I can debug scale-related problems outside Pangeo Forge Cloud infrastructure

Acceptance Criteria

Documentation which describes how to accomplish this task

Linked Issues

Failure related to distributed locking can be hard to reproduce in a local context; e.g. pangeo-forge/cmip6-feedstock#2 (comment)

Don't rerun unchanged recipes

User Profile

As a recipe maintainer

User Action

I want to push commits to the default branch of my feedstock repository, and have the resulting production deployment only rerun new recipes or those that have changed, and not rerun unchanged recipes

User Goal

So that I can add or update certain recipes in my feedstock without rerunning all of them.

Acceptance Criteria

A mechanism to check the hash of all recipes at deployment time, and skip re-running if the hash matches the hash for the same recipe in the last production deployment

Linked Issues

In the order in which they should be merged:

More than one ConcatDim

User Profile

As a recipe contributor

User Action

I want pangeo-forge-recipes to support more than one ConcatDim

User Goal

So that I can write recipes for datasets which require concatenation along more than one dimension

Acceptance Criteria

A feature in pangeo-forge-recipes to support more than one ConcatDim

Linked Issues

pangeo-forge/pangeo-forge-recipes#140
pangeo-forge/pangeo-forge-recipes#348

From deployment to `dataset_public_url`

User Profile

As a feedstock contributor

User Action

I want one click between my feedstock's deployment page and the dataset_public_url for my dataset

User Goal

So that, starting from feedstock's GitHub repo, I can easily find a dataset built by my feedstock

Acceptance Criteria

From the deployments page of a feedstock repo, an average user should be able find a dataset_public_url for successful production deployments in one click, without needing to read any documentation or any prior specialized knowledge.

Linked Issues

No response

CLI for preparing / validating recipes

User Profile

As recipe contributor

User Action

I want to test my recipe on the command line before submitting a pull request

User Goal

so that I can avoid a slow debugging cycle talking to the pangeo forge bot on github. (Example: pangeo-forge/staged-recipes#150)

Acceptance Criteria

I run a command like

$ pangeo-forge recipe validate recipe_folder/

and see output like

It looks like your meta.yaml does not conform to the specification.

            1 validation error for MetaYaml
    pangeo_notebook_version
      field required (type=value_error.missing)

When I tried to import your recipe module, I encountered this error

            line 17, in <module>
        fs.ls(url_base + str(year), detail=False)
    NameError: name 'fs' is not defined

Please correct your recipe module so that it's importable.

Linked Issues

No response

More detailed catalog information

User Profile

As a data library user

User Action

I want to be able to see detailed information (e.g. variables, dimensions, attributes) about the datasets in the catalog

User Goal

So that I can decide whether I want to use a particular dataset.

Acceptance Criteria

User browses to https://pangeo-forge.org/catalog an can click on a dataset (related to #1)
Information is shown similar to the Xarray repr, or the legacy Pangeo catalog (e.g. https://catalog.pangeo.io/browse/master/ocean/SOSE/)
Code samples show how to open the data in python.

pangeo-forge / user-stories Goto Github PK

user-stories's Introduction

user-stories

user-stories's People

Contributors

Stargazers

Watchers

user-stories's Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

User Profile

User Action

User Goal

Acceptance Criteria

Linked Issues

Recommend Projects

Recommend Topics

Recommend Org