Giter VIP home page Giter VIP logo

dataflow-status-monitoring's Introduction

pangeo-forge-recipes

PyPI version CI Codecov Documentation Status Code style: black NSF Award 2026932

pangeo-forge is an open-source tool designed to aid the extraction, transformation, and loading of datasets. The goal of pangeo-forge is to make it easy to extract datasets from traditional data repositories and deposit them into cloud object storage in analysis-ready, cloud-optimized format.

pangeo-forge is inspired by conda-forge, a community-led collection of recipes for building Conda packages. We hope that pangeo-forge can play the same role for datasets.

Documentation

More can be learned about pangeo-forge, its progress, and related subprojects in its official documentation.

Contributing

pangeo-forge is still early in development - there are several ways to contribute:

  1. Create a recipe for a dataset you are interested in
  2. Open an issue or pull request here or in any of the related subprojects (pangeo-smithy, staged-recipes)
  3. Check out the project roadmap

Get in touch

Discussions on Pangeo Forge are generally hosted biweekly on Mondays at 2pm ET. Calendar link here. We aim to announce cancellations on this discourse thread.

License

This project is licensed under the Apache License, Version 2.0.

dataflow-status-monitoring's People

Contributors

cisaacstern avatar sharkinsspatial avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

dataflow-status-monitoring's Issues

Use dedicated service account for each Cloud Function

In #6, I moved the webhook cred into the Secrets Manager API, and granted the default runtime service account read access to it:

role = "roles/secretmanager.secretAccessor"
members = [
"serviceAccount:${var.project}@appspot.gserviceaccount.com",

Here's the SO post which made me realize that was necessary and the place in the GCP docs it references.

For a more fine-grained permission structure down the line, we could also make a dedicated service account for each function.

This seemed to add unnecessary complexity at this early stage of the project, but may be worth keeping in mind as we grow.

cc @sharkinsspatial @rabernat (No action needed now AFAICT, just keeping you both in the loop.)

Inconsistent behavior of job/status metric.

This repo is my initial attempt at a Dataflow status monitoring and reporting approach as outlined in https://github.com/pangeo-forge/registrar/issues/47. There does not seem to be a consistently documented approach for providing external notifications of Dataflow job success or failure (most approaches describe using log sinks to identify and report failed jobs but don't provide a recommended strategy for successful jobs).

While the approach of Cloud Monitoring Alerts that I've taken here appears to work, I've noticed that job/status metric occasionally seems to not report successful jobs. This is my first time working with Cloud Monitoring and MQL so I'm unsure if this is related to an error in my table alignment logic or if this is an issue with my understanding of the metric itself.

@alxmrs If you have a moment, can you provide some feedback on if my approach is usable or if there is a recommended approach for monitoring Dataflow jobs and reporting their status to external applications (via a webhook). cc @cisaacstern

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.