Giter VIP home page Giter VIP logo

composable-logs / composable-logs Goto Github PK

View Code? Open in Web Editor NEW
17.0 1.0 0.0 1.2 MB

Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documentation site with more details and demo:

Home Page: https://composable-logs.github.io/composable-logs

License: MIT License

Makefile 3.19% Python 96.81%
ray dag python3 jupytext papermill mlops data-science datascience notebooks jupyter-notebooks

composable-logs's Introduction

ci/cd: publish snapshot to PyPI PyPI version license=mit Ideas and feedback=welcome

Composable Logs

Composable Logs is a Python library to run ML/data workflows on stateless compute infrastructure (that may be ephemeral or serverless).

In particular, using Composable Logs one can do ML experiment tracking without a dedicated tracking server (and database) to record ML metrics, models or artifacts. Instead, these are emitted using the OpenTelemetry standard for logging. This is an open standard in software engineering with growing support.

It can be useful to think of the logs emitted by Composable Logs as somewhat similar to logs emitted by unit test frameworks (like eg the JUnit format).

For example, log events emitted from Composable Logs can be directed to a JSON-file, or sent to any log storage supporting OpenTelemetry (span) events. In either case, this means that one does not need a separate tracking service only for ML experiments.

The below shows how a captured JSON log can be converted into a static website based on ML Flow.

Composable Logs uses the Ray framework for parallel task execution.

For more details:

Documentation and architecture

Live demo

  • Using Composable Logs one can run a ML training pipeline using only a free Github account. This uses:

    • Github actions: trigger the ML pipeline daily and for each PR.
    • Build artifacts: to store OpenTelemetry logs of past runs.
    • Github Pages: to host static website for reporting on past runs.

    The static website is rebuilt after each pipeline run (by extracting relevant data from past OpenTelemetry logs). This uses a fork of MLFlow that can be deployed as a static website, https://github.com/composable-logs/mlflow.

    Screenshot

  • Codes for pipeline (MIT): https://github.com/composable-logs/mnist-digits-demo-pipeline

Public roadmap and planning

Install via PyPI

Latest release
Snapshot of latest commit to main branch

Any feedback/ideas welcome!

License

(c) Matias Dahl, MIT, see LICENSE.md.

(Note: As of 1/2023 this project was renamed from pynb-dag-runner to composable-logs.)

composable-logs's People

Contributors

dependabot[bot] avatar matiasdahl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

composable-logs's Issues

`pynb-dag-runner` -> `composable-logs` package rename

Code

Switch output pypi libraries

mlflow

Tasks

Update package names

Documentation

Tasks

Get task context using explicit `get_task_context()' function

Tasks

Set up initial mkDocs documentation site

[devx, pynb-dag-runner + demo repos] Review docker, makefile, tmuxinator setups

mnist-demo repo

pynb-dag-runner repo

[mnist, main repo] Verify local notebook dev-setup

Check that local notebook dev works with both repos (in vscode dev container setup)

mnist-demo-pipeline

pynb-dag-runner package:

docs: Comparison with similar tools (if any?)

Currently docs has an (uncomplete) list of similar projects.

https://pynb-dag-runner.github.io/pynb-dag-runner/home/similar-projects/

From the page it is not clear if there are options to pynb-dag-runner. Ie., are there any options to run a notebook (or pipeline of notebooks) serverless, and report on results wihtout infrastructure?

Additional projects potentially related but not listed

open-telemetry/opentelemetry-python-contrib#151

pynb-dag-runner: Move to new span parser (pydantic, assumes one run per task)

Refactor (Python task/notebook) tests on evaluated spans

Fail if trying to execute pipeline with task with more than 1 retry setting

Implement new iterator for parsing spans

Move UI over to use data generated from new parser

Cleanup/minor

front end: Render Mermaid from source in UI (without conversion into png for front end)

Write mermaid files to static website

Render Mermaid files in UI

new workflow API: support for pure Python tasks

Convert OpenTelemetry logs directly into static assets for web ui


Currently converting OpenTelemetry logs into static assets for the web UI is done in two phases. This could be simplified into a one-step process without an intermediate step.

Old approach

Step 1:

  • Each OpenTelemetry JSON is expanded into a directory structure of individual files.
  • For each run, this directory structure is stored as a build artifact

Step 2:

  • To generate static assets for the web UI, data is read from the expanded directory structures.
  • This requires a separate parser.

This approach was implemented before moving to Graph based OpenTelemetry parsing (since parsing was then really slow).


Start refactorings of cli-tools:

Revise no-link version of Mermaid DAG diagram

Create new cli generate_static_data

Switch to use latest version of pynb-dag-runner in mnist demo pipeline

Reorganize Github project issue tracker

Tasks

Create a pip-package with pynb-dag-runner core

Add README content to package PyPI description pages

Package pynb-dag-runner[-snapshot]

Works:


Package pynb-dag-runner-webui

Description and long descriptions updated:

Update READMEs in project git repositories

Modified MLFlow repo

mnist-dag-runner rep

pynb-dag-runner main repo

Support logging for logging with MLFlow client

Tasks

Support pure Python tasks

subtasks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.