Giter VIP home page Giter VIP logo

daflow's People

Contributors

abhioncbr avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

daflow's Issues

Update README.md of the project.

Update the README.md files of the various modules of the project and detailed README.md for building, deploying and running the project.

Configure etl feed metrics stats publishing

Currently etl_framework support two frameworks for publishing stats for the ETL feed.

  1. Entries in hive table for each feed run.
  2. Publishing stats on Prometheus.

Right now, it is tightly coupled and code needs a separation from the feed code so that based on job_static_param stats will be published.

Add support for multiple feeds in ETL job.

Currently, multiple feeds in extraction are supported but passing through the transformation stage & finally loading multiple feeds are not supported. Required support for multiple feeds. Also, further strategy required for support of atomicity in-case of multiple feeds.

Support GraphQL in Schema Registry along with grpc and thrift.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Spline Support

Is your feature request related to a problem? Please describe.
https://absaoss.github.io/spline/
The spline is a Data Lineage tracking and visualization tool for Apache Spark ™. It would be good to analyze metrics with the output of Spline.

Describe the solution you'd like
Integration of the Spline with DaFlow job flow.

Explore and support Apache Iceberg table format

Is your feature request related to a problem? Please describe.
Apache Iceberg is a new table format for large, slow-moving tabular data. From the load perspective of the ETL framework support is required for the Iceberg.

Describe the solution you'd like
Exploration and implementation of code are� required for supporting a new format in the framework.

Support Yaml based DaFlow Job Configurations.

Is your feature request related to a problem? Please describe.
Currently, DaFlow jobs are only XML based. It should accept job definitions from a different format. YAML is one of the popular formats.

Describe the solution you'd like
Build parser classes for parsing DaFlow job definition from the YAML file.

Move project build from SBT to Maven

Is your feature request related to a problem? Please describe.
Currently, elt-framework is based on SBT build tool. However, for managing a multi-module project, Maven build tool is easier and extensible. Moving build tool from sbt to maven is much needed for refactoring of the code too.

Build easy demo for DaFlow usage.

Is your feature request related to a problem? Please describe.
DaFlow is a complex project with several modules based on several technologies and It is necessary to have an easy good simple usage showcase.

Describe the solution you'd like
Docker container-based demo could be easily achievable to showcase DaFlow usage.

Add support for schema registry module with support for multiple versions of schema

Functionality / Module required for validation and transformation of a feed schema. Maintaining the versions of a schema is one of the basic requirement. Also, the schema should be easily accessible from various endpoints based on different methods.

Schema registry framework in future can be extensible for storing different vendors data-types mapping.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.