Giter VIP home page Giter VIP logo

opendatadiscovery-specification's Introduction

open-data-discovery-specification-logoย 


Apache2 GitHub contributors Slack

Open Data Discovery Specification (ODD Spec): A Universal Standard for Metadata Collection

Specification

  • ๐Ÿ‘‰ specification.md is a versioned description of the current Open Dat aDiscovery Standard.
  • ๐Ÿ‘‰ Here is a reference implementation of Open Data Discovery Specification: Open Data Platform (ODD Platform).

Overview

ODD Spec is an open source industry-wide standard for collecting metadata. It provides a set of technologies to gather and export metadata from cloud-native applications, infrastructure, and other data sourcces to let it be discovered. The standard defines a schema for metadata collection and integrates with data tools through endpoints to receive metadata from them.

Data catalogs built on ODD Spec would enjoy important opportunities like data federation, real end-to-end lineage, data quality assurance, company-wide observability, and discoverable ML assets.

open-data-discovery-oddย 


Contributing

Contributing to ODD Spec is very welcome. For basic contributions, all you need is being comfortable with GitHub and Git. The best ways to contribute are:

  • Work on new adapters
  • Work on documentation

To ensure equal and positive communication, we adhere to our Code of Conduct. Before starting any interactions with this repository, please read it and make sure to follow.

Please before contributing check out our Contributing Guide and issues labeled "good first issue":

GitHub issues by-label


License

ODD Spec uses the Apache 2.0 License.

opendatadiscovery-specification's People

Contributors

andreynenashev avatar damirabdul avatar dementevnikita avatar dliubimov avatar evanto avatar germanosin avatar ramandamayeu avatar rm-os avatar ryapparov avatar vixtir avatar vladysl avatar yaroslavbeshta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendatadiscovery-specification's Issues

Is there any idea for RPC Data๏ผŸ

For a big, exist system, it's difficult to use data discovery tech from database. So we want to do the same things from rpc, just like Response.Field as Table.Column. Have ODD considered this scene of data? Is there any experience?

DataQualityTest Categorization

Goal

The goal is to categorize tests into two distinct categories:

  1. Assertion Tests: These are tests that run at specific points in time and primarily validate specific conditions or behaviors.
  2. Anomaly Detection Tests: These tests are designed to identify anomalies or deviations in data, and their outcomes are influenced by the temporal aspects and lifetime of the data.

Decisions

  1. Due to property type already used to specify expectation type by name, decided to introduce new property category.

Option 1: Categorizing Anomaly Detection Subtypes

In this option, we establish and classify common subtypes for anomaly detection.

Pros:

  • Simplifies the process of grouping tests by their subtypes because all possible values are predefined.

Cons:

  • Requires specification changes to incorporate new subtypes, which may involve additional administrative effort.

Specification:

...
DataQualityTestExpectationCategory:
    type: string
    enum:
      - ASSERTION
      - VOLUME_ANOMALY
      - FRESHNESS_ANOMALY
      - COLUMN_VALUES_ANOMALY
      - SCHEMA_CHANGE

DataQualityTestExpectation:
    type: object
    properties:
      type:
        type: string
        example: "expect_table_row_count_to_be_between"
      category:
          $ref: '#/components/schemas/DataQualityTestExpectationCategory'
    additionalProperties:
      type: string
...

Code Example:

test_anomaly=DataQualityTestExpectation(
	type="volume_anomalies",
	category=DataQualityTestExpectationCategory.VOLUME_ANOMALY
)

test_assertion=DataQualityTestExpectation(
	type="expect_table_row_count_to_be_between",
	category=DataQualityTestExpectationCategory.ASSERTION
)

Option 2. Simplifying the Categorization

In this approach, we define only the main categories, and the specific type of a test, whether it's an ASSERTION or an ANOMALY_DETECTION, is determined by the DataQualityTestExpectation.type property.

Pros:

  • Offers flexibility as any value can be assigned to the DataQualityTestExpectation.type property, allowing for custom categorization.
  • Streamlines the process and avoids the need to create new subtypes for anomaly detection.

Cons:

  • May make it challenging to group anomaly tests by their subtypes since the categorization is solely dependent on the DataQualityTestExpectation.type property.

Specification:

...
DataQualityTestExpectationCategory:
      type: string
      enum:
        - ASSERTION
        - ANOMALY_DETECTION
DataQualityTestExpectation:
    type: object
    properties:
      type:
        type: string
        example: "expect_table_row_count_to_be_between"
      category:
          $ref: '#/components/schemas/DataQualityTestExpectationCategory'
    additionalProperties:
      type: string
...

Code Example:

test_anomaly=DataQualityTestExpectation(
	type="volume_anomalies",
	category=DataQualityTestExpectationCategory.ANOMALY_DETECTION
)

test_assertion=DataQualityTestExpectation(
	type="expect_table_row_count_to_be_between",
	category=DataQualityTestExpectationCategory.ASSERTION
)

Add part to ingest Relationships between Data Entities

We need to prepare spec to ingest relationship between data entities:

  1. We assume that relationship could be only between 2 data entities (and self-reference included)
  2. We prepare only 2 types of relationship at the moment: between relations (tables/files/view/etc.) and between graph nodes
  3. We assume that there would be this list of attributes for different types:
    3.1 Between relations: 1) cardinality (One-to-Zero-One-or-More, One-to-One-or-More, One-to-Zero-or-One, One-to-Exactly-1); 2) Identifying/Non-identifying; 3) ODDRNs of data entities for beginning/end; 4) ODDRNs of data set fields for beginning/end (there could be composite foreign keys);
    3.2 Between graph nodes: 1) Relationship Types (name); 2) Direction; 3) List of Attributes (key-value); 4) ODDRNs of data entities for beginning/end;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.