Giter VIP home page Giter VIP logo

open-data-contract-standard's Introduction

title description image
Open Data Contract Standard (ODCS)
Home of Open Data Contract Standard (ODCS) documentation.

OpenSSF Best Practices

Open Data Contract Standard (ODCS)

Welcome!

Thanks for your interest and for taking the time to come here! ❤️

Executive summary

This standard describes a structure for a data contract. Its current version is v2.2.2. It is available for you as an Apache 2.0 license. Contributions are welcome!

Discover the open standard

A reader-friendly version of the standard can be found on its dedicated site.

Discover the Open Data Contract Standard. This file contains some explanations and several examples. More examples can be found here.

What is a Data Contract?

The basics of a data contract

A data contract defines the agreement between a data producer and consumers. A data contract contains several sections:

Data contract schema

Figure 1: illustration of a data contract, its principal contributors, sections, and usage.

JSON Schema

JSON Schema for ODCS can be found here. You can import this schema into your IDE for validation of your YAML files. Links below show how you can import the schema:

Contributing to the project

Check out the CONTRIBUTING file.

Articles and Other Resources

If you spot an article about the Open Data Contract Standard, make a pull request!

More

History

Formerly known as the data contract template, this standard is used to implement Data Mesh at PayPal. Starting with v2.2.0, it is maintained by a 501c6 non-profit organization called AIDA User Group (Artificial Intelligence, Data, and Analytics User Group). On November 30th, 2023, AIDA User Group and the Linux Foundation AI & Data joined forces to create Bitol. Bitol englobes ODCS and future standards & tools.

How does PayPal use Data Contracts?

PayPal uses data contracts in many ways, but this article from the PayPal Technology blog gives a good introduction.

open-data-contract-standard's People

Contributors

bart-vee avatar caladogan avatar chrfoyer avatar destouma avatar fabiocarvalho777 avatar guowj800 avatar iliev01 avatar jaterson avatar jeremyjiang126 avatar jgperrin avatar johannesboyne avatar laveenakewlani avatar pflooky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open-data-contract-standard's Issues

Documentation Request: Example of a full contract (file or in readme)

Great work to the team and thanks for this effort to open source and maintain an open standard. I had a question/request on how to best tie the sections of contracts in the readme and example folders into a unitary document. Essentially in the contracts docs and examples section there are different component pieces of a data contract, but I wasn't clear on whether these are meant to be represented together as a unit or always within their own files. Would it be possible to get an end to end sample contract for a unified sample dataset people are familiar with, even if this is only in the readme?

I see sections are separated by # ... in the readme but perhaps a single unitary contract file or a section tying all the sections together could help drive it home.

Perhaps I've completely missed the plot and each section is meant to be its own file, but figured it was worth a shot.

Thank you!

[question] How to transform dbt yaml to open-data-contract

Not sure if this is the right place to ask, sorry for the inconvenience.

Since learning of the Open Data Contract Standard, I started to wonder if there is a tool that can convert to and from dbt yaml files to the standard. Sure there will be missing parts, but there is a lot there.

Additionally, it would help to find more examples of the standard usage.

P.S. Bitol website is missing a link to the github organization repo.

v.3.0.0 Changes Proposal

3.0.0 PROPOSED VERSION NOTES

This version of the contract contains the following major and minor change propositions (3.0.0 increment at time of writing Dec 29th 2023). I am using the full-contract.yaml file to make the changes evident but have not yet propagated these into other modules in my local branch at time of writing.

MAJOR/BREAKING PROPOSALS (Breaking changes are called out in #NOTE comments)

  • Proposing to consolidate metadata into a _metadata yaml map object
  • Proposing to nest SLA information alongside columns which they apply to.
  • Proposing to create new sections for the following areas (will be possible to consolidate into _metadata if that is desired)
    • access
    • sourceDetails
    • support

MINOR PROPOSALS

  • Proposing that the contract kind and apiVersion should be the first section
  • Proposing to consolidate all non-nested metadata fields at the root level to the top of the contract
  • Examine the possibility that alphabetical order be used to order sections (only relevant if _metadata section is adopted and consolidated)
# BASEMODEL:
# ODCS Metadata
kind: DataContract
apiVersion: 3.0.0 # Standard version (follows semantic versioning, previously known as templateVersion)

# NOTE: Possible breaking change proposal:
#   Subsequent sections would fall under this _metadata section. This section would inherently be sorted to the top, and could have sub sections within it relating to metadata
# Ex:
_metadata:
  datasetDomain: seller # Domain
  quantumName: my business data product name # Data product name
  userConsumptionMode: Analytical
  version: 1.1.0 # Version (follows semantic versioning)
  status: current
  uuid: 53581432-6c55-4ba2-a65f-72344a91553a
  type: tables
  tenant: ClimatentrsInc

# NOTE: Begin non _metadata section changes:
# Data Contract Metadata: What's this data contract about and whom/what domain does it apply to?
# Consolidated these fields from v.2.2.0 to reside together
datasetDomain: seller # Domain
quantumName: my business data product name # Data product name
userConsumptionMode: Analytical
version: 1.1.0 # Version (follows semantic versioning)
status: current
uuid: 53581432-6c55-4ba2-a65f-72344a91553a
type: tables
tenant: ClimatentrsInc
systemInstance: instance.ClimateQuantum.org
contractCreatedTs: 2022-11-15 02:59:43

# Access
access: # NOTE: New standalone section in v.3.0.0
  database: pypl-edw.pp_access_views
  password: "${env.password}"
  drivers:
    - driver: jdbc
      driverVersion: x.x.x
      driverUrl: urlToDriverVersion
  schedulerAppName: name_coming_from_scheduler # NEW 2.1.0 Required if you want to schedule stuff, comes from DataALM.
  server: null
  username: "${env.username}"

# Description: High level details on the data product/quantum
description:
  limitations: null
  purpose: Views built on top of the seller tables.
  notes: null # New optional field v.3.0.0
  usage: null

# Pricing (if any):
price:
  priceAmount: 9.95
  priceCurrency: USD
  priceUnit: megabyte

# Roles
roles:
  - role: microstrategy_user_opr
    access: read
    firstLevelApprovers: Reporting Manager
    secondLevelApprovers: "mandolorian"
  - role: bq_queryman_user_opr
    access: read
    firstLevelApprovers: Reporting Manager
    secondLevelApprovers: na
  - role: risk_data_access_opr
    access: read
    firstLevelApprovers: Reporting Manager
    secondLevelApprovers: "dathvador"
  - role: bq_unica_user_opr
    access: write
    firstLevelApprovers: Reporting Manager
    secondLevelApprovers: "mickey"

# Source: Details about each source, best practice is to add a source and not reuse source numbers.
sourceDetails: # NOTE: New standalone section in v.3.0.0
  - source: sourcename1
    contractPath: path/to/contract/if/local
    contractUrl: urlTosSourceContract
    datasetProject: edw
    datasetName: access_views
    sourcePlatform: googleCloudPlatform
    sourceNotes: Any extra source notes needed # New in v.3.0.0 This field is useful for adding color about the data source.
    sourceSystem: bigQuery
  - source: sourcename2
    contractPath: path/to/contract/if/local
    contractUrl: urlTosSourceContract
    datasetProject: mySnowflakeProject
    datasetName: my_snowflake_views
    sourcePlatform: azure
    sourceNotes: null
    sourceSystem: snowflake
  - source: sourcename3
    contractPath: null # Sometimes you may not have all the source details.
    contractUrl: null # Work with source teams to build these out.
    datasetProject: myS3LakeProject
    datasetName: my_raw_lake_files
    sourcePlatform: aws
    sourceNotes: Contract needed for this data source, must work with the source team
    sourceSystem: s3

# Stakeholders
stakeholders:
  - username: ceastwood
    role: Data Scientist
    dateIn: 2022-08-02
    dateOut: 2022-10-01
    replacedByUsername: mhopper
  - username: mhopper
    role: Data Scientist
    dateIn: 2022-10-01
    dateOut: null
    replacedByUsername: null
  - username: daustin
    role: Owner
    comment: Keeper of the grail
    dateIn: 2022-10-01
    dateOut: null
    replacedByUsername: null

# Support: How and where to get support for this data product.
support: # NOTE: New standalone section in v.3.0.0
  productDl: [email protected]
  productFeedbackUrl: null
  productSlackChannel: "#product-help"

# Tags
tags:
  - transactions

# Dataset, schema and quality, usually should come as the last section
dataset:
  - table: tbl
    physicalName: tbl_1 # NEW in v2.1.0, Optional, default value is table name + version separated by underscores, as table_1_2_0
    priorTableName: null # if needed
    description: Provides core payment metrics
    authoritativeDefinitions: # NEW in v2.2.0, inspired by the column-level authoritative links
      - url: https://catalog.data.gov/dataset/air-quality
        type: businessDefinition
      - url: https://youtu.be/jbY1BKFj9ec
        type: videoTutorial
    tags: null
    dataGranularity: Aggregation on columns txn_ref_dt, pmt_txn_id
    columns:
      - column: txn_ref_dt
        classification: null
        clusterKeyPosition: -1
        clusterStatus: false
        criticalDataElementStatus: false
        businessName: transaction reference date
        description: null
        encryptedColumnName: null
        isPrimary: false # NEW in v2.1.0, Optional, default value is false, indicates whether the column is primary key in the table.
        isNullable: false
        logicalType: date
        partitionKeyPosition: 1
        partitionStatus: true
        physicalType: date
        primaryKeyPosition: -1
        sampleValues:
          - 2022-10-03
          - 2020-01-28
        tags: null
        transformDescription: defines the logic in business terms; logic for dummies
        transformLogic: sel t1.txn_dt as txn_ref_dt from table_name_1 as t1, table_name_2 as t2, table_name_3 as t3 where t1.txn_dt=date-3
        transformSourceTables:
          - table_name_1
          - table_name_2
          - table_name_3
        # NOTE: Possible breaking change proposal:
        # slaProperties are now nested under the columns which they apply to.
        slaProperties: # consolidated to a column level attribute in v3.0.0
          - property: latency # Property, see list of values in DP QoS
            value: 4
            unit: d # d, day, days for days; y, yr, years for years
            column: txn_ref_dt # This would not be needed as it is the same table.column as the default one
          - property: generalAvailability
            value: 2022-05-12T09:30:10-08:00
          - property: endOfSupport
            value: 2032-05-12T09:30:10-08:00
          - property: endOfLife
            value: 2042-05-12T09:30:10-08:00
          - property: retention
            value: 3
            unit: y
            column: txn_ref_dt
          - property: frequency
            value: 1
            valueExt: 1
            unit: d
            column: txn_ref_dt
          - property: timeOfAvailability
            value: 09:00-08:00
            column: txn_ref_dt
            driver: regulatory # Describes the importance of the SLA: [regulatory|analytics|operational|...]
          - property: timeOfAvailability
            value: 08:00-08:00
            column: tab1.txn_ref_dt
            driver: analytics
      - column: rcvr_id
        isPrimary: true # NEW in v2.1.0, Optional, default value is false, indicates whether the column is primary key in the table.
        primaryKeyPosition: 1
        businessName: receiver id
        logicalType: string
        physicalType: varchar(18)
        isNullable: false
        description: A description for column rcvr_id.
        partitionStatus: false
        partitionKeyPosition: -1
        clusterStatus: true
        clusterKeyPosition: 1
        criticalDataElementStatus: false
        tags: null
        classification: null
        encryptedColumnName: null
        slaProperties: null
      - column: rcvr_cntry_code
        isPrimary: false # NEW in v2.1.0, Optional, default value is false, indicates whether the column is primary key in the table.
        primaryKeyPosition: -1
        businessName: receiver country code
        logicalType: string
        physicalType: varchar(2)
        isNullable: false
        description: null
        partitionStatus: false
        partitionKeyPosition: -1
        clusterStatus: false
        clusterKeyPosition: -1
        criticalDataElementStatus: false
        tags: null
        classification: null
        authoritativeDefinitions:
          - url: https://collibra.com/asset/742b358f-71a5-4ab1-bda4-dcdba9418c25
            type: businessDefinition
          - url: https://github.com/myorg/myrepo
            type: transformationImplementation
          - url: jdbc:postgresql://localhost:5432/adventureworks/tbl_1/rcvr_cntry_code
            type: implementation
        encryptedColumnName: rcvr_cntry_code_encrypted
        quality:
          - code: nullCheck
            customProperties:
              - property: FIELD_NAME
                value:
              - property: COMPARE_TO
                value:
              - property: COMPARISON_TYPE
                value: Greater than
            description: column should not contain null values
            dimension: completeness # dropdown 7 values
            templateName: NullCheck
            toolName: Elevate
            toolRuleName: DQ.rw.tab1_2_0_0.rcvr_cntry_code.NullCheck
            type: dataQuality
            severity: error
            businessImpact: operational
            scheduleCronExpression: 0 20 * * *
        slaProperties: none
    quality:
      - code: countCheck # Required, name of the rule
        businessImpact: operational # Optional NEW in v2.1.0
        description: Ensure row count is within expected volume range # Optional
        dimension: completeness # Optional
        scheduleCronExpression: 0 20 * * * # Optional NEW in v2.1.0 default schedule - every day 10 a.m. UTC
        severity: error # Optional NEW in v2.1.0, default value is error
        templateName: CountCheck # NEW in v2.1.0 Required
        toolName: Elevate # Required
        toolRuleName: DQ.rw.tab1.CountCheck # NEW in v2.1.0 Optional (Available only to the users who can change in source code edition)
        type: reconciliation # Optional NEW in v2.1.0 default value for column level check - dataQuality and for table level reconciliation

Contribution update request: Refresh link to Code of conduct and issues

Hi Team,

I want to contribute to the Open Data Contract Standard project. When I was reading the Contributing.md, I realised that some of the links are directed to old places of the files and returns 404 error. It would be great and ,make it easier if you could update the Contributing.md file with the refreshed links.

  1. Code of conduct link: Should be updated to Cod of Conduct (I couldn't find the link to suggest)
Screenshot 2024-03-24 at 13 25 23
  1. Issues link : should be updated to "https://github.com/bitol-io/open-data-contract-standard/issues"
Screenshot 2024-03-24 at 13 28 12

Best wishes,
Cem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.