Giter VIP home page Giter VIP logo

field-level-mapping-template's Introduction

Field-level mapping template

The OCDS Field-Level Mapping Template is a tool that helps users to map data from their source systems to OCDS.

This repository contains Python code used to generate CSV files, which are used to create some of the sheets in the Field Level Mapping Template.

Issues for the improvement and generation of the Field Level Mapping Template are also logged here.

field-level-mapping-template's People

Contributors

dependabot[bot] avatar jpmckinney avatar romifz avatar yolile avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

romifz

field-level-mapping-template's Issues

Replace main script with manage.py CLI

The CLI can accept parameters that currently need to be edited in the code:

  • Language
  • OCDS version
  • Extensions (and versions)

The CLI interface can be consistent with ocdsextensionregistry (for example, how extension versions are specified).

Difference between mapping template in Spanish and English

Hi!
I was looking at the last mapping template and I found some differences between then, for example in the page "(OCDS) 1. General (todas las etapas)" of [ES] OCDS - Field Level Mapping Template for OCDS 1.1 (Template Version 0.9) #public there are not mention of the field contracts/implementation/transactions/payer as it is in the OCDS - Field Level Mapping Template for OCDS 1.1 (Template Version 0.9) #public

what we should do in this case?

Indicate when data types are misaligned between source field and OCDS field

Reported in CRM-5463 by @mrshll1001

During the LIFT Learning Circle someone asked if there was a feature of the template that could flag misaligned data types for given fields. For example if (for some reason) a publisher's database stored a date as a String or Number (UNIX Epoch) and then mapped it to an OCDS field which required "Date" as a data type could there be a way of visually flagging it?

This would be useful to highlight if the publisher or implementer needed to do any additional work. In the example given, they'd need to parse it before putting it into a template which is an extra step.

Auto-generation

Originally reported in #5537 by @duncandewhurst:

Adapt the automated process under development for the OC4IDS 0.9.2 mapping template (CRM-4378), so that it can be used to automate generation of the OCDS 1.1.5, 1.2 and OCDS for PPPs (CRM-5230) mapping templates.

Document testing approach

Since the output is expected to change regularly, we can document an approach of:

  1. Generate the original output
  2. Copy to a new location
  3. Make code changes
  4. Generate the new output
  5. Diff

Consider adding a column on coverage

During an ongoing OCDS implementation, significant issues have been encountered with missing values in mapped fields. This has affected the initial usability check, likely making it less informative after data publication.

To address this, a new column on "coverage" could be added for each mapped field. This column would reflect the coverage percentage for each field, which would be a useful indication when producing usability checks before OCDS implementation. The coverage column would be optional and should be filled out only when feasible.

Add usage documentation

At present I don’t know what to do with this repository’s output.

(Also, when switching fully to ocdsextensionregistry in a recent commit, some extension rows were re-ordered in the output. I don’t know if this matters, since I don’t know how the output is used.)

Add a column or a sheet to support mapping from Source to OCDS

The present template comes from the perspective of starting with the OCDS schema, and identifying, for each of its fields, which source system field populates it.

We want to additionally support the perspective of starting with a source system, and identifying, for each of its fields, which OCDS field it populates.

We can do this by adding a new sheet, or by adding extra columns to the Source Fields sheet, for the source-to-OCDS mapping.

Ideally, if you enter a mapping in one sheet, it will appear on the other sheet in a new column for the effective mapping. (If a mapping is entered on both sheets and the mapping conflicts, then that new column can say as much.)

Based on discussion in CRM-5938 opened by @duncandewhurst.

Update schema tab with the latest version

The mapping template schema tab contains unchecked deprecated fields, also the reference links are invalid

I've found that these fields are deprecated

buyer/additionalIdentifiers
contracts/implementation/transactions/payee/additionalIdentifiers
contracts/implementation/transactions/payer/additionalIdentifiers
tender/procuringEntity/additionalIdentifiers
tender/tenderers/additionalIdentifiers
contracts/implementation/milestones/documents
contracts/milestones/documents
planning/milestones/documents
tender/milestones/documents
contracts/implementation/milestones/documents

Some examples of invalid links

cc @jpmckinney @yolile

Add a 'publish' field to the source fields sheet

According to the mapping gudiance:

The preferred approach is to eventually list all the data elements within your data sources in your Field-Level Mapping, decide whether to publish each, and then map each.

We should add a field to the source fields sheet so publishers can indicate whether they plan to publish each field, this will help analysts avoid spending time advising on mapping fields that are not planned for publication.

Add "assess ability to calculate indicators" functionality to mapping templates

We have a set of procurement indicators mapped to OCDS fields.

We could add functionality to the mapping spreadsheet so that for a completed spreadsheet, publishers could generate a list of indicators that they can calculate with current fields, and a list of additional fields they need to publish in order to calculate the remaining indicators.

The latter could help feed into prioritisation of sourcing and publishing any currently-missing fields.

Data Type: Add boolean and time, remove long and double

time is relevant, as some data sources split date and time as different elements.

Also, I'm not sure why we have long and double, since they are just different length integer and real numbers. (Databases have many more numeric types that aren't listed: smallint, bigint, serial, etc.)

Decimal (also "numeric") is also a distinct type. It's more precise than float (which can have floating-point errors). Not sure if we care about retaining that distinction.

Note that different contexts use different words. Not sure which context makes the most sense. Current choice in bold.

  • character (ANSI SQL); char, varchar, text (non-standard SQL); string (JSON)
  • enum (non-standard SQL); codelist (OCDS)
  • float, real (ANSI SQL); number (JSON)
  • timestamp (ANSI SQL); datetime (non-standard SQL); date-time (JSON Schema)

These don't have such issues:

  • boolean
  • date
  • time
  • integer

Mapping statistics hidden and broken formulae

The mapping statistics section in the public version of the template was hidden and the formulae were broken.

As a temporary measure, I've fixed this in the public version of the template, but when we next generate the template we should check this section works before publishing it.

Edit: I've also applied the fix to the Spanish version of the template.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.