open-contracting / field-level-mapping-template Goto Github PK

Collects issues for the improvement and generation of the Field-Level Mapping Template.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

field-level-mapping-template's Introduction

Field-level mapping template

The OCDS Field-Level Mapping Template is a tool that helps users to map data from their source systems to OCDS.

This repository contains Python code used to generate CSV files, which are used to create some of the sheets in the Field Level Mapping Template.

Issues for the improvement and generation of the Field Level Mapping Template are also logged here.

field-level-mapping-template's People

Contributors

Stargazers

Watchers

Forkers

romifz

field-level-mapping-template's Issues

Align language with OCDS implementation guidance and add link to guidance

Update the template to replace 'Source systems' with 'Data sources' and 'Source fields' with 'Data elements'.

Add a link to the implementation guidance on the Field Level Mapping Overview sheet.

Update lookup formulae for field titles and descriptions in (OCDS) sheets

The formulae evaluate to a #REF! error when the template is opened in MS Excel.

Replace main script with manage.py CLI

The CLI can accept parameters that currently need to be edited in the code:

Language
OCDS version
Extensions (and versions)

The CLI interface can be consistent with ocdsextensionregistry (for example, how extension versions are specified).

Difference between mapping template in Spanish and English

Hi!
I was looking at the last mapping template and I found some differences between then, for example in the page "(OCDS) 1. General (todas las etapas)" of [ES] OCDS - Field Level Mapping Template for OCDS 1.1 (Template Version 0.9) #public there are not mention of the field contracts/implementation/transactions/payer as it is in the OCDS - Field Level Mapping Template for OCDS 1.1 (Template Version 0.9) #public

what we should do in this case?

Provide glossary and/or clarify that description fields are generated from the schema

Feedback from a publisher suggested adding a glossary to avoid misinterpretation of terminology by non-English speakers.

This suggests it's not clear that the "description" fields are drawn from the schema itself. We should clarify this in the "Field Level Mapping Overview" tab and in the Mapping Template Guidance.

Visual improvements

Format the schema and codelists sheets so that they are more user friendly (use word-wrap etc.)

First bullet originally reported in CRM-5537 by @duncandewhurst

Remove the QUERY function from the (Source) 2. Fields sheet

This function breaks loading the template in MS Excel.

Indicate when data types are misaligned between source field and OCDS field

Reported in CRM-5463 by @mrshll1001

During the LIFT Learning Circle someone asked if there was a feature of the template that could flag misaligned data types for given fields. For example if (for some reason) a publisher's database stored a date as a String or Number (UNIX Epoch) and then mapped it to an OCDS field which required "Date" as a data type could there be a way of visually flagging it?

This would be useful to highlight if the publisher or implementer needed to do any additional work. In the example given, they'd need to parse it before putting it into a template which is an extra step.

Auto-generation

Originally reported in #5537 by @duncandewhurst:

Adapt the automated process under development for the OC4IDS 0.9.2 mapping template (CRM-4378), so that it can be used to automate generation of the OCDS 1.1.5, 1.2 and OCDS for PPPs (CRM-5230) mapping templates.

Consider how to incorporate usability checks

From 2020-02 Bogota OCDS retreat.

Document testing approach

Since the output is expected to change regularly, we can document an approach of:

Generate the original output
Copy to a new location
Make code changes
Generate the new output
Diff

Consider adding a column on coverage

During an ongoing OCDS implementation, significant issues have been encountered with missing values in mapped fields. This has affected the initial usability check, likely making it less informative after data publication.

To address this, a new column on "coverage" could be added for each mapped field. This column would reflect the coverage percentage for each field, which would be a useful indication when producing usability checks before OCDS implementation. The coverage column would be optional and should be filled out only when feasible.

Add usage documentation

At present I don’t know what to do with this repository’s output.

(Also, when switching fully to ocdsextensionregistry in a recent commit, some extension rows were re-ordered in the output. I don’t know if this matters, since I don’t know how the output is used.)

Remove the "Mapped" column from the "Source Fields" sheet

Originally reported in CRM-5537 by @duncandewhurst

The "Mapping details" column makes it redundant.

Use OCDS Kit as library instead of as CLI

Add a column or a sheet to support mapping from Source to OCDS

The present template comes from the perspective of starting with the OCDS schema, and identifying, for each of its fields, which source system field populates it.

We want to additionally support the perspective of starting with a source system, and identifying, for each of its fields, which OCDS field it populates.

We can do this by adding a new sheet, or by adding extra columns to the Source Fields sheet, for the source-to-OCDS mapping.

Ideally, if you enter a mapping in one sheet, it will appear on the other sheet in a new column for the effective mapping. (If a mapping is entered on both sheets and the mapping conflicts, then that new column can say as much.)

Based on discussion in CRM-5938 opened by @duncandewhurst.

Update schema tab with the latest version

The mapping template schema tab contains unchecked deprecated fields, also the reference links are invalid

I've found that these fields are deprecated

buyer/additionalIdentifiers
contracts/implementation/transactions/payee/additionalIdentifiers
contracts/implementation/transactions/payer/additionalIdentifiers
tender/procuringEntity/additionalIdentifiers
tender/tenderers/additionalIdentifiers
contracts/implementation/milestones/documents
contracts/milestones/documents
planning/milestones/documents
tender/milestones/documents
contracts/implementation/milestones/documents

Some examples of invalid links

cc @jpmckinney @yolile

Add dropdown for System column on Fields sheet (sourced from Systems sheet)

Originally reported in CRM-5463 by @pindec:

Another feature request I noted in Kenya last week:
"(Source) 2. Fields" sheet's "System" column should auto populate systems in a drop-down so users don't have to re-type system names that they have added to the "(Source) 1. Systems" sheet

Add a 'publish' field to the source fields sheet

According to the mapping gudiance:

The preferred approach is to eventually list all the data elements within your data sources in your Field-Level Mapping, decide whether to publish each, and then map each.

We should add a field to the source fields sheet so publishers can indicate whether they plan to publish each field, this will help analysts avoid spending time advising on mapping fields that are not planned for publication.

Use gettext instead of a dict

Babel can extract messages from Python code.

Add "assess ability to calculate indicators" functionality to mapping templates

We have a set of procurement indicators mapped to OCDS fields.

We could add functionality to the mapping spreadsheet so that for a completed spreadsheet, publishers could generate a list of indicators that they can calculate with current fields, and a list of additional fields they need to publish in order to calculate the remaining indicators.

The latter could help feed into prioritisation of sourcing and publishing any currently-missing fields.

Use formulae to look-up field titles and descriptions from the schema sheet to support localization

The current version of the mapping template has field titles and descriptions pasted-as-values, but the draft guidance on localizing OCDS recommends localizing titles and descriptions in the schema sheet.

Data Type: Add boolean and time, remove long and double

time is relevant, as some data sources split date and time as different elements.

Also, I'm not sure why we have long and double, since they are just different length integer and real numbers. (Databases have many more numeric types that aren't listed: smallint, bigint, serial, etc.)

Decimal (also "numeric") is also a distinct type. It's more precise than float (which can have floating-point errors). Not sure if we care about retaining that distinction.

Note that different contexts use different words. Not sure which context makes the most sense. Current choice in bold.

character (ANSI SQL); char, varchar, text (non-standard SQL); string (JSON)
enum (non-standard SQL); codelist (OCDS)
float, real (ANSI SQL); number (JSON)
timestamp (ANSI SQL); datetime (non-standard SQL); date-time (JSON Schema)

These don't have such issues:

boolean
date
time
integer

Mapping statistics hidden and broken formulae

The mapping statistics section in the public version of the template was hidden and the formulae were broken.

As a temporary measure, I've fixed this in the public version of the template, but when we next generate the template we should check this section works before publishing it.

Edit: I've also applied the fix to the Spanish version of the template.