Giter VIP home page Giter VIP logo

cove-ocds's People

Contributors

andylolz avatar bibianac avatar bjwebb avatar caprenter avatar dependabot[bot] avatar duncandewhurst avatar edugomez avatar idlemoor avatar jpmckinney avatar kindly avatar odscjames avatar pre-commit-ci[bot] avatar requires avatar rhiaro avatar robredpath avatar yolile avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cove-ocds's Issues

Reduce test duplication / find a way to share tests when we look for the same thing in both lib-cove-ocds and cove-ocds

I've been making changes to Key Field Information, and this requires the same tests to be carried out in lib-cove-ocds and cove-ocds.

The tests are necessary in both places:

  • in lib-cove-ocds, we're checking that the calculations are carried out correctly
  • in cove-ocds, we're checking that the relevant data is being supplied, and that it's been correctly calculated

It would save a lot of lines of code if we were able to share the tests in some way, while still carrying them out in both places.

Big files: the Web result page is huge

Hello!

As we are about to publish French award data, I had to validate it (4 days ago): https://standard.open-contracting.org/review/data/55390859-63fd-453a-989b-21c612d69687

If you clicked the above link and the validation have not expired:

  1. it's going to be a little while before you see something
  2. then your browser may be struggling a bit to display the page

It's not surprising, it's trying to display 120,000+ releases.

I don't think that displaying a release table with so many lines is relevant, especially if it costs so much on both the client and the server side.

Would it make sense to disable the display of the release table from a certain number of releases?

More generally, should the reviewing process be optimized for big files? That could be changes on cove-ocds, but also the release of command line tool that would be run locally.

KFI: Include count of unique item IDs

Feature request from OpenDataServices/cove#263

Original text:

When a file containing multiple releases is validated, the total count of items and documents are shown. For clarity, could we also include the unique count of:

  • items based on id

By showing that we found 19 mentions of a document and 4 unique document ids, publishers can be check the release has been validated correctly.

Flattening record packages

Some quick notes from a discussion with @kindly on this today:

The primary use case we have in mind is publishers sharing draft data with the helpdesk in record package format and helpdesk analysts wanting to flatten this to help give feedback (e.g. it's easier to check all the values of a given field by looking at a spreadsheet than by reviewing a JSON file).

Records have the following components:

  • Compiled release - easiest to flatten, using existing flatten-tool functionality. could include in minimal version of record package flattening
  • Releases list, which can be either linked releases or flattened releases. Key issue is the field is oneOf and it isn't possible to definitely tell which has been provided. Also the list could be mixed. Would require more work.
    • Linked releases - assume we won't fetch full releases for flattening, Q: would flattening the list of linked releases (url, date and tag) be useful?
    • Embedded releases - would need to combine the release lists from all records into one list - not sure where this functionality would sit between CoVE and flatten-tool. Could require lots of work.
  • Versioned release - we don't have approach to flattening this and it is very rarely used by published, would leave out of functionality for now

We also discussed whether compiled and embedded releases should be flattened into the same spreadsheet.

  • Flattening to separate spreadsheets would be easier to implement and it would be easier to use for users who just wanted to work with one or the other (no filtering out would be required).
  • Flattening to one spreadsheet would be harder to implement but it would make it easier for users who wanted to do analysis across compiled and individual releases (we don't have a specific use case for that in mind).

We also discussed whether this would sit better in CoVE/flatten-tool or OCDSKit/-web (possibly via the tabulate command supporting a spreadsheet output).

Seeking feedback from @yolile @romifz @jpmckinney @mrshll1001@pindec and others on:

  • How important is this for the helpdesk
  • Other use cases for flattening record packages
  • Views on which elements of records are important to flatten
  • Flattening records into one or multiple spreadsheets
  • Where this functionality belongs

Edits to grouping of validation errors

See OpenDataServices/cove#1117 for how we did this for 360Giving.

Right now, all of the validation errors are presented in one table. Grouping these by type (eg missing-but-required, format errors, 'other') helps people understand what kind of things they need to do in order to make their data use the standard.

The task here is to:

  • Review all possible errors, see if the same groupings as we used for 360 are appropriate (they probably are)
  • Implement the splitting up of the errors
  • Write simple, appropriate text to frame the errors

"Array has non-unique elements" error is missing examples

Example data

The data has duplicate values in tender/participationFees/0/methodOfPayment which is an array of strings, but no examples are shown in the first 3 examples column:

image

For arrays of strings, I think the examples should just show the whole field so that users can see that there are duplicates, e.g.

First 3 errors
DD - Demand Draft;FDR - Fixed Deposit;DD - Demand Draft;FDR - Fixed Deposit
DD - Demand Draft;FDR - Fixed Deposit;DD - Demand Draft;FDR - Fixed Deposit
DD - Demand Draft;BC - Bankers Cheque;SS - Small Savings Instrument;FDR - Fixed Deposit;DD - Demand Draft;BC - Bankers Cheque;SS - Small Savings Instrument;FDR - Fixed Deposit

KFI: Improve documentation of results

Copied from OpenDataServices/cove#263 (comment)

KFI is an important part of how the Helpdesk supports publishers to understand their data but to an unsupported user who is still learning about the standard the terse presentation of the stats isn't helpful.

Inline docs could be useful here.

Original text:

I have a great deal of difficulty understanding how each of the numbers is calculated (using the linked file as input). For example, 8 releases have an (identical) planning field, and the number is 1.

However, I think we first need to determine how frequently this section is referenced, before investing time in improving its clarity.

Signpost the command line tool from the validator landing page

From discussion with @ColinMaudry in Georgia:

We don't signpost the command line tool anywhere from the validator landing page.

At the moment this is a beta tool but it would be good to let users know it exists.

Could we add a link?

In the future we might want to impose a limit on upload size for the web tool and direct users to the command line tool for large files (rather than returning a server error).

Truncate check results

Follow-up to #31, which yielded #35.

Problem

Large, invalid files yield very large response sizes (performance), and more information than is useful (UX), e.g. a list of 42,000 invalid entries for one error type.

File Valid? File Size Response size before PR Response size after releases table PR #35
repeated_errors_repeated.json (not public data) Invalid 359MB 65.8 MB 27.32 MB
badfile_repeated.json (script to generate) Invalid 341MB 174.18 MB 150 MB

Test files are now here: https://github.com/open-contracting/sample-data-private/tree/master/data

Solution

I think we can have a configurable setting to limit the number of results returned.

To address performance issues, we can set a high limit that still exceeds usefulness, like 1000.

To address usability issues, we can have a smaller number like 100. We'll want to randomize the results returned, so that we're not simply reporting e.g. the first 100 errors all caused by old data and none of the errors caused by newer data (publishers who are only making improvements to new data are likely to ignore the results if they only seem to pertain to old data).

Review and test language and copy for different types of user

CoVE is used by a number of different types of users, each attempting to achieve different things.

Through our research last year, and Georg's research into personas for the OCP web presence, we can have a reasonable go at documenting a handful of these, walking through their paths to interact with the software, and improve things for each of them as we go.

Internal - see https://docs.google.com/document/d/1KQ-j4q0rC5lkhIHGk9bCC_TA50PGP5KSvTZaKUuuaaQ/edit#bookmark=id.55uduidgj3bu

DRT reports packages array has non-unique elements when the elements differ

Original data example.

Packages (schema: "A list of URIs of all the release packages that were used to create this record package") look like this:

"packages": [
       "https://budeshi.ng/api/releases/1288/planning",
       "https://budeshi.ng/api/releases/1288/tender",
       "https://budeshi.ng/api/releases/1288/planning",
       "https://budeshi.ng/api/releases/1288/contract"
   ]

DRT report shows a structural error: Array has non-unique elements

Improved text on landing page

From open-contracting/cove-oc4ids#19

The form prompts can be improved so that a first-time user knows what to do without scrolling below the fold to read the instructions, for example: "Paste an OCDS release package or record package as JSON" instead of simply "Paste". I can suggest more improvements, but I recommend that either @mrshll1001 or @pindec do a quick review to pick up these types of usability issues for first-time users.


Similarly, the order and flow of content below the form doesn't put the most important or relevant information first.

For example, I would at minimum swap the positions of the "Check and Review" and "About OCDS" blocks. Ideally, "About OCDS" would come after both other blocks, as it's the least relevant content; anyone arriving at the OCDS Data Review Tool is very likely to at least know what OCDS is.

The text in each box can also be clearer, simpler, more straight-forward for first-time users. As in issue description, I think others should do a first pass.


Noting some quick observations from Hotjar recordings. In general, drawing conclusions from recordings over a short period will be biased, as the recordings will tend to be the same user working to implement OCDS.

  • Many users don't get past the first page. The common behavior is to scroll past the form, loiter around the content area, scroll further down, then back up to the content area.
    • Similar to my heuristic observations above, my interpretation is that people don't know what the tool does yet, so they don't know what the form is about and skip it. They get to the content area and scan it, but the content isn't ordered by priority (and there's a fair amount of it), so they scroll further down to scan the content there, realize that content is even less relevant, so scroll back up to read the content more closely.
  • Users don't seem to know what to do with the error about not identifying a package structure.
    • We can perhaps display a JSON snippet of the skeleton of a package, to show what it should look like. We might also have a truncated (e.g. first 1000 characters) copy of their indented JSON shown side-by-side.
  • One user abandoned the form.
    • Instead of just "Loading" and a spinner, we should display a message that for large inputs, the form will take some time to submit, and to please be patient.

Moved from OpenDataServices/cove#1197

explore_release and explore_record are not in-sync

In the explore_additional_content block, explore_record has "Is structurally correct?", "Number of records", etc. which are in the headlines block in explore_release. The record template also has the Schema and Convert boxes alongside the Headlines box, whereas the release template has them beneath.

As much as possible, the two templates should be the same. We should perhaps have each inherit from (or reuse blocks from) a common template.

coverage line in CI does nothing?

 Run coverage run --source cove_ocds,cove_project manage.py test
/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/cove/settings.py:31: UserWarning: SECRET_KEY should be added to Environment Variables. Random key will be used instead.
  warnings.warn('SECRET_KEY should be added to Environment Variables. Random key will be used instead.')

System check identified no issues (0 silenced).
----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

When migrating from Travis to Github CI's, I left it in as I wasn't certain it does nothing, and if felt like we should check that. If it really does nothing, we can remove. I suspect coveralls data comes from the next test line anyway.

Include context of errors (e.g. reporting the OCID)

From a helpdesk issue

Currently when reporting structural errors the review tool reports the location of the error relative to the package e.g. releases/38/tender/tenderers. Since some packages are generated on the fly, and there may be complex mapping behind them, it may be helpful if it would also extract the OCID of the release with the issue rather than just its position in the array. This would support the publisher finding the offending release elsewhere to make the change.

Add a web API

Copied across from OpenDataServices/cove#320

The original issue has some discussion, but pertinent bits:

It's certainly true that we don't intend CoVE / lib-cove-ocds to be used as part of the backend for a web site - it's definitely not fast enough. If there was a demand for this, we could probably concoct something - but I'd want to know more about exactly what was required. As it stands, I'm only aware of @patxiworks wanting to use CoVE in this way, and unfortunately we weren't able to support what he wanted to do.

However, a web API to CoVE also has a role in publication systems, and potentially in data consumption applications, as part of automated processing, where response time isn't as important. Packaging up something that can be run as a 'black box' and handle requests for feedback on data that goes beyond simple validation can be useful.

and

I guess the use case for a web API is an implementer who is unable or unwilling to use lib-cove-ocds as a library. (Even if their system is implemented in another language, they can either bridge to Python, or shell out to the libcoveocds command.) Right now, the demand for that seems low, but the issue can certainly remain open.

Include package metadata in 'fields that are empty or contain only whitespaces' check

It looks like this check currently only looks at the release, not the package metadata, at least a blank string publisher/scheme is not reported by the check.

The check should be carried out on both the package metadata and the contents of the package.

I'm not sure how this is implemented, so worth checking if this issue is specific to this check or whether it applies to the way additional checks work in general.

Remove duplication of validation messages from within oneOf

Introduced as part of OpenDataServices/cove#895 when we replaced the monolithic oneOf validation messages with the individual messages for each subschema.

Errors about date and tag being required can come from either subschema, and are repeated for each.

This is only a problem for files that have a mix of assumed embedded and linked releases for different records.

Should we be using "tx pull -af"?

In https://ocds-data-review-tool.readthedocs.io/en/latest/translations.html

See https://docs.transifex.com/client/pull

-f or --force: Force the download of the translations files regardless of whether timestamps on the local computer are newer than those on the server.

I just had an issue where because of git branch switching (I assume), it was skipping ES and I wasn't getting the latest translations. I had to add -f to get it to do that.

Is there any situations where this is bad? If not, should we just add that to the docs?

Additional Fields: Collapse button overrides other panel titles

When viewing validation results, if you try to collapse the Additional Fields panel, it's header and description overrides the titles and descriptions on all of the other panels. This doesn't happen when collapsing/restoring the other panels, only the Additional Fields panel.

As well, when there are no additional fields in use, it might be good to specify a row in that panel to say that no fields were identified rather than it just being blank as it looks like something might be missing.

Group validation errors by oneOf subschema

In OpenDataServices/cove#895 we now assume whether a record has linked or embedded releases, in order to use the correct subschema within the oneOf block. Text about this assumption is added to every relevant validation message.

This text is repeated for each validation error message. Instead we should group the messages by subschema used, and state the assumption only once.

List fields that are present/missing compared to the schema

Copied from OpenDataServices/cove#118

Although we provide a list of fields that are present but aren't in the schema (ie, "additional fields"), we don't provide a list of fields that are in the schema but aren't in the file, or any other way for people to discover fields that may be relevant to them from the results page. The risk, of course, is that excessive noise is generated, but some way of allowing people to see their mapping template alongside their existing coverage could help alignment.

Write overview documentation

In this first iteration, this documentation should cover information that is unique to the DRT (i.e. we should assume that the developer knows Python, Django, JSON Schema, OCDS, etc. which are not unique to the DRT). It should cover the information that developers newly working on the DRT would be given, with respect to the responsibilities of each component (lib-cove-ocds, lib-cove, cove-ocds, cove), for example.

Related reading:

Possible contents:

  • Internal architecture of cove-ocds (where to find code for what)
  • Connection with external libs
  • How to add an additional check
  • Translations
  • The CLI
  • Template structure
  • Something about extensions

Account for #skiprows and #headerrows in spreadsheet validation error messages

The row number reported in spreadsheet validation error messages is incorrect when the #skiprows or #headerrows configuration properties are set.

For example, in the data which generated the following error message (a spreadsheet with #skiprows set to 2 and #headerrows set to 5) the error is actually on row 8, not row 2:

image

The row reported in the error message should account for the configuration properties set in the source spreadsheet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.