Remove crossbeam dependency in Rust 1.67 (Jan 2023)

See issue description in #1.

Allow bid status to be absent

Allow a user to indicate the dataset as using a single bid status (i.e. valid). We can then calculate indicators as if the status were set. (I don't yet know how common this is.)

Add support for lots

Most publishers do not implement lots, so for now the indicator calculations are simpler / naive.

R024 The winning offer is just below the next lower offer: If lots are present, we can relax the requirement for there to be a single award, since we'll be able to determine which bids are competing for which lot (i.e. whom each awardee was competing with).

TBD whether to include lot IDs in the output (probably), re: #25.

Use parties/identifier instead of organization reference ID

Some publishers have good organization reference IDs, e.g. DR sets it to "{parties/identifier/scheme}-{parties/identifier/id}".

Other publishers don't, and we'll need to construct IDs from parties/identifier as above.

Once this is implemented, we can add a configuration option to opt-in to fallback to (aka trust) organization reference ID (faster, and useful if the user knows that IDs are well-constructed in cases where parties/identifier is not populated).

Edit: This configuration is useful, because cross-referencing parties is annoying.

Make primary currency configurable

Presently, the first observed currency is used as the primary currency.

prepare: Add option for user to quiet an issue without fixing it

For example, DR's current dataset has at least one bid without a status in 1% of its procedures, but this occurs both when there are awards and not, and when there are other bids with statuses. So, it's not really knowable.

Since this issue produces about 3000 lines of text, it would be nice to be able to quiet it, e.g.

[quiet]
missing_bid_status

Add autocomplete

Maybe useful if we add some CLI options with enumerations as valid values.

https://lib.rs/crates/clap_complete

Memoize the Indicators::get_* functions

Add documentation in result interpretation

From user research: "Documentation should be for how to calculate the red flag, but also how to check that the red flag results are sensible."

Potential indicator methodology changes (R003, R024, R025)

R003 Short submission period

OCDS 1.2 adds tender/expressionOfInterestDeadline. How should this field be used, if present?

“Short submission period” in An Objective Corruption Risk Index Using Public Procurement Data suggests taking weekends (and holidays) into account:

Abuse of weekends is possible as legally required time periods are defined in calendar days so the effective time companies would have for bid preparation can further be decreased by including weekends and national holidays in the submission period.

R024 The winning offer is just below the next lower offer

Presently, we require there to be a single supplier and single tenderer.

In the case of consortia – for example – it's possible for the two fields to contain the same IDs. It's perhaps useful to extend the methodology to cover this case, e.g. one consortium colludes with others.

However, we've also observed cases where they are different, e.g. the buyer awards items to multiple bidders who submitted individually. (I'm not sure that this case is rescuable – there's probably no option except to skip such cases).

R025 The ratio of winning bids to submitted bids for a top tenderer is a low outlier

James: I don't know what to do for multiple winning bids, bids with multiple tenderers, or awards made to multiple suppliers.

Camila: For this cases I would suggest to count each bid and award separetely, for instance. If a bid has 2 tenderers, and both win, each tenderer would have 1 bid and 1 award.

New command: statistics

It can be useful to report out some order statistics and distributions that are relevant to indicators. For example:

Distribution of procurementMethod codes: so that the user can evaluate if the distribution of open, selective, limited, direct conforms to their knowledge of the procurement market
- From user research: "A methodology should also come with clear risk warnings for instance the use of certain fields. Are there some fields that we know are problematic when it comes to bias in the data? (e.g. Could there be a bias toward using 'selective' instead of 'limited' in procurementMethod?)"

We might also consider reporting:

Some priority quality issues (e.g. incoherent dates).
Outliers. If there is demand, we can also change indicators command to ignore outliers.
Order statistics (possibly per procurement method) to assist the user in setting threshold values

Ideas for new flags, while implementing other flags

While working on R035:

A bid has not been evaluated, but all awards are finalized
Bids are withdrawn if not submitted by the single tenderer of the winning bid (i.e. other bidders only submit to simulate competition)

For methodology changes to existing flags, see #17

Add progress bar if tool slows down

Right now it is very fast. We can get the file size, and maybe track how many bytes have been read by serde, to report progress.

https://lib.rs/crates/indicatif

Potentially relevant: https://docs.rs/console/latest/console/ https://docs.rs/dialoguer/latest/dialoguer/

Decide after testing with SERCOP data.

Fallback to parties array to determine roles

e.g. if there is a single award and suppliers is not set, we can fallback to testing if 'supplier' is in parties/roles.

Improve documentation

~~Once API stabilizes, after more features are added.~~

Generate a ReadTheDocs website
Add fictional, narrative examples in blockquotes (to ease understanding of the indicator)
Add more command invocation examples (and run doc-tests)

init command
prepare command
indicators command

Add configuration option to map to standard codes

e.g. DR uses "Qualified" instead of "valid" for bid status.

Also remove all the to_ascii_lowercase in the code.

Reference: Rust libraries

Having gone through the top 250 at https://lib.rs/std, some libraries not already in use:

Testing

✅ tempfile
✅ pretty_assertions
✅ predicates
✅ assert_cmd
https://docs.rs/doc-comment/latest/doc_comment/ (test examples in non-Rust files)
https://lib.rs/crates/mockall
✅ https://lib.rs/crates/rstest
https://lib.rs/crates/assert_matches

Probably not relevant:

https://lib.rs/crates/proptest (hypothesis-like)
https://lib.rs/crates/quickcheck (hypothesis-like)
https://lib.rs/crates/serial_test
https://lib.rs/crates/trybuild

Also:

https://github.com/assert-rs/assert_fs
✅ https://docs.rs/trycmd/latest/trycmd/ (used by clap to test examples in docs)
https://docs.rs/snapbox/latest/snapbox/ (snapshot-based)
trycmd's related crates
https://docs.rs/docmatic/latest/docmatic/ (test examples in non-Rust files, same author as trycmd, assert_cmd)
https://rust-fuzz.github.io/book/

CLI

✅ clap
✅ log
✅ num_cpus
https://docs.rs/console/latest/console/ with dialoguer and #3
https://lib.rs/crates/signal-hook (signal handling)
✅ https://lib.rs/crates/config (reads INI)
✅ pretty_env_logger
https://lib.rs/crates/clap_complete (my shell autocompletes filenames, maybe useful if we have enum options, etc.)

Also:

✅ human-panic
https://github.com/rust-cli/confy from Rust CLI WG
https://crates.io/crates/cursive TUI
https://crates.io/crates/termion TUI (lower-level)

Errors

✅ https://lib.rs/crates/thiserror
✅ anyhow
https://lib.rs/crates/eyre (fork of anyhow) with https://lib.rs/crates/color-eyre

flatterer uses https://docs.rs/snafu/latest/snafu/

Calculations

https://lib.rs/crates/ndarray

Other

Performance

✅ rayon
https://docs.rs/dashmap/latest/dashmap/
https://lib.rs/crates/lru
https://crates.io/crates/dhat count heap allocations

Consider support for string "amount" values

We expect this to be a very rare occurrence.

The OCDS implementation in Belarus had a case where a number in the source system was badly formatted, and the publication therefore published it as a string.

However, in that case, we won't have much luck converting to float - making assumptions about whether comma or period is used as the decimal point, etc. might just lead to incorrect results.

Leaving this issue open for now, but I think it might be wontfix.

Compare shared state vs aggregator pattern

Coverage::run currently implements an aggregator pattern (gets the coverage from each thread, and sums).

Another option is to share state.

Question is whether the aggregation step is faster or slower than managing shared state.

DashMap is designed as a concurrent hashmap (using Arc), but I haven't figured out the implementation yet: https://docs.rs/dashmap/latest/dashmap/struct.DashMap.html

CLI option/command to add metadata relevant to BI reporting

Review the frontend mockups and requirements to check for other fields by which results are filtered/aggregated.

Always included:

flag ID
result metadata
primary ID (OCID if indicator can be calculated per contracting process, buyer/supplier ID if indicator must be calculated for that buyer/supplier across the entire dataset)

Opt-in (BI) metadata:

flag category
date(s) (which?)
secondary IDs (e.g. the buyer and suppliers involved in the flagged contracting process, or vice versa)
...

Requested by DR:

process stage (whether awards is set), aka awarded or unawarded
tender/procurementMethodDetails (9-value codelist)
tender/startDate year

After #14, we probably want to add the lot ID.

Add release workflow

Docs automatically generated at docs.rs

Things we probably won't do:

https://crates.io/crates/cargo-deb for Debian
- Also https://github.com/sharkdp/fd/blob/f09f038bd2b3e20178c4acf3903e645fae338f6e/ci/before_deploy.bash
https://crates.io/crates/cargo-aur for Arch Linux
https://crates.io/crates/cargo-bundle for .app, .deb, .msi installers

No longer up-to-date or otherwise irrelevant:

https://github.com/rustwasm/wasm-pack/blob/master/.github/workflows/release.yml
- Uploads and then downloads artifacts to create a release at the end
https://github.com/japaric/trust but japaric/trust#125 and japaric/trust#136, see also https://github.com/yoshuawuyts/crossgen
rust-cli/book#50

Report the indicator's coverage (application, pass, fail counts and total)

For example, Pelican reports "pass", "fail" and "not applicable" for quality checks.

Cardinal presently only reports "fail" for red flags.

It might be useful to be able to review the N/A results.

This would involve, at minimum, storing:

the result (pass, fail, N/A), as an Option<bool>
- Or, just fail and N/A – storing something for every "pass" is probably just bloat
the reason

Pelican also stores other metadata, like application_count and pass_count for checks that operate on arrays, and then easily accessible metadata to understand why the check failed (e.g. the paths to the fields that caused the failure).

Friendly user feedback of indicator results

e.g. "This indicator is positive for over x% of contracting processes. These likely include false positives. Please review the configuration of the indicator."

Reference: Compiled release sizes

In my Rust testing, I'm setting an initial capacity of 1 MiB for the vector of characters of a compiled release.

The table below is for jobs in the data registry (some jobs are for the same collection). The maximum is 147 MiB (!), the second highest is 24 MiB.

Starting with a capacity of 1 MiB, there would be at most 8 reallocations for the one above 128 MiB, 5 reallocations for those few above 16 MiB, and less for the rest. This seems fine.

Noting that if we didn't set an initial capacity, the initial capacity would be set when first pushing onto the vector. For job 696, the shortest line is 933 bytes, which would take 10-11 reallocations to get to 1 MiB (its longest line is a little over 2 MiB). For that job, we would have a total of 2 reallocations by starting with 1 MiB, instead of 12 in the worst case (i.e. if the shortest line were the first line). I haven't taken the time to compare to a median-length first line.

Longest line:

find . -name 'full.jsonl.gz' -exec sh -c 'echo {}; gunzip -c {} | awk "{ if (length > L){L=length} }END{print L}"' \;

Shortest line:

find . -name 'full.jsonl.gz' -exec sh -c 'echo {}; gunzip -c {} | awk "{ if (L == \"\" || length < L){L=length} }END{print L}"' \;

bytes	job id
2051	711
2051	742
2051	766
2071	468
2071	556
2071	592
2071	625
2071	679
2628	532
2628	581
2628	617
2628	664
2628	699
2628	735
2628	757
2709	422
3072	340
3072	384
4457	347
4604	470
5260	323
5304	426
5326	415
5398	789
5475	353
5475	362
5475	445
5475	549
5475	589
5475	622
5475	673
5475	740
5475	764
5716	576
5741	521
5990	552
5990	591
5990	624
5990	675
5990	709
5990	741
5990	765
6010	467
7362	670
7362	738
7362	762
7367	324
7367	379
7367	483
7367	493
7367	563
7367	597
7367	658
8895	471
10054	487
11201	473
11201	554
11201	593
11201	626
11201	680
11201	712
11201	743
11201	767
12689	610
12737	474
13279	387
13279	728
14098	392
14098	537
14223	386
14231	727
14488	320
14488	378
14488	481
14488	494
14488	564
14488	598
14488	683
14488	717
14488	746
16585	660
16585	697
16585	791
19856	503
19856	568
19856	603
19856	686
19857	718
20474	370
20474	447
20474	550
20474	588
20474	623
20474	674
20474	739
20474	763
21073	402
21471	491
22713	385
22850	726
22923	770
22930	748
23466	357
23466	363
23466	520
23466	574
23466	608
23466	676
24460	583
24460	618
24460	665
24460	700
31868	497
31868	566
32185	373
32189	448
37163	600
37163	685
41248	551
41248	590
41273	388
41273	389
50147	395
54573	790
55566	691
56411	522
56411	575
56411	609
56411	692
56411	724
56411	753
59588	546
59588	586
59588	621
59588	668
59588	703
59588	734
59588	758
59590	428
59749	570
59759	514
59763	367
64710	716
64710	793
67060	329
67060	381
67390	562
67729	409
75089	350
75089	361
75089	444
89443	534
91096	405
93306	424
109972	672
117903	427
121715	337
121715	430
121715	547
124777	393
124777	536
124777	580
124777	616
133466	687
133726	504
133726	573
133726	607
133726	723
133726	749
133726	773
134096	719
134096	752
134256	602
137126	771
148127	750
155727	326
155727	380
155727	484
155727	490
155727	560
155727	596
155727	628
155727	682
155727	714
155727	745
155727	774
164035	472
177309	778
181159	528
184990	359
184990	515
185160	567
196407	327
205987	539
206986	441
214911	425
214911	543
214911	585
214911	620
214911	667
214911	702
214911	736
214911	756
242554	524
256279	341
256515	418
259557	343
265329	414
272611	780
278653	511
278653	606
278653	690
305333	578
308637	615
308637	662
308637	698
308637	733
308637	761
324811	330
344030	390
344030	730
346808	399
408222	737
408222	775
441544	407
449754	725
481791	342
484040	345
532518	512
539538	419
566587	529
601246	349
601246	787
622821	413
662541	526
662683	577
662683	611
706831	694
706831	754
760184	410
880634	542
927369	421
1475448	406
1478839	339
1478839	383
1478839	498
1478839	569
1478839	604
1478839	688
1478839	720
1478839	747
1478839	772
1526742	671
1605718	525
1642379	322
1642379	377
1642379	486
1697247	559
1697247	594
1697247	627
1697247	681
1697247	713
1697247	744
1697247	768
2152041	612
2152041	656
2152041	696
2152041	732
2152041	779
2550939	684
2553305	715
2553305	792
2553739	496
2553739	565
2553739	599
2790994	404
4357161	401
4617653	659
4737232	396
4775975	572
5528742	695
5528742	731
5528742	776
7547759	519
7547759	571
7547759	605
7547759	689
10082242	403
10930049	348
10930049	364
13145632	777
13742917	408
18428801	530
20622679	666
20622815	584
20622815	619
20623099	701
20623145	794
22990409	523
25435453	398
154296486	346

R038: Consider option for minimum number of contracting processes

e.g. a few buyers rejected all bids, but these buyers only appear in 1-2 processes.

prepare: Interactive configuration

Users might find the loop described here to be laborious (would need to confirm with research): #41 (comment)

Maybe relevant: https://crates.io/crates/cursive or https://crates.io/crates/termion (lower-level)

Documentation: Recommended pre-processing

Upgrade to OCDS 1.1
Compile releases
#35

Allow values to be absent

Allow a user to indicate the dataset as using a single value for a given field. We can then calculate indicators as if the field were set.

.../items/classification/scheme = 'UNSPSC' (observed for DR)
Value/currency (I'm not aware that any publisher omits currency entirely)
awards/status = 'active' (not sure how frequently this field is unset)
bids/details/status = 'active' (not sure how frequently this field is unset)

prepare: Fill in fields based on parties roles

For example, if buyer and procuringEntity are not set, but parties/roles contains 'buyer' or 'procuringEntity', we can fill in these fields, before calculating indicators. Similarly:

If there is only one active award, we can set awards/suppliers with role 'supplier'
if there is only one active award and no lots, we can set tender/tenderers with role 'tenderer'

Performance improvements

serde should be fine, as our own code is the bottleneck. That said, simdjson is the fastest JSON parser. https://github.com/SunDoge/simdjson-rust tracks version 2.2.2 of simdjson, but is not available on crates.io. https://github.com/simd-lite/simd-json tracks 0.2.x (issue).

I also tried the following on the coverage code:

SmartString worsened performance (likely due to re-casting in some places).
SmallVec didn’t improve performance. Edit: Noting this comment by a maintainer: https://users.rust-lang.org/t/when-is-it-morally-correct-to-use-smallvec/46375/5

We can try them again once the code is more complex.

crossbeam is a better channel implementation. It will replace std::sync::mpsc in Rust 1.6.7 (January 2023).

In case it's relevant in the future, here is a fast way to read a file line-by-line:

// Compiled releases of multiple MiBs have been observed, but most are less than 1 MiB.
const CAPACITY: usize = 1024 * 1024;

let file = File::open(config.path)?;
let mut line = Vec::with_capacity(CAPACITY);
let mut reader = BufReader::new(file);

while reader.read_until(b'\n', &mut line).unwrap_or(0) > 0 {
    let value: Value = serde_json::from_slice(&line)?;
    // ...
    line.clear();
}

It is faster because it only allocates memory for one line. It can't be used if the line is passed to a thread for parsing; instead memory needs to be allocated for each line (i.e. using for line in reader.lines()).

Have another look through statistics crates

Using statrs currently (most popular). There might be some newer crates that meet our needs better.

medians has medinfof64. It's a single-author library (along with rstats), and has less usage.

qsv-stats

qsv-stats performs a sort - O(n log n) - to calculate quartiles. statrs uses a selection algorithm – O(n).

For DR bid ratios, numpy calculates 0.25580327. qsv-stats got 0.2560257847899094 (0.00022 diff). statrs got 0.2559516146277174 (0.00014 diff). In other words, no major difference.

Also looking at https://docs.rs/watermill/latest/watermill/ for online statistics.

ADR: watermill's quartile calculation is non-deterministic. I think that means we should not use that feature, as I expect it will be confusing to users to get different results (more or fewer flags) on different runs.

Add command to generate a default configuration file

The file should be self-documenting (like Sphinx, etc.).

Add a step to #29 to update the template.

R024: Allow setting manual threshold

To override the $Q_1 - 1.5(IQR)$ default.

coverage: Add configuration option to measure co-occurrence

prepare: Output warnings in e.g. CSV format

e.g. line_number,ocid,path,value,message

Alternatively, leave this as the responsibility of #41 (where values can be marked as invalid).

prepare: Consider changing stdout and stderr to mandatory options

e.g. --output (-o) and --error (-e).

Presently, if a user doesn't use redirection (e.g. > prepared.json 2> issues.csv), then they get a mix of both in the console output.

~~Also, I think Windows users can have some challenges around redirection.~~ Looks okay, actually.

For implementation, we'll probably want an intermediate buffer, that is then written to the output file at the end of the thread. Otherwise, we could have characters mixed across threads.

Fix Homebrew release

Add to the bottom of release.yml (after addressing TODO)

  bottle:
    needs: release
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        # macos-13 is not available. https://github.com/actions/runner-images/issues/6426
        include:
          - name: arm64_monterey
            os: macos-12
            target: aarch64-apple-darwin
          - name: arm64_big_sur
            os: macos-11
            target: aarch64-apple-darwin
          - name: monterey
            os: macos-12
            target: x86_64-apple-darwin
          - name: big_sur
            os: macos-11
            target: x86_64-apple-darwin
          - name: catalina
            os: macos-10.15
            target: x86_64-apple-darwin
          - name: x86_64_linux
            os: ubuntu-latest
            target: x86_64-unknown-linux-gnu
    steps:
      - id: setup-homebrew
        uses: Homebrew/actions/setup-homebrew@master
      # TODO need to update the url and sha256, otherwise brew install complains
      # try with checkout and git commands
      # https://lannonbr.com/blog/2019-12-09-git-commit-in-actions/
      - run: brew tap open-contracting/tap
      - env:
          CARGO_TARGET: ${{ matrix.target }}
        run: brew install --build-bottle --verbose ocdscardinal
      - run: brew bottle --no-rebuild --verbose ocdscardinal
      - env:
          GH_TOKEN: ${{ github.token }}
        run: gh release upload ${{ github.ref_name }} ocdscardinal--${{ github.ref_name }}.${{ matrix.name }}.bottle.tar.gz

brew install ... errors with:

==> Verifying checksum for '7c4169757f272594850bc4fae4e03439505521618da28851cbe1ae7226e5dd96--cardinal-rs-0.1.0.tar.gz'
Error: ocdscardinal: SHA256 mismatch
Expected: 8408aea9b1f47369e07697c4bd2411179e18fa5c1e9fe5b79b9f2ff1dd712323
  Actual: 7881c01b85fe3088643faf757d90d2267b52aaafddbe5a45bab464b6d42430fd
    File: /home/runner/.cache/Homebrew/downloads/7c4169757f272594850bc4fae4e03439505521618da28851cbe1ae7226e5dd96--cardinal-rs-0.1.0.tar.gz
To retry an incomplete download, remove the file above.

When I last looked at this, I was also looking through org:fair-ground HOMEBREW_FAIRTOOL_ARCH on GitHub Search.

Need to also fix GitHub Actions on https://github.com/open-contracting/homebrew-tap/actions

Add CI

linting (comes with Rust, check for pre-commit hook)
tests and coverage https://github.com/mozilla/grcov

prepare: Add currency conversion

Convert amounts if there are multiple currencies.

Use same approach as in pelican-backend: https://pelican-backend.readthedocs.io/en/latest/api/util/currency_converter.html

In Pelican we convert all values to USD. This means there will be very many conversions, even if 99% of the dataset is in another currency. However, I think supporting conversion to any currency requires more API cals to fixer.io (assuming it has rates between all currency pairs – I think conversion to USD has the best coverage).

Access to conversion rates is not free, in general. This feature would need to be opt-in, with the user supplying a fixer.io API token via the configuration file. (We can consider other sources, but I think fixer is pretty good.)

Amounts are compared in the fold step, so we already need to know by that point whether conversion is required. As such, the tool will need to be instructed (via configuration) to perform conversion from the start.

The default behavior can be to warn about multiple currencies, and otherwise ignore other currencies.

Use configuration file to opt-in to each indicator

The default (and template) configuration file can contain all indicators, along with in-line documentation about their options.

This will also make it easier to isolate tests (presently, testing one indicator might return results for another).

Apple code signing

Request DUNS number
Receive DUNS number
Join Apple Developer Program (via other staff, as my address is unrecognized)
Follow instructions at https://gregoryszorc.com/docs/apple-codesign/main/apple_codesign_getting_started.html onward

Or, alternatively:

Workaround with self-signed certificates https://gregoryszorc.com/docs/apple-codesign/main/apple_codesign_custom_assessment_policies.html

Reference for apple-codesign crate:

Example using Apple's codesign tool:

https://github.com/qdwang/w3s-cli/blob/3fa630446d1951ff62bb3f39b59516f5cadd82b0/.github/workflows/build.yml

Other reference:

Apple API https://developer.apple.com/documentation/appstoreconnectapi/certificates/

Document which `prepare` configurations are relevant to each indicator

Can just add a "Data preparation" admonition linking to the section of the prepare docs.

R036: Add logic for awardCriteria

Right now, the indicator just ignores what the awardCriteria is, entirely.

We could add an option to the prepare command to set awardCriteria to 'priceOnly' if the lowest valid bid is awarded.

Add support for number "id" values

Convert them to string for comparisons.

Add to test.rs for indicators and prepare commands

prepare

indicators

no_price_comparison_procurement_methods / price_comparison_procurement_methods (R028, R036, R024/R058) 6f57bd2 73245c5
maps
mixed currencies warning (R024/R058)
global exclusions (is_cancelled_contracting_process)

Use Award.relatedBid(s) if available

Presently, we match awards/suppliers/id to bids/details/tenderers/id.

prepare: Options for datasets without bids (e.g. R025)

For R025 (Excessive unsuccessful bids), can consider filling in bids/details according to tender/tenderers, making the assumption that status is 'valid' and that each tenderer submitted a separate bid.

From discussion:

James: I'm not sure how this indicator should be modified if only tender/tenderers is available

Camila: I agree that the best option is to have bids information, where the bids are not disqualified or withdrawn. We should prioritize this and recommend the use of this field. However with tenderers/id, you could still calculate the success rate, with the limitation that you could be counting disqualified or withdrawn bids. We could highlight that, or maybe just calculate it with bids fields and in the methodology we could mention this alternative to users.

R044: More robust address matching

For example, dedupe (as I remember) applies address normalization (for at least US addresses). If we follow the same approach, we'd need to implement appropriate normalization for different jurisdictions. This strategy uses equality tests, but allows for some address components to be missing (e.g. "Main" vs "Main St"). I know Roberto Rocha recently evaluated a few different strategies when merging Canadian political donation datasets.

I think naive fuzzy matching will yield too many false positives (e.g. 1 Main St, Podunk, New York, USA 12345 and 100 Main St, ... are very close typographically, but are not at all the same address).

The first implementation could just do simple equality.

The metadata for this indicator should include a measure of similarity (percentage or otherwise).

Contributor documentation

Principles

Results should be stable. It's okay for an update to a contracting process to cause a red flag to be newly raised. However, it is disfavored for an update to cause a red flag to be lowered. For example, while awards are pending, calculating some red flags can cause false positives. On the other hand, it's okay (and normal) for flag to be raised after an update.
Keep data preparation separate from indicator calculation. (see comment in #23)

From comments on the user research report: "It should be possible for developers to read documentation on how to implement new red flags, with as little new code as possible."

New command: prepare

Following the principle of "Keep data preparation separate from indicator calculation " #29, we can add a command to do:

and maybe these as possible (opt-in) pre-processing:

#32 change id references to identifier/id (perhaps if consistently available)

Pretty much all issues labeled 'robustness' could be resolved via this command.

Also:

~~Lowercase codes.~~ For now, we'll require users to manually map such codes. If it's a common issue, we can add an option that lowercases all codelist fields used in indicators.

open-contracting / cardinal-rs Goto Github PK

cardinal-rs's People

Contributors

Stargazers

Watchers

Forkers

cardinal-rs's Issues

R003 Short submission period

R024 The winning offer is just below the next lower offer

R025 The ratio of winning bids to submitted bids for a top tenderer is a low outlier

Testing

CLI

Errors

Calculations

Other

Performance

qsv-stats

Principles

Recommend Projects

Recommend Topics

Recommend Org