Giter VIP home page Giter VIP logo

conformance's Introduction

Unicode & CLDR Data Driven Test

This repository provides tools and procedures for verifying that an implementation is working correctly according to the data-based specifications. The tests are implemented on several platforms including NodeJS (JavaScript), ICU4X (RUST), ICU4C, etc. Additional programming platforms may be added to use the test driver framework.

The goal of this work is an easy-to-use framework for verifying that an implementation of ICU functions agrees with the required behavior. When a DDT tet passes, it a strong indication that output is consistent across platforms.

Data Driven Test (DDT) focuses on functions that accept data input such as numbers, date/time data, and other basic information. The specifications indicate the expected output from implementations when given the data and argument settings for each of the many individual data items.

Note that these tests are only part of the testing required for ICU-compliant libraries. Many additional tests are implemented in the

  • !!! TODO: reference to data specifications

Components of Data Driven Test

ICU versions for data and testing

Each ICU test program is built with a specific version of ICU & CLDR data. These versions are updated periodically. For each ICU version, there is a specific CLDR version, e.g., ICU73 uses data from CLDR 43, although multiple ICU releases may depend on the same CLDR data.

For this reason, specifying a particular ICU version for test data or test executor or both

Each part of Data Driven Testing is designed to handle a specific ICU version.

  • Data generation uses specifications starting with ICU versions 70, 71, etc. For each ICU release, these data should be updated.

  • Test execution allows setting the data version explicitly with a command line argument --icuversion that points to the indicated test data. The ICU version of the test executor platform is requested from each platform at the start of the test driver. Output directories are created under the platform for the test results running a particular ICU version, e.g., testOutput/node/icu73.

  • Test verification uses ICU version information in the test output files for matching with the corresponding expected results. Verification output appears in the testResults subdirectory for each node, e.g. testOutput/rust/icu71.

Architectural Overview

Conceptually, there are three main functional units of the DDT implementation:

Conceptual model of Data Driven Testing

Data generation

Utilizes Unicode (UTS-35) specifications, CLDR data, and existing ICU test data. Existing ICU test data has the advantage of being already structured towards data driven testing, is in many cases formatted in a way to simplify adding new tests, and contains edge and error cases.

Data generation creates two files:

  • Test data instance: a JSON file containing the type of test and additional information on the environment and version of data.

The test type is indicated with the "Test scenario" field.

Individual data tests are stored as an array of items, each with a label and parameters to be set for computing a result.

Example line for collation_short:

{
"description": "UCA conformance test. Compare the first data\n   string with the second and with strength = identical level\n   (using S3.10). If the second string is greater than the first\n   string, then stop with an error.",
"Test scenario": "collation_short",
"tests": [
  {
    "label": "0000000",
    "string1": "\u0009!",
    "string2": "\u0009?"
  },
  • A required test result file (JSON) containing the expected results from each of the inputs. This could be called the “golden data”.

    Sample verify data:

    {"Test scenario": "collation_short",
    "verifications": [
      {
        "label": "0000000",
        "verify": "True"
      },
    

Text Execution

Test execution consists of a Test Driver script and implementation-specific executables. The test driver executes each of the configured test implementation executables, specifying the input test data and the location for storing results. STDIN and STDOUT are the defaults.

Test executors

Each test executor platform contains a main routine that accepts a test request from the test driver, calling the tests based on the request data.

Each executor parses the data line sent by the test driver, extracting elements to set up the function call the the particular test.

For each test, the needed functions and other objects are created and the test is executed. Results are saved to a JSON output file.

See executors/README for more details

Verification

Each test is matched with the corresponding data from the required test results. A report of the test results is generated. Several kinds of status values are possible for each test item:

  • Success: the actual result agrees with expected results
  • Failure: a result is generated, but the result is not the same as the expected value.
  • No test run: The test was not executed by the test implementation for the data item
  • Error: the test resulted in an exception or other behavior not anticipated for the test case

Open questions for the verifier

  • What should be done if the test driver fails to complete? How can this be determined?

    • Proposal: each test execution shall output a completion message, indicating that the test driver finished its execution normally, i.e., did not crash.

How to use DDT

In its first implementation, Data Driven Test uses data files formatted with JSON structures describing tests and parameters. The data directory string is set up as follows:

A directory testData containing

  • Test data files for each type of test, e.g., collation, numberformat, displaynames, etc. Each file contains tests with a label, input, and parameters.
  • Verify files for each test type. Each contains a list of test labels and expected results from the corresponding tests.

Directory testOutput

This contains a subdirectory for each executor. The output file from each test is stored in the appropriate subdirectory. Each test result contains the label of the test and the result of the test. This may be a boolean or a formatted string.

The results file contains information identifying the test environment as well as the result from each test. As an example, collation test results from the testOutput/node file are shown here:

{
  "platform": {
    "platform": "NodeJS",
    "platformVersion": "v18.7.0",
    "icuVersion": "71.1"
  },
  "test_environment": {
    "test_language": "nodejs",
    "executor": "/usr/bin/nodejs ../executors/nodejs/executor.js",
    "test_type": "collation_short",
    "datetime": "10/07/2022, 16:19:00",
    "timestamp": "1665184740.2130146",
    "inputfile": "/usr/local/google/home/ccornelius/DDT_DATA/testData/icu73/collation_testt.json",
    "resultfile": "/usr/local/google/home/ccornelius/DDT_DATA/testOutputs/node/icu73/collation_test.json",
    "icu_version": "ICUVersion.ICU71",
    "cldr_version": "CLDRVersion.CLDR41",
    "test_count": "192707"
  },
  "tests": [
    {
      "label": "0000000",
      "result": "True"
    },
    {
      "label": "0000001",
      "result": "True"
    },
    ...
  ]
}

Directory testReports

This directory stores summary results from verifying the tests performed by each executor. Included in the testReports directory are:

  • index.html: shows all tests run and verified for all executors and versions. Requires a webserver to display this properly.

  • exec_summary.json: contains summarized results for each pair (executor, icu version) in a graphical form. Contains links to details for each test pair.

  • subdirectory for each executor, each containing verification of the tested icu versions, e.g., node/, rust/, etc.

Under each executor, one or more ICU version files are created, each containing:

  • verfier_test_report.html - for showing results to a user via a web server

  • verfier_test_report.json - containing verifier output for programmatic use

  • failing_tests.json - a list of all failing tests with input values

  • pass.json - list of test cases that match their expected results

  • test_errors.json - list of test cases where the executor reported an error

  • unsupported.json - list of test cases that are not expected to be supported in this version

The verifier_test_report.json file contains information on tests run and comparison with the expected results. At a minimum, each report contains:

  • The executor and test type
  • Date and time of the test
  • Execution information, from the testResults directory
  • Total number of tests executed
  • Total number of tests failing
  • Total number of tests succeeding
  • Number of exceptions identified in the test execution. This may include information on tests that could not be executed, along with the reasons for the problems.
  • Analysis of test failures, if available. This may include summaries of string differences such as missing or extra characters or substitutions found in output data.

Contributor setup

Requirements to run Data Driven Testing code locally:

  • Install the Python package jsonschema
    • In a standard Python environment, you can run
      pip install jsonschema
      
    • Some operating systems (ex: Debian) might prefer that you install the OS package that encapsulates the Python package
      sudo apt-get install python-jsonschema
      
  • Install the minimum version supported by ICU4X
    • The latest minimum supported supported Rust version ("MSRV") can be found in the rust-toolchain.toml file
    • To view your current default Rust version (and other locally installed Rust versions):
      rustup show
      
    • To update to the latest Rust version:
      rustup update
      
    • Install logrotate
      sudo apt-get install logrotate
      

History

Data Driven Test was initiated in 2022 at Google. The first release of the package was delivered in October, 2022.

Copyright & Licenses

Copyright © 2022-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

The project is released under LICENSE.

A CLA is required to contribute to this project - please refer to the CONTRIBUTING.md file (or start a Pull Request) for more information.

conformance's People

Contributors

catamorphism avatar cclauss avatar echeran avatar gnrunge avatar mosuem avatar mradbourne avatar robertbastian avatar sffc avatar srl295 avatar sven-oly avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conformance's Issues

More flexible source data download in testdata generator

testdata_gen.py hardcodes the source of data using a Github URL for a file from a specific version of ICU: https://github.com/unicode-org/conformance/blob/main/testgen/testdata_gen.py#L334

Instead, we should:

  • Options to de-flake the download process
    • Separate the download step from the data generation step
    • Enable option to download a file vs. using a local copy
    • Show user a display of download progress
  • Handle versioning of data (allow different versions of input)

Number format tests include incorrect units

In many of the test failures for number format, the reason is that "furlong" is not a recognized unit. I think that the test data is incorrect, however. Perhaps the unit is not correctly set for many of the test cases.

test input issues for NumberFormatter / ICU4J

Some of these issues are a part of the test framework (ex: schema definition), some might be related to the ICU4J executor, some might be for the ICU4J NumberFormatter APIs.

ICU4X Collation failures

ICU4X in conformance testing shows more that 20% of the tests failing, seen here:
ICU4X/icu73

The actual collator options are seen in the test failure detail, with a few examples here. The inputs are s1 and s2 and the actual options used are given

  • {"label":"0010001","s1":"𑜿!","s2":"𑜿?","line":8661,"ignorePunctuation":true} CollatorOptions { strength: Some(Tertiary), alternate_handling: Some(Shifted), case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }
  • {"label":"0243300","s1":"𑛁b","s2":"𑜱b","line":47434} CollatorOptions { strength: Some(Tertiary), alternate_handling: None, case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }
  • {"label":"0373766","s1":"龜a","s2":"龜a","line":177900} CollatorOptions { strength: Some(Tertiary), alternate_handling: None, case_first: None, max_variable: None, case_level: None, numeric: None, backward_second_level: None }

We need some help debugging help with this!

Integrate schema validation into executables

For the executables that we run (test data generator, test executor), we should validate the inputs to the executable against the schema within the executable, right before we use them.

So if step A generates output a that goes into step B that generates b, ..., then we want step B validating values in a right before it processes them.

That protects us against the data inconsistency of stale data problem.

Add flexible pagination in test reports

For test reports, add pagination to speed review of test failures / errors / unimplemented options. This could use JSON data loaded directly rather than creating tables in the Python code.

Set locale field for collation tests

Also, for any of the existing collation tests, they are implicitly defaulting to the root locale, which is und. Updating these tests to have a specified locale means that we set the locale to be und .

end-to-end not exiting on fatal Rust executor errors

The Rust executor is getting an error when trying to execute sendOneLine, and it does so for every batch of 10,000 tests that it sends.

Ex:

Testing ../executors/rust/target/release/executor / coll_shift_short. 190,000 of 192,707
Testing ../executors/rust/target/release/executor / coll_shift_short. 191,000 of 192,707
Testing ../executors/rust/target/release/executor / coll_shift_short. 192,000 of 192,707
!!! sendOneLine fails: input => {"label": "0190000", "string1": "\u2eb6!", "string2": "\u2eb6?", "test_type": "coll_shift_short"}
{"label": "0190001", "string1": "\u2eb6?", "string2": "\u2eb7!", "test_type": "coll_shift_short"}
...
#EXIT<. Err = [Errno 2] No such file or directory: '../executors/rust/target/release/executor'
!!!!!! processBatchOfTests: "platform error": "None"

Issues:

  • The Python script running everything logs the entire batch of test cases upon this error. We shouldn't print those 10000 lines
  • In cases where the Python script can't get the executors to do basic things properly, the Python script should exit with a non-zero exit code

Bonus points: in the future, we can use a logging library so that we can more easily control the behavior differently on our local machines vs. on CI

Using logging instead of print

For the test driver and test data generator in Python, we should use logging instead of just printing to the console.

At the least, it's equivalent. But the potential benefits are:

  • logging methods (ex: logging.debug(), logging.error()) allow us to indicate what severity a statement is
  • we can control what level we view logs at for testing mode, debugging mode, and production mode
  • we can configure the format of the messages if needed (add timestamps, etc or not)

Define schema of test case data JSON

Some options for defining a schema:

  • JSON Schema
  • Protobuf

JSON Schema is a natural first choice. Also, it would take more effort to deal with Protobuf (perhaps too prohibitive in statically typed languages, even if possible in dynamic ones).

Only need to have a single tool to use JSON Schema since purpose is to validate once the JSON test data cases generated by the test generation tool.

Fix handling of non-matching surrogates in collation data.

The current test generator doesn't create tests for collation data when either of the test strings contains an incomplete surrogate. These are recorded in the logging files but they are not stored in any data or mentioned in any dashboards.

Leave input line untransformed in the error handling

Revisit #145 (comment), where an executor encounters an error in processing a test case. Instead of returning the test case input line as is in the error response, the error handling code is transforming the input line before including in the error response. This transformation seems unintended, unless there is a good reason.

@sven-oly

verifier crashes

From a fresh checkout of main, when running sh generateDataAndRun.sh, I get the following:

#EXIT<. Err = [Errno 2] No such file or directory: '../executors/rust/target/release/executor'
!!!!!! processBatchOfTests: "platform error": "None"

Traceback (most recent call last):
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 111, in <module>
    main(sys.argv)
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 101, in main
    driver.runPlans()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testdriver.py", line 91, in runPlans
    plan.runPlan()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 86, in runPlan
    self.runOneTestMode()
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 219, in runOneTestMode
    numErrors = self.runAllSingleTests(per_execution)
  File "/usr/local/google/home/elango/oss/conformance/testdriver/testplan.py", line 279, in runAllSingleTests
    allTestResults.extend(self.processBatchOfTests(testLines))
TypeError: 'NoneType' object is not iterable
1
Verifier starting on 9 verify cases
  Verifying test coll_shift_short on rust executor
Cannot load ../TEMP_DATA/testResults/rust/coll_test_shift.json result data: Expecting value: line 1 column 1 (char 0)Traceback (most recent call last):
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 500, in <module>
    main(sys.argv)
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 491, in main
    verifier.verifyDataResults()
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 189, in verifyDataResults
    self.compareTestToExpected()
  File "/usr/local/google/home/elango/oss/conformance/verifier/verifier.py", line 267, in compareTestToExpected
    self.report.platform_info = self.resultData['platform']
AttributeError: 'Verifier' object has no attribute 'resultData'. Did you mean: 'result_path'?
1

Speed up end-to-end CI

We can speed up our end-to-end CI in different ways:

  • Cache Rust Cargo build artifacts
  • Split up executor work per-platform (or per-{platform, version})

Validate test case input and output at runtime

Now that we have schemas for test input and output, we should enable runtime validation of those test inputs & outputs across the board.

Doing so will enable the realization of a large chunk of the value proposition for having the schemas. It would ensure that all test cases passed to executors, and all data received from executors, adhere to the contracts defined by the schemas.

Executor for dart_native needs environment setup to execute

Testdriver with dart_native gives this in Linux environment. This needs to be fixed to run dart_native tests.

----> STDOUT= ><

!!!!!! !!!! ERROR IN EXECUTION: 255. STDERR = Unhandled exception:
UnimplementedError: Insert diplomat bindings here
#0 Collation4X.compareImpl (package:intl4x/src/collation/collation_4x.dart:16)
#1 Collation.compare (package:intl4x/src/collation/collation.dart:28)
#2 testCollator (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:74)
#3 main. (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:49)
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#8 _StreamController._add (dart:async/stream_controller.dart:650)
#9 _StreamController.add (dart:async/stream_controller.dart:598)
#10 _Socket._onData (dart:io-patch/socket_patch.dart:2381)
#11 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#12 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#13 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#14 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#15 _StreamController._add (dart:async/stream_controller.dart:650)
#16 _StreamController.add (dart:async/stream_controller.dart:598)
#17 new _RawSocket. (dart:io-patch/socket_patch.dart:1899)
#18 _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1356)
#19 _microtaskLoop (dart:async/schedule_microtask.dart:40)
#20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#21 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:123)
#22 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:190)
WARNING:root:!!!!!! process_batch_of_tests: "platform error": "!!!! ERROR IN EXECUTION: 255. STDERR = Unhandled exception:
UnimplementedError: Insert diplomat bindings here
#0 Collation4X.compareImpl (package:intl4x/src/collation/collation_4x.dart:16)
#1 Collation.compare (package:intl4x/src/collation/collation.dart:28)
#2 testCollator (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:74)
#3 main. (file:///usr/local/google/home/ccornelius/ICU_conformance/conformance/executors/dart_native/bin/executor.dart:49)
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#8 _StreamController._add (dart:async/stream_controller.dart:650)
#9 _StreamController.add (dart:async/stream_controller.dart:598)
#10 _Socket._onData (dart:io-patch/socket_patch.dart:2381)
#11 _RootZone.runUnaryGuarded (dart:async/zone.dart:1594)
#12 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:339)
#13 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:271)
#14 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:776)
#15 _StreamController._add (dart:async/stream_controller.dart:650)
#16 _StreamController.add (dart:async/stream_controller.dart:598)
#17 new _RawSocket. (dart:io-patch/socket_patch.dart:1899)
#18 _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1356)
#19 _microtaskLoop (dart:async/schedule_microtask.dart:40)
#20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#21 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:123)
#22 _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:190)
"

Create simple clustering of test failure/error results

When there are many test failures or errors, there are too many instances to report each one individually. Many of the test cases might look the same, and without any subgrouping.

It might be helpful to implement some simple unsupervised clustering of the input values (say, taking the top 10 most frequent values per input struct key) and report the top 10 counts.

Fix version labeling to use ICU4X version, not Rust

In the summary page and also in the detail page, the platform version is shown but not the ICU4X version, e.g.,
"platform: {'cldrVersion': '43.1.0', 'icuVersion': 'icu4x/2023-05-02/73.x', 'platform': 'rust', 'platformVersion': '1.73.0'}"

This should show the ICU4X version, e.g., 1.3 or 1.4, not "1.73".

Must deal with missing or incorrect icu testdata version

The testdriver code assumes that the --icu_version parameter for the test driver is defined and that it refers to existing data. However, the value may be missing or may not be one of the defined test sets.

Proposed solution: check all defined testdata directories. If icu_version is not defined or a bad value is given, use the highest number ICU version, e.g., a value of "xyz" will look at subdirectory names and pick the one that sorts highest.

For example, if the directories are [icu73, icu72, and icu71], a missing or incorrect value for icu_version will select icu73 data for testing.

Configure logging

Configure logging to have a single global settings file/config.

Also, make the logging level in CI be high enough to not show test execution progress.

Use HTML files to do HTML templating

Created from comment at #67 (comment)

+1 from me on this. Doing so should be win-win for everyone. It will probably feel like using jQuery.

It seems like the best way to do this in Python is using the Beautiful Soup library (docs). I've used JSoup in Java before, and that was really nice (powerful and easy). Beautiful Soup and JSoup seem to be comparable.

Using a regular HTML file as the input for HTML templating, rather than some special syntax that requires some special engine to interpret, is a simpler way to go. (Examples of special syntax HTML templating that are all-too-common still: ex1, ex2). The simplicity is that you keep code in Python along with the caller to the library, and you keep markup in HTML, and you don't mix the two. Not having to deal with yet another syntax is a follow on benefit.

Remove `DDT_DATA` dir and scripts referencing it

The DDT_DATA directory is obsolete at this point, and it seems to be just a copy of a portion of the TEMP_DATA directory that get created locally to store intermediate files.

We should remove the DDT_DATA directory. At this point, all scripts referencing that directory are obsolete, too.

Do not remove any Python code references to ddt_data. The Python identifier is the alias used for datasets.py when importing that Python file/module.

Rename 'rust' to 'icu4x' in testdriver, executor code

The code has been using "Rust" instead of "ICU4X". We should rename accordingly.

Since the thing under test is an i18n library, we should rename our code according to the library name under test. The version number of the language runtime needed for the library version is a separate thing, and may not correspond 1:1 anyways (ex: ICU4X 1.0 and ICU4X 1.1 were developed against Rust 1.61, ICU4X 1.2 was developed against Rust 1.68.2).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.