Giter VIP home page Giter VIP logo

enketo-validate's Introduction

npm version Build Status

Enketo Validate

Validate ODK XForms using Enketo's form engine

This app can be used:

  1. via the command-line
  2. as a nodeJS module to be used in your own javascript application

Live demo web application (meant for testing purposes only) that uses Enketo Validate (and ODK Validate) as a module: validate.enketo.org (source code)

Technical Documentation

Prerequisites

  1. install Node 18 or 20 and Yarn 1 ("classic")
  2. (if necessary) install build tools for native modules with apt-get install build-essential
  3. (if necessary) install puppeteer (headless Chrome) prerequisites as mentioned here, e.g. for Ubuntu/Debian do apt-get install ca-certificates fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

Via Command-line

Command-line Install

Clone the repo and run yarn install --production. This will make the ./validate command available from within the clone folder. Running yarn link makes the enketo-validate command available from any folder on your machine.

Command-line Use

$ enketo-validate path/to/form.xml

Errors are returned to stderr and warnings to stdout. If there is no stderr output the form is valid.

Command-line Help

$ enketo-validate --help

As NodeJS module

Module installation

Add the following yarn resolutions to package.json:

"resolutions": {
    "nan": "^2.17.0",
    "libxslt/nan": "^2.17.0",
    "node1-libxmljsmt-myh/nan": "^2.17.0"
},
yarn add enketo-validate

Module Use

const validator = require('enketo-validate');

// Options:
// debug: [boolean] outputs unadulterated errors instead of cleaned ones
// openclinica: [boolean] runs the validator in a special OpenClinica mode
const options = {};

// Read the xform as string
const result = validator.validate( xformStr, options );

// The result has the following format:
// {
//      warnings: [ 'a warning', 'another warning'],
//      errors: ['an error', 'another error'],
//      version: "0.0.0"
// }
// if errors.length is 0, the form passed validation

Develop

  1. Clone repo and install prerequisites.
  2. Run yarn install. If there is an error the first thing to do is to run rm -R node_modules and retry especially after changing Node versions or after earlier crashes during installation.
  3. Run via command line, e.g. ./validate test/xform/xpath-fails.xml or ./validate --help.

How it works

In it's current iteration, the validator does the following:

  • It checks whether the XForm is a valid XML document.
  • It performs some elementary ODK XForm structure checks.
  • It checks if each bind nodeset exists in the primary instance.
  • It checks if appearance values are supported or deprecated for that type of question.
  • It checks for each <bind> whether the relevant, constraint, calculate, and required expressions are supported and valid* XPath.
  • It checks whether required <label> elements exist.
  • It checks for duplicate question or group names.
  • It checks for nested repeats.
  • It checks for form controls that have a calculation but are not set as readonly.

* Note, that /path/to/a/nonexisting/node is perfectly valid XPath.

Funding

The development of this application was funded by OpenClinica.

License

See the license document for this application's license.

Change log

See change log.

enketo-validate's People

Contributors

dependabot[bot] avatar gushil avatar lindsay-stevens avatar magicznyleszek avatar martijnr avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enketo-validate's Issues

add to readme.md: differences with ODK Validate

Differences:

  • Enketo will keep going and show all XPath errors in form. ODK Validator exits after the first error.

No false errors/warnings for:

  • src attribute on instance element
  • unknown bind attributes such as [length,requiredMsg]
  • /path/to/non-existing/node (? check if this is actually an issue)
  • any valid XPath 1.0 syntax including e.g. fancy axes

use enketo-core instead of enketo-xpathjs

There is a bunch of magic that enketo-core does before sending to the Xpath evaluator.

  • fix dependency issues (e.g. enketo-config)
  • change how jquery is added to formModel to avoid jQuery error with window and document or somehow run formModel inside jsdom context
  • remove require('./plugins') from Form-model.js. It is not used and causes an error in validator.
  • make sure the 'native' XPath evaluator is removed/disabled
  • use enketo-core's evaluate function

add bunch of basic checks

These could later be replaced by XML schema:

  • root is h:html
  • root has 1 h:head child
  • root has 1 h:body child
  • h:head has 1 model child
  • model contains at least 1 instance child
  • first instance has 1 child (only)
  • first instance child has id attribute (does this actually matter?)
  • contains meta or orx:meta and instanceID or orx:instanceID (although this check is also in Enketo Core
  • binding without calculate should have corresponding body element (finds logical groups that don't have a ref attribute) These could be a warning, and an error in case they also have a relevant.

Detect cyclic dependencies for calculations

E.g.:

  • node a has calculation b + 1
  • node b has calculation a + 1

Or:

  • node a has calculation b + 1
  • node b has calculation c + 1
  • node c has calculation a + 1

Suggested implementation:

  1. iterate through all calculation expressions to build a map of all calculation nodes with the nodes they reference (using xform._extractNodeReferences).
  2. then, for each calculation, do a recursive check of all dependent nodes (and their dependencies etc)
  3. check if any of the dependencies in 2 is the node itself

main approach

Rationale:

  • Commcare/Enketo-style external data
  • Access to native XPath functions already supported in enketo: starts-with(), ends-with(), contains(), substring-after(), substring-before(), translate(), normalize-space(), floor(), ceiling(), last(), etc
  • Access to all native XPath 1.0 axes that greatly increase the power of XForms, e.g. we can prevent duplicates in repeats using the preceding-sibling axis.
  • Ability to add agreed new OpenRosa functions, such as randomize(), without dependencies.
  • Ability to add custom features to client-specific enketo-validate ports.

Some ideas to do this:

first version:

  • check if it's valid XML

  • check it conforms to the XForm format (an XML Schema would be wonderful - maybe Dimagi has one or ODK Validate uses one already?) - required sections, required attributes on each node type, doesn't have to be hugely complex to be very useful

  • run each relevant/calculate/constraint XPath expression inside a try/catch statement and see if causes an error, using Enketo's XPath evaluator. If needed, add tolerance for references to external instances (that are empty because they're external).

later:

  • for more descriptive errors we could do a separate check for validity of nodeNames perhaps
  • check if all bind nodesets exist in primary instance (apparently, ODK Validate doesn't do this ref. formhub/uuid behaviour where an incorrect binding nodeset path doesn't trigger errors and somehow gets 'corrected' in ODK Collect)
  • check if all instance elements referred to anywhere in instance(ID)/path/to/node exist in model
  • check if all itext elements referred to anywhere in exist in model
  • (maybe) perform XSLT transformation which could fail or output error messages that are built into the XSL sheet
  • similarly, attempt to parse and initialize the whole form.

These would all be simple to do, but to provide useful feedback messages and to make validation as complete (though better) than ODK Validate would probably take quite a lot of additional work.

improve missing end tag errors

Currently the error message is utterly useless (in ODK Validate it is much better):

Errors:
Unexpected close tag Line: 94, Column: 38, Char: >

e.g. for the following <instance> (PROCEDURE is not closed):

    <instance>
        <Last_group_does_not_display id="PSS" version="1">
          <PROCEDURE>
            <DOP/>
            <SURG/>
            <PROC_NOTE/>
            <PROC_GRID jr:template="">
              <PROC/>
              <LATER/>
              <TMLOC_BLANK/>
              <PMLOC/>
              <SLNDLOC/>
              <CALC_LOC/>
            </PROC_GRID>
            <CALC_SPEC_TTL1/>
            <CALC_SHOW1/>
            <CALC_SHOW2/>
            <CALC_SPEC_TTL2/>
          <SPEC>
            <GPT>
              <TUMTYP/>
              <PM_TUMLENGTH/>
              <PM_TUMWIDTH/>
              <PM_TUMUNITS/>
              <PM_LGMLENGTH/>
              <PM_LGMWIDTH/>
              <PM_LGMUNITS/>
              <PM_MGMLENGTH/>
              <PM_MGMWIDTH/>
              <PM_MGMUNITS/>
              <PM_LENGTH/>
              <PM_WIDTH/>
              <PM_UNITS/>
              <ENTRIES>
                <PM jr:template="">
                  <PM_SPECIM/>
                  <PM_SPECIMHIST/>
                  <PM_GSS/>
                  <PM_MEDIALAT/>
                  <PM_SLICEMETH/>
                  <PM_ANTDEEP/>
                  <PM_SLICEMETH2/>
                  <PM_SUPINF/>
                  <PM_SLICEMETH3/>
                  <PM_SPECSLICE/>
                  <PM_TOTSLICE/>
                  <PM_TOTSLICEMETH/>
                  <PM_SLICENOTE/>
                  <PM_SLICEPL/>
                  <PM_SLICENUM/>
                  <PM_BLOCKNUM/>
                  <PM_SLICEPL2/>
                  <PM_SLICENUM2/>
                  <PM_BLOCKNUM2/>
                  <PM_PERFLAB/>
                  <PM_LYMNTISS/>
                  <PM_NOTE/>
                </PM>
                <TM jr:template="">
                  <TM_SPECIM/>
                  <TM_SPECIMHIST/>
                  <TM_GSS/>
                  <TM_MEDIALAT/>
                  <TM_SLICEMETH/>
                  <TM_ANTDEEP/>
                  <TM_SLICEMETH2/>
                  <TM_SUPINF/>
                  <TM_SLICEMETH3/>
                  <TM_TOTSLICE/>
                  <TM_TOTSLICEMETH/>
                  <TM_SLICENOTE/>
                  <TM_SLICEPL/>
                  <TM_SLICENUM/>
                  <TM_BLOCKNUM/>
                  <TM_SLICEPL2/>
                  <TM_SLICENUM2/>
                  <TM_BLOCKNUM2/>
                  <TM_PERFLAB/>
                  <TM_LYMNTISS/>
                  <TM_NOTE/>
                </TM>
              </ENTRIES>
            </GPT>
          </SPEC>
          <LYMPHNODES>
            <LYMNDISS/>
          </LYMPHNODES>
          <meta>
            <instanceID/>
          </meta>
        </Last_group_does_not_display>
      </instance>

Add a warning for nested repeats?

Not sure. Ideally we should keep fixing bugs in nested repeats. However, it is quite a pain, very expensive. Data analysis is also not going to be much fun with nested repeats for users, so a warning might be helpful?

change build script away from 'install'

Installing enketo-validate on CentOS v7, with the latest change:

[root@formservice enketo-validate]# npm install
npm WARN lifecycle [email protected]~install: cannot run in wd %s %s (wd=%s) [email protected] browserify src/FormModel.js > build/FormModel-bundle.js && browserify -g aliasify src/FormModel.js > build/FormModel-bundle-oc.js /root/applications/enketo-validate

Had to do: sudo npm install --unsafe-perm

let --oc flag do all custom OC stuff

No longer create a separate binary for OC, but let the --oc flag do both:

  1. load the OC XPath evaluator
  2. validate additional custom XForm rules

The binary in Mac OS increases from 54.7 MB to 55.3 MB so this increase is negligible.

If loaded cleverly, it should not slow down the tests either.

Remove built files from repo

The prepublish build script should actually be run as an install script.

For some reason this install (rollup) script fails when using npm install enketo-validate in an app.

Therefore, I've temporary published the built files to /build. I don't like this, so this should be solved properly some day.

tests

  • invalid nodename, e.g 1a and a b => parse error?
  • missing closing tag
  • missing namespace declaration for used namespace prefix
  • function call with too few arguments
  • function call with too many arguments
  • bloody jr:choicename call => just pass whatever
  • instance('data') for nonexisting instance => error

Question: Difference between ODK and Enketo validate

@MartijnR

We can see that the latest version of pyxform offers command line options to choose how the XForm XML is validated.usage: xls2xform [-h] [--json] [--skip_validate] [--odk_validate] [--enketo_validate] [--no_pretty_print] and it appears that ODK validation will not pass some of the functions described in https://docs.opendatakit.org/form-operators-functions such as count-non-empty().

We are hoping to better understand why there are two validators and which we should be using. Then, if required we would change the Survey123 conversion API to use the most appropriate one.

Any information would be appreciated. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.