pisa-engine / standard-benchmark Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 12.06 MB

Standard speed regression test for PISA

Rust 100.00%

standard-benchmark's People

Contributors

Watchers

standard-benchmark's Issues

Add option to skip compiling tests

Use scorer in output path

Currently, different scorers would be overwritten.

Check for paths existence when applying workdir

Decouple inverted index and compressed inv index

Repo name

We should change the repo/project name to something more generic. The major objective is to run a big query benchmark but it's becoming more than that.

For instance, it could be used by anyone just to build collections, compress and such, and then work on that for, say, feature extraction. This is more or less my use case in query routing project. I will want to do all from prasing through indexing and wand data, but then I'll do some custom stuff. I will definitely reuse that and having something like that would help with reproducibility of any paper we publish in the future that is based on PISA codebase.

So, any suggestions?

Refactor config to use serde

Add option to print certain logs to file

For example, there's a lot of logs that are printed out from PISA itself, like parsing collection. However, it might be good to have the ability to save only [EXEC] logs to see what has been executed throughout the run.

trec_eval

trec_eval is now considered a system dependency but it's small and fast to compile, and it would be nice to make it part of the project. Not sure how yet, I'll think of a good approach but this is not a priority so I'm creating this issue.

Support NYT and Robust

Output results for evaluation

Instead of only writing the trec_eval output, write down the actual rankings to compare.

Take multiple topic files

Like so:

runs:
    - collection: wapo
      type: evaluate
      topics:
        - /data/collections/WashingtonPost.v2/topics.core18.txt
        - /data/collections/WashingtonPost.v2/topics.core18.txt
      qrels:
        - /data/collections/WashingtonPost.v2/qrels.core18.txt
        - /data/collections/WashingtonPost.v2/qrels.core18.txt
      output: wapo.trec

Make sure to git pull when repo already exits

Failing test (sometimes?)

---- run::tests::test_evaluate_simple_topics stdout ----
eval_results = 
eval_results = 
thread 'run::tests::test_evaluate_simple_topics' panicked at 'assertion failed: `(left == right)`
  left: `EchoOutput([])`,
 right: `EchoOutput(["/tmp/build.J86mqR353kVz/trec_eval -q -a qrels /tmp/build.J86mqR353kVz/output.trec.wand.results"])`', src/run.rs:175:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Results from different encodings are being overwritten

To format the path of a result file, we use the following:

format!("{}.{}.{}.trec_eval", compare_with.display(), algorithm, tid);

We use the algorithm, and the idx of topics file, but not the encoding.

Testing against gold standard

In run's config, both bench and eval, there should be a way of pointing to a gold standard. If different (?), it should fail. For use in, say, Jenkins or Travis.

Override config

I want to provide an override for any given program run from executor.

This would be an "unstable" feature in a way that it will not ensure it's correct and just pass it to the program as is. It is not ideal but on the other hand very flexible. It would be for example used when waiting for a feature to be implemented.

I think in general this should be discouraged. If we wanted something this unstable, we could have done it in python or bash. I really want these things to fail early whenever they can, preferably when loading config.

pisa-engine / standard-benchmark Goto Github PK

standard-benchmark's People

Contributors

Watchers

standard-benchmark's Issues

Recommend Projects

Recommend Topics

Recommend Org