pisa-engine / standard-benchmark Goto Github PK
View Code? Open in Web Editor NEWStandard speed regression test for PISA
Standard speed regression test for PISA
Currently, different scorers would be overwritten.
We should change the repo/project name to something more generic. The major objective is to run a big query benchmark but it's becoming more than that.
For instance, it could be used by anyone just to build collections, compress and such, and then work on that for, say, feature extraction. This is more or less my use case in query routing project. I will want to do all from prasing through indexing and wand data, but then I'll do some custom stuff. I will definitely reuse that and having something like that would help with reproducibility of any paper we publish in the future that is based on PISA codebase.
So, any suggestions?
For example, there's a lot of logs that are printed out from PISA itself, like parsing collection. However, it might be good to have the ability to save only [EXEC]
logs to see what has been executed throughout the run.
trec_eval
is now considered a system dependency but it's small and fast to compile, and it would be nice to make it part of the project. Not sure how yet, I'll think of a good approach but this is not a priority so I'm creating this issue.
Instead of only writing the trec_eval
output, write down the actual rankings to compare.
Like so:
runs:
- collection: wapo
type: evaluate
topics:
- /data/collections/WashingtonPost.v2/topics.core18.txt
- /data/collections/WashingtonPost.v2/topics.core18.txt
qrels:
- /data/collections/WashingtonPost.v2/qrels.core18.txt
- /data/collections/WashingtonPost.v2/qrels.core18.txt
output: wapo.trec
---- run::tests::test_evaluate_simple_topics stdout ----
eval_results =
eval_results =
thread 'run::tests::test_evaluate_simple_topics' panicked at 'assertion failed: `(left == right)`
left: `EchoOutput([])`,
right: `EchoOutput(["/tmp/build.J86mqR353kVz/trec_eval -q -a qrels /tmp/build.J86mqR353kVz/output.trec.wand.results"])`', src/run.rs:175:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
To format the path of a result file, we use the following:
format!("{}.{}.{}.trec_eval", compare_with.display(), algorithm, tid);
We use the algorithm, and the idx of topics file, but not the encoding.
In run's config, both bench and eval, there should be a way of pointing to a gold standard. If different (?), it should fail. For use in, say, Jenkins or Travis.
I want to provide an override for any given program run from executor.
This would be an "unstable" feature in a way that it will not ensure it's correct and just pass it to the program as is. It is not ideal but on the other hand very flexible. It would be for example used when waiting for a feature to be implemented.
I think in general this should be discouraged. If we wanted something this unstable, we could have done it in python or bash. I really want these things to fail early whenever they can, preferably when loading config.
I noticed commands from executors are not printed out.
Maybe ability to define multiple sources and switch from cmd line?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.