capreolus-ir / capreolus Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 32.0 40.94 MB

A toolkit for end-to-end neural ad hoc retrieval

Home Page: https://capreolus.ai

License: Apache License 2.0

Python 99.97% Shell 0.03%

deep-learning information-retrieval

capreolus's People

Contributors

Stargazers

Watchers

capreolus's Issues

Replace Magnitude with something faster

We use Magnitude to download and read embeddings of various sizes for word2vec, glove, and fasttext. This library is much slower than reading the raw embedding files though, because it stores the embeddings as sqlite (to add missing embedding functionality that we don't use). We should find a faster library to replace it or just parse the files ourselves.

Long queries truncated in bertpassage extractors

PR #112 fixed an issue where a long query can result in input longer than maxseqlen. This was accomplished by truncating both the query and the document depending on which is longer.

However, truncating the query is unusual and can be surprising behavior, especially when done silently. Is there a more intuitive way to handle this? For example, does it make sense to add a maxqlen option (without padding) to these extractors, so that the user can be warned if maxqlen is shorter than the longest query encountered?

Cannot reproduce codes in documentation

I'm pretty new to neural IR so I might be doing it wrong but when I try to follow codes in https://capreolus.ai/en/latest/quick.html#command-line-interface and https://capreolus.ai/en/latest/quick.html#command-line-interface, I get an ambiguous error. There is something off with Anserini I believe.

>>>  index.create_index()

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-82a60b1884b6> in <module>
----> 1 index.create_index()

~/miniconda3/lib/python3.8/site-packages/capreolus/index/__init__.py in create_index(self)
     28             return
     29 
---> 30         self._create_index()
     31         donefn = self.get_index_path() / "done"
     32         with open(donefn, "wt") as donef:

~/miniconda3/lib/python3.8/site-packages/capreolus/index/anserini.py in _create_index(self)
     69         app.wait()
     70         if app.returncode != 0:
---> 71             raise RuntimeError("command failed")
     72 
     73     def get_docs(self, doc_ids):

RuntimeError: command failed

I tried the code on 3 different systems (google colab, ubuntu's python and anaconda) and installed capreolus with pip.

Add command to describe benchmarks (folds and defaults)

unsafe memory access error

I'm running the reranker KNRM on msmarcopsg, Can someone help with this error:

Exception in thread "pool-2-thread-2" java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code

Thank you.

No usable temporary directory found

I am getting this weird error on traineval. previously, it gave the error that "no space left on device"

FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/user1/capreolus']
I cleaned the /tmp and /var/tmp using this command
sudo find /tmp -type f -atime +10 -delete

NFCorpus collection

Add support for NFCorpus, which is small enough to work well for tutorials/notebooks.

Type safety

Thanks for this well-written framework!
I'm currently discovering the aspects of the Capreolus framework and plan to write a custom Reranker.
Though, I find that a bit difficult, as IDE support is limited. How do you think about the idea of making as much of the Capreolus code-base type safe?
For example, when a method uses Qrels, why don't we specify the dict structure as well?

Remove redundant entries from AutoAPI documentation

Currently Sphinx AutoAPI is too verbose and shows imported members under each module. For example, if a module contains from foo import bar, bar shows up in the module's documentation. We should find a way to remove these redundant entries.

DeepTileBars failing on GPU with CUDA 10.1

The tests for DTB are now failing when a GPU with a recent CUDA version is used. Setting CUDA_VISIBLE_DEVICES='' causes the tests to pass.

[Proposal] Adding 'metrics' config to reranker task

Hi there,
I noticed that when defining a rank task, the user may define a list of metrics to be evaluated and reported. But there is no such config for rerank tasks. I am assuming that this feature is not missing by purpose and there is no harm in adding it. If that is the case, please do consider adding this config to rerankers.

getting error when use a predefined reranker with a custom collection.

Hi there,
I have created a dataset with TREC format (topics, collection, qrels and folds) and defined a collection and a benchmark on them:

fold_path = '/home/aliabedzadeh/capreolus/datasets_output/sajad_folds.json'
qrel_path = '/home/aliabedzadeh/capreolus/datasets_output/sajad_qrels_p_pn.txt'
topic_path = '/home/aliabedzadeh/capreolus/datasets_output/sajad_topics.txt'
collection_path = '/home/aliabedzadeh/capreolus/datasets_output/documents'

@Collection.register
class SajadCollection(Collection):
    module_name = "sajad"
    collection_type = "TrecCollection"
    generator_type = "DefaultLuceneDocumentGenerator"
    _path = collection_path
    is_large_collection = False

@Benchmark.register
class SajadBenchmark(Benchmark):
    module_name = "sajad"
    dependencies = [Dependency(key="collection", module="collection", name="sajad")]
    qrel_file = qrel_path
    topic_file = topic_path
    fold_file = fold_path
    query_type = "title"
    relevance_level = 1

When using ranking task (pyserini) everything is sound. But when I try using these two in different rerankers, I get an error saying dev.best cannot be found:

FileNotFoundError: [Errno 2] No such file or directory: '/home/aliabedzadeh/.capreolus/results/collection-sajad/benchmark-sajad/collection-sajad/benchmark-sajad/collection-sajad/index-anserini_indexstops-False_stemmer-porter/searcher-BM25_b-0.8_fields-title_hits-1000_k1-0.9/task-rank_filter-False/collection-sajad/index-anserini_indexstops-True_stemmer-None/tokenizer-anserini_keepstops-True_stemmer-None/extractor-slowembedtext_calcidf-True_embeddings-glove6b_maxdoclen-800_maxqlen-4_seed-42_usecache-False_zerounk-False/trainer-pytorch_amp-None_batch-32_decay-0.0_decayiters-3_decaytype-None_evalbatch-0_fastforward-False_gradacc-1_itersize-512_lr-0.001_multithread-False_niters-1_seed-42_softmaxloss-False_validatefreq-5_warmupiters-0/reranker-TK_alpha-0.5_ffdim-100_finetune-False_gradkernels-True_numattheads-10_numlayers-2_projdim-32_scoretanh-False_singlefc-True_usemask-False_usemixer-False/sampler-triplet_seed-42/task-rerank_fold-s1_optimize-map_seed-42_testthreshold-1000_threshold-100/dev.best'

I have tried 3 rerankers: TK, DUET and KNRM and they all raised almost the same error (like error above). I suspect that is the case for all rerankers. the error is raised by reranker/__init__.py invoking load_weights in line 43:

capreolus/capreolus/reranker/__init__.py

Line 43 in 7b7dc1d

with open(weights_fn, "rb") as f:

It appears that for some reason, dev.best is not created but the rest of the files are present. For instance, in case of TK, all the directories before dev.best do exists. Even listing dev.best parent directory "task-rerank_fold-s1_optimize-map_seed-42_testthreshold-1000_threshold-100" will yield some results:

!ls -1 /home/aliabedzadeh/.capreolus/results/collection-sajad/benchmark-sajad/collection-sajad/benchmark-sajad/collection-sajad/index-anserini_indexstops-False_stemmer-porter/searcher-BM25_b-0.8_fields-title_hits-1000_k1-0.9/task-rank_filter-False/collection-sajad/index-anserini_indexstops-False_stemmer-porter/tokenizer-anserini_keepstops-True_stemmer-None/extractor-embedtext_calcidf-True_embeddings-glove6b_maxdoclen-800_maxqlen-4_seed-42/trainer-pytorch_amp-None_batch-32_decay-0.0_decayiters-3_decaytype-None_evalbatch-0_fastforward-False_gradacc-1_itersize-512_lr-0.001_multithread-False_niters-1_seed-42_softmaxloss-False_validatefreq-5_warmupiters-0/reranker-KNRM_finetune-False_gradkernels-True_scoretanh-False_singlefc-True/sampler-triplet_seed-42/task-rerank_fold-s1_optimize-map_seed-42_testthreshold-1000_threshold-100/

info
pred
weights

and here is my pipeline definition:

tk_config = {
    "reranker": {
        "name": 'TK',
        "trainer": {
            "niters": 1,
            "validatefreq": 5,
        },
    },
    "rank": {
        "searcher": {
            'name': 'BM25',
            'index': {'stemmer': 'porter'},
            'b': '0.8'}
    },
    "benchmark": {"name": "sajad"},
}

re_rank_task = RerankTask(tk_config)
predictions = re_rank_task.train()  # will raise the error

I would really appreciate any comments on this error.
With best regards,
Ali.

Hide DEBUG messages from Anserini

Make -l correctly set CLI log level

Avoid multiple pyjnius loads

Currently we carefully avoid loading pyjnius more than once, since this causes an error. We can simplify this by checking and setting a classpath entry in constants.

Show more Anserini log messages

When DEBUG logging is enabled, we shouldn't filter Anserini's output. This makes it harder to debug stuff like a topics file not being properly parsed

PARADE replication result

I replicated on Colab GPU with numpassage=8,

Here is my result.

INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 dev metrics: P_1=0.688 P_10=0.500 P_20=0.427 P_5=0.554 judged_10=1.000 judged_20=0.992 judged_200=0.947 map=0.254 ndcg_cut_10=0.516 ndcg_cut_20=0.491 ndcg_cut_5=0.546 recall_100=0.453 recall_1000=0.453 recip_rank=0.802
2020-10-20 18:21:08,984 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 test metrics: P_1=0.574 P_10=0.494 P_20=0.417 P_5=0.562 judged_10=0.989 judged_20=0.982 judged_200=0.931 map=0.282 ndcg_cut_10=0.497 ndcg_cut_20=0.484 ndcg_cut_5=0.523 recall_100=0.490 recall_1000=0.490 recip_rank=0.721
INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 dev metrics: P_1=0.653 P_10=0.459 P_20=0.386 P_5=0.551 judged_10=0.988 judged_20=0.974 judged_200=0.935 map=0.218 ndcg_cut_10=0.511 ndcg_cut_20=0.473 ndcg_cut_5=0.559 recall_100=0.391 recall_1000=0.391 recip_rank=0.752
2020-10-20 19:37:53,818 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 test metrics: P_1=0.646 P_10=0.496 P_20=0.426 P_5=0.562 judged_10=0.996 judged_20=0.993 judged_200=0.947 map=0.259 ndcg_cut_10=0.516 ndcg_cut_20=0.490 ndcg_cut_5=0.549 recall_100=0.453 recall_1000=0.453 recip_rank=0.780
INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 dev metrics: P_1=0.583 P_10=0.510 P_20=0.447 P_5=0.563 judged_10=1.000 judged_20=0.995 judged_200=0.963 map=0.275 ndcg_cut_10=0.488 ndcg_cut_20=0.476 ndcg_cut_5=0.502 recall_100=0.557 recall_1000=0.557 recip_rank=0.712
2020-10-20 21:52:05,053 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 test metrics: P_1=0.633 P_10=0.437 P_20=0.367 P_5=0.535 judged_10=0.980 judged_20=0.978 judged_200=0.935 map=0.209 ndcg_cut_10=0.488 ndcg_cut_20=0.453 ndcg_cut_5=0.544 recall_100=0.391 recall_1000=0.391 recip_rank=0.724
INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 dev metrics: P_1=0.796 P_10=0.543 P_20=0.463 P_5=0.633 judged_10=0.996 judged_20=0.994 judged_200=0.949 map=0.279 ndcg_cut_10=0.577 ndcg_cut_20=0.533 ndcg_cut_5=0.638 recall_100=0.493 recall_1000=0.493 recip_rank=0.856
2020-10-20 23:02:43,400 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 test metrics: P_1=0.667 P_10=0.554 P_20=0.470 P_5=0.596 judged_10=0.998 judged_20=0.995 judged_200=0.963 map=0.299 ndcg_cut_10=0.537 ndcg_cut_20=0.515 ndcg_cut_5=0.546 recall_100=0.557 recall_1000=0.557 recip_rank=0.763

2020-10-21 00:11:49,032 - INFO - capreolus.task.rerank.evaluate -                       P_1: 0.6473
2020-10-21 00:11:49,032 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.4971
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4222
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                       P_5: 0.5685
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                 judged_10: 0.9921
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9880
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.9450
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                       map: 0.2622
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5139
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.4869
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -                ndcg_cut_5: 0.5497
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -                recall_100: 0.4765
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.4765
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.7590
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.35000000000000003, 0.4, 0.4, 0.55, 0.7000000000000001]
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -              P_1 [interp]: 0.6386
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -             P_10 [interp]: 0.5060
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4273
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -              P_5 [interp]: 0.5679
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -        judged_10 [interp]: 0.9896
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9847
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -       judged_200 [interp]: 0.8509
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3227
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_10 [interp]: 0.5150
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.4920
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -       ndcg_cut_5 [interp]: 0.5412
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -       recall_100 [interp]: 0.4612
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.7293

Extractors that generate transformers.tokenization_utils_base.BatchEncoding will cause error before training

Hi there.
I had an Extractor which was kind of copy of passagebert and textbert and I thought instead of decompressing tokenizer's result and embed them into a new dictionary like:

{
'positive_ids': tensor([1,2,3,...]),
'positive_mask': tensor([1,1,1,...]),
'positive_segments': tensor([1,1,1,...]),
}

it would be much better if I pass the tokenizer's results without any decompressing and reshaping. Bert tokenizer will yield a transformers.tokenization_utils_base.BatchEncoding object which is a dictionary-like structure and can be passed to the model like bert_model(**tokens) as you already know.
I assumed that I could just pass this object type and the code will run with no problem. something like this:

{
'positive_ids_and_mask': self.my_tokenizer('This is a test sentence'),
}

But it was not the case. In the pytorch trainer line 93, an error will be raised:

capreolus/capreolus/trainer/pytorch.py

Line 93 in 0121f6e

 batch = {k: v.to(self.device) if not isinstance(v, list) else v for k, v in batch.items()} 

AttributeError: 'dict' object has no attribute 'to'

v here became a dict and it is not a transformers.tokenization_utils_base.BatchEncoding anymore so there is no to attribure.
I investigated a little bit and I'm pretty sure the problem is caused by this line:

capreolus/capreolus/trainer/pytorch.py

Line 223 in 0121f6e

train_dataloader = torch.utils.data.DataLoader(

pytorch's DataLoader will accept transformers.tokenization_utils_base.BatchEncoding but will yield a dictionary. Here is a show case:

>>> data = transformers.tokenization_utils_base.BatchEncoding({"test": [1,2,3]})
>>> type(data)
transformers.tokenization_utils_base.BatchEncoding
>>> for x in torch.utils.data.DataLoader([data]):
>>>         print(x)
>>>         print(type(x))
{'test': [tensor([1]), tensor([2]), tensor([3])]}
<class 'dict'>

I manually changed pytorch trainer code so it can convert dict to transformers.tokenization_utils_base.BatchEncoding but this is just a solution for my task and will cause problem for other non-bert models.

how to properly output reranker scores?

Hi there.
Using Capreolus, I was able to train a Bert-based reranker in a "pair-wise manner" and it was sound. Now I want to do the same but this time I want it to be "point-wise". I have some troubles to do it and I don't know what am I doing wrong. I believe my problem lies in how my reranker outputs its score but I am not certain.
In order to create this reranker:

I set "softmaxloss": True in trainer config section since the comments in the code says: "True to use softmax loss (over pairs) or False to use hinge loss"
In the extractor, added a sanity check:

def id2vec(self, qid, posid, negid=None, label=None):
    if negid is not None:
        raise ValueError(f'got negid: {negid}')

In reranker, the output of score and test are same:

    def score(self, d):
        return self.test(d)

    def test(self, d):
        score = self.model(d['paragraph_ids'], d['paragraph_mask'], d['query_ids'], d["query_mask"])
        return score

And here is how things are done in the model itself:

    def forward(self, paragraph_ids, paragraph_mask, question_ids, question_mask):
        question_embeddings = self.question_bert(question_ids, attention_mask=question_mask)[0][:,0,:]
        paragraph_embeddings = self.paragraph_bert(paragraph_ids, attention_mask=paragraph_mask)[0][:,0,:]
        
        similarity = []
        for x, y in zip(question_embeddings, paragraph_embeddings):
            sim = torch.dot(x, y).view(1, -1)
            similarity.append(sim)
        return similarity

Please do note that .view(1, -1) part. Right now, this reranker outputs a list of (1, 768) vectors. If I omit that .view(1, -1) part, the loss function (pair_softmax_loss in reranker/common.py) will raises an error: IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1).
This is the line that would cause the problem if I omit that part:

capreolus/capreolus/reranker/common.py

Line 56 in 8695a47

scores = torch.stack(pos_neg_scores, dim=1)

Keeping that view part, the training gets done with no errors but when it comes to prediction time, there will be an error:
AttributeError: 'list' object has no attribute 'view'
from this line:

capreolus/capreolus/trainer/pytorch.py

Line 341 in 8695a47

scores = scores.view(-1).cpu().numpy()

If I redefine score and test like this:

    def score(self, d):
        score = self.model(d['paragraph_ids'], d['paragraph_mask'], d['query_ids'], d["query_mask"])
        return score

    def test(self, d):
        score = self.model(d['paragraph_ids'], d['paragraph_mask'], d['query_ids'], d["query_mask"])
        score = torch.stack(score)
        return score

everything works.
Since I had to use this tricks, I have a feeling that there is something wrong with what I am doing here. So I would be grateful if you guys help me here. What is the proper way to pass scores?

Custom training data for re-rankers

Hey guys,
Great work with the library! I just had a quick question regarding the rerankers module. Would it be possible for us to make the re-ranker train on our custom data? Currently I'm not sure where the training data for the re-rankers come from/or where it is being stored. Is this something that can be overridden using a command-line-argument or something?

TIA!

Error when running the colab notebook

I get the following error when running the second line in colab notebook:

!capreolus rank.searcheval with benchmark.name=nf searcher.name=BM25 searcher.index.stemmer=porter searcher.b=0.8
ImportError: cannot import name '_ColumnEntity' from 'sqlalchemy.orm.query' (/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py)

Please help. This is after Capreolus installs successfully.

Port PARADE model to Pytorch

The current PARADE implementation only supports the Tensorflow trainer. It should be straightforward to port to Pytorch, however, since the transformers library already supports both. This is mainly a matter of replacing the TF aggregation layers with their Pytorch versions.

Fix results and cache locations when using capreolus package

Locations are currently relative to the install dir, not the current working dir as before:

2020-01-30 17:58:33,806 - INFO - capreolus.collection.download_if_missing - downloading missing collection robust04 to temporary file /usr/local/lib/python3.6/dist-packages/capreolus/utils/../..//cache/robust04/downloaded/tmp/archive_file

However, let's define some default folders (~/.capreolus/cache) to use instead, and also update the docs accordingly.

when to set is_large_collection True?

Hi there,
I am trying to create a new collection of Persian Wikipedia and it contains about 1.4 million paragraphs which I am willing to index and use. I did not found any documentation regarding when to set is_large_collection to True or what it does.

I would appreciate any comments.

Add sanity checks for custom benchmarks/collections

e.g., no duplicate qids or docids; topics file can be parsed by both us and Anserini (rewrite?)

[Question] How to use preprocess method in Extractors?

Hi there,
I hope I'm not bothering you guys/gals with my novice questions. I am trying to create a Bert-based re-ranking model and I cannot understand how does preprocess method in Extractor module works. The documentation says that id2vec needs to be provided for an extractor. I investigated textbert and bertpassage extractors and id2vec depends on some sort of dictionary like self.docid2toks that are created by methods like preprocess, _build_vocab and such.

This part is a bit magical to me since I can not understand what class/module is calling this preprocess and what arguments exactly does the caller provide. I did a bit of testing for my extractor and preprocess was not invoked.

Also, for creating those dictionaries, self.index.get_doc and topics are used. I understand that self.index.get_doc can be provided via dependencies but I don't understand who is providing topics for us! I tested self.benchmark.topics['title'].get() instead and it works just fine, but using just topics is really neat.
I would appreciate any comments, thanks.

Impact of PRNG with multiple modules

We currently seed the PRNG in the pipeline initialization (i.e., outside any specific module), which ensures that modules can use numpy/pytorch/python random libraries without specifying a seed. However, this means that the number of draws performed by each module can affect draws performed by a module later in the pipeline.

For example, say we have two modules that use the PRNG, X and Y, with X running before Y. If the number of draws X performs is constant, Y will always receive a PRNG in the same state. If the number of X's draws changes, however, the state of the PRNG when Y uses it will have changed.

This should not cause issues as long as (1) the number of draws X performs is a function of its config, so that the same config always creates the same output, and (2) Y operates on X's output. Capreolus experiments are functional in that the config fully describes them, so the first condition is satisfied.

The second condition is satisfied in the current pipeline, but we should keep this issue in mind if the pipeline changes. The danger is that X and Y may be independent modules (i.e., Y does not use X's output), but changing X's config effectively changes Y's seed by altering the state of the PRNG before Y uses it. This is unintuitive because we expect Y's behavior to rely on only its config, the pipeline config, and its inputs (which are specified by the configs of its input modules, and do not include X in this case).

Add SWA support to Pytorch trainer

Pytorch 1.6 supports stochastic weight averaging (SWA), which could improve model effectiveness. The Pytorch trainer should support a boolean swa config option that controls whether it is used.

As described in the docs, using SWA with batchnorm requires additional steps. Can we automatically detect whether a model contains any batchnorm layers and, if so, either raise an exception (easier) or perform the necessary additional steps?

Validate that niters >= validatefreq

Fix batching issue with TF on dev dataset

From TFKNRM on GPU:

File "/home/ayates/.pyenv/versions/miniconda3-4.3.30/lib/python3.7/site-packages/tensorflow_core/python/keras/callbacks.py", line 302, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/ayates/capreolus-profane/capreolus/trainer/__init__.py", line 397, in on_epoch_end
    trec_preds = self.get_preds_in_trec_format(predictions, self.dev_data)
  File "/home/ayates/capreolus-profane/capreolus/trainer/__init__.py", line 417, in get_preds_in_trec_format
    pred_dict[qid][docid] = predictions[i][0].astype(np.float16).item()
IndexError: index 49984 is out of bounds for axis 0 with size 49984

This happens because of drop_remainder=True in load_tf_records_from_file(). Do a version of PytorchTrainer's fill_incomplete_batch() to fix this

Prevent `batch_size` from influencing the number of training samples on TensorflowTrainer

PyTorch does batches_per_epoch = self.config["itersize"] // self.config["batch"]
TensorflowTrainer follows a different convention: each "epoch" consists of "itersize" iterations where each iteration has "batch_size" batches

It won't make much sense to tie batch_size to the number of examples in an iteration. There's no need for one to be dependent on the other. This would mean that if you move to a machine with less GPU memory, both the number of steps and the batch size would have to be adjusted to achieve a similar training setup. (The problem is similar to how the LR becomes dependent on the batch size if you sum the loss rather than averaging it.)

Implementing a new neural reranker module

I am trying to implement a simple neural reranker module following the example.
I want to ask what are the different Dependency and config options to choose from?
Also, if I want to use a tokenizer from the given modules in Capreolus, how can I include it in the new reranker module?
Is it possible to share some more examples of implementing a neural reranker using different options ?
Thank you.

Tensorflow version conflict with Pipenv

When installing via Pipenv, locking fails because of a version conflict for tensorflow.

Sample Pipfile:

[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"

[packages]
capreolus = "*"

[dev-packages]

[requires]
python_version = "3"

Output of pip lock:

ocking [dev-packages] dependencies…
Locking [packages] dependencies…

Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
  You can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
Could not find a version that matches tensorflow<3,==2.3.1,>=2.4.0,>=2.4.1
Tried: 0.12.0rc0, 0.12.0rc0, 0.12.0rc0, 0.12.0rc0, 0.12.0rc0, 0.12.0rc0, 0.12.0rc1, 0.12.0rc1, 0.12.0rc1, 0.12.0rc1, 0.12.0rc1, 0.12.0rc1, 0.12.0, 0.12.0, 0.12.0, 0.12.0, 0.12.0, 0.12.0, 0.12.1, 0.12.1, 0.12.1, 0.12.1, 0.12.1, 0.12.1, 0.12.1, 0.12.1, 0.12.1, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.0, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.0.1, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc0, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc1, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0rc2, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.1.0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc1, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0rc2, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.0, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.2.1, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc0, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc1, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0rc2, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.3.0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc0, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0rc1, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.0, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.4.1, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc0, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0rc1, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.0, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.5.1, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc0, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0rc1, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.6.0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc0, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0rc1, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.0, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.7.1, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc0, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0rc1, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.8.0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc1, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0rc2, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.9.0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc0, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0rc1, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.0, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.10.1, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc0, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc1, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0rc2, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.11.0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc1, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0rc2, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.0, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.2, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.12.3, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc0, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc1, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.0rc2, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.1, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.13.2, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc0, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0rc1, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.14.0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc1, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc2, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0rc3, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.0, 1.15.2, 1.15.2, 1.15.2, 1.15.2, 1.15.2, 1.15.2, 1.15.2, 1.15.2, 1.15.2, 1.15.3, 1.15.3, 1.15.3, 1.15.3, 1.15.3, 1.15.3, 1.15.3, 1.15.3, 1.15.3, 1.15.4, 1.15.4, 1.15.4, 1.15.4, 1.15.4, 1.15.4, 1.15.4, 1.15.4, 1.15.4, 1.15.5, 1.15.5, 1.15.5, 1.15.5, 1.15.5, 1.15.5, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0a0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b0, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0b1, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc0, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc1, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0rc2, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.0, 2.0.1, 2.0.1, 2.0.1, 2.0.1, 2.0.1, 2.0.1, 2.0.1, 2.0.1, 2.0.1, 2.0.2, 2.0.2, 2.0.2, 2.0.2, 2.0.2, 2.0.2, 2.0.2, 2.0.2, 2.0.2, 2.0.3, 2.0.3, 2.0.3, 2.0.3, 2.0.3, 2.0.3, 2.0.3, 2.0.3, 2.0.3, 2.0.4, 2.0.4, 2.0.4, 2.0.4, 2.0.4, 2.0.4, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc0, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc1, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0rc2, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.0, 2.1.1, 2.1.1, 2.1.1, 2.1.1, 2.1.1, 2.1.1, 2.1.1, 2.1.1, 2.1.1, 2.1.2, 2.1.2, 2.1.2, 2.1.2, 2.1.2, 2.1.2, 2.1.2, 2.1.2, 2.1.2, 2.1.3, 2.1.3, 2.1.3, 2.1.3, 2.1.3, 2.1.3, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc0, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc1, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc2, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc3, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0rc4, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.0, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.1, 2.2.2, 2.2.2, 2.2.2, 2.2.2, 2.2.2, 2.2.2, 2.2.2, 2.2.2, 2.2.2, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc1, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0rc2, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.0, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.1, 2.3.2, 2.3.2, 2.3.2, 2.3.2, 2.3.2, 2.3.2, 2.3.2, 2.3.2, 2.3.2, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc0, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc1, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc2, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc3, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0rc4, 2.4.0, 2.4.0, 2.4.0, 2.4.0, 2.4.0, 2.4.0, 2.4.0, 2.4.0, 2.4.0, 2.4.1, 2.4.1, 2.4.1, 2.4.1, 2.4.1, 2.4.1, 2.4.1, 2.4.1, 2.4.1

Output of pip graph after pipenv install --skip-lock:

CacheControl==0.12.6
capreolus==0.2.5
  - beautifulsoup4 [required: Any, installed: 4.9.3]
    - soupsieve [required: >1.2, installed: 2.1]
  - colorlog [required: ==4.0.2, installed: 4.0.2]
  - cython [required: Any, installed: 0.29.21]
  - google-api-python-client [required: Any, installed: 1.12.8]
    - google-api-core [required: <2dev,>=1.21.0, installed: 1.25.1]
      - google-auth [required: <2.0dev,>=1.21.1, installed: 1.24.0]
        - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
        - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
          - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
        - rsa [required: >=3.1.4,<5, installed: 4.7]
          - pyasn1 [required: >=0.1.3, installed: 0.4.8]
        - setuptools [required: >=40.3.0, installed: 44.0.0]
        - six [required: >=1.9.0, installed: 1.14.0]
      - googleapis-common-protos [required: >=1.6.0,<2.0dev, installed: 1.52.0]
        - protobuf [required: >=3.6.0, installed: 3.14.0]
          - six [required: >=1.9, installed: 1.14.0]
      - protobuf [required: >=3.12.0, installed: 3.14.0]
        - six [required: >=1.9, installed: 1.14.0]
      - pytz [required: Any, installed: 2020.5]
      - requests [required: <3.0.0dev,>=2.18.0, installed: 2.22.0]
      - setuptools [required: >=40.3.0, installed: 44.0.0]
      - six [required: >=1.13.0, installed: 1.14.0]
    - google-auth [required: >=1.16.0, installed: 1.24.0]
      - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
      - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
        - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
      - rsa [required: >=3.1.4,<5, installed: 4.7]
        - pyasn1 [required: >=0.1.3, installed: 0.4.8]
      - setuptools [required: >=40.3.0, installed: 44.0.0]
      - six [required: >=1.9.0, installed: 1.14.0]
    - google-auth-httplib2 [required: >=0.0.3, installed: 0.0.4]
      - google-auth [required: Any, installed: 1.24.0]
        - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
        - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
          - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
        - rsa [required: >=3.1.4,<5, installed: 4.7]
          - pyasn1 [required: >=0.1.3, installed: 0.4.8]
        - setuptools [required: >=40.3.0, installed: 44.0.0]
        - six [required: >=1.9.0, installed: 1.14.0]
      - httplib2 [required: >=0.9.1, installed: 0.18.1]
      - six [required: Any, installed: 1.14.0]
    - httplib2 [required: >=0.15.0,<1dev, installed: 0.18.1]
    - six [required: <2dev,>=1.13.0, installed: 1.14.0]
    - uritemplate [required: >=3.0.0,<4dev, installed: 3.0.1]
  - h5py [required: Any, installed: 3.1.0]
    - numpy [required: >=1.17.5, installed: 1.19.5]
  - lxml [required: Any, installed: 4.6.2]
  - matplotlib [required: Any, installed: 3.3.3]
    - cycler [required: >=0.10, installed: 0.10.0]
      - six [required: Any, installed: 1.14.0]
    - kiwisolver [required: >=1.0.1, installed: 1.3.1]
    - numpy [required: >=1.15, installed: 1.19.5]
    - pillow [required: >=6.2.0, installed: 8.1.0]
    - pyparsing [required: !=2.1.2,!=2.1.6,>=2.0.3,!=2.0.4, installed: 2.4.6]
    - python-dateutil [required: >=2.1, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.14.0]
  - mock [required: Any, installed: 4.0.3]
  - nltk [required: ==3.4.5, installed: 3.4.5]
    - six [required: Any, installed: 1.14.0]
  - numpy [required: Any, installed: 1.19.5]
  - oauth2client [required: Any, installed: 4.1.3]
    - httplib2 [required: >=0.9.1, installed: 0.18.1]
    - pyasn1 [required: >=0.1.7, installed: 0.4.8]
    - pyasn1-modules [required: >=0.0.5, installed: 0.2.8]
      - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
    - rsa [required: >=3.1.4, installed: 4.7]
      - pyasn1 [required: >=0.1.3, installed: 0.4.8]
    - six [required: >=1.6.1, installed: 1.14.0]
  - pandas [required: Any, installed: 1.2.1]
    - numpy [required: >=1.16.5, installed: 1.19.5]
    - python-dateutil [required: >=2.7.3, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.14.0]
    - pytz [required: >=2017.3, installed: 2020.5]
  - Pillow [required: Any, installed: 8.1.0]
  - pre-commit [required: Any, installed: 2.9.3]
    - cfgv [required: >=2.0.0, installed: 3.2.0]
    - identify [required: >=1.0.0, installed: 1.5.13]
    - nodeenv [required: >=0.11.1, installed: 1.5.0]
    - pyyaml [required: >=5.1, installed: 5.4.1]
    - toml [required: Any, installed: 0.10.2]
    - virtualenv [required: >=20.0.8, installed: 20.4.0]
      - appdirs [required: <2,>=1.4.3, installed: 1.4.3]
      - distlib [required: <1,>=0.3.1, installed: 0.3.1]
      - filelock [required: >=3.0.0,<4, installed: 3.0.12]
      - six [required: <2,>=1.9.0, installed: 1.14.0]
  - profane [required: >=0.2.0, installed: 0.2.2]
    - colorama [required: Any, installed: 0.4.3]
    - docopt [required: Any, installed: 0.6.2]
    - numpy [required: >=1.17, installed: 1.19.5]
    - sqlalchemy [required: Any, installed: 1.3.22]
    - sqlalchemy-utils [required: Any, installed: 0.36.8]
      - six [required: Any, installed: 1.14.0]
      - SQLAlchemy [required: >=1.0, installed: 1.3.22]
  - psycopg2-binary [required: Any, installed: 2.8.6]
  - pyjnius [required: ==1.2.1, installed: 1.2.1]
    - cython [required: Any, installed: 0.29.21]
    - six [required: >=1.7.0, installed: 1.14.0]
  - pymagnitude [required: ==0.1.143, installed: 0.1.143]
  - pyserini [required: ==0.9.3.0, installed: 0.9.3.0]
    - Cython [required: Any, installed: 0.29.21]
    - numpy [required: Any, installed: 1.19.5]
    - pyjnius [required: Any, installed: 1.2.1]
      - cython [required: Any, installed: 0.29.21]
      - six [required: >=1.7.0, installed: 1.14.0]
    - scikit-learn [required: Any, installed: 0.24.1]
      - joblib [required: >=0.11, installed: 1.0.0]
      - numpy [required: >=1.13.3, installed: 1.19.5]
      - scipy [required: >=0.19.1, installed: 1.6.0]
        - numpy [required: >=1.16.5, installed: 1.19.5]
      - threadpoolctl [required: >=2.0.0, installed: 2.1.0]
    - scipy [required: Any, installed: 1.6.0]
      - numpy [required: >=1.16.5, installed: 1.19.5]
  - pytest [required: Any, installed: 6.2.2]
    - attrs [required: >=19.2.0, installed: 20.3.0]
    - iniconfig [required: Any, installed: 1.1.1]
    - packaging [required: Any, installed: 20.3]
    - pluggy [required: <1.0.0a1,>=0.12, installed: 0.13.1]
    - py [required: >=1.8.2, installed: 1.10.0]
    - toml [required: Any, installed: 0.10.2]
  - pytest-mock [required: Any, installed: 3.5.1]
    - pytest [required: >=5.0, installed: 6.2.2]
      - attrs [required: >=19.2.0, installed: 20.3.0]
      - iniconfig [required: Any, installed: 1.1.1]
      - packaging [required: Any, installed: 20.3]
      - pluggy [required: <1.0.0a1,>=0.12, installed: 0.13.1]
      - py [required: >=1.8.2, installed: 1.10.0]
      - toml [required: Any, installed: 0.10.2]
  - pytrec-eval [required: >=0.5, installed: 0.5]
  - scipy [required: Any, installed: 1.6.0]
    - numpy [required: >=1.16.5, installed: 1.19.5]
  - scispacy [required: Any, installed: 0.3.0]
    - joblib [required: Any, installed: 1.0.0]
    - nmslib [required: >=1.7.3.6, installed: 2.0.11]
      - numpy [required: >=1.10.0, installed: 1.19.5]
      - psutil [required: Any, installed: 5.8.0]
      - pybind11 [required: >=2.2.3, installed: 2.6.1]
    - numpy [required: Any, installed: 1.19.5]
    - pysbd [required: Any, installed: 0.3.3]
    - requests [required: <3.0.0conllu,>=2.0.0, installed: 2.22.0]
    - scikit-learn [required: >=0.20.3, installed: 0.24.1]
      - joblib [required: >=0.11, installed: 1.0.0]
      - numpy [required: >=1.13.3, installed: 1.19.5]
      - scipy [required: >=0.19.1, installed: 1.6.0]
        - numpy [required: >=1.16.5, installed: 1.19.5]
      - threadpoolctl [required: >=2.0.0, installed: 2.1.0]
    - spacy [required: <3.0.0,>=2.3.0, installed: 2.3.5]
      - blis [required: >=0.4.0,<0.8.0, installed: 0.7.4]
        - numpy [required: >=1.15.0, installed: 1.19.5]
      - catalogue [required: <1.1.0,>=0.0.7, installed: 1.0.0]
      - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
      - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
      - numpy [required: >=1.15.0, installed: 1.19.5]
      - plac [required: >=0.9.6,<1.2.0, installed: 1.1.3]
      - preshed [required: <3.1.0,>=3.0.2, installed: 3.0.5]
        - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
        - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
      - requests [required: <3.0.0,>=2.13.0, installed: 2.22.0]
      - setuptools [required: Any, installed: 44.0.0]
      - srsly [required: <1.1.0,>=1.0.2, installed: 1.0.5]
      - thinc [required: <7.5.0,>=7.4.1, installed: 7.4.5]
        - blis [required: >=0.4.0,<0.8.0, installed: 0.7.4]
          - numpy [required: >=1.15.0, installed: 1.19.5]
        - catalogue [required: <1.1.0,>=0.0.7, installed: 1.0.0]
        - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
        - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
        - numpy [required: >=1.15.0, installed: 1.19.5]
        - plac [required: >=0.9.6,<1.2.0, installed: 1.1.3]
        - preshed [required: <3.1.0,>=1.0.1, installed: 3.0.5]
          - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
          - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
        - srsly [required: <1.1.0,>=0.0.6, installed: 1.0.5]
        - tqdm [required: >=4.10.0,<5.0.0, installed: 4.56.0]
        - wasabi [required: <1.1.0,>=0.0.9, installed: 0.8.1]
      - tqdm [required: >=4.38.0,<5.0.0, installed: 4.56.0]
      - wasabi [required: >=0.4.0,<1.1.0, installed: 0.8.1]
  - spacy [required: Any, installed: 2.3.5]
    - blis [required: >=0.4.0,<0.8.0, installed: 0.7.4]
      - numpy [required: >=1.15.0, installed: 1.19.5]
    - catalogue [required: <1.1.0,>=0.0.7, installed: 1.0.0]
    - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
    - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
    - numpy [required: >=1.15.0, installed: 1.19.5]
    - plac [required: >=0.9.6,<1.2.0, installed: 1.1.3]
    - preshed [required: <3.1.0,>=3.0.2, installed: 3.0.5]
      - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
      - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
    - requests [required: <3.0.0,>=2.13.0, installed: 2.22.0]
    - setuptools [required: Any, installed: 44.0.0]
    - srsly [required: <1.1.0,>=1.0.2, installed: 1.0.5]
    - thinc [required: <7.5.0,>=7.4.1, installed: 7.4.5]
      - blis [required: >=0.4.0,<0.8.0, installed: 0.7.4]
        - numpy [required: >=1.15.0, installed: 1.19.5]
      - catalogue [required: <1.1.0,>=0.0.7, installed: 1.0.0]
      - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
      - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
      - numpy [required: >=1.15.0, installed: 1.19.5]
      - plac [required: >=0.9.6,<1.2.0, installed: 1.1.3]
      - preshed [required: <3.1.0,>=1.0.1, installed: 3.0.5]
        - cymem [required: >=2.0.2,<2.1.0, installed: 2.0.5]
        - murmurhash [required: >=0.28.0,<1.1.0, installed: 1.0.5]
      - srsly [required: <1.1.0,>=0.0.6, installed: 1.0.5]
      - tqdm [required: >=4.10.0,<5.0.0, installed: 4.56.0]
      - wasabi [required: <1.1.0,>=0.0.9, installed: 0.8.1]
    - tqdm [required: >=4.38.0,<5.0.0, installed: 4.56.0]
    - wasabi [required: >=0.4.0,<1.1.0, installed: 0.8.1]
  - SQLAlchemy [required: Any, installed: 1.3.22]
  - sqlalchemy-utils [required: Any, installed: 0.36.8]
    - six [required: Any, installed: 1.14.0]
    - SQLAlchemy [required: >=1.0, installed: 1.3.22]
  - tensorflow [required: ==2.3.1, installed: 2.3.1]
    - absl-py [required: >=0.7.0, installed: 0.11.0]
      - six [required: Any, installed: 1.14.0]
    - astunparse [required: ==1.6.3, installed: 1.6.3]
      - six [required: >=1.6.1,<2.0, installed: 1.14.0]
      - wheel [required: >=0.23.0,<1.0, installed: 0.34.2]
    - gast [required: ==0.3.3, installed: 0.3.3]
    - google-pasta [required: >=0.1.8, installed: 0.2.0]
      - six [required: Any, installed: 1.14.0]
    - grpcio [required: >=1.8.6, installed: 1.35.0]
      - six [required: >=1.5.2, installed: 1.14.0]
    - h5py [required: <2.11.0,>=2.10.0, installed: 3.1.0]
      - numpy [required: >=1.17.5, installed: 1.19.5]
    - keras-preprocessing [required: <1.2,>=1.1.1, installed: 1.1.2]
      - numpy [required: >=1.9.1, installed: 1.19.5]
      - six [required: >=1.9.0, installed: 1.14.0]
    - numpy [required: <1.19.0,>=1.16.0, installed: 1.19.5]
    - opt-einsum [required: >=2.3.2, installed: 3.3.0]
      - numpy [required: >=1.7, installed: 1.19.5]
    - protobuf [required: >=3.9.2, installed: 3.14.0]
      - six [required: >=1.9, installed: 1.14.0]
    - six [required: >=1.12.0, installed: 1.14.0]
    - tensorboard [required: <3,>=2.3.0, installed: 2.4.1]
      - absl-py [required: >=0.4, installed: 0.11.0]
        - six [required: Any, installed: 1.14.0]
      - google-auth [required: <2,>=1.6.3, installed: 1.24.0]
        - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
        - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
          - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
        - rsa [required: >=3.1.4,<5, installed: 4.7]
          - pyasn1 [required: >=0.1.3, installed: 0.4.8]
        - setuptools [required: >=40.3.0, installed: 44.0.0]
        - six [required: >=1.9.0, installed: 1.14.0]
      - google-auth-oauthlib [required: <0.5,>=0.4.1, installed: 0.4.2]
        - google-auth [required: Any, installed: 1.24.0]
          - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
          - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
            - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
          - rsa [required: >=3.1.4,<5, installed: 4.7]
            - pyasn1 [required: >=0.1.3, installed: 0.4.8]
          - setuptools [required: >=40.3.0, installed: 44.0.0]
          - six [required: >=1.9.0, installed: 1.14.0]
        - requests-oauthlib [required: >=0.7.0, installed: 1.3.0]
          - oauthlib [required: >=3.0.0, installed: 3.1.0]
          - requests [required: >=2.0.0, installed: 2.22.0]
      - grpcio [required: >=1.24.3, installed: 1.35.0]
        - six [required: >=1.5.2, installed: 1.14.0]
      - markdown [required: >=2.6.8, installed: 3.3.3]
      - numpy [required: >=1.12.0, installed: 1.19.5]
      - protobuf [required: >=3.6.0, installed: 3.14.0]
        - six [required: >=1.9, installed: 1.14.0]
      - requests [required: >=2.21.0,<3, installed: 2.22.0]
      - setuptools [required: >=41.0.0, installed: 44.0.0]
      - six [required: >=1.10.0, installed: 1.14.0]
      - tensorboard-plugin-wit [required: >=1.6.0, installed: 1.8.0]
      - werkzeug [required: >=0.11.15, installed: 1.0.1]
      - wheel [required: >=0.26, installed: 0.34.2]
    - tensorflow-estimator [required: >=2.3.0,<2.4.0, installed: 2.3.0]
    - termcolor [required: >=1.1.0, installed: 1.1.0]
    - wheel [required: >=0.26, installed: 0.34.2]
    - wrapt [required: >=1.11.1, installed: 1.12.1]
  - tensorflow-ranking [required: Any, installed: 0.3.2]
    - absl-py [required: >=0.1.6, installed: 0.11.0]
      - six [required: Any, installed: 1.14.0]
    - numpy [required: >=1.13.3, installed: 1.19.5]
    - six [required: >=1.10.0, installed: 1.14.0]
    - tensorflow-serving-api [required: >=2.0.0,<3.0.0, installed: 2.4.1]
      - grpcio [required: >=1.0<2, installed: 1.35.0]
        - six [required: >=1.5.2, installed: 1.14.0]
      - protobuf [required: >=3.6.0, installed: 3.14.0]
        - six [required: >=1.9, installed: 1.14.0]
      - tensorflow [required: >=2.4.1,<3, installed: 2.3.1]
        - absl-py [required: >=0.7.0, installed: 0.11.0]
          - six [required: Any, installed: 1.14.0]
        - astunparse [required: ==1.6.3, installed: 1.6.3]
          - six [required: >=1.6.1,<2.0, installed: 1.14.0]
          - wheel [required: >=0.23.0,<1.0, installed: 0.34.2]
        - gast [required: ==0.3.3, installed: 0.3.3]
        - google-pasta [required: >=0.1.8, installed: 0.2.0]
          - six [required: Any, installed: 1.14.0]
        - grpcio [required: >=1.8.6, installed: 1.35.0]
          - six [required: >=1.5.2, installed: 1.14.0]
        - h5py [required: <2.11.0,>=2.10.0, installed: 3.1.0]
          - numpy [required: >=1.17.5, installed: 1.19.5]
        - keras-preprocessing [required: <1.2,>=1.1.1, installed: 1.1.2]
          - numpy [required: >=1.9.1, installed: 1.19.5]
          - six [required: >=1.9.0, installed: 1.14.0]
        - numpy [required: <1.19.0,>=1.16.0, installed: 1.19.5]
        - opt-einsum [required: >=2.3.2, installed: 3.3.0]
          - numpy [required: >=1.7, installed: 1.19.5]
        - protobuf [required: >=3.9.2, installed: 3.14.0]
          - six [required: >=1.9, installed: 1.14.0]
        - six [required: >=1.12.0, installed: 1.14.0]
        - tensorboard [required: <3,>=2.3.0, installed: 2.4.1]
          - absl-py [required: >=0.4, installed: 0.11.0]
            - six [required: Any, installed: 1.14.0]
          - google-auth [required: <2,>=1.6.3, installed: 1.24.0]
            - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
            - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
              - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
            - rsa [required: >=3.1.4,<5, installed: 4.7]
              - pyasn1 [required: >=0.1.3, installed: 0.4.8]
            - setuptools [required: >=40.3.0, installed: 44.0.0]
            - six [required: >=1.9.0, installed: 1.14.0]
          - google-auth-oauthlib [required: <0.5,>=0.4.1, installed: 0.4.2]
            - google-auth [required: Any, installed: 1.24.0]
              - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
              - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
                - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
              - rsa [required: >=3.1.4,<5, installed: 4.7]
                - pyasn1 [required: >=0.1.3, installed: 0.4.8]
              - setuptools [required: >=40.3.0, installed: 44.0.0]
              - six [required: >=1.9.0, installed: 1.14.0]
            - requests-oauthlib [required: >=0.7.0, installed: 1.3.0]
              - oauthlib [required: >=3.0.0, installed: 3.1.0]
              - requests [required: >=2.0.0, installed: 2.22.0]
          - grpcio [required: >=1.24.3, installed: 1.35.0]
            - six [required: >=1.5.2, installed: 1.14.0]
          - markdown [required: >=2.6.8, installed: 3.3.3]
          - numpy [required: >=1.12.0, installed: 1.19.5]
          - protobuf [required: >=3.6.0, installed: 3.14.0]
            - six [required: >=1.9, installed: 1.14.0]
          - requests [required: >=2.21.0,<3, installed: 2.22.0]
          - setuptools [required: >=41.0.0, installed: 44.0.0]
          - six [required: >=1.10.0, installed: 1.14.0]
          - tensorboard-plugin-wit [required: >=1.6.0, installed: 1.8.0]
          - werkzeug [required: >=0.11.15, installed: 1.0.1]
          - wheel [required: >=0.26, installed: 0.34.2]
        - tensorflow-estimator [required: >=2.3.0,<2.4.0, installed: 2.3.0]
        - termcolor [required: >=1.1.0, installed: 1.1.0]
        - wheel [required: >=0.26, installed: 0.34.2]
        - wrapt [required: >=1.11.1, installed: 1.12.1]
    - tf-models-official [required: >=2.3.0, installed: 2.4.0]
      - Cython [required: Any, installed: 0.29.21]
      - dataclasses [required: Any, installed: 0.6]
      - gin-config [required: Any, installed: 0.4.0]
      - google-api-python-client [required: >=1.6.7, installed: 1.12.8]
        - google-api-core [required: <2dev,>=1.21.0, installed: 1.25.1]
          - google-auth [required: <2.0dev,>=1.21.1, installed: 1.24.0]
            - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
            - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
              - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
            - rsa [required: >=3.1.4,<5, installed: 4.7]
              - pyasn1 [required: >=0.1.3, installed: 0.4.8]
            - setuptools [required: >=40.3.0, installed: 44.0.0]
            - six [required: >=1.9.0, installed: 1.14.0]
          - googleapis-common-protos [required: >=1.6.0,<2.0dev, installed: 1.52.0]
            - protobuf [required: >=3.6.0, installed: 3.14.0]
              - six [required: >=1.9, installed: 1.14.0]
          - protobuf [required: >=3.12.0, installed: 3.14.0]
            - six [required: >=1.9, installed: 1.14.0]
          - pytz [required: Any, installed: 2020.5]
          - requests [required: <3.0.0dev,>=2.18.0, installed: 2.22.0]
          - setuptools [required: >=40.3.0, installed: 44.0.0]
          - six [required: >=1.13.0, installed: 1.14.0]
        - google-auth [required: >=1.16.0, installed: 1.24.0]
          - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
          - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
            - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
          - rsa [required: >=3.1.4,<5, installed: 4.7]
            - pyasn1 [required: >=0.1.3, installed: 0.4.8]
          - setuptools [required: >=40.3.0, installed: 44.0.0]
          - six [required: >=1.9.0, installed: 1.14.0]
        - google-auth-httplib2 [required: >=0.0.3, installed: 0.0.4]
          - google-auth [required: Any, installed: 1.24.0]
            - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
            - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
              - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
            - rsa [required: >=3.1.4,<5, installed: 4.7]
              - pyasn1 [required: >=0.1.3, installed: 0.4.8]
            - setuptools [required: >=40.3.0, installed: 44.0.0]
            - six [required: >=1.9.0, installed: 1.14.0]
          - httplib2 [required: >=0.9.1, installed: 0.18.1]
          - six [required: Any, installed: 1.14.0]
        - httplib2 [required: >=0.15.0,<1dev, installed: 0.18.1]
        - six [required: <2dev,>=1.13.0, installed: 1.14.0]
        - uritemplate [required: >=3.0.0,<4dev, installed: 3.0.1]
      - google-cloud-bigquery [required: >=0.31.0, installed: 2.6.2]
        - google-api-core [required: >=1.23.0,<2.0.0dev, installed: 1.25.1]
          - google-auth [required: <2.0dev,>=1.21.1, installed: 1.24.0]
            - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
            - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
              - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
            - rsa [required: >=3.1.4,<5, installed: 4.7]
              - pyasn1 [required: >=0.1.3, installed: 0.4.8]
            - setuptools [required: >=40.3.0, installed: 44.0.0]
            - six [required: >=1.9.0, installed: 1.14.0]
          - googleapis-common-protos [required: >=1.6.0,<2.0dev, installed: 1.52.0]
            - protobuf [required: >=3.6.0, installed: 3.14.0]
              - six [required: >=1.9, installed: 1.14.0]
          - protobuf [required: >=3.12.0, installed: 3.14.0]
            - six [required: >=1.9, installed: 1.14.0]
          - pytz [required: Any, installed: 2020.5]
          - requests [required: <3.0.0dev,>=2.18.0, installed: 2.22.0]
          - setuptools [required: >=40.3.0, installed: 44.0.0]
          - six [required: >=1.13.0, installed: 1.14.0]
        - google-cloud-core [required: <2.0dev,>=1.4.1, installed: 1.5.0]
          - google-api-core [required: <2.0.0dev,>=1.21.0, installed: 1.25.1]
            - google-auth [required: <2.0dev,>=1.21.1, installed: 1.24.0]
              - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
              - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
                - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
              - rsa [required: >=3.1.4,<5, installed: 4.7]
                - pyasn1 [required: >=0.1.3, installed: 0.4.8]
              - setuptools [required: >=40.3.0, installed: 44.0.0]
              - six [required: >=1.9.0, installed: 1.14.0]
            - googleapis-common-protos [required: >=1.6.0,<2.0dev, installed: 1.52.0]
              - protobuf [required: >=3.6.0, installed: 3.14.0]
                - six [required: >=1.9, installed: 1.14.0]
            - protobuf [required: >=3.12.0, installed: 3.14.0]
              - six [required: >=1.9, installed: 1.14.0]
            - pytz [required: Any, installed: 2020.5]
            - requests [required: <3.0.0dev,>=2.18.0, installed: 2.22.0]
            - setuptools [required: >=40.3.0, installed: 44.0.0]
            - six [required: >=1.13.0, installed: 1.14.0]
          - six [required: >=1.12.0, installed: 1.14.0]
        - google-resumable-media [required: >=0.6.0,<2.0dev, installed: 1.2.0]
          - google-crc32c [required: <2.0dev,>=1.0, installed: 1.1.2]
            - cffi [required: >=1.0.0, installed: 1.14.4]
              - pycparser [required: Any, installed: 2.20]
          - six [required: Any, installed: 1.14.0]
        - proto-plus [required: >=1.10.0, installed: 1.13.0]
          - protobuf [required: >=3.12.0, installed: 3.14.0]
            - six [required: >=1.9, installed: 1.14.0]
        - protobuf [required: >=3.12.0, installed: 3.14.0]
          - six [required: >=1.9, installed: 1.14.0]
      - kaggle [required: >=1.3.9, installed: 1.5.10]
        - certifi [required: Any, installed: 2019.11.28]
        - python-dateutil [required: Any, installed: 2.8.1]
          - six [required: >=1.5, installed: 1.14.0]
        - python-slugify [required: Any, installed: 4.0.1]
          - text-unidecode [required: >=1.3, installed: 1.3]
        - requests [required: Any, installed: 2.22.0]
        - six [required: >=1.10, installed: 1.14.0]
        - tqdm [required: Any, installed: 4.56.0]
        - urllib3 [required: Any, installed: 1.25.8]
      - matplotlib [required: Any, installed: 3.3.3]
        - cycler [required: >=0.10, installed: 0.10.0]
          - six [required: Any, installed: 1.14.0]
        - kiwisolver [required: >=1.0.1, installed: 1.3.1]
        - numpy [required: >=1.15, installed: 1.19.5]
        - pillow [required: >=6.2.0, installed: 8.1.0]
        - pyparsing [required: !=2.1.2,!=2.1.6,>=2.0.3,!=2.0.4, installed: 2.4.6]
        - python-dateutil [required: >=2.1, installed: 2.8.1]
          - six [required: >=1.5, installed: 1.14.0]
      - numpy [required: >=1.15.4, installed: 1.19.5]
      - oauth2client [required: Any, installed: 4.1.3]
        - httplib2 [required: >=0.9.1, installed: 0.18.1]
        - pyasn1 [required: >=0.1.7, installed: 0.4.8]
        - pyasn1-modules [required: >=0.0.5, installed: 0.2.8]
          - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
        - rsa [required: >=3.1.4, installed: 4.7]
          - pyasn1 [required: >=0.1.3, installed: 0.4.8]
        - six [required: >=1.6.1, installed: 1.14.0]
      - opencv-python-headless [required: Any, installed: 4.5.1.48]
        - numpy [required: >=1.17.3, installed: 1.19.5]
      - pandas [required: >=0.22.0, installed: 1.2.1]
        - numpy [required: >=1.16.5, installed: 1.19.5]
        - python-dateutil [required: >=2.7.3, installed: 2.8.1]
          - six [required: >=1.5, installed: 1.14.0]
        - pytz [required: >=2017.3, installed: 2020.5]
      - Pillow [required: Any, installed: 8.1.0]
      - psutil [required: >=5.4.3, installed: 5.8.0]
      - py-cpuinfo [required: >=3.3.0, installed: 7.0.0]
      - pycocotools [required: Any, installed: 2.0.2]
        - cython [required: >=0.27.3, installed: 0.29.21]
        - matplotlib [required: >=2.1.0, installed: 3.3.3]
          - cycler [required: >=0.10, installed: 0.10.0]
            - six [required: Any, installed: 1.14.0]
          - kiwisolver [required: >=1.0.1, installed: 1.3.1]
          - numpy [required: >=1.15, installed: 1.19.5]
          - pillow [required: >=6.2.0, installed: 8.1.0]
          - pyparsing [required: !=2.1.2,!=2.1.6,>=2.0.3,!=2.0.4, installed: 2.4.6]
          - python-dateutil [required: >=2.1, installed: 2.8.1]
            - six [required: >=1.5, installed: 1.14.0]
        - setuptools [required: >=18.0, installed: 44.0.0]
      - pyyaml [required: >=5.1, installed: 5.4.1]
      - scipy [required: >=0.19.1, installed: 1.6.0]
        - numpy [required: >=1.16.5, installed: 1.19.5]
      - sentencepiece [required: Any, installed: 0.1.95]
      - seqeval [required: Any, installed: 1.2.2]
        - numpy [required: >=1.14.0, installed: 1.19.5]
        - scikit-learn [required: >=0.21.3, installed: 0.24.1]
          - joblib [required: >=0.11, installed: 1.0.0]
          - numpy [required: >=1.13.3, installed: 1.19.5]
          - scipy [required: >=0.19.1, installed: 1.6.0]
            - numpy [required: >=1.16.5, installed: 1.19.5]
          - threadpoolctl [required: >=2.0.0, installed: 2.1.0]
      - six [required: Any, installed: 1.14.0]
      - tensorflow [required: >=2.4.0, installed: 2.3.1]
        - absl-py [required: >=0.7.0, installed: 0.11.0]
          - six [required: Any, installed: 1.14.0]
        - astunparse [required: ==1.6.3, installed: 1.6.3]
          - six [required: >=1.6.1,<2.0, installed: 1.14.0]
          - wheel [required: >=0.23.0,<1.0, installed: 0.34.2]
        - gast [required: ==0.3.3, installed: 0.3.3]
        - google-pasta [required: >=0.1.8, installed: 0.2.0]
          - six [required: Any, installed: 1.14.0]
        - grpcio [required: >=1.8.6, installed: 1.35.0]
          - six [required: >=1.5.2, installed: 1.14.0]
        - h5py [required: <2.11.0,>=2.10.0, installed: 3.1.0]
          - numpy [required: >=1.17.5, installed: 1.19.5]
        - keras-preprocessing [required: <1.2,>=1.1.1, installed: 1.1.2]
          - numpy [required: >=1.9.1, installed: 1.19.5]
          - six [required: >=1.9.0, installed: 1.14.0]
        - numpy [required: <1.19.0,>=1.16.0, installed: 1.19.5]
        - opt-einsum [required: >=2.3.2, installed: 3.3.0]
          - numpy [required: >=1.7, installed: 1.19.5]
        - protobuf [required: >=3.9.2, installed: 3.14.0]
          - six [required: >=1.9, installed: 1.14.0]
        - six [required: >=1.12.0, installed: 1.14.0]
        - tensorboard [required: <3,>=2.3.0, installed: 2.4.1]
          - absl-py [required: >=0.4, installed: 0.11.0]
            - six [required: Any, installed: 1.14.0]
          - google-auth [required: <2,>=1.6.3, installed: 1.24.0]
            - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
            - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
              - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
            - rsa [required: >=3.1.4,<5, installed: 4.7]
              - pyasn1 [required: >=0.1.3, installed: 0.4.8]
            - setuptools [required: >=40.3.0, installed: 44.0.0]
            - six [required: >=1.9.0, installed: 1.14.0]
          - google-auth-oauthlib [required: <0.5,>=0.4.1, installed: 0.4.2]
            - google-auth [required: Any, installed: 1.24.0]
              - cachetools [required: >=2.0.0,<5.0, installed: 4.2.1]
              - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
                - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
              - rsa [required: >=3.1.4,<5, installed: 4.7]
                - pyasn1 [required: >=0.1.3, installed: 0.4.8]
              - setuptools [required: >=40.3.0, installed: 44.0.0]
              - six [required: >=1.9.0, installed: 1.14.0]
            - requests-oauthlib [required: >=0.7.0, installed: 1.3.0]
              - oauthlib [required: >=3.0.0, installed: 3.1.0]
              - requests [required: >=2.0.0, installed: 2.22.0]
          - grpcio [required: >=1.24.3, installed: 1.35.0]
            - six [required: >=1.5.2, installed: 1.14.0]
          - markdown [required: >=2.6.8, installed: 3.3.3]
          - numpy [required: >=1.12.0, installed: 1.19.5]
          - protobuf [required: >=3.6.0, installed: 3.14.0]
            - six [required: >=1.9, installed: 1.14.0]
          - requests [required: >=2.21.0,<3, installed: 2.22.0]
          - setuptools [required: >=41.0.0, installed: 44.0.0]
          - six [required: >=1.10.0, installed: 1.14.0]
          - tensorboard-plugin-wit [required: >=1.6.0, installed: 1.8.0]
          - werkzeug [required: >=0.11.15, installed: 1.0.1]
          - wheel [required: >=0.26, installed: 0.34.2]
        - tensorflow-estimator [required: >=2.3.0,<2.4.0, installed: 2.3.0]
        - termcolor [required: >=1.1.0, installed: 1.1.0]
        - wheel [required: >=0.26, installed: 0.34.2]
        - wrapt [required: >=1.11.1, installed: 1.12.1]
      - tensorflow-addons [required: Any, installed: 0.12.0]
        - typeguard [required: >=2.7, installed: 2.10.0]
      - tensorflow-datasets [required: Any, installed: 4.2.0]
        - absl-py [required: Any, installed: 0.11.0]
          - six [required: Any, installed: 1.14.0]
        - attrs [required: >=18.1.0, installed: 20.3.0]
        - dill [required: Any, installed: 0.3.3]
        - future [required: Any, installed: 0.18.2]
        - importlib-resources [required: Any, installed: 5.1.0]
        - numpy [required: Any, installed: 1.19.5]
        - promise [required: Any, installed: 2.3]
          - six [required: Any, installed: 1.14.0]
        - protobuf [required: >=3.12.2, installed: 3.14.0]
          - six [required: >=1.9, installed: 1.14.0]
        - requests [required: >=2.19.0, installed: 2.22.0]
        - six [required: Any, installed: 1.14.0]
        - tensorflow-metadata [required: Any, installed: 0.27.0]
          - absl-py [required: >=0.9,<0.11, installed: 0.11.0]
            - six [required: Any, installed: 1.14.0]
          - googleapis-common-protos [required: <2,>=1.52.0, installed: 1.52.0]
            - protobuf [required: >=3.6.0, installed: 3.14.0]
              - six [required: >=1.9, installed: 1.14.0]
          - protobuf [required: <4,>=3.7, installed: 3.14.0]
            - six [required: >=1.9, installed: 1.14.0]
        - termcolor [required: Any, installed: 1.1.0]
        - tqdm [required: Any, installed: 4.56.0]
      - tensorflow-hub [required: >=0.6.0, installed: 0.11.0]
        - numpy [required: >=1.12.0, installed: 1.19.5]
        - protobuf [required: >=3.8.0, installed: 3.14.0]
          - six [required: >=1.9, installed: 1.14.0]
      - tensorflow-model-optimization [required: >=0.4.1, installed: 0.5.0]
        - dm-tree [required: ~=0.1.1, installed: 0.1.5]
          - six [required: >=1.12.0, installed: 1.14.0]
        - numpy [required: ~=1.14, installed: 1.19.5]
        - six [required: ~=1.10, installed: 1.14.0]
      - tf-slim [required: >=1.1.0, installed: 1.1.0]
        - absl-py [required: >=0.2.2, installed: 0.11.0]
          - six [required: Any, installed: 1.14.0]
  - torch [required: ==1.6, installed: 1.6.0]
    - future [required: Any, installed: 0.18.2]
    - numpy [required: Any, installed: 1.19.5]
  - torchvision [required: ==0.7, installed: 0.7.0]
    - numpy [required: Any, installed: 1.19.5]
    - pillow [required: >=4.1.1, installed: 8.1.0]
    - torch [required: ==1.6.0, installed: 1.6.0]
      - future [required: Any, installed: 0.18.2]
      - numpy [required: Any, installed: 1.19.5]
  - transformers [required: ==3.1.0, installed: 3.1.0]
    - filelock [required: Any, installed: 3.0.12]
    - numpy [required: Any, installed: 1.19.5]
    - packaging [required: Any, installed: 20.3]
    - regex [required: !=2019.12.17, installed: 2020.11.13]
    - requests [required: Any, installed: 2.22.0]
    - sacremoses [required: Any, installed: 0.0.43]
      - click [required: Any, installed: 7.1.2]
      - joblib [required: Any, installed: 1.0.0]
      - regex [required: Any, installed: 2020.11.13]
      - six [required: Any, installed: 1.14.0]
      - tqdm [required: Any, installed: 4.56.0]
    - sentencepiece [required: !=0.1.92, installed: 0.1.95]
    - tokenizers [required: ==0.8.1.rc2, installed: 0.8.1rc2]
    - tqdm [required: >=4.27, installed: 4.56.0]
chardet==3.0.4
contextlib2==0.6.0
distro==1.4.0
html5lib==1.0.1
idna==2.8
ipaddr==2.2.0
lockfile==0.12.2
msgpack==0.6.2
pep517==0.8.2
pkg-resources==0.0.0
progress==1.5
pytoml==0.1.21
retrying==1.3.3
webencodings==0.5.1

error in feature/msmarco_psg

I am getting the following error when trying to use msmarcopsg, in benchmarks/msmarco.py at line 60:
IndexError: list index out of range

PARADE Replication Results

Parade Replication Results

Notes

The results are somewhat different because the replication was done on a machine without CUDA.

I had some initial trouble installing all the requirements at first, but Crystina lent me a machine where I was eventually able to get everything working fine.

Results

2020-10-21 09:15:01,147 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 dev metrics: P_1=0.688 P_10=0.521 P_20=0.447 P_5=0.563 judged_10=0.994 judged_20=0.993 judged_200=0.947 map=0.264 ndcg_cut_10=0.538 ndcg_cut_20=0.511 ndcg_cut_5=0.560 recall_100=0.453 recall_1000=0.453 recip_rank=0.784
2020-10-21 09:15:01,270 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 test metrics: P_1=0.660 P_10=0.483 P_20=0.422 P_5=0.553 judged_10=0.991 judged_20=0.982 judged_200=0.931 map=0.281 ndcg_cut_10=0.494 ndcg_cut_20=0.489 ndcg_cut_5=0.532 recall_100=0.490 recall_1000=0.490 recip_rank=0.766
2020-10-21 09:15:01,279 - INFO - capreolus.task.rerank.evaluate - rerank: skipping cross-validated metrics because results exist for only 1/5 folds

2020-10-21 20:19:43,820 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 dev metrics: P_1=0.694 P_10=0.500 P_20=0.424 P_5=0.567 judged_10=0.990 judged_20=0.981 judged_200=0.935 map=0.218 ndcg_cut_10=0.545 ndcg_cut_20=0.509 ndcg_cut_5=0.585 recall_100=0.391 recall_1000=0.391 recip_rank=0.789
2020-10-21 20:19:43,939 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 test metrics: P_1=0.688 P_10=0.500 P_20=0.451 P_5=0.529 judged_10=0.994 judged_20=0.989 judged_200=0.947 map=0.262 ndcg_cut_10=0.518 ndcg_cut_20=0.506 ndcg_cut_5=0.531 recall_100=0.453 recall_1000=0.453 recip_rank=0.767
2020-10-21 20:19:43,946 - INFO - capreolus.task.rerank.evaluate - rerank: skipping cross-validated metrics because results exist for only 2/5 folds

2020-10-21 23:44:39,319 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 dev metrics: P_1=0.708 P_10=0.573 P_20=0.456 P_5=0.642 judged_10=0.994 judged_20=0.994 judged_200=0.963 map=0.317 ndcg_cut_10=0.563 ndcg_cut_20=0.521 ndcg_cut_5=0.593 recall_100=0.557 recall_1000=0.557 recip_rank=0.792
2020-10-21 23:44:39,422 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 test metrics: P_1=0.612 P_10=0.494 P_20=0.400 P_5=0.563 judged_10=0.990 judged_20=0.982 judged_200=0.935 map=0.207 ndcg_cut_10=0.526 ndcg_cut_20=0.480 ndcg_cut_5=0.560 recall_100=0.391 recall_1000=0.391 recip_rank=0.751
2020-10-21 23:44:39,427 - INFO - capreolus.task.rerank.evaluate - rerank: skipping cross-validated metrics because results exist for only 3/5 folds

2020-10-22 03:09:47,202 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 dev metrics: P_1=0.653 P_10=0.484 P_20=0.423 P_5=0.555 judged_10=0.996 judged_20=0.992 judged_200=0.949 map=0.262 ndcg_cut_10=0.505 ndcg_cut_20=0.481 ndcg_cut_5=0.551 recall_100=0.493 recall_1000=0.493 recip_rank=0.762
2020-10-22 03:09:47,336 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 test metrics: P_1=0.583 P_10=0.550 P_20=0.446 P_5=0.613 judged_10=0.996 judged_20=0.995 judged_200=0.963 map=0.309 ndcg_cut_10=0.529 ndcg_cut_20=0.495 ndcg_cut_5=0.550 recall_100=0.557 recall_1000=0.557 recip_rank=0.720
2020-10-22 03:09:47,346 - INFO - capreolus.task.rerank.evaluate - rerank: skipping cross-validated metrics because results exist for only 4/5 folds

2020-10-22 07:11:42,980 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s5 dev metrics: P_1=0.660 P_10=0.496 P_20=0.423 P_5=0.553 judged_10=0.991 judged_20=0.981 judged_200=0.931 map=0.274 ndcg_cut_10=0.498 ndcg_cut_20=0.487 ndcg_cut_5=0.520 recall_100=0.490 recall_1000=0.490 recip_rank=0.771
2020-10-22 07:11:43,095 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s5 test metrics: P_1=0.653 P_10=0.482 P_20=0.431 P_5=0.531 judged_10=1.000 judged_20=0.996 judged_200=0.949 map=0.262 ndcg_cut_10=0.502 ndcg_cut_20=0.483 ndcg_cut_5=0.537 recall_100=0.493 recall_1000=0.493 recip_rank=0.760
2020-10-22 07:11:43,102 - INFO - capreolus.task.rerank.evaluate - rerank: average cross-validated metrics when choosing iteration based on 'ndcg_cut_20':
2020-10-22 07:11:43,451 - ERROR - capreolus.evaluator.judged - 672 in run files cannot be found in qrels (Line was repeated a number of times)
2020-10-22 07:12:13,660 - INFO - capreolus.task.rerank.evaluate -                       P_1: 0.6390
2020-10-22 07:12:13,665 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.5017
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4299
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -                       P_5: 0.5577
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -                 judged_10: 0.9942
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9886
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.9450
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -                       map: 0.2637
2020-10-22 07:12:13,670 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5137
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.4906
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -                ndcg_cut_5: 0.5420
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -                recall_100: 0.4765
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.4765
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.7527
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.25, 0.30000000000000004, 0.30000000000000004, 0.30000000000000004, 0.5]
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -              P_1 [interp]: 0.6586
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -             P_10 [interp]: 0.5056
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4353
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -              P_5 [interp]: 0.5614
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -        judged_10 [interp]: 0.9912
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9859
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -       judged_200 [interp]: 0.8509
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3225
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_10 [interp]: 0.5197
2020-10-22 07:12:13,671 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.4997
2020-10-22 07:12:13,672 - INFO - capreolus.task.rerank.evaluate -       ndcg_cut_5 [interp]: 0.5486
2020-10-22 07:12:13,672 - INFO - capreolus.task.rerank.evaluate -       recall_100 [interp]: 0.4612
2020-10-22 07:12:13,672 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2020-10-22 07:12:13,672 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.7494

Add GH action to test pip lock

Improve error message when Anserini cannot parse topics file

/usr/local/lib/python3.6/dist-packages/capreolus/searcher/__init__.py in load_trec_run(fn)
     32         run = OrderedDefaultDict()
     33 
---> 34         with open(fn, "rt") as f:
     35             for line in f:
     36                 line = line.strip()

TypeError: expected str, bytes or os.PathLike object, not NoneType

Incompatible pytorch and torchvision versions

This happens when I install it from pip:

ERROR: torchvision 0.5.0 has requirement torch==1.4.0, but you'll have torch 1.2.0 which is incompatible.

Make Extractor and (Tensorflow) Trainer realize the benchmark it's using

Currently, two different benchmarks would share exactly the same extractor and trainer cache path (where qid2toks and docid2passage, training/dev/test data, and everything are stored).

related to the refactor mentioned in #105

unknown error

Upon executing the following command I received an error that I do not understand. Error pasted below. I am running on a Linux machine, java 11.0.10. What could be the problem?

capreolus rerank.traineval with benchmark.name=msmarcopsg rank.searcher.index.stemmer=porter rank.searcher.name=BM25 rank.optimize=recall_1000 reranker.name=KNRM reranker.trainer.niters=2 optimize=map

Error:

2021-04-18 11:25:55.977718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/toobasalahuddin/capreolus/capreolus/run.py", line 96, in <module>
    task, task_entry_function = prepare_task(arguments["COMMAND"], config)
  File "/home/toobasalahuddin/capreolus/capreolus/run.py", line 34, in prepare_task
    task = Task.create(taskstr, config)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 222, in create
    module_obj = module_cls(config, provide, share_dependency_objects=share_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 274, in __init__
    self._instantiate_dependencies(self.config, provide, share_dependency_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 314, in _instantiate_dependencies
    dependencies[dependency.key] = dependency_cls.create(
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 222, in create
    module_obj = module_cls(config, provide, share_dependency_objects=share_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 274, in __init__
    self._instantiate_dependencies(self.config, provide, share_dependency_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 314, in _instantiate_dependencies
    dependencies[dependency.key] = dependency_cls.create(
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 222, in create
    module_obj = module_cls(config, provide, share_dependency_objects=share_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 274, in __init__
    self._instantiate_dependencies(self.config, provide, share_dependency_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 314, in _instantiate_dependencies
    dependencies[dependency.key] = dependency_cls.create(
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 222, in create
    module_obj = module_cls(config, provide, share_dependency_objects=share_objects)
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/profane/base.py", line 279, in __init__
    self.build()
  File "/home/toobasalahuddin/capreolus/capreolus/tokenizer/anserini.py", line 15, in build
    self._tokenize = self._get_tokenize_fn()
  File "/home/toobasalahuddin/capreolus/capreolus/tokenizer/anserini.py", line 18, in _get_tokenize_fn
    from jnius import autoclass
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/jnius/__init__.py", line 42, in <module>
    from .reflect import *  # noqa
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/jnius/reflect.py", line 17, in <module>
    class Class(with_metaclass(MetaJavaClass, JavaClass)):
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/site-packages/six.py", line 856, in __new__
    return meta(name, resolved_bases, d)
  File "jnius/jnius_export_class.pxi", line 114, in jnius.MetaJavaClass.__new__
  File "jnius/jnius_export_class.pxi", line 164, in jnius.MetaJavaClass.resolve_class
  File "jnius/jnius_env.pxi", line 11, in jnius.get_jnienv
  File "jnius/jnius_jvm_dlopen.pxi", line 118, in jnius.get_platform_jnienv
  File "jnius/jnius_jvm_dlopen.pxi", line 70, in jnius.create_jnienv
  File "jnius/jnius_jvm_dlopen.pxi", line 47, in jnius.find_java_home
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/toobasalahuddin/anaconda3/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['which', 'javac']' returned non-zero exit status 1.

Document Anaconda3 default environment incompatibility with pytorch dataloader

TPU config options

I am trying to use the TPU in colab for running this code, using the python code in the example notebook.
The docs say that set the following config options tpuname, tpuzone and storage to run on the TPU. Can you please give an example of where to set these config params?

Thank you.

Make print_config group options by module

Make EmbedText faster

Right now when creating an embedding matrix random embeddings are used to OOV words. zerounk would set oov words as zero. Both are extremes.

Consequences:

Slow embedding creation
Noisly signals due to the cosine similarity between two randomly initialized oov word embeddings

Solution : For OOV words,

Set similarity as 1 if there's an exact match (unlike zerounk, which always set it as 0)
Set similarity as 0 if it's not an exact match
Avoid building stoi - the pymagnitude embedding already has a vocabulary. Simply add OOV terms to this vocab

Add AMP support to Pytorch trainer

Pytorch 1.6 added native AMP support. This should be supported by the Pytorch trainer and configured by a new boolean amp config option.

Remove progress bar when running in a notebook

We normally use multiple tqdm progress bars at different positions, which is not rendered correctly in Colab. We should replace these with sparse logging messages when running in a notebook. Does it also make sense to do this when running in a normal shell?

Move prob in bertpassage to the shared method rather than TF-specific method

Defer tensorflow import

Importing tensorflow takes a non-trivial amount of time; about 10s in my environment. This is inconvenient when running commands like rerank.print_config that would otherwise return (nearly) instantly.

To speed things up, we can use LazyLoader to avoid importing TF until it's actually used, such as when running a TF reranker.

Import Error when setting up Capreolus on Compute Canada

I tried to follow this doc https://github.com/capreolus-ir/capreolus/blob/feature/msmarco_psg/docs/setup/setup-cc.md to set up Capreolus on the cedar server. I have installed anaconda in advance, but I got some import errors like this:

Is there any installation or steps that I need to do in advance to set up Capreolus?

Long queries handled incorrectly by bertpassage and pooled_bertpassage

The input sequence can become greater than maxseqlen if len(querytoks) is greater. This causes us to pass inputs to BERT that are too large:

https://github.com/capreolus-ir/capreolus/blob/master/capreolus/extractor/bertpassage.py#L354
https://github.com/capreolus-ir/capreolus/blob/master/capreolus/extractor/pooled_bertpassage.py#L145

@crystina-z can we fix this as part of feature/autotokenizer? I think the most straightforward solution is to rely on hgf's code to prepare the input (eg tokenizer(segA, segB)) rather than constructing it ourselves.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

capreolus-ir / capreolus Goto Github PK

capreolus's People

Contributors

Stargazers

Watchers

Forkers

capreolus's Issues

Parade Replication Results

Notes

Results

Recommend Projects

Recommend Topics

Recommend Org