The nlpboost from avacaondata

SQUAD config not work

In nlpboost online documentation say:

The task name for QA is qa, so the correct configuration is DatasetConfig(..., task="qa"). The default format for this task is the SQUAD format (check squad dataset in Huggingface’s Datasets). If your QA dataset is not in that format, you can either preprocess it before using AutoTrainer with it, or use a pre_func in DatasetConfig to achieve the same.

I try launch squad model with this dataset setting:

squac_config = default_args_dataset.copy()
squac_config.update(
{ "dataset_name": "squad", "alias": "squad", "task": "qa", "text_field": "context", "label_col": "question", "hf_load_kwargs": {"path": "squad"} }

But when I launch the train script, this show me the next error: Keyerror "test" (this is correct, because the dataset have train and validation, but not train)

It's posible change in nlpboost/hfdatasets_manager.py the line 94 the "train" value by "validation" and work, but would the final test work well?

93 if self.dataset_config.task == "qa":
94 test_dataset = dataset["test"]

the output error is:

/data/afernandez/nlpboost/src/nlpboost/hfdatasets_manager.py:62 in get_dataset_and_tag2id │
│ │
│ 59 │ │ │ Dictionary with tags (labels) and their indexes. │
│ 60 │ │ """ │
│ 61 │ │ if self.dataset_config.pretokenized_dataset is None: │
│ ❱ 62 │ │ │ dataset, tag2id = self._generic_load_dataset(tokenizer) │
│ 63 │ │ else: │
│ 64 │ │ │ dataset = self.dataset_config.pretokenized_dataset │
│ 65 │ │ │ tag2id = {} │
│ │
│ /data/afernandez/nlpboost/src/nlpboost/hfdatasets_manager.py:94 in _generic_load_dataset │
│ │
│ 91 │ │ if self.dataset_config.pre_func is not None: │
│ 92 │ │ │ dataset = dataset.map(self.dataset_config.pre_func, remove_columns=dataset[" │
│ 93 │ │ if self.dataset_config.task == "qa": │
│ ❱ 94 │ │ │ test_dataset = dataset["test"] │
│ 95 │ │ tags = get_tags(dataset, self.dataset_config) │
│ 96 │ │ tag2id = {t: i for i, t in enumerate(sorted(tags))} │
│ 97 │ │ dataset = self._general_label_mapper(tag2id, dataset) │
│ │
│ /data/afernandez/odesia/lib/python3.10/site-packages/datasets/dataset_dict.py:58 in getitem │
│ │
│ 55 │ │
│ 56 │ def getitem(self, k) -> Dataset: │
│ 57 │ │ if isinstance(k, (str, NamedSplit)) or len(self) == 0: │
│ ❱ 58 │ │ │ return super().getitem(k) │
│ 59 │ │ else: │
│ 60 │ │ │ available_suggested_splits = [ │
│ 61 │ │ │ │ split for split in (Split.TRAIN, Split.TEST, Split.VALIDATION) if split │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'test'

Pytest error

I receive the next message when I run the test:

ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --cov-report --cov-report term --cov-config=.coveragerc --cov=src --instafail --pycodestyle --pydocstyle .
inifile: /home/andres/Desarrollo/Desarrollo/Python/Doctorado/PruebasDeModelos/nlpboost/setup.cfg
rootdir: /home/andres/Desarrollo/Desarrollo/Python/Doctorado/PruebasDeModelos/nlpboost

some idea?

Thanks!

avacaondata / nlpboost Goto Github PK

nlpboost's People

Contributors

Stargazers

Watchers

Forkers

nlpboost's Issues

SQUAD config not work

Pytest error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent