Comments (3)
As a follow-up to this: It looks like the actual GLUE task name is supplied as the name
argument. Is there a way to check what name
s/sub-datasets are available under a grouping like GLUE? That information doesn't seem to be readily available in info from nlp.list_datasets()
.
Edit: I found the info under Glue.BUILDER_CONFIGS
from datasets.
Yes so the first config is loaded by default when no name
is supplied but for GLUE this should probably throw an error indeed.
We can probably just add an __init__
at the top of the class Glue(nlp.GeneratorBasedBuilder)
in the glue.py
script which does this check:
class Glue(nlp.GeneratorBasedBuilder):
def __init__(self, *args, **kwargs):
assert 'name' in kwargs and kwargs[name] is not None, "Glue has to be called with a configuration name"
super(Glue, self).__init__(*args, **kwargs)
from datasets.
An error is raised if the sub-dataset is not specified :)
ValueError: Config name is missing.
Please pick one among the available configs: ['cola', 'sst2', 'mrpc', 'qqp', 'stsb', 'mnli', 'mnli_mismatched', 'mnli_matched', 'qnli', 'rte', 'wnli', 'ax']
Example of usage:
`load_dataset('glue', 'cola')`
from datasets.
Related Issues (20)
- Make convert_to_parquet CLI command create script branch
- Allow deleting a subset/config from a no-script dataset HOT 2
- `map` with `num_proc` > 1 leads to OOM HOT 1
- Give more details in `DataFilesNotFoundError` when getting the config names
- Loading problems of Datasets with a single shard
- Winogrande does not seem to be compatible with datasets version of 1.18.0 HOT 2
- Loading a remote dataset fails in the last release (v2.19.0)
- Load and save from/to disk no longer accept pathlib.Path
- Add a doc page for the convert_to_parquet CLI
- Super slow iteration with trivial custom transform HOT 1
- largelisttype not supported (.from_polars())
- ExpectedMoreSplits error on load_dataset when upgrading to 2.19.0 HOT 2
- Cannot use cached dataset without Internet connection (or when servers are down) HOT 3
- Remove token arg from CLI examples
- Delete uploaded files from the UI
- Unable to load wiki_auto_asset_turk from GEM HOT 6
- Datasets with files with colon : in filenames cannot be used on Windows
- IterableDataset raises exception instead of retrying HOT 5
- load_dataset doesn't support list column
- Unimaginable super slow iteration HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datasets.