Comments (5)
Hi @guotong1988,
you can find the whole list of data using the following code which you will get in download_dataset.py:
for ds in [
'webtext',
'small-117M', 'small-117M-k40',
'medium-345M', 'medium-345M-k40',
'large-762M', 'large-762M-k40',
'xl-1542M', 'xl-1542M-k40',
]:
for split in ['train', 'valid', 'test']:
filename = ds + "." + split + '.jsonl'
r = requests.get("https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/" + filename, stream=True)
From above you can download any specific file as follows:
https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/small-117M.train.jsonl
Else, you can run download_dataset.py to download all the dataset files.
I hope this helps.
from gpt-2-output-dataset.
thank you
from gpt-2-output-dataset.
Thanks
from gpt-2-output-dataset.
Thanks
from gpt-2-output-dataset.
Hi, How are you?
I have some question.
How to contact with you?
Thanks.
from gpt-2-output-dataset.
Related Issues (20)
- Any plans for a GPT-3 detector that can spot ChatGPT output? HOT 7
- C:\Users\JoeBiden\AppData\Local\Programs\Python\Python37\python.exe: Error while finding module specification for 'detector.server' (ModuleNotFoundError: No module named 'detector') HOT 1
- RunTimeError: Error(s) in loading state_dict for RobertaForSequenceClassification even after transformer==2.9.1 HOT 12
- Is this capable of running on Mac M1 architecture? HOT 3
- gpt-2-output-dataset
- indices sequence length is longer than the specified maximum sequence length HOT 1
- Different detection result on localhost and the server HOT 5
- Simplified English often falsely classified as AI output
- Detector failing on certain inputs
- how
- How to work with JSON lines database? HOT 1
- ModuleNotFoundError: No module named 'torch._C'
- Pembangunan sebuah bangunan memerlukan banyak bahan. Tidak hanya bahan bangunan yang berkualitas baik yang diperlukan, tetapi juga diperlukan supplier bahan konstruksi yang dapat dipercaya dan berkualitas. Pemilihan supplier bahan bangunan yang baik dan tepat dapat membantu memastikan bahwa bangunan yang dibangun memiliki kualitas yang baik dan tahan lama.
- Missing key(s) in state_dict: "roberta.embeddings.position_ids". HOT 1
- Permission for Commercial Using HOT 1
- What prompt is used to generate the GPT2 datasets? HOT 1
- Training code fails on 0 length inputs (which are in several datasets included by the author/used in the report)
- Finding a strange error in a simple question of GPT4.5
- Loss, Logits error while training HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt-2-output-dataset.