aonotas / adversarial_text Goto Github PK

View Code? Open in Web Editor NEW

122.0 5.0 28.0 96 KB

Code for Adversarial Training Methods for Semi-Supervised Text Classification

Shell 0.88% Python 99.12%

adversarial-texts virtual-adversarial-training nlp semi-supervised-learning chainer

adversarial_text's People

Contributors

Stargazers

Watchers

adversarial_text's Issues

Some questions about dropout in virtual adversarial training.

hello, I have some questions about dropout in VAT.

If I use dropout in VAT, the output distribution will change even without perturbation.

thanks!

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)

Hi, when i try to run download.sh, i have the following error:

Prepare for IMDB
Prepare script is running...
Traceback (most recent call last):
  File "preprocess.py", line 79, in <module>
    prepare_imdb()
  File "preprocess.py", line 55, in prepare_imdb
    imdb_validation_pos_start_id)
  File "preprocess.py", line 24, in load_file
    words = read_text(filename.strip())
  File "preprocess.py", line 11, in read_text
    for line in f:
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 399: ordinal not in range(128)

Then i added encoding='utf-8' at every with open() in preprocessing.py

After that, i have the following error:

Namespace(adaptive_softmax=1, add_labeld_to_unlabel=1, alpha=0.001, alpha_decay=0.9998, batchsize=32, batchsize_semi=96, clip=5.0, dataset='imdb', debug_mode=0, dropout=0.5, emb_dim=256, eval=0, freeze_word_emb=0, gpu=0, hidden_cls_dim=30, hidden_dim=1024, ignore_unk=1, load_trained_lstm='', lower=0, min_count=1, n_class=2, n_epoch=30, n_layers=1, nl_factor=1.0, norm_sentence_level=1, pretrained_model='imdb_pretrained_lm.model', random_seed=1234, save_name='imdb_model_vat', use_adv=0, use_exp_decay=1, use_rational=0, use_semi_data=1, use_unlabled_to_vocab=1, word_only=0, xi_var=5.0, xi_var_first=1.0)
train_set:71246
avg word number:242.8615501221121
vocab:87008
avg word number (train_x): 242.43914148545608
avg word number (dev_x):239.861747469366
avg word number (test_x):235.59372
lm_words_num:17297560
train_vocab_size: 66825
vocab_inv: 87008
Traceback (most recent call last):
  File "train.py", line 354, in <module>
    main()
  File "train.py", line 164, in main
    serializers.load_npz(args.pretrained_model, pretrain_model)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 190, in load_npz
    d.load(obj)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializer.py", line 83, in load
    obj.serialize(self)
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 997, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 651, in serialize
    data = serializer(name, param.data)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 150, in __call__
    numpy.copyto(value, dataset)
ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)

I guess it is my modifying the decoding method that throws out some lines in file?
Could you give me a workout on this issue?

Valueerror: ValueError: Cannot concatenate from empty tuple

When i am running virtual adversarial training, i got the error at nets.py file line,

x_data = self.xp.concatenate(x_data, axis=0)

and also with the train.py file
output_original = model(x, length)

Pretrain model got 404 Not Found

I got 404 Not Found when tried to download the pretrain model from:
http://sato-motoki.com/research/vat/imdb_pretrained_lm.model

This is my output:

wget http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
--2020-09-01 16:52:54--  http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
Resolving sato-motoki.com (sato-motoki.com)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ...
Connecting to sato-motoki.com (sato-motoki.com)|185.199.109.153|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-09-01 16:52:58 ERROR 404: Not Found.

New dataset

Hi, thanks for your work. This is a question, not an issue, so feel free to close it if you want.

I have a dataset with not-labelled call transcriptions, and I want to train a classifier for them. I'm wondering if I could use the adversarial training to train it (once part of the dataset it's labelled manually).

I'm waiting your suggestion, thanks again!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.