Giter VIP home page Giter VIP logo

adversarial_text's Issues

New dataset

Hi, thanks for your work. This is a question, not an issue, so feel free to close it if you want.

I have a dataset with not-labelled call transcriptions, and I want to train a classifier for them. I'm wondering if I could use the adversarial training to train it (once part of the dataset it's labelled manually).

I'm waiting your suggestion, thanks again!

Pretrain model got 404 Not Found

I got 404 Not Found when tried to download the pretrain model from:
http://sato-motoki.com/research/vat/imdb_pretrained_lm.model

This is my output:

wget http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
--2020-09-01 16:52:54--  http://sato-motoki.com/research/vat/imdb_pretrained_lm.model
Resolving sato-motoki.com (sato-motoki.com)... 185.199.109.153, 185.199.111.153, 185.199.108.153, ...
Connecting to sato-motoki.com (sato-motoki.com)|185.199.109.153|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-09-01 16:52:58 ERROR 404: Not Found.

ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)

Hi, when i try to run download.sh, i have the following error:

Prepare for IMDB
Prepare script is running...
Traceback (most recent call last):
  File "preprocess.py", line 79, in <module>
    prepare_imdb()
  File "preprocess.py", line 55, in prepare_imdb
    imdb_validation_pos_start_id)
  File "preprocess.py", line 24, in load_file
    words = read_text(filename.strip())
  File "preprocess.py", line 11, in read_text
    for line in f:
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 399: ordinal not in range(128)

Then i added encoding='utf-8' at every with open() in preprocessing.py

After that, i have the following error:

Namespace(adaptive_softmax=1, add_labeld_to_unlabel=1, alpha=0.001, alpha_decay=0.9998, batchsize=32, batchsize_semi=96, clip=5.0, dataset='imdb', debug_mode=0, dropout=0.5, emb_dim=256, eval=0, freeze_word_emb=0, gpu=0, hidden_cls_dim=30, hidden_dim=1024, ignore_unk=1, load_trained_lstm='', lower=0, min_count=1, n_class=2, n_epoch=30, n_layers=1, nl_factor=1.0, norm_sentence_level=1, pretrained_model='imdb_pretrained_lm.model', random_seed=1234, save_name='imdb_model_vat', use_adv=0, use_exp_decay=1, use_rational=0, use_semi_data=1, use_unlabled_to_vocab=1, word_only=0, xi_var=5.0, xi_var_first=1.0)
train_set:71246
avg word number:242.8615501221121
vocab:87008
avg word number (train_x): 242.43914148545608
avg word number (dev_x):239.861747469366
avg word number (test_x):235.59372
lm_words_num:17297560
train_vocab_size: 66825
vocab_inv: 87008
Traceback (most recent call last):
  File "train.py", line 354, in <module>
    main()
  File "train.py", line 164, in main
    serializers.load_npz(args.pretrained_model, pretrain_model)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 190, in load_npz
    d.load(obj)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializer.py", line 83, in load
    obj.serialize(self)
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 997, in serialize
    d[name].serialize(serializer[name])
  File "/usr/local/lib/python3.6/dist-packages/chainer/link.py", line 651, in serialize
    data = serializer(name, param.data)
  File "/usr/local/lib/python3.6/dist-packages/chainer/serializers/npz.py", line 150, in __call__
    numpy.copyto(value, dataset)
ValueError: could not broadcast input array from shape (86935,256) into shape (87008,256)

I guess it is my modifying the decoding method that throws out some lines in file?
Could you give me a workout on this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.