Seems like the double conv bug has now been resolved in pytorch: <a class="issue-link

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

The nan problem may because of the pytorch version <a class="issue-link js-issue-link"

Fixing the language model code about wgan-gp HOT 10 CLOSED

caogang commented on June 15, 2024

Fixing the language model code

from wgan-gp.

Comments (10)

caogang commented on June 15, 2024

Yeah, I will finish the code after this week, because i am busy this week. :)

from wgan-gp.

thvasilo commented on June 15, 2024

Thanks @caogang, will the change just involve changing the discriminator to having ResBlock instead of LinearBlock?

If that's all it takes I could submit a simple PR.

from wgan-gp.

caogang commented on June 15, 2024

Yes, that is one urgent modification needed to be done. It would be very nice for you to submit such PR.

from wgan-gp.

thvasilo commented on June 15, 2024

Hello Caocang, doing the change simply like that leads to an error (I think the it was about a 3D vs. 2D tensor expected), I'll need to understand the code a bit better before I'm able to make it work.

If I manage to get it fixed I'll submit a PR.

from wgan-gp.

caogang commented on June 15, 2024

Hi @thvasilo , I have finished the gan_language code. But I have only tested it on cpu. I will test it on gpu as well and push the new result soon.

from wgan-gp.

thvasilo commented on June 15, 2024

Thank you @caogang, I'll try to test this today and will close the issue if everything works.

from wgan-gp.

thvasilo commented on June 15, 2024

Hello @caogang I've run through 200 iterations of the model and I can verify that it doesn't crash on GPU, however I'm not sure about the quality of the model.

Using the default settings and tokenized parsing (instead of character-level) all the output is just the unk keyword. Do you have examples where you got the model to have some meaningful output?
If you can recommend parameter settings I can try them out.

Another unusual thing I've noticed is that the cost figures (train_gen, train_disc, wasserstein_distance) only report the first few iterations (7 last time I tried it), but I can't come up with a good reason on why this would happen.

Upon review of the training log, the above happens because after iteration 7 the metrics go to nan.

from wgan-gp.

caogang commented on June 15, 2024

The nan problem may because of the pytorch version #12 . And I am running the gan_language.py now. But it is very slow :(.

The command output is this:

iter 1599       tmp/lang/js4    0.649939207431  tmp/lang/js1    0.0690228257595 tmp/lang/js2    0.181709832974  tmp/lang/wasserstein distance   115.948425293   tmp/lang/train disc cost -66.5119781494   tmp/lang/train gen cost -18.3391399384  tmp/lang/time   9.30638914585   tmp/lang/js3    0.390863805779
iter 1699       tmp/lang/js4    0.654350427735  tmp/lang/js1    0.082869597317  tmp/lang/js2    0.207291999431  tmp/lang/wasserstein distance   119.094825745   tmp/lang/train disc cost -67.5338134766   tmp/lang/train gen cost -0.702518522739 tmp/lang/time   9.30622985125   tmp/lang/js3    0.407354697129
iter 1799       tmp/lang/js4    0.63200969616   tmp/lang/js1    0.062857846673  tmp/lang/js2    0.165385939136  tmp/lang/wasserstein distance   122.879089355   tmp/lang/train disc cost -69.4046707153   tmp/lang/train gen cost 6.02346229553   tmp/lang/time   9.30193270206   tmp/lang/js3    0.367523874878
iter 1899       tmp/lang/js4    0.618222129172  tmp/lang/js1    0.0649802950323 tmp/lang/js2    0.172988959998  tmp/lang/wasserstein distance   124.128738403   tmp/lang/train disc cost -70.9303894043   tmp/lang/train gen cost 4.98324775696   tmp/lang/time   9.30568416595   tmp/lang/js3    0.366494706406
iter 1999       tmp/lang/js4    0.591232030789  tmp/lang/js1    0.056358720747  tmp/lang/js2    0.14490814063   tmp/lang/wasserstein distance   125.916267395   tmp/lang/train disc cost -71.5480270386   tmp/lang/train gen cost 7.57119989395   tmp/lang/time   9.28405963659   tmp/lang/js3    0.33542050967
iter 2099       tmp/lang/js4    0.566582456808  tmp/lang/js1    0.0537158501776 tmp/lang/js2    0.142843420001  tmp/lang/wasserstein distance   129.192977905   tmp/lang/train disc cost -73.951171875    tmp/lang/train gen cost 11.3406629562   tmp/lang/time   9.32169566154   tmp/lang/js3    0.315354519341
iter 2199       tmp/lang/js4    0.583774236572  tmp/lang/js1    0.0563736255341 tmp/lang/js2    0.14336678617   tmp/lang/wasserstein distance   130.59588623    tmp/lang/train disc cost -73.9788131714   tmp/lang/train gen cost 14.0651540756   tmp/lang/time   9.33803822517   tmp/lang/js3    0.325970952853
iter 2299       tmp/lang/js4    0.560677620088  tmp/lang/js1    0.0559929911158 tmp/lang/js2    0.139462115734  tmp/lang/wasserstein distance   132.936828613   tmp/lang/train disc cost -75.1215515137   tmp/lang/train gen cost 15.6881551743   tmp/lang/time   9.35610331297   tmp/lang/js3    0.309859271646
iter 2399       tmp/lang/js4    0.600723672393  tmp/lang/js1    0.0626923963729 tmp/lang/js2    0.167899202686  tmp/lang/wasserstein distance   133.518753052   tmp/lang/train disc cost -75.1536636353   tmp/lang/train gen cost 17.0819015503   tmp/lang/time   9.29543369532   tmp/lang/js3    0.351785832441
iter 2499       tmp/lang/js4    0.559481759817  tmp/lang/js1    0.0532214248546 tmp/lang/js2    0.133561413303  tmp/lang/wasserstein distance   136.571792603   tmp/lang/train disc cost -77.7102508545   tmp/lang/train gen cost 17.6871318817   tmp/lang/time   11.9401491189   tmp/lang/js3    0.305097104994
iter 2599       tmp/lang/js4    0.556855971949  tmp/lang/js1    0.0528149367548 tmp/lang/js2    0.132663441013  tmp/lang/wasserstein distance   137.70753479    tmp/lang/train disc cost -77.2345962524   tmp/lang/train gen cost 20.4136238098   tmp/lang/time   12.6488033676   tmp/lang/js3    0.305337972041
iter 2699       tmp/lang/js4    0.545001725004  tmp/lang/js1    0.0511753393725 tmp/lang/js2    0.128217736777  tmp/lang/wasserstein distance   139.771087646   tmp/lang/train disc cost -78.144317627    tmp/lang/train gen cost 19.9227981567   tmp/lang/time   11.9526500607   tmp/lang/js3    0.293619363332

The sample output after 7099 epoch is :

Tchin oat and , dreave ain atbon
Ampors ives rlad bats anilg the 
" Itthis pest tore budt lical by
In have Carmees manfseon frem Su
Bose in actation Peen garger the
Thenaan har ucic awalh chinds , 
Hhw sain reauld thandid lagery w
Butt the Coict offage , fhom for
Soch on ease whost timertewed ,n
Hut , arged fort rapreauad will 
Thire ofterding on ouksand came 
The revensrand st porgd coucerv 
But of coupe thoed incent ahe co
In thken rhan the came on parts 
Thie is runcomledts ut har thata
He would Cpllation onday c hista
Tupartovesai cyrain Calledsacan 
" Becumed che hendts wastrefpors
Pellows on ther phiadint on comm
Chings to vicast inrecpliits oft
Hy Moded thear oues icall comple
St he cavs the confovedts not go
Thilelace is found con inden ind
"rum ofged fangy is mave nor to 
Horger sre gect and , the alse t
ome comn leln in atoll aroose or
The said fortrwith came stowacie
Shmes aver that ouvicy puitts la
The kut gest ill but bracived in
Hougd , New Moved cared that ofd
H Indangut gingi gan iceits P ow
Itsin sappation pirsealuasion th
Hey Mading onds eich is parment 
It tenders cat son in his has in
In the Mp snd hery laced is she 
Phvice Cuppeet goass , cormovt a
The condirent for charddents wab
Hough is coungedsty gan bies , t
A Cowraloss ion in the Funlivts 
Some mont collint in the mas itt
Pupeclick , drnal impordts cat a
Nost vitergs is is sheworpaced `
And ofpendation ,ighe sad apocie

from wgan-gp.

thvasilo commented on June 15, 2024

Hmm, I'll try pulling the latest master and trying again then.

The code you are running is using the parameters as defined in gan_language.py without any changes I assume?

I agree that the training is very slow, on the GPU I'm using it's about 5x slower than yours. I'll take a look at Fisher-GAN as an alternative in the meantime :/

from wgan-gp.

caogang commented on June 15, 2024

I have changed MAX_N_EXAMPLES = 10000000 to perform a full training. All other parameters is same with gan_language.py of master branch. When training on the GPU, it costs about 8981MiB memory.

from wgan-gp.

Fixing the language model code about wgan-gp HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent