Comments (2)
Hi,
I've seen similar cost values to yours in my experiments, although I don't recall running into NaNs.
Since I don't know your experiment detail (e.g. the dataset, the application, etc) I can't really say for sure why you run into NaNs.
But I think simple gradient clipping would suffice in this situation.
I can't guarantee when, but I will add an option to turn on gradient clipping in the future.
from med2vec.
That sounds fair to me- I appreciate the help and understand that you have other stuff going on!
For all purposes I think it is fair to say that my dataset is similar to the one from the paper - it is a list of icd,cpt,ndc codes from person's visit to the doctor with the transformations from the provided ReadMe (lists of indexed ints with different patients separated by [-1] )
I have tried implementing gradient clipping by adding grad_clip on total_cost in build_model method but even with thresholds of -.5 and .5 I am still eventually getting NAN (probably because that is not the right way to do it)
Here is the lengthy output of NanGuardMode if it helps:
Med2Vec$ CUDA_VISIBLE_DEVICES=0 THEANO_FLAGS=mode=NanGuardMode python med2vec.py /da
ta/trosenfl/visit.pkl 32228 output.pkl --batch_size 10 --cr_size 500 --vr_size 1000 --window_size 3 --verbose --n_epoch 20
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switc
h to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
Using gpu device 0: Tesla P100-SXM2-16GB (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5110)
initializing parameters
building models
loading data
object
training start
epoch:0, iteration:0/1475771, cost:374.362030
epoch:0, iteration:10/1475771, cost:292.984375
epoch:0, iteration:20/1475771, cost:333.820068
epoch:0, iteration:30/1475771, cost:496.689789
Traceback (most recent call last):
File "med2vec.py", line 321, in <module>
train_med2vec(seqFile=args.seq_file, demoFile=args.demo_file, labelFile=args.label_file, outFile=args.out_file, numXcode
s=args.n_input_codes, numYcodes=args.n_output_codes, embDimSize=args.cr_size, hiddenDimSize=args.vr_size, batchSize=args.bat
ch_size, maxEpochs=args.n_epoch, L2_reg=args.L2_reg, demoSize=args.demo_size, windowSize=args.window_size, logEps=args.log_e
ps, verbose=args.verbose)
File "med2vec.py", line 289, in train_med2vec
cost = f_grad_shared(x, mask, iVector, jVector)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 884, in __call__
self.fn() if output_subset is None else\
File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 513, in __call__
storage_map=storage_map)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 482, in __call__
_, dt = self.run_thunk_of_node(current_apply)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 402, in run_thunk_of_node
compute_map=self.compute_map,
File "/usr/local/lib/python2.7/dist-packages/theano/compile/nanguardmode.py", line 344, in nan_check
do_check_on(storage_map[var][0], node)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/nanguardmode.py", line 332, in do_check_on
raise AssertionError(msg)
AssertionError: NaN detected
NanGuardMode found an error in the output of a node in this variable:
GpuElemwise{Composite{(((((((-i0) + (-i1)) / i2) + (((-i3) + (-i4)) / i5)) + (((-i6) + (-i7)) / i8)) + ((-i9) / i10)) + (i11
* i12))}}[(0, 0)] [id A] ''
|GpuCAReduce{add}{1,1} [id B] ''
| |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id C] ''
| |GpuSubtensor{int64::} [id D] ''
| | |GpuFromHost [id E] ''
| | | |x [id F]
| | |Constant{1} [id G]
| |GpuElemwise{add,no_inplace} [id H] ''
| | |CudaNdarrayConstant{[[ 9.99999994e-09]]} [id I]
| | |GpuElemwise{mul,no_inplace} [id J] ''
| | |GpuSubtensor{:int64:} [id K] ''
| | | |GpuSoftmaxWithBias [id L] ''
| | | | |GpuDot22 [id M] ''
| | | | | |GpuElemwise{maximum,no_inplace} [id N] ''
| | | | | | |GpuElemwise{Add}[(0, 0)] [id O] ''
| | | | | | | |GpuDot22 [id P] ''
| | | | | | | | |GpuElemwise{maximum,no_inplace} [id Q] ''
| | | | | | | | | |GpuElemwise{Add}[(0, 0)] [id R] ''
| | | | | | | | | | |GpuDot22 [id S] ''
| | | | | | | | | | | |GpuFromHost [id E] ''
| | | | | | | | | | | |W_emb [id T]
| | | | | | | | | | |GpuDimShuffle{x,0} [id U] ''
| | | | | | | | | | |b_emb [id V]
| | | | | | | | | |CudaNdarrayConstant{[[ 0.]]} [id W]
| | | | | | | | |W_hidden [id X]
| | | | | | | |GpuDimShuffle{x,0} [id Y] ''
| | | | | | | |b_hidden [id Z]
| | | | | | |CudaNdarrayConstant{[[ 0.]]} [id W]
| | | | | |W_output [id BA]
| | | | |b_output [id BB]
| | | |Constant{-1} [id BC]
| | |GpuElemwise{mul,no_inplace} [id BD] ''
| | |GpuDimShuffle{0,x} [id BE] ''
| | | |GpuSubtensor{:int64:} [id BF] ''
| | | |GpuFromHost [id BG] ''
| | | | |mask [id BH]
| | | |Constant{-1} [id BC]
| | |GpuDimShuffle{0,x} [id BI] ''
| | |GpuSubtensor{int64::} [id BJ] ''
| | |GpuFromHost [id BG] ''
| | |Constant{1} [id G]
| |GpuElemwise{sub,no_inplace} [id BK] ''
| | |CudaNdarrayConstant{[[ 1.]]} [id BL]
| | |GpuSubtensor{int64::} [id D] ''
| |GpuElemwise{mul,no_inplace} [id J] ''
|GpuCAReduce{add}{1,1} [id BM] ''
| |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id BN] ''
| |GpuSubtensor{:int64:} [id BO] ''
| | |GpuFromHost [id E] ''
| | |Constant{-1} [id BC]
| |GpuElemwise{add,no_inplace} [id BP] ''
| | |CudaNdarrayConstant{[[ 9.99999994e-09]]} [id I]
| | |GpuElemwise{mul,no_inplace} [id BQ] ''
| | |GpuSubtensor{int64::} [id BR] ''
| | | |GpuSoftmaxWithBias [id L] ''
| | | |Constant{1} [id G]
| | |GpuElemwise{mul,no_inplace} [id BD] ''
| |GpuElemwise{sub,no_inplace} [id BS] ''
| | |CudaNdarrayConstant{[[ 1.]]} [id BL]
| | |GpuSubtensor{:int64:} [id BO] ''
| |GpuElemwise{mul,no_inplace} [id BQ] ''
|GpuElemwise{Add}[(0, 1)] [id BT] ''
| |CudaNdarrayConstant{9.99999993923e-09} [id BU]
| |GpuCAReduce{add}{1,1} [id BV] ''
| |GpuElemwise{mul,no_inplace} [id BD] ''
|GpuCAReduce{add}{1,1} [id BW] ''
| |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id BX] ''
| |GpuSubtensor{int64::} [id BY] ''
| | |GpuFromHost [id E] ''
| | |Constant{2} [id BZ]
| |GpuElemwise{add,no_inplace} [id CA] ''
| | |CudaNdarrayConstant{[[ 9.99999994e-09]]} [id I]
| | |GpuElemwise{mul,no_inplace} [id CB] ''
| | |GpuSubtensor{:int64:} [id CC] ''
| | | |GpuSoftmaxWithBias [id L] ''
| | | |Constant{-2} [id CD]
| | |GpuElemwise{Composite{((i0 * i1) * i2)},no_inplace} [id CE] ''
| | |GpuDimShuffle{0,x} [id CF] ''
| | | |GpuSubtensor{:int64:} [id CG] ''
| | | |GpuFromHost [id BG] ''
| | | |Constant{-2} [id CD]
| | |GpuDimShuffle{0,x} [id CH] ''
| | | |GpuSubtensor{int64:int64:} [id CI] ''
| | | |GpuFromHost [id BG] ''
| | | |Constant{1} [id G]
| | | |Constant{-1} [id BC]
| | |GpuDimShuffle{0,x} [id CJ] ''
| | |GpuSubtensor{int64::} [id CK] ''
| | |GpuFromHost [id BG] ''
| | |Constant{2} [id BZ]
| |GpuElemwise{sub,no_inplace} [id CL] ''
| | |GpuSubtensor{int64::} [id BY] ''
| |GpuElemwise{mul,no_inplace} [id CB] ''
|GpuCAReduce{add}{1,1} [id CM] ''
| |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id CN] ''
| |GpuSubtensor{:int64:} [id CO] ''
| | |GpuFromHost [id E] ''
| | |Constant{-2} [id CD]
| |GpuElemwise{add,no_inplace} [id CP] ''
| | |CudaNdarrayConstant{[[ 9.99999994e-09]]} [id I]
| | |GpuElemwise{mul,no_inplace} [id CQ] ''
| | |GpuSubtensor{int64::} [id CR] ''
| | | |GpuSoftmaxWithBias [id L] ''
| | | |Constant{2} [id BZ]
| | |GpuElemwise{Composite{((i0 * i1) * i2)},no_inplace} [id CE] ''
| |GpuElemwise{sub,no_inplace} [id CS] ''
| | |CudaNdarrayConstant{[[ 1.]]} [id BL]
| | |GpuSubtensor{:int64:} [id CO] ''
| |GpuElemwise{mul,no_inplace} [id CQ] ''
|GpuElemwise{Add}[(0, 1)] [id CT] ''
| |CudaNdarrayConstant{9.99999993923e-09} [id BU]
| |GpuCAReduce{add}{1,1} [id CU] ''
| |GpuElemwise{Composite{((i0 * i1) * i2)},no_inplace} [id CE] ''
|GpuCAReduce{add}{1,1} [id CV] ''
| |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id CW] ''
| |GpuSubtensor{int64::} [id CX] ''
| | |GpuFromHost [id E] ''
| | |Constant{3} [id CY]
| |GpuElemwise{add,no_inplace} [id CZ] ''
| | |CudaNdarrayConstant{[[ 9.99999994e-09]]} [id I]
| | |GpuElemwise{mul,no_inplace} [id DA] ''
| | |GpuSubtensor{:int64:} [id DB] ''
| | | |GpuSoftmaxWithBias [id L] ''
| | | |Constant{-3} [id DC]
| | |GpuElemwise{Composite{(((i0 * i1) * i2) * i3)},no_inplace} [id DD] ''
| | |GpuDimShuffle{0,x} [id DE] ''
| | | |GpuSubtensor{:int64:} [id DF] ''
| | | |GpuFromHost [id BG] ''
| | | |Constant{-3} [id DC]
| | |GpuDimShuffle{0,x} [id DG] ''
| | | |GpuSubtensor{int64:int64:} [id DH] ''
| | | |GpuFromHost [id BG] ''
| | | |Constant{1} [id G]
| | | |Constant{-2} [id CD]
| | |GpuDimShuffle{0,x} [id DI] ''
| | | |GpuSubtensor{int64:int64:} [id DJ] ''
| | | |GpuFromHost [id BG] ''
| | | |Constant{2} [id BZ]
| | | |Constant{-1} [id BC]
| | |GpuDimShuffle{0,x} [id DK] ''
| | |GpuSubtensor{int64::} [id DL] ''
| | |GpuFromHost [id BG] ''
| | |Constant{3} [id CY]
| |GpuElemwise{sub,no_inplace} [id DM] ''
| | |CudaNdarrayConstant{[[ 1.]]} [id BL]
| | |GpuSubtensor{int64::} [id CX] ''
| |GpuElemwise{mul,no_inplace} [id DA] ''
|GpuCAReduce{add}{1,1} [id DN] ''
| |GpuElemwise{Composite{((i0 * log(i1)) + (i2 * log1p((-i3))))},no_inplace} [id DO] ''
| |GpuSubtensor{:int64:} [id DP] ''
| | |GpuFromHost [id E] ''
| | |Constant{-3} [id DC]
| |GpuElemwise{add,no_inplace} [id DQ] ''
| | |CudaNdarrayConstant{[[ 9.99999994e-09]]} [id I]
| | |GpuElemwise{mul,no_inplace} [id DR] ''
| | |GpuSubtensor{int64::} [id DS] ''
| | | |GpuSoftmaxWithBias [id L] ''
| | | |Constant{3} [id CY]
| | |GpuElemwise{Composite{(((i0 * i1) * i2) * i3)},no_inplace} [id DD] ''
| |GpuElemwise{sub,no_inplace} [id DT] ''
| | |CudaNdarrayConstant{[[ 1.]]} [id BL]
| | |GpuSubtensor{:int64:} [id DP] ''
| |GpuElemwise{mul,no_inplace} [id DR] ''
|GpuElemwise{Add}[(0, 1)] [id DU] ''
| |CudaNdarrayConstant{9.99999993923e-09} [id BU]
| |GpuCAReduce{add}{1,1} [id DV] ''
| |GpuElemwise{Composite{(((i0 * i1) * i2) * i3)},no_inplace} [id DD] ''
|GpuCAReduce{add}{1} [id DW] ''
| |GpuElemwise{log,no_inplace} [id DX] ''
| |GpuElemwise{Composite{(i0 + (i1 / i2))},no_inplace} [id DY] ''
| |CudaNdarrayConstant{[ 9.99999994e-09]} [id DZ]
| |GpuElemwise{Exp}[(0, 0)] [id EA] ''
| | |GpuCAReduce{add}{0,1} [id EB] ''
| | |GpuElemwise{mul,no_inplace} [id EC] ''
| | |GpuAdvancedSubtensor1 [id ED] ''
| | | |GpuElemwise{maximum,no_inplace} [id EE] ''
| | | | |W_emb [id T]
| | | | |CudaNdarrayConstant{[[ 0.]]} [id W]
| | | |Elemwise{Cast{int64}} [id EF] ''
| | | |iVector [id EG]
| | |GpuAdvancedSubtensor1 [id EH] ''
| | |GpuElemwise{maximum,no_inplace} [id EE] ''
| | |Elemwise{Cast{int64}} [id EI] ''
| | |jVector [id EJ]
| |GpuAdvancedSubtensor1 [id EK] ''
| |GpuCAReduce{add}{0,1} [id EL] ''
| | |GpuElemwise{Exp}[(0, 0)] [id EM] ''
| | |GpuDot22 [id EN] ''
| | |GpuElemwise{maximum,no_inplace} [id EE] ''
| | |GpuDimShuffle{1,0} [id EO] ''
| | |GpuElemwise{maximum,no_inplace} [id EE] ''
| |Elemwise{Cast{int64}} [id EF] ''
|GpuFromHost [id EP] ''
| |Elemwise{Cast{float32}} [id EQ] ''
| |Shape_i{0} [id ER] ''
| |iVector [id EG]
|CudaNdarrayConstant{0.0010000000475} [id ES]
|GpuCAReduce{pre=sqr,red=add}{1,1} [id ET] ''
|W_emb [id T]
Apply node that caused the error: GpuElemwise{Composite{(((((((-i0) + (-i1)) / i2) + (((-i3) + (-i4)) / i5)) + (((-i6) + (-i
7)) / i8)) + ((-i9) / i10)) + (i11 * i12))}}[(0, 0)](GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0, GpuElemwise{Add}[(0,
1)].0, GpuCAReduce{add}{1,1}.0, GpuCAReduce{add}{1,1}.0, GpuElemwise{Add}[(0, 1)].0, GpuCAReduce{add}{1,1}.0, GpuCAReduce{ad
d}{1,1}.0, GpuElemwise{Add}[(0, 1)].0, GpuCAReduce{add}{1}.0, GpuFromHost.0, CudaNdarrayConstant{0.0010000000475}, GpuCARedu
ce{pre=sqr,red=add}{1,1}.0)
Toposort index: 138
Inputs types: [CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNda
rrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scal
ar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(f
loat32, scalar), CudaNdarrayType(float32, scalar), CudaNdarrayType(float32, scalar)]
Inputs shapes: [(), (), (), (), (), (), (), (), (), (), (), (), ()]
Inputs strides: [(), (), (), (), (), (), (), (), (), (), (), (), ()]
Inputs values: [CudaNdarray(nan), CudaNdarray(-75.6651153564), CudaNdarray(9.0), CudaNdarray(-68.2716598511), CudaNdarray(-6
8.2715835571), CudaNdarray(8.0), CudaNdarray(-60.6214866638), CudaNdarray(-60.6214637756), CudaNdarray(7.0), CudaNdarray(0.0
), CudaNdarray(0.0), CudaNdarray(0.0010000000475), CudaNdarray(462.694641113)]
Outputs clients: [[HostFromGpu(GpuElemwise{Composite{(((((((-i0) + (-i1)) / i2) + (((-i3) + (-i4)) / i5)) + (((-i6) + (-i7))
/ i8)) + ((-i9) / i10)) + (i11 * i12))}}[(0, 0)].0)]]
from med2vec.
Related Issues (20)
- TyperError: Expected Variable, got odict values HOT 4
- Negative Visit Forward Cross-Entropy on MIMIC-III HOT 1
- Questions about experiments HOT 1
- questions about the training data format HOT 3
- How to tune parameters to avoid cost:nan? HOT 1
- Where I can find the AHFS classification table? HOT 1
- Cannot able to Interpret Output of npz model File HOT 6
- Negative Code Embeddings HOT 2
- high training cost HOT 2
- Scatter plot from learned code representations HOT 16
- Epochs and loss during training HOT 3
- Mapping embeddings to ICD codes HOT 2
- NaN gradient may be due to weight initialization HOT 4
- Interpretation of learned representations
- How to make demo.txt
- GPU training fails HOT 5
- output file HOT 2
- Output model/weights? HOT 3
- Questions about complexity analysis HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from med2vec.