Hello Ed, In Med2Vec, after creating the model file, you have create

Scatter plot from learned code representations about med2vec HOT 16 OPEN

mp2893 commented on September 26, 2024

Scatter plot from learned code representations

from med2vec.

Comments (16)

mp2893 commented on September 26, 2024

I showed the scatterplot of ICD9 diagnosis codes, which can be grouped by the ICD9 taxonomy (http://www.icd9data.com/2015/Volume1/default.htm).
For other codes (medication, procedure codes), you just need to find the right grouper.

from med2vec.

sathickibrahims18 commented on September 26, 2024

Thanks Ed!

from med2vec.

sathickibrahims18 commented on September 26, 2024

Hello Ed,

My problem statement is to predict similar diagnosis codes using med2vec.

For example, If I have 140 Medical codes, embedding dimension size is 200, hidden dimension size is 2000

I have ran your code and got .npz files and find 6 numpy.array variables W_emb, b_output, b_hidden, b_emb, W_output, W_hidden inside it.

Which one is used to predict similar codes? (W_emb or W_output)

Based on my input, the output W_emb is 140X200 and W_output is 2000X140. Because in this issue https://github.com/mp2893/med2vec/issues/16#issue-403688598(Paste it in chrome) you have mentioned W_output is used to predict neighboring visits.

And also how we know the generated embeddings is respect to which diagnosis code? Where that mapping is happened between embeddings and medical codes.

Please try to clarify my doubts.
Thank You

from med2vec.

mp2893 commented on September 26, 2024

If you want to find similar diagnosis codes, you shoul use W_emb and b_emb.
Each row of W_emb corresponds to a specific medical code (e.g. diagnosis code, medication code. etc).
Given diagnosis code A, its vector representation is relu(W_emb[row that corresponds to A] + b_emb). Given A's vector representation, calculate cosine similarity between A's vector representation and the vector representations of all other medical codes. Whatever is the closest in terms of cosine similarity is the diagnosis code similar to A.
Hope this helps.

from med2vec.

sathickibrahims18 commented on September 26, 2024

Thanks Ed,

When I ran theano code in CPU it's working fine but consumes more time (16 hours for each epoch).

If ran it in GPU then it throws Segmentation Fault in this line (cost = f_grad_shared(x, batchD, y, mask, iVector, jVector)).

Could you please help me to solve this issue?

from med2vec.

mp2893 commented on September 26, 2024

Unfortunately, that error seems to be caused by system-related issues, rather than the algorithm itself (unless it's the NaN error)
There is nothing much I can do unless we sit together side-by-side.
I'd suggest taking a look at whether you are using compatible versions of CUDA, Nvidia driver, and Theano. (I know these issues are difficult, but I can't think of any reason why the code would run fine on CPU but not on GPU).

from med2vec.

sathickibrahims18 commented on September 26, 2024

Could you please tell me the versions of CUDA, Nvidia driver and Theano you have used?

from med2vec.

mp2893 commented on September 26, 2024

It says in the README that I used Theano 0.7.
As for CUDA, IIRC, I used either 6.0 or 7.0.
As for the Nvidia driver, I really have no idea.
If you understand med2vec code, I suggest you just implement a TensorFlow version, since med2vec is not that complicated. Most work is spent on pre-processing the data, and the neural net itself is quite straightforward.

from med2vec.

sathickibrahims18 commented on September 26, 2024

Thanks Ed,

But I have also tried TensorFlow version, it takes 16 hours for each epochs because of huge volume of data.

Could you please help me to reduce time for epochs?

from med2vec.

mp2893 commented on September 26, 2024

That's weird. How can the job take the same amount of time (16 hours) on both CPU and GPU?

from med2vec.

sathickibrahims18 commented on September 26, 2024

Yes Ed, I have faced some weird issues.

cost = f_grad_shared(x, batchD, mask, iVector, jVector)

This particular line takes more time Ed, after removing this line model works fine.
So could you please help me to understood weather the above line is really useful for model.
without that line model works good.

from med2vec.

mp2893 commented on September 26, 2024

If you are using demographic information, and are using grouped codes for the softmax output label, then you are going to need that line. Otherwise, the model won't be trained at all.
Please check whether you are using the correct set of option arguments.

from med2vec.

sathickibrahims18 commented on September 26, 2024

OK Ed, But I didn't use demographic information and also not perform grouped codes.

cost = f_grad_shared(x, batchD, mask, iVector, jVector)

I have removed this line and generated the embeddings.

The generated embeddings are quite good. I have validated it by using cosine similarity.
For example if we search for Type2Diabetes it gives commorbities as Chronic Kidney Disease.

Could you please confirm whether this approach is correct or not?

from med2vec.

mp2893 commented on September 26, 2024

My mistake. The line you deleted is used only when you are using demographic information, but not grouped codes (see line 274 of the source code).
It's still weird, because even though you are not using demographic info, deleting that line somehow impacts the experiment.
Although I don't have any good answer, I guess it's fine as long as your experiment runs without any issue.

from med2vec.

victorconan commented on September 26, 2024

If you want to find similar diagnosis codes, you shoul use W_emb and b_emb.
Each row of W_emb corresponds to a specific medical code (e.g. diagnosis code, medication code. etc).
Given diagnosis code A, its vector representation is relu(W_emb[row that corresponds to A] + b_emb). Given A's vector representation, calculate cosine similarity between A's vector representation and the vector representations of all other medical codes. Whatever is the closest in terms of cosine similarity is the diagnosis code similar to A.
Hope this helps.

Is the diagnosis code representation as ReLU(W_emb + b_emb) or ReLU(W_emb)? In the paper, it says

The code representations to be learned is denoted as a matrix Wc' = ReLU(Wc)

from med2vec.

mp2893 commented on September 26, 2024

Technically it should be ReLU(W_emb[row that corresponds to A] + b_emb.
I guess in the paper I omitted the bias term.

from med2vec.

Scatter plot from learned code representations about med2vec HOT 16 OPEN

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent