Giter VIP home page Giter VIP logo

Comments (5)

mp2893 avatar mp2893 commented on June 17, 2024 1

To be fair, Med2Vec is a co-occurrence based algorithm, so it will show good performance in applications where co-occurrence information between codes plays an important role. But Med2Vec probably won't help you find novel cure for cancer. For fraud detection, I think it will be helpful since fraud detection can be seen as anomaly detection.

As for your questions:

  1. Your question is valid. Actually I tried both softmax and sigmoid in other papers when doing the visit-level prediction. But softmax almost always out-performed sigmoid. We think this is because softmax is a strong regularizer due to the normalizing denominator. Also, there aren't many codes per visit (e.g. typical less than 10 in most datasets) so using softmax instead of sigmoid doesn't have too drastic a impact.

  2. If you use the input sequence as the label sequence then it will take a very long time to train because you are training a softmax with tens of thousands of possible outcomes. In my paper, I grouped the codes with existing groupers (such as CCS diagnosis grouper) to reduce the output space. I suggest you do the same, as it significantly increases training speed, and has minimal impact to the overall performance (although it depends on what application you have in mind)

Thanks,
Ed

from med2vec.

mp2893 avatar mp2893 commented on June 17, 2024 1

Hi Kirk,

It's wonderful to meet another person with the same interest.
It would great to have a distribution-learning-enabled med2vec for people with large data.
I can't guarantee I'll promptly review the code, but it would be nice to have a pull request.
(or we can have a separate script that trains in distributed fashion)

from med2vec.

mp2893 avatar mp2893 commented on June 17, 2024

Hi Xianlong,

Thanks for your interest in our work.

To answer your question:

  1. To be fair, CHOA and MIMIC-III are very different datasets, former being an outpatient record of 550K patients and the latter being the ICU records of only 7K patients. Also there are more codes per visit in MIMIC-III than CHOA, so the performance cannot straightforwardly be compared. I haven't tested Med2Vec on MIMIC-III. But MIMIC-III is a public dataset, so you could do evaluation yourself. It would be great if you could share the results as well.

  2. That's a valid question. I think it depends on what you want to achieve with concept embedding. I was interested in finding out the underlying relationship between different types of codes. For example, if you embed diagnosis codes and medication codes to the same latent space, you can easily find out which drugs are closely related to which diagnoses. Moreover, in Med2Vec, if you embed diagnosis/medication/procedure codes to the same latent space, you can study that latent space and find out how each dimension is related to various diagnosis/medication/procedure codes (see Table 5 in the paper).

Thanks,
Ed

from med2vec.

2g-XzenG avatar 2g-XzenG commented on June 17, 2024

Hi Ed, thanks for your quick respond.

I am working on a medical related project (predict "fraud" billing, "define" patient status and etc. ), finding a good representation of medical concept will be a great help for me, and this paper seems achieved state-of-art performance (right? ^_^), so I would like to bother you with some detail questions if you don't mind.

  1. For visit-level prediction, you used softmax to predict the neighbor visit, but there are multiple codes for each visit (so it is like a multi-label classification problem), is it better to use sigmoid instead of softmax?

  2. I ran with the mimic3 dataset, the training result seems very good (I evaluated by looking at some ICD codes' neighbors), but the training loss are very high even after 100 epochs (begin with 400 and reach 360 at 100 epochs). I think this is because of the softmax I mentioned above. Is this a problem? It seems in this way, the loss for code-level part doesn't matter very much (loss for this part will be small).

Thanks
Xianlong

from med2vec.

KirkHadley avatar KirkHadley commented on June 17, 2024

Gentlemen,
It's always great stumbling upon strangers on the internet discuss exactly the problem you work on... until you also realize that what you were so certain was a novel idea is already a thing.

Xianlong- Ed is absolutely right insofar as the boon you'll get out a representation strategy that's at least a nudge towards semantic "understanding."

Last thing, I'm currently running this in admittedly much uglier fashion than yall, but I do distribtution configured/implemented. Is that something for which you'd appreciate a pull request or would it really just be another you had to maintain?

from med2vec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.