umls-similarity's Introduction

UMLS-Similarity

Estimate the similarity of medical concepts based on Unified Medical Language System (UMLS) and WordNet

Installation

First of all, please install Perl environment (Strawberry).

For UMLS use:

Install MySQL and MySQL Workbench and the MySQL Home folder should not have space in its path;
Download the UMLS and extract the subset;
Goto UMLS's META and NET folders and Load UMLS data into MySQL database with scripts;
Install necessary libs with 'cpanm' command with the flag --force like below:
```
cpanm UMLS::Interface --force

cpanm UMLS::Similarity --force
```
Errors may occur in the above process, just ignore them.
Please check if you have installed DBI, DBD::mysql; install them if not;
- Issue: mysql.xs.dll not found problem, please found more details in link.
- Solution: Copying C:\strawberry\c\bin\libmysql.dll_ to c:\strawberry\perl\vendor\lib\auto\mysql
Finished!

For WordNet use (skip it if not)

Download the WordNet-2.1 if you want to use WordNet Similarity (if not, please skip)
Set WNHome environment variables (if you need to use WordNet Similarity)
Install WordNet::QueryData via cpanm command in perl
Install WordNet::Similarity via cpanm command in perl
Finished!

Finally, install our Python package `umls-similrity` via pip

pip install umls-similarity

Available similarity measures

Leacock and Chodorow (1998) referred to as lch
Wu and Palmer (1994) referred to as wup
Zhong, et al. (2002) referred to as zhong
The basic path measure referred to as path
The undirected path measure referred to as upath
Rada, et. al. (1989) referred to as cdist
Nguyan and Al-Mubaid (2006) referred to as nam
Resnik (1996) referred to as res
Lin (1988) referred to as lin
Jiang and Conrath (1997) referred to as jcn
The vector measure referred to as vector
Pekar and Staab (2002) referred to as pks
Pirro and Euzenat (2010) referred to as faith
Maedche and Staab (2001) referred to as cmatch
Batet, et al (2011) referred to as batet
Sanchez, et al. (2012) referred to as sanchez

Let Codes Speak

Example Code 1: Estimate similarity between two medical concepts using UMLS

from umls_similarity.umls import UMLSSimilarity
import os

if __name__ == "__main__":
    # define MySQL information that stores UMLS data in your computer
    mysql_info = {}
    mysql_info["database"] = "umls"
    mysql_info["username"] = "root"
    mysql_info["password"] = "{I am not gonna tell you}"
    mysql_info["hostname"] = "localhost"

    # Perl bin's path which will be automatically detected by the lib, but you can also manually specify in its constructor
    # perl_bin_path = r"C:\Strawberry\perl\bin\perl"

    # create an instance
    umls_sim = UMLSSimilarity(mysql_info=mysql_info,
                              # perl_bin_path=''
                              )
    
    # show the names of all available measures so you can pass them into the following `measure` parameter
    measures=umls_sim.get_all_measures()
    print(measures)

    # Directly pass two CUIs into the function below:
    sims = umls_sim.similarity(cui1="C0017601", cui2="C0232197", measure="lch")
    print(sims[0])  # only one pair with two concepts
    
    # Or batch process many CUI pairs from a text file where each line is formatted like 'C0006949<>C0031507'
    current_path = os.path.dirname(os.path.realpath(__file__))
    sims = umls_sim.similarity_from_file(current_path + r"\cuis_umls_sim.txt", measure="lch")
    for sim in sims:
        print(sim)

Example Code 2: Estimate similarity between concept using WordNet 2.1

from umls_similarity.wordnet import WNSimilarity

if __name__ == "__main__":

    wn_root_path = r"C:\Program Files (x86)\WordNet\2.1"
    # perl_bin_path=r"C:\Strawberry\perl\bin\perl"

    var1 = "dog#n#1"
    var2 = "orange#n#1"

    wn_sim = WNSimilarity(wn_root_path=wn_root_path)

    sims = wn_sim.similarity(var1, var2)
    print(sims)

    for k, v in enumerate(sims):
        print(k, '\t', v, '\t', sims[v])

Credits

This project is a wrapper of the Perl library of UMLS::Similarity and UMLS::Interface.

Note: There are plenty of unexpected errors to occur during the installation of the perl library of UMLS::Similarity, possibly because I am not an expert about Perl and its library use.

License

The umls-similarity Python package is provided by Donghua Chen.

umls-similarity's People

Contributors

Stargazers

Watchers

umls-similarity's Issues

Indexing Issues with UMLS?

Hey there,

I'm having an issue that's causing the indexing process that's done in the umls-similarity.pl script to take an incredibly long time.

I've looked into the Perl source code and I understand that the UMLS::Similarity and UMLS::Interface modules need to first create a set of indexes in order to speed up subsequent path dependent semantic similarity measures. I've checked their mailing list, which has been unfortunately inactive since 2019, and I know that this process is expected to take several hours and maybe even days. However, when I ran it in my machine, the first time I tried it, I left it running over the weekend and after 48+ hours just about 1000 CUIs had been indexed which means that at that rate it would take over 17 YEARS to index all of the 3.31 million concepts within the UMLS... The machine I'm running it in should definitely not be having these issues, at least from a hardware perspective (96 cores, 488 GB RAM, 2TB SSD).

I've been experimenting with the MySQL my.ini config file and was able to reduce that down to 3 years. I used the parameters delineated here as a starting point: https://www.nlm.nih.gov/research/umls/implementation_resources/scripts/README_RRF_MySQL_Output_Stream.html. A good improvement but it is still unreasonably long... I'm aware of the --realtime flag which I've turned on, however, that can take a VERY long time to process some concept pairs and I'd rather go through the long process of setting up the indexes once in order to significantly speed up any subsequent calculations.

The routine that's causing these delays seems to be the subroutine "_initializeDepthFirstSearch" that can be found here: https://github.com/bmcinnes/UMLS-Interface/blob/master/lib/UMLS/Interface/PathFinder.pm

Do you have any recommendations/suggestions? Could you share with me what is the exact configuration you used for your UMLS build as well as your MySQL parameters?

Here's some system info in case it helps:
OS: Windows Server 2019 Datacenter (64-bit OS)
MySQL version : 8.1.0
UMLS Version: 2023AA
Perl Version: Strawberry Perl 5.38.0.1-64bit
UMLS Interface Version: 1.51

I'm including some screenshots of what I'm seeing in case it helps:

Here you can see that the routine successfully connected to MySQL database:

Here you can see that after a successful connection to the umls database in MySQL, there is a new schema called "umlsinterfaceindex"

Here are the contents of the file containing example CUI pairs that I'm looking to get a semantic similarity measure for:

Here you can see some of the outputs from running the routine with VERBOSE mode turned on:

Here you can see the contents of a table file that gets created alongside verbose mode that tracks the progress of the indexing process:

Please let me know if you have any thoughts on this.

Thanks!

Question about CuiFinder.pm

Hello

Thank you for publishing this library.
I am not familiar with Perl
at this time I download UMLS and load scripts into MySQL.
after this command (cpanm UMLS::Interface --force) I get the error :

I guess this error is related to the user and password of MySQL.
But I can not find the CuiFinder.pm for adding a fixed username and password in Line 721.
would you please help me with this issue?

Thank you very much

Recommend Projects

dhchenx / umls-similarity Goto Github PK

umls-similarity's Introduction

UMLS-Similarity

Installation

For UMLS use:

For WordNet use (skip it if not)

Finally, install our Python package `umls-similrity` via pip

Available similarity measures

Let Codes Speak

Credits

License

umls-similarity's People

Contributors

Stargazers

Watchers

umls-similarity's Issues

Indexing Issues with UMLS?

Question about CuiFinder.pm

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

dhchenx / umls-similarity Goto Github PK

umls-similarity's Introduction

UMLS-Similarity

Installation

For UMLS use:

For WordNet use (skip it if not)

Finally, install our Python package umls-similrity via pip

Available similarity measures

Let Codes Speak

Credits

License

umls-similarity's People

Contributors

Stargazers

Watchers

umls-similarity's Issues

Recommend Projects

Recommend Topics

Recommend Org

Finally, install our Python package `umls-similrity` via pip