dmis-lab / bern Goto Github PK

View Code? Open in Web Editor NEW

173.0 173.0 44.0 1.01 MB

A neural named entity recognition and multi-type normalization tool for biomedical text mining

Home Page: https://bern.korea.ac.kr

License: BSD 2-Clause "Simplified" License

Python 98.04% Shell 1.96%

api bert biomedical-text-mining named-entity-linking named-entity-recognition overlapped-entities-resolving

bern's People

Contributors

Stargazers

Watchers

bern's Issues

The Certificate of BERN website expired

See below warning message dated 28 Nov 2021 10:31am HK Time

SSLError: HTTPSConnectionPool(host='bern.korea.ac.kr', port=443): Max retries exceeded with url: /plain (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)')))

tensorflow-gpu version

Hi,
very nice work indeed!
can you provide your tensorflow-gpu version? i encounter some problem when I try use bern.

I create a new environment named bern and pip install -r requirements.txt
when i run python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896
it raise Exception

~/bern » python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896
2022-09-05 16:32:04.232546: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-09-05 16:32:04.237711: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-05 16:32:04.237742: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "server.py", line 8, in <module>
    from biobert_ner.run_ner import BioBERT, FLAGS
  File "/data/wenyuhao/bern/biobert_ner/run_ner.py", line 24, in <module>
    flags = tf.flags
AttributeError: module 'tensorflow' has no attribute 'flags'

i think it beacuse of the version of tensorflow-gpu mismatched
and i install latest version

tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow-estimator==2.9.0
tensorflow-gpu==2.9.1
tensorflow-io-gcs-filesystem==0.26.0

i saw

but install error

~/bern » pip install tensorflow-gpu==1.13 
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.13 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3)
ERROR: No matching distribution found for tensorflow-gpu==1.13

/bern/logs/nohup_gnormplus.out: No such file or directory

When issuing the command:
nohup java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895 >> ~/bern/logs/nohup_gnormplus.out 2>&1 &

I get
-bash: /home/naren/bern/logs/nohup_gnormplus.out: No such file or directory error

FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-47.txt'

I have managed to install BERN on centos7, under Python 3.7. When i send some requests, some requests return normal results, but some requests will report an error.
The output tin the log file looks like the following:
Exception happened during processing of request from ('123.150.213.177', 41796) Traceback (most recent call last): File "/root/anaconda3/lib/python3.7/socketserver.py", line 650, in process_request_thread self.finish_request(request, client_address) File "/root/anaconda3/lib/python3.7/socketserver.py", line 360, in finish_request self.RequestHandlerClass(request, client_address, self) File "/root/anaconda3/lib/python3.7/socketserver.py", line 720, in __init__ self.handle() File "/root/anaconda3/lib/python3.7/http/server.py", line 426, in handle self.handle_one_request() File "/root/anaconda3/lib/python3.7/http/server.py", line 414, in handle_one_request method() File "server.py", line 320, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False) File "server.py", line 460, in tag_entities self.biobert_recognize(dict_list, is_raw_text, cur_thread_name) File "server.py", line 501, in biobert_recognize thread_id=cur_thread_name) File "/bern/biobert_ner/utils.py", line 15, in with_profiling ret = fn(*args, **kwargs) File "/bern/biobert_ner/run_ner.py", line 488, in recognize with open(token_path, 'r') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-47.txt'

Then i check input_gnormplus and out_gnormplus. The input_gnormplus are normal but some out_gnormplus haven't write text. Like this:
[root@instance-1 output]# cat 64f84c92f898abb1b9e8c596a2719024e4b60782bb6c39c0075894d2.PubTator 64f84c92f898abb1b9e8c596a2719024e4b60782bb6c39c0075894d2|t| 64f84c92f898abb1b9e8c596a2719024e4b60782bb6c39c0075894d2|a|- No text -
Would you be able to let me know what the issue might be?
Thank you!

What does the start and end tag in entities represent?

In the annotated PubMed data that you have shared, what do the 'start' and 'end' tag in 'entities' represent? Initially, I thought that they were character position but now I am not so sure. Can you please confirm.

can bern recognize cellLine entity?

like pubtator https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/pubtator?pmids=25416956&concepts=cellline

pubtator's NER accuracy for cellLine is not high, so I wonder if the accuracy of cell line entity recognition based on biobert may be higher

Site is down

Hello,
I just wanted to inform you that unfortunately the server seems to be down.

lib/PAM140-6.txt (No such file or directory)

java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896
Starting tmVar 2.0 Service at 172.17.0.7:18896
Reading POS tagger model from lib/taggers/english-left3words-distsim.tagger ... done [0.6 sec].
Exception in thread "main" java.io.FileNotFoundException: lib/PAM140-6.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at kr.ac.korea.dmis.tmVar2.(tmVar2.java:66)
at kr.ac.korea.dmis.tmVar2Server.(tmVar2Server.java:24)
at kr.ac.korea.dmis.tmVar2Server.main(tmVar2Server.java:72)

Don't have permission to push a new branch to make Pull Request

Hi, I'm trying to push my local branch "fix_post", in order to make a Pull Request, but I cannot do it due to permission errors. Would it be possible that you give me permission to push my branch? Then you can decide later if you want to accept the pull request or not.

HTTPConnectionPool(host='164.52.196.65', port=8888): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc1089f04a8>: Failed to establish a new connection: [Errno 111] Connection refused',))

@jhyuklee

HTTPConnectionPool(host='164.52.196.65', port=8888): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc1089f04a8>: Failed to establish a new connection: [Errno 111] Connection refused',))

This issues comes randomly at some time. Can you guide me solve this issue

Need clarification on GNormPlus_180921.jar , tmVar2Server.jar and other normalizer jars

On Analysing stop_normalizers.sh, a line is specified as
pid=`ps auxww | grep GNormPlus_180921.jar | grep -v grep | awk '{print $2}' | sort -r

Here,

Does GNormPlus_180921.jar refers GNormPlusServer.jar.
If they are same, which one should be referred.

About Sources of other jars:

Does GNormPlusServer.jar is also Open Source Tool, where can its source code be found
Similarly, Is tmVar2Server is Open Source tool, if so, where can its source code be found
Similarly ,about the normalizer gene_normalizer_19.jar
Similarly, about the normalizer disease_normalizer_19.jar

Can you please clarify on this.

How to run and use from google colab ?

Hello
This is actually a question not an issue
I have used the code you provided and edited it minimally so it will run on colab notebook. It seems to be working but the problem is that I don't know how to use the model in this case. can the server address be obtained via this command :
!curl ipecho.net/plain

After I ran the previous command I got the server number and then I ran this script in a different notebook:

import requests
import json
body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and the syndrome."})}
response = requests.post('http://<YOUR_SERVER_ADDRESS>:8888', data=body_data)
result_dict = response.json()

print(result_dict)

unfortunately it generates timeout error every time.
I would really appreciate your help or any advice. Thanks.

here is the link to the notebook.

Batch Processing for BERN server

The server.py does not allow multiple text inputs to be sent. Will this capability be introduced ? Is the underlying batch capability of the models being utilised while inference ?

How can I perform relation extraction?

Can I perform relation extration using bern? If so how ?

Using pre-trained models directly to predict on example text

Hello, thank you for this great and very useful project.

I am a machine learning student, but I am not used to work with the first version of tensorflow.
I am trying to use your pre-trained model for NER (I want to recognize diseaes, drugs and genes) instead of using your code to setup the API server.

So I downloaded your pre-trained model in the pretrainedBERT folder, but I don't know how to upload the file model.ckpt for each type of named entities and especially how to use it to predict on a text example.

Can you please enlighten me on this point?

Many thanks in advance for your help !

BERN Server killed while predicting for an article

I used curl "http://0.0.0.0:8888/?pmid=25226362&format=json&indent=true" > output.txt to get the annotations for a sample article through my own server. This results in the BERN server getting "killed":

INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running infer on CPU INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running infer on CPU INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running infer on CPU INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running infer on CPU WARNING:tensorflow:From /sbksvol/gaurav/bern/biobert_ner/modeling.py:648: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. 2020-02-08 05:26:09.639623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-02-08 05:26:09.639685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-02-08 05:26:09.639698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-02-08 05:26:09.639708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-02-08 05:26:09.639804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10798 MB memory) -> physical GPU (device: 0, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) WARNING:tensorflow:From /sbksvol/gaurav/tfenv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from ./biobert_ner/pretrainedBERT/disease/model.ckpt-45000 INFO:tensorflow:Graph was finalized. 2020-02-08 05:26:09.700057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-02-08 05:26:09.700112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-02-08 05:26:09.700125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-02-08 05:26:09.700135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-02-08 05:26:09.700233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10798 MB memory) -> physical GPU (device: 0, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) INFO:tensorflow:Restoring parameters from ./biobert_ner/pretrainedBERT/drug/model.ckpt-28020 INFO:tensorflow:Graph was finalized. 2020-02-08 05:26:09.872090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-02-08 05:26:09.872149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-02-08 05:26:09.872163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-02-08 05:26:09.872173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-02-08 05:26:09.872288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10798 MB memory) -> physical GPU (device: 0, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) INFO:tensorflow:Restoring parameters from ./biobert_ner/pretrainedBERT/gene/model.ckpt-6678 INFO:tensorflow:Graph was finalized. 2020-02-08 05:26:09.886157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-02-08 05:26:09.886214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-02-08 05:26:09.886226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-02-08 05:26:09.886236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-02-08 05:26:09.886353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10798 MB memory) -> physical GPU (device: 0, name: Tesla K40c, pci bus id: 0000:82:00.0, compute capability: 3.5) INFO:tensorflow:Restoring parameters from ./biobert_ner/pretrainedBERT/species/model.ckpt-90000 Killed

Would appreciate your help!

Incomplete abstracts + duplicated entries in 18.4 mil Pubmed dump

The compressed dump available on the website which contains the annotations of 18.4+ million Pubmed articles has multiple duplicated entries (e.g. pmid 29422500) and incomplete abstracts (e.g. pmid 29413363)

On an unrelated note, thanks a lot for making this project publicly available. Great work 😄

Only Abstracts are getting tagged

Hi, it seems that BERN can only detects entities in the abstracts of the PubMed articles. Can it not tag the full article?

JSONDecodeError for input of big size

JSONDecodeError is raised when accessing BERN through API. This error is particularly raised when the length of the input string is of greater size.
Attaching the error raised.

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-37-d5e92984db1a> in <module>
----> 1 output = query_raw(str_new)

<ipython-input-31-190dc9c1a306> in query_raw(text, url)
      1 def query_raw(text, url="https://bern.korea.ac.kr/plain"):
----> 2     return requests.post(url, data={'sample_text': text}).json()

~\Anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
    895                     # used.
    896                     pass
--> 897         return complexjson.loads(self.text, **kwargs)
    898 
    899     @property

~\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

~\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

~\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Problems running bern in Docker

I am trying to dockerize your project but I am running into following issue when trying one of your test URL's.

see here: https://github.com/amalic/bern-docker

When I call: http://localhost/?pmid=30429607&format=pubtator
I get following error

Exception happened during processing of request from ('172.17.0.1', 40600)
Traceback (most recent call last):
  File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 422, in handle
    self.handle_one_request()
  File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request
    method()
  File "server.py", line 196, in do_GET
    self.biobert_recognize(dict_list, is_raw_text, cur_thread_name)
  File "server.py", line 490, in biobert_recognize
    thread_id=cur_thread_name)
  File "/app/biobert_ner/utils.py", line 15, in with_profiling
    ret = fn(*args, **kwargs)
  File "/app/biobert_ner/run_ner.py", line 474, in recognize
    example, self.FLAGS.max_seq_length, req_id, "test")
  File "/app/biobert_ner/run_ner.py", line 846, in convert_single_example
    self.write_tokens(ntokens, mode, req_id)
  File "/app/biobert_ner/run_ner.py", line 854, in write_tokens
    with open(path, 'a') as wf:
FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-1.txt'

Starting the Docker container results in following console output

docker run -it --gpus all -p 80:8888 -v $PWD/externalData/GNormPlusJava/Dictionary/:/app/GNormPlusJava/Dictionary/ -v $PWD/externalData/tmVarJava/Database:/app/tmVarJava/Database -v $PWD/externalData/biobert_ner_models/:/app/bern/biobert_ner/ -v $PWD/externalData/data/:/app/normalization/data/ -v $PWD/externalData/resources/:/app/normalization/resources/ bern-docker
nohup: nohup: appending output to 'nohup.out'appending output to 'nohup.out'

root          6  0.0  0.0  37412  3432 pts/0    R+   03:14   0:00 java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895
root          7  0.0  0.0  37412  3524 pts/0    R+   03:14   0:00 java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896
root          9  0.0  0.0  24628  4472 pts/0    R+   03:14   0:00 python3 normalizers/chemical_normalizer.py
root         10  0.0  0.0  24628  4712 pts/0    R+   03:14   0:00 python3 normalizers/species_normalizer.py
root         11  0.0  0.0  24628  4400 pts/0    R+   03:14   0:00 python3 normalizers/mutation_normalizer.py
root         12  0.0  0.0  37412  3440 pts/0    R+   03:14   0:00 java -Xmx16G -jar resources/normalizers/disease/disease_normalizer_19.jar
root         13  0.0  0.0  37412  2376 pts/0    R+   03:14   0:00 java -Xmx20G -jar gnormplus-normalization_19.jar
[23/Apr/2020 03:14:17.105468] Starting..
2020-04-23 03:14:17.118139: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-23 03:14:17.265485: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-23 03:14:17.268081: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3d8ae20 executing computations on platform CUDA. Devices:
2020-04-23 03:14:17.268144: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-04-23 03:14:17.288584: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3393280000 Hz
2020-04-23 03:14:17.290987: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3f2c490 executing computations on platform Host. Devices:
2020-04-23 03:14:17.291048: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-04-23 03:14:17.291345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:43:00.0
totalMemory: 10.91GiB freeMemory: 7.44GiB
2020-04-23 03:14:17.291440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-04-23 03:14:17.292519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-23 03:14:17.292554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2020-04-23 03:14:17.292616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2020-04-23 03:14:17.292743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7239 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:43:00.0, compute capability: 6.1)
A GPU is available

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb16ec46840>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_keep_checkpoint_max': 5, '_model_dir': './biobert_ner/pretrainedBERT/gene', '_master': '', '_log_step_count_steps': None, '_cluster': None, '_train_distribute': None, '_num_ps_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb16e05b9e8>, '_experimental_distribute': None, '_is_chief': True, '_protocol': None, '_evaluation_master': '', '_task_type': 'worker', '_global_id_in_cluster': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_device_fn': None, '_tf_random_seed': None, '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_save_checkpoints_steps': 1000, '_save_summary_steps': 100}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb04042a378>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_keep_checkpoint_max': 5, '_model_dir': './biobert_ner/pretrainedBERT/disease', '_master': '', '_log_step_count_steps': None, '_cluster': None, '_train_distribute': None, '_num_ps_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb04043b9b0>, '_experimental_distribute': None, '_is_chief': True, '_protocol': None, '_evaluation_master': '', '_task_type': 'worker', '_global_id_in_cluster': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_device_fn': None, '_tf_random_seed': None, '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_save_checkpoints_steps': 1000, '_save_summary_steps': 100}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb0403c0488>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_keep_checkpoint_max': 5, '_model_dir': './biobert_ner/pretrainedBERT/drug', '_master': '', '_log_step_count_steps': None, '_cluster': None, '_train_distribute': None, '_num_ps_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb04043bac8>, '_experimental_distribute': None, '_is_chief': True, '_protocol': None, '_evaluation_master': '', '_task_type': 'worker', '_global_id_in_cluster': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_device_fn': None, '_tf_random_seed': None, '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_save_checkpoints_steps': 1000, '_save_summary_steps': 100}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb0403c0598>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_keep_checkpoint_max': 5, '_model_dir': './biobert_ner/pretrainedBERT/species', '_master': '', '_log_step_count_steps': None, '_cluster': None, '_train_distribute': None, '_num_ps_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb04043bbe0>, '_experimental_distribute': None, '_is_chief': True, '_protocol': None, '_evaluation_master': '', '_task_type': 'worker', '_global_id_in_cluster': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_device_fn': None, '_tf_random_seed': None, '_save_checkpoints_secs': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_save_checkpoints_steps': 1000, '_save_summary_steps': 100}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
BioBERT init_t 0.592 sec.
[23/Apr/2020 03:14:17.886905] Starting server at http://0.0.0.0:8888
gid2oid loaded 59849
goid2goid loaded 3468
gene meta #ids 42916, #ext_ids 42916
disease meta #ids 12122, #ext_ids 15040
chem meta #ids 179063, #ext_ids 179463
code2mirs size 9447
mirbase_id2mirna_id size 14945
mirna_id2accession size 6308
# of pathway regex 514

"setup.txt" file missing

While following the instruction
sed -i 's/= All/= 9606/g' setup.txt; echo "FocusSpecies: from All to 9606 (Human)"
sh Installation.sh. Where is the "setup.txt" file ?

Server offline

The bern server https://bern.korea.ac.kr/ is offline for the last couple of days. Any updates when it will be online again?

tmVar2sever running error

Hi I am facing an error after all the installation of GNormPlusJava and tmVarJava. ps: all server running normally
In the tmVar log, the error is:

Starting tmVar 2.0 Service at 192.168.0.38:18896
Reading POS tagger model from lib/taggers/english-left3words-distsim.tagger ... done [0.7 sec].
Loading tmVar : Processing Time:0.779sec
Ready
input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7-Thread-3.PubTator - (PubTator format) : Processing ...
Exception in thread "main" java.lang.IllegalArgumentException: Empty command
at java.base/java.lang.Runtime.exec(Runtime.java:408)
at java.base/java.lang.Runtime.exec(Runtime.java:311)
at tmVarlib.PostProcessing.toPostMEoutput(PostProcessing.java:1686)
at kr.ac.korea.dmis.tmVar2.tag(tmVar2.java:177)
at kr.ac.korea.dmis.tmVar2Server.run(tmVar2Server.java:42)
at kr.ac.korea.dmis.tmVar2Server.(tmVar2Server.java:30)
at kr.ac.korea.dmis.tmVar2Server.main(tmVar2Server.java:72)

It seems that there is an error in tmVar2Server. It is failed to transfer the text in input folder to output folder. After this error, the tmVar server is down.
Could you please give me some hint of solving this problem?
Thanks.

PubTator FileNotFound Error

Hi. I was running the sample provided in the repo to extract the entities out of a medical corpus. i got the below after setting up all the servers in the provided repo. Any help would be highly appreciated.

My System Specs :
[1] : Linux OS
[2] : 32 GB RAM
[3] : Running the code on GPU

ModuleNotFoundError: No module named 'xmltodict' when running `tail -F logs/nohup_BERN.out`

(env) ubuntu@ip-172-31-40-20:~/bern$ tail -F logs/nohup_BERN.out
Traceback (most recent call last):
File "server.py", line 8, in
from biobert_ner.run_ner import BioBERT, FLAGS
File "/home/ubuntu/bern/biobert_ner/run_ner.py", line 21, in
from convert import preprocess
File "/home/ubuntu/bern/convert.py", line 6, in
from download import query_pubtator2biocxml
File "/home/ubuntu/bern/download.py", line 11, in
import xmltodict
ModuleNotFoundError: No module named 'xmltodict'

There is a bug when bern recognize the 'Excerpt' section in some papers, shown below.

There are some articles with 'Excerpt' section, instead of 'Abstract' section.

For example, these pimids, ['29787038', '30844201', '31643199', '31643392', '31643562', '31855378', '31869126'].

Error output

BERN returned the html text on this kind of pmid, as [{"project":"BERN","sourcedb":"PubMed","sourceid":"31869126","text":"error: tmtool: PubTatorBioC.key","denotations":[],"timestamp":"Sat Apr 11 04:43:19 +0000 2020"}]
Please have a notice. Besides, BERN is such a nice tool and good job!

Not working with tensorflow 2.0

In the requirements.txt no version is specified for tensorflow (or for any other library, which is very confusing), therefore your Python scripts using Tensorflow don't work. Could you please tell me which package versions you used for setting up the server??

vibrio cholerae getting classified with all possibilties

In the attached image, vibrio cholerae getting classified with all possibilties.
1st input is getting classifies as species, where 2nd is classified as nothing and 3rd gets classifies as disease

@jhyuklee @donghyeonk @wonjininfo @seanswyi Can you please explain why is that. Not only for this word, there are many similar examples like this

Could I use other port of Disease_normalizer_19.jar?

Hi.
I use bern for finding entities in pubmed abstracts.
I have two GPUS, so run two bern server on one system.
Chemical, species, mutation normalizer is made with python, so I could edit these port numbers.
But Disease and gene normalizer server is made with java, and compiled, so I can't edit this port number.
How can I use other port for disease, gene normalizers?

Kubernetes / Docker setup

Is it possible to make this dockerized so that we could think about k8s scalable deployments ?

PubTator file not found error

I have managed to install BERN on my Linux 18 machine, under Python 3.6 and everything seems fine upon starting the server. The output tin the log file looks like the following:

nohup: ignoring input
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[05/Nov/2019 16:35:28.802904] Starting..
2019-11-05 16:35:28.835150: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
A GPU is NOT available
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd33f8c8c8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/gene', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f358>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd2a92d6a8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/disease', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f4e0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd2a69c7b8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/drug', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f668>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fbd2a69c8c8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': './biobert_ner/pretrainedBERT/species', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbd2a69f7f0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
BioBERT init_t 3.838 sec.
[05/Nov/2019 16:35:32.679049] Starting server at http://0.0.0.0:8888
gid2oid loaded 59849
gene meta #ids 42916, #ext_ids 42916
disease meta #ids 12122, #ext_ids 15040
chem meta #ids 178395, #ext_ids 178795

Then, when I proceed to test the example script which is mentioned in the README file:

import requests
import json
body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
response = requests.post('http://127.0.01:8888', data=body_data)
result_dict = response.json()
print(result_dict)

It complains about the missing PubTator file in the output folder:

127.0.0.1 - - [05/Nov/2019 16:41:05] "POST / HTTP/1.1" 200 -
[05/Nov/2019 16:41:05.504282] [Thread-1] text_hash: 3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7
[05/Nov/2019 16:41:06.330533] [Thread-1] GNormPlus 0.826 sec
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 51812)
Traceback (most recent call last):
  File "/usr/lib/python3.6/shutil.py", line 550, in move
    os.rename(src, real_dst)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator' -> '/home/ubuntu/bern/tmVarJava/input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/socketserver.py", line 654, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.6/socketserver.py", line 364, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.6/socketserver.py", line 724, in __init__
    self.handle()
  File "/usr/lib/python3.6/http/server.py", line 418, in handle
    self.handle_one_request()
  File "/usr/lib/python3.6/http/server.py", line 406, in handle_one_request
    method()
  File "server.py", line 317, in do_POST
    text, cur_thread_name, is_raw_text=True, reuse=False)
  File "server.py", line 420, in tag_entities
    shutil.move(output_gnormplus, input_tmvar2)
  File "/usr/lib/python3.6/shutil.py", line 564, in move
    copy_function(src, real_dst)
  File "/usr/lib/python3.6/shutil.py", line 263, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

Would you be able to let me know what the issue might be?
Thank you!

('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

This error comes
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) most of the time
when i hit bern api

Predictions overlapping

Helloy,thank you for this great project.I also working at my own ner for bio and the main issue now is predictions overlapping.If you can please describe the process how do you solve this problem

FILE DOWNLOAD ISSUE

resources_v1.1.b.tar.gz
File download is really slow in general even tho internet is not an issue.

KeyError: 'disease'

Hi. I was able to set up the repo successfully and also resolved the PubTator File not found error from #4 . Now I am getting the below error.

If I scroll above a little in the terminal. I am getting this.

Any help would be highly appreciated.

Switching off GNormPlus and tmVar; customizing BERN

Is there an easy way to turn off or not install at all some of the modules used in NER in BERN? For example, I'm only interested in drug/chemical discovery and I want to skip using GNormPlus and tmVar. Thanks!

NER extraction of text seems not to be working

Sample program, based on your README.MD

import requests
import json
body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
response = requests.post('http://localhost/', data=body_data)
print(response)
print("content: ", response.content)
result_dict = response.json()
print(result_dict)

Output

<Response [200]>
content:  b''
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    result_dict = response.json()
  File "/home/alex/.local/lib/python3.6/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

A curl example would be highly appreciated.

{'status': 'fail', 'message': 'JSON PARSE ERROR'}

I'm getting the error {'status': 'fail', 'message': 'JSON PARSE ERROR'} when I run query_raw("Input_Text)

Guide to fine-tune!

Dear authors!

Your work with Bern is amazing. I am still quite new to the domain of pre-trained/fine-tuned models! Is there an approach/guide/manual to fine-tune the pre-trained model on an (unstructured) dataset of our own? Thanks in advance for your efforts!

Running BERN on Windows: requests.exceptions.ConnectionError

Hello,

Firstly, thank you so much for this project. I am really looking forward to using it.

I have been trying to run the sample code in the README on Windows, but I get the following connection error:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8888): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000020CE329CE88>: Failed to establish a new connection: [WinError 10049] The requested address is not valid in its context'))

The server appears to start running successfully when I use the following command:

nohup python -u server.py --port 8888 --gnormplus_home GNormPlusJava/GNormPlusJava --gnormplus_port 18895 --tmvar2_home tmVarJava --tmvar2_port 188956 >> logs/nohup_BERN.out 2>&1 &

Log output:
tail -F logs/nohup_BERN.out

INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
BioBERT init_t 1.040 sec.
[19/Apr/2020 22:26:03.602298] Starting server at http://0.0.0.0:8888
gid2oid loaded 59849
goid2goid loaded 3468
gene meta #ids 42916, #ext_ids 42916
disease meta #ids 12122, #ext_ids 15040
chem meta #ids 179063, #ext_ids 179463
code2mirs size 9447
mirbase_id2mirna_id size 14945
mirna_id2accession size 6308
# of pathway regex 514

Would you have any suggestions to get this running? I followed all of the steps in the README but for Windows. Thank you!

Usage of crfpp-0.58.tar.gz is in GNormplus and Tmvar

Can we reuse crfpp-0.58.tar.gz in GNormplus to use in Tmvar,,

I meant when make install, it wil be available already.

Is there any specific reason to do the CRF again.

POST request to local server

Hi, this is an amazing tool. The normalisation function is critical and very useful for bio ner. I setup BERN in a local server, however while i am able to submit get requests using PMID, the POST request using raw text is successful, but does not return any annotated entities. Could you help provide examples on submitting POST request?

This is the code i used

body_data = {'param': {"text":'CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome.'}}

response = requests.post( 'http://0.0.0.0:8888', data = body_data)

Not getting classified

@jhyuklee @donghyeonk @wonjininfo @seanswyi

Can you help me to understand why Cholera is not classified as disease in the first case, whereas in the second case it gets classified.

Disease normalizer isn't being given entire input text, only the keyword

On scanning through the code I can see that you don't seem to be giving the sieve-based normalizer the full abstract as input, only the keyword (see here).

In that case how does it do abbreviation detection? Or is that being skipped?

Abbreviation detection is responsible for >5% of the disease normalizer's accuracy so would be great if you could clarify 😄

On another note, thanks a lot for this repo, has been very useful.

Is database of BERN IDs to other ontologies mappings available?

I would like to connect BERN IDs to other ontologies.

https://bern.korea.ac.kr/ not working

@donghyeonk Bern link https://bern.korea.ac.kr/ is dead.

Website seems to be down

@jhyuklee

The website seems to be down from couple of days.

I have been getting following error

HTTPSConnectionPool(host='bern.korea.ac.kr', port=443): Max retries exceeded with url: /plain (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001BE255B4550>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Is it going to be down for a while? if so, is there any other easy way to user BERN for NER without having to clone the whole repo?

Ending index is not correct in JSON for species category

@jhyuklee @donghyeonk @wonjininfo @seanswyi
The starting index is correct, but the ending index is not correct for species category when getting the output from api hit

After the getting the start and end index, if I map it to word, the output is throw as below
sample output: ["woma", "infan", "patien", "patien", "patien"]

The probelm is only in Species category, other categories are working correctly

[{'error': 'NER crash'}]

@jhyuklee @donghyeonk @wonjininfo @seanswyi

I have implemented in my local system.

I often get this issue.
[03/Dec/2020 16:14:14.692631] [Thread-424] [{'error': 'NER crash'}]

Can you help me with this error.
What can be the possible things that is triggering this error

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The issue is coming in this line
return requests.post(url, data={'sample_text': text}).json()

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

dmis-lab / bern Goto Github PK

bern's People

Contributors

Stargazers

Watchers

Forkers

bern's Issues

There are some articles with 'Excerpt' section, instead of 'Abstract' section.

Error output

Recommend Projects

Recommend Topics

Recommend Org