Comments (3)
Hey, we assume that mention detection was already done and SapBERT only handles the entity normalisation part. If you'd like to input a full sentence then first another entity detection is needed to identify the entities.
from sapbert.
Thanks for the fast and concise answer! I already thought it could be like that.
from sapbert.
Just a follow up question: What mention-detection tool would you recommend?
I found that sciSpacy reports F1 as follows:
Model | Mentions (F1) |
---|---|
en_core_sci_sm |
68.00 |
en_core_sci_md |
68.95 |
en_core_sci_lg |
68.67 |
en_core_sci_scibert |
67.90 |
from this reference. |
I was surprised these scores are so low, so I tested two sentences:
- "The Calvin–Benson–Bassham (CBB) cycle is presumably evolved for optimal synthesis of C3 sugars, but not for the production of C2 metabolite acetyl-CoA. The carbon loss in producing acetyl-CoA from decarboxylation of C3 sugar limits the maximum carbon yield of photosynthesis."
- it missed 1: "maximum carbon yield"
- The bigger problem is that "Calvin-Benson-Bassham (CBB)" pathway was recognized as three different entities
- "Here we design a synthetic malyl-CoA-glycerate (MCG) pathway to augment the CBB cycle for efficient acetyl-CoA synthesis."
- Missed "pathway", here is the output: "(synthetic malyl-CoA-glycerate, MCG, CBB cycle, acetyl-CoA synthesis)"
Llama
Next, I considered "meta-llama/Llama-2-7b-hf", put it always produced text unrelated to the instructions.
Next I considered "meta-llama/Llama-2-7b-chat-hf" , which gave me:
- Calvin
- Benson
- Bassham (CBB)
- carbon
- CoA
- decarboxylation
- C3
- sugar
- photosynthesis
Conclusion
So I'm still interested in the question of the first sentence: Do you have a recommenation for mention-detection ?
from sapbert.
Related Issues (10)
- Entity Span --> CUI HOT 2
- 1. ko_1k_test_query_with_context.txt, 2. finetuning, and 3. inference api
- The evaluation data download link doesn't work
- Tokenizer HOT 1
- MedMentions Dictionary file created HOT 1
- Details on fine-tuning data HOT 4
- dictionary for custom dataset HOT 1
- Statistical significance test HOT 1
- requirements.txt doesn't help resolving requirements. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sapbert.