tuh8888 / dep2rel Goto Github PK
View Code? Open in Web Editor NEWRelation extraction using word embeddings and dependecy paths
License: GNU General Public License v3.0
Relation extraction using word embeddings and dependecy paths
License: GNU General Public License v3.0
There are a variety of word embedding methods in addition to Word2Vec.
Needed Scripts:
Inputs:
Needed Scripts:
On a few test sets of known relations, try different combinations of parameters, and evaluate the performance. Determine the best range of parameter values.
Inputs:
Action:
Output:
BioCreative 2017 had a subtask (Track5) involving extracting chemical-protein interactions from text.
Needed Scripts: Write a script that does the following:
Inputs:
Action:
Output: what is the desired output?
Wiki Page:
Suggestions:
Originally posted by @tuh8888 in #14 (comment)
Describe:
Needed Scripts:
Determines good seeds.
Possible approaches:
Inputs:
Task:
Invite initial collaborators.
Make sure Abra-Collaboratory framework properly implemented
Wiki Page:
Suggestions:
Originally posted by @tuh8888 in #24 (comment)
Currently, I am using the SyntaxNet architecture trained on CRAFT to parse dependency trees. There are some alternative automatic approaches I could try.
Originally posted by @tuh8888 in #14 (comment)
Pub or Presentation: Relation extraction
Submission Site: https://sites.google.com/view/icbo2019
Description:
Due Date: May 1, 2019; April 15, 2019
Pub or Presentation: Presentation for NLM Informatics Training Conference 2019
Description:
Due Date: June 24
Inputs:
Action:
Parses sentences to get relations
Using naive method (find occurences of seed entity pairs)
Using less naive method (find occurences of sentences containing seed entity pairs that are similar to seed sentences)
Using bootstrap method
Evaluates performance using relations annotated by Mike Bada
Demonstrates usage of key functions
Output: what is the desired output?
Check out these other two language models suggested by Negacy
Originally posted by @tuh8888 in #14 (comment)
Needed Scripts:
Create visualizations of the clustering to help determine
Inputs:
Action:
Output:
Meeting Date: June, 10th, 2019
Topic: NLM Training Conference presentation
Attendees: @LEHunter
Problem
When one of the similarity thresholds is relaxed, the bootstrapping process will cascade and grab every possible sentence. This seems to happen when long dependency chains are matched.
These long dependency chains are usually matched because they contain repetitive phrases
e.g. Chemical X inhibits protein A and chemical Y inhibits protein B -> Checmical X inhibits protein B โ
According to Mike Bada, usually relations are only implied between entities that are within one or two steps along a dependency path. So it might make sense to enforce a limit on the distance we search along the dependency path.
Possible Solutions
Pub or Presentation: Relation Extraction Work for Dep2Rel with CRAFT and BioCreative results.
Submission Site:
Due Date:
Meeting Date: May, 9th, 2019
Topic: Update on generating BioCreative Results
Attendees: @LEHunter
Proposed Agenda:
Wiki Page:
Meeting Date: June, 12th, 2019
Topic: Show results
Attendees: @LEHunter
Currently the algorithm has two filtering steps for the sentences to become seeds.
I think the first step needs to be removed because it violates one of the goals of the algorithm which is to develop context patterns and use them to refine themselves.
I am proposing to either remove this step, or replace it with a step to filter by concept type of the entities in the current pattern.
I noticed that I was able to get an F1 score of 0.2 over BioCreative VI.4 with only 5 seeds. Because each seed got its own cluster, the matching stage only involved finding similarity to one pattern that had no "dilution" due to being combined with other patterns.
This makes be think that the summing of the context vectors that make up a pattern is somehow diluting the underlying sentences too much so they don't match as well to the next round of samples.
Maybe I should use the context vector for each sentence in the patterns on its own while looking for matches. The original BREDS did this and just checked if the good-bad ratio was greater than 1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.