Giter VIP home page Giter VIP logo

shashank140195 / biomedical_qa_model Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 4.46 MB

built a retriever which can efficiently retrieve the relevant Pubmed abstracts for a given biomedical domain question. performed MIPS to retrieve the top 10 passages and measured the accuracy of those 10 retrieved passages. We observed the top 10 accuracy is 0.69 based on our test dataset.

Python 0.31% Jupyter Notebook 99.69%

biomedical_qa_model's Introduction

Biomedical_QA_Model (BioBert, BERT, FacebookBERT)

Link to the paper: https://arxiv.org/abs/2308.04028

Question answering (QA) is a task that answers factoid (what, when, where etc.) questions using a large collection of documents. It aims to provide precise answers in response to the user’s questions in natural language.

On the web, there is no single article that could provide all the possible answers available on the internet to the question of the problem asked by the user. Most search engines provide top results for an open-domain question but may also provide some biased articles which may or may not be directly relevant to the question asked by the user. These biased articles may be based on various criteria like advertised articles, paid articles to come on top results, the number of times a particular site is visited, and many more.

TRAINING It is essentially a metric learning issue to train the encoders such that the dot-product similarity becomes a decent ranking function for retrieval. By learning a greater embedding function, the purpose is to develop a vector space wherein relevant pairings of questions and passages have a shorter distance (i.e., higher similarity) versus irrelevant ones.

Hard Negative During inference, the retriever needs to identify positive (or relevant) passages for each question from a large collection containing millions of candidates. However, during training, the model is learned to estimate the probabilities of positive passages in a small candidate set for each question, due to the limited memory of a single GPU (or other device). To reduce such a discrepancy, we tried to design specific mechanisms for selecting a few hard negatives from the top-k retrieved candidates. To find the hard negatives, we sub sampled the 210K Pubmed articles randomly and ran the BM25 model to find the hard negatives on them. BM25 gives the score how relevant a passage is given a query. Top passage which do not contain the answer was taken as hard negative for the given query.

compared different types of pre-trained language models that are able to encode questions and contexts to high-dimensional vectors. Due to time limit, we only indexed 4421 Pubmed abstracts. We performed MIPS to retrieve the top 10 passages and measured the accuracy of those 10 retrieved passages. We observed the top 10 accuracy is 0.69 based on our test dataset.

biomedical_qa_model's People

Contributors

shashank140195 avatar

Stargazers

Kunlun Zhu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.