Giter VIP home page Giter VIP logo

trec-doc-2-doc-relevance's People

Contributors

ljgarcia avatar rohitharavinder avatar talhamohsin avatar timfe avatar two-kay avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

trec-doc-2-doc-relevance's Issues

Double check doc-2-doc assessment dataset

Please have a look to the description of the dataset for doc-2-doc assessment. For instance, for topic 100 I was expecting only 7 references documents (see https://github.com/zbmed-semtec/TREC-doc-2-doc-relevance/blob/main/code/process.md) but there are 22 (i.e., all of the topic relevant documents). Also, each reference document should have 20 documents to be assessed but that is not always the case. Thanks.

Subtasks in this issue

  • Chose the right number of reference documents per topic
  • Each reference article should have exactly 20 documents to be assessed (randomly select 6 to 9 from the topic relevant ones, randomly select 6 to 9 from the topic partially relevant ones, and the rest --to complete 20, from the topic non-relevant ones)
  • recreate the database used by the app

Dataset creation

Build Dataset for assessment backend

  1. Build topic_reference_pmid.tsv
    - Two column tsv with Topic No. and PMID
    - includes randomly selected ref candidates - Amount in TREC Topics List
  2. Build topic_reference_and_documents.tsv
    - for each reference article in topic_reference_pmid.tsv 20 articles for assessment are randomly selected (6-9 def relevant, 6-9 partially relevant, rest irrelevant)
    - order by the first column, then the second and then the third.

Make sure database is accessible outside the container

Once the app moves to a container, how can we get the data from the database corresponding to all users?
Options:

  • Keep the database outside the container and start the container in such a way that it access the 'external' database
  • Document a process that allows to access the instance while running to copy the database. This might be the simplest option, it might required to stop the app so the databased is not locked although a copy is a read-only action.

Reference articles and to-be-assessed articles duplicate error

  • Fix the algorithm and regenerate candidates only for those topics that Olga has not touched
  • For those topics that Olga has already processed (even if incomplete), please manually remove the candidate article which is the same as the reference article. Choose the replacement randomly within the topic.
  • Update the database with the manual changes from 2 and the new sets from 1

Display Progress Status

  • Display arrows for next/prev articles
  • Progressbar at assessment article
  • Display progress for reference article (needs database)
  • Display progress for overall topic (needs database upgrade

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.