Giter VIP home page Giter VIP logo

set-you-free's Introduction

Automated search across multiple databases and preprint servers to save your time during structured literature search and review.

Features

  • Coherent structure across multiple databases and preprint servers
  • Cross-reference search based on the references of your findings
  • Export to advanced structured literature engines such as Ryyan or cadima
  • Excludes most duplicates across databases
  • Manual selection based on publication details (see PRISMA)

Demo

example image

Requirements

  • python > 3.8
  • poetry

How to start

Navigate to the repo folder and start poetry shell

poetry shell

Install the dependencies

poetry install

Start the application via streamlit

streamlit run src/home.py

Authors

Christian Gerloff, Leon Lotter, Kashyap Maheshwari

How to cite

If you use Set You Free please cite (see Zenodo):

Gerloff C., Lotter L., & Maheshwari K. (2020). Set You Free: Automated Structured Literature Search.

set-you-free's People

Contributors

christiangerloff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

joneszd

set-you-free's Issues

Refine PRSIMA and add worldcloud and statics

The PRISMA chart seems to deviate from the usual graphs. We aim to adjust the graph and provide an SVG download option.
Further, the download buttons should be replaced (see #37)
A wordcloud should be added

Update previous search

A crucial requirement for scientific reviews is to be able to update a previous literature search. This feature allows users to load previous searches, refine search parameters, and integrate new information, ensuring the literature search remains up-to-date and consistent.

The "Update search" option in the literature search system is designed to allow users to refresh or modify an existing search with new parameters or filters. Here's a detailed description of this feature:

Update Search Option

Functionality:

  • Loading Previous Searches: Users can load a previously saved search by uploading the search file. The system supports files saved in a specific format (likely a pickle format in this context).

Modification of Search Parameters:

  • Retaining Initial Search Settings: Upon loading a previous search, the system retains initial settings like the query string, selected databases, and publication date range.

  • Adjustable Search Criteria: Users can modify the search criteria. This includes changing the date range, selecting different publication types, updating API keys for databases, and altering the maximum number of papers to be fetched per database.

  • Keyword Filtering: The system allows users to update or add new keywords for filtering the search results.

  • Citation Count Filter: Users can adjust the filter for selecting top-cited papers, changing the number to include more or fewer papers based on citation counts.

Additional Settings:

  • Duplication Sensitivity: The option to set the sensitivity for considering papers as duplicates. This setting helps in managing similar papers in the search results.

Search Execution:

  • API Key Checks: The system checks for the availability of API keys for selected databases. If a key is missing, the respective database is excluded from the search.

  • Search Processing: The updated search parameters are processed, and a new search is executed. This includes fetching papers from the selected databases within the specified date range and applying all set filters.

Result Management:

  • Updating Initial Search Results: The system updates the initial search results with new findings. This includes adding new papers and updating the details of previously found papers (e.g., updated citation counts).

  • Handling Missing Papers From Initial Search: If there are papers in the initial search that are not found in the updated search, the system alerts the user and provides options to manage these discrepancies.

  • Integration with Original Ratings: If the initial search had papers marked as selected or rejected based on certain criteria, these ratings are carried over to the updated search.

development packages

We plan to use to following packages & conventions:

conventions should be added to the readme

conventions

  • pep8
  • flake8
  • google doc string format

packages

  • poetry as package manager

small frontend adjustments for first integration

  • add default start date
  • bug: search string was not updated
  • start with simple table to display results (rayya)
  • replace download component
  • reset limit
  • consider keys
  • provide Search obj as json download
  • provide option to insert plan search string directly ,e.g. [brain-to-brain] AND [hyperscanning]
  • avoid index of csv @kashyapm94
  • provide an option to remove last added chars of string @kashyapm94
  • provide an option to clean the search string @kashyapm94
  • currently no keys are used: results_as_df (set default to true) have to be created earlyer or within seach part @kashyapm94
  • discuss: remove search class
  • reduce deps
  • use multi-page structure

remove search string builder

The usability of the search string creator is quite limited. We would like to provide sample strings instead.

integrate refine process

Findpapers provides some functionalities to refine the search results. We could aim to integrate this for streamlit too?

integrate findpapers in this repo

To avoid using findpapers as a dependency for this project, it has been decided to migrate & refactor the relevant codes in this repo

remove last query

For composed search strings, the last query can only be removed if the operator is not None.

Is related to #11

enable manual screening step

Add manual selection procedure after automated search. The user should be able to manually select the found publications according to his selection criteria.
The results should be exportable to json, ris, rayyan.

  • ensure session data
  • remove papers
  • enable export
  • adjust PRISMA

enrich, cross-ref., similarity and publication types options

We have recently added additional options for the search procedure. These should be integrated into the frontend. The similarity threshold should be specified in inverse form to allow interpretation as duplicate sensitivity.

  • enrich papers
  • cross-reference search
  • similarity threshold
  • publication types

test suits

We aim to add support for:

  • codecov
  • pytest

TypeError: search() got an unexpected keyword argument 'cross_reference_search'

Error in search.py file as the search() cannot recognize the argument 'cross_reference_search'. Any idea how to fix this?

TypeError: search() got an unexpected keyword argument 'cross_reference_search'
Traceback:
File "C:\Users\Anirudh128\anaconda3\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Users\Anirudh128\Desktop\GNi Internship\March to July 2024\Literature Review\set-you-free\src\pages\1_1️⃣_search.py", line 107, in
search = fp.search(None,

Screenshot 2024-03-28 142815

Suggestion to replace poetry with PDM

I would like to suggest the idea of replacing poetry with PDM.

While both are almost identical in the way they manage the dependencies, PDM provides the added benefit of using scripts, which is lacking in poetry.

Word and citation based filter

Keyword-based filtering may help exclude upfront before screening non-relevant papers, while citation-based sorting brings attention to the most impactful papers in a specific field.

1. Filtering Options Using Keywords

  • Sidebar Integration: Keyword filtering integrated into the Streamlit sidebar for user convenience.
  • Tag-Based Keyword Input: Users can input keywords using a tag-based system, adding and removing tags easily.
  • Flexible Entry: Multiple keywords can be added by typing and pressing enter, allowing for a comprehensive keyword list.
  • Filtering Logic: The system filters each paper by checking if keywords are present in the title, abstract, or keywords.
  • Exclusion Criteria: Papers containing specified keywords are excluded, ensuring search results are relevant and refined.

2. Selecting Top Cited Papers

  • Citation-Based Sorting: Papers are sorted based on citation counts, with the most cited papers listed first.
  • User-Defined Citation Threshold: Users set a threshold (Top 'N' cited papers) to filter papers based on citation impact.
  • Filter Application: Only the top 'N' cited papers are considered in the final results, focusing on influential studies.
  • Adjustable Limit: The citation threshold is user-adjustable, allowing for flexible inclusion of papers based on citations.

logs and states in streamlit

Aim 1. in findpapers we is log on INFO level, which paper (e.g., 3/45) is currently being fetched or enhanced with information from other sources. We could show this info to the user, because the waiting time can be quite long and the user has no idea how long it might take.

stqdm

Aim 2. Furthermore, logged exceptions (query issues etc.) which are shown in the console could be displayed

st.error
st.info
st.warning

Provide abstract for screening directly

It would be convenient if the option to provide automatically the next paper for review is available on the selection page. Additionally, it would be better to replace the buttons with radio buttons.

Hosting the app

We plan on hosting the app on Heroku. Add the relevant files & connect the repo to the heroku account.

download fulltext papers

--> will be part of find papers

  • new step after selection based on abstract (see navigation)
  • adj in find papers --> own method of class paper?

use ieee & scopus as databases based on api key availability

If the user has chosen all the databases by default but has no API key for IEEE & Scopus, can we remove these databases automatically before the search begins?

Currently, if all the databases are used, the search function gets the following parameter as input:

databases = ['ACM', 'arXiv', 'bioRxiv', 'IEEE', 'medRxiv', 'PubMed', 'Scopus']

By automatically removing the above-mentioned databases if their API keys are None, the parameter becomes:

databases = ['ACM', 'arXiv', 'bioRxiv', 'medRxiv', 'PubMed']

Store and load search results only

We aim to add the ability to save and load results using pickle. This feature allows for easy retrieval of previous work. For now without rating information. Additionally, we aim to replace the download button with HTML code to prevent unnecessary page refresh.

update README

add the README to include:

  • About the app
  • Getting started
  • Contribute (Before making the repo public)

PsyArxiv

The aim is to add PsyArxiv as an additional databases.
probably into findpapers fork

author name order

In our literature search (conducted 12.12.2021), references from most databases had author names in the wrong order:
Example: Santa N. Claus
Correct version: Claus, Santa N.

Imported in Zotero:

  • Correct:
    • Scopus: Claus, Santa N.
  • Incorrect:
    • medRxiv: Santa N. Claus
    • bioRxiv: Santa N. Claus
    • PubMed: Santa N. Claus
    • OpenCitations: Santa N. Claus

Imported in EndNote:

  • Correct:
    • Scopus: Claus, Santa N.
  • Incorrect:
    • medRxiv: Santa, N. Claus
    • bioRxiv: Santa, N. Claus
    • PubMed: Santa, N. Claus
    • OpenCitations: Santa, N. Claus

VIZ: Network visualization

It would be nice to be able to visualize the citations between the papers found. Such a search network could be realized via NetworkX but requires further API requests to capture the references of the individual publications. A list of DOIs as a reference list would be sufficient.

Furthermore, it would be great to add a "network without search" that basically describes the n1 neighbours of each paper. Consequently, this network would contain papers that are not in the search and should be highlighted.

Example

https://www.connectedpapers.com
image

streamlit scaffolding

scaffolding to provide basic structure including

  • main page

  • about

main page

add fields for

  • DB checkboxes

  • date picker (start & end)

  • star textbox for API keys (IEEE, scopus)

  • search string textbox

  • join button for search string (including:

    • AND
    • OR
    • (AND
    • (OR
    • )
    • None - to add no additional operator
  • Search button

  • slider for number of total results

  • format downloading (csv, bib)

  • viewer for pandas data table (results view)

integrating findpapers

The forked version of findpapers is to be integrated with the repo as a submodule. The reason: the original repo has a few bugs, which will be improved in the forked version.

refactor & integrate searchers from findpapers

the following searchers are to be refactored & integrated in this repo

  • acm searcher
  • arxiv searcher
  • biorxiv searcher
  • cross ref searcher
  • ieee searcher
  • medrxiv searcher
  • opencitations searcher
  • pubmed searcher
  • rxiv searcher
  • scopus searcher

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.