christiangerloff / set-you-free Goto Github PK

Automated search across multiple databases and preprint servers to save your time during structured literature search and review.

Python 100.00%

literature-review literature-search streamlit systematic-review

set-you-free's Introduction

Automated search across multiple databases and preprint servers to save your time during structured literature search and review.

Features

Coherent structure across multiple databases and preprint servers
Cross-reference search based on the references of your findings
Export to advanced structured literature engines such as Ryyan or cadima
Excludes most duplicates across databases
Manual selection based on publication details (see PRISMA)

Demo

Requirements

python > 3.8
poetry

How to start

Navigate to the repo folder and start poetry shell

poetry shell

Install the dependencies

poetry install

Start the application via streamlit

streamlit run src/home.py

Authors

Christian Gerloff, Leon Lotter, Kashyap Maheshwari

How to cite

If you use Set You Free please cite (see Zenodo):

Gerloff C., Lotter L., & Maheshwari K. (2020). Set You Free: Automated Structured Literature Search.

set-you-free's People

Contributors

Stargazers

Watchers

Forkers

joneszd

set-you-free's Issues

Refine PRSIMA and add worldcloud and statics

The PRISMA chart seems to deviate from the usual graphs. We aim to adjust the graph and provide an SVG download option.
Further, the download buttons should be replaced (see #37)
A wordcloud should be added

Update previous search

A crucial requirement for scientific reviews is to be able to update a previous literature search. This feature allows users to load previous searches, refine search parameters, and integrate new information, ensuring the literature search remains up-to-date and consistent.

The "Update search" option in the literature search system is designed to allow users to refresh or modify an existing search with new parameters or filters. Here's a detailed description of this feature:

Update Search Option

Functionality:

Loading Previous Searches: Users can load a previously saved search by uploading the search file. The system supports files saved in a specific format (likely a pickle format in this context).

Modification of Search Parameters:

Retaining Initial Search Settings: Upon loading a previous search, the system retains initial settings like the query string, selected databases, and publication date range.
Adjustable Search Criteria: Users can modify the search criteria. This includes changing the date range, selecting different publication types, updating API keys for databases, and altering the maximum number of papers to be fetched per database.
Keyword Filtering: The system allows users to update or add new keywords for filtering the search results.
Citation Count Filter: Users can adjust the filter for selecting top-cited papers, changing the number to include more or fewer papers based on citation counts.

Additional Settings:

Duplication Sensitivity: The option to set the sensitivity for considering papers as duplicates. This setting helps in managing similar papers in the search results.

Search Execution:

API Key Checks: The system checks for the availability of API keys for selected databases. If a key is missing, the respective database is excluded from the search.
Search Processing: The updated search parameters are processed, and a new search is executed. This includes fetching papers from the selected databases within the specified date range and applying all set filters.

Result Management:

Updating Initial Search Results: The system updates the initial search results with new findings. This includes adding new papers and updating the details of previously found papers (e.g., updated citation counts).
Handling Missing Papers From Initial Search: If there are papers in the initial search that are not found in the updated search, the system alerts the user and provides options to manage these discrepancies.
Integration with Original Ratings: If the initial search had papers marked as selected or rejected based on certain criteria, these ratings are carried over to the updated search.

development packages

We plan to use to following packages & conventions:

conventions should be added to the readme

conventions

pep8
flake8
google doc string format

packages

poetry as package manager

small frontend adjustments for first integration

remove search string builder

The usability of the search string creator is quite limited. We would like to provide sample strings instead.

integrate refine process

Findpapers provides some functionalities to refine the search results. We could aim to integrate this for streamlit too?

integrate topic modelling

refactor & integrate the publication model from findpapers

integrate findpapers in this repo

To avoid using findpapers as a dependency for this project, it has been decided to migrate & refactor the relevant codes in this repo

refactor & integrate the search model from findpapers

add ris download

We aim to add the RIS export (ChristianGerloff/findpapers#10) with the file extension .ris to enable import into cadima

remove last query

For composed search strings, the last query can only be removed if the operator is not None.

Is related to #11

enable manual screening step

Add manual selection procedure after automated search. The user should be able to manually select the found publications according to his selection criteria.
The results should be exportable to json, ris, rayyan.

ensure session data
remove papers
enable export
adjust PRISMA

enrich, cross-ref., similarity and publication types options

We have recently added additional options for the search procedure. These should be integrated into the frontend. The similarity threshold should be specified in inverse form to allow interpretation as duplicate sensitivity.

enrich papers
cross-reference search
similarity threshold
publication types

refactor & integrate the paper model from findpapers

test suits

We aim to add support for:

codecov
pytest

TypeError: search() got an unexpected keyword argument 'cross_reference_search'

Error in search.py file as the search() cannot recognize the argument 'cross_reference_search'. Any idea how to fix this?

TypeError: search() got an unexpected keyword argument 'cross_reference_search'
Traceback:
File "C:\Users\Anirudh128\anaconda3\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Users\Anirudh128\Desktop\GNi Internship\March to July 2024\Literature Review\set-you-free\src\pages\1_1️⃣_search.py", line 107, in
search = fp.search(None,

basic stats and PRISMA visualization

empty results cause Rayyan_df to be empty and app to fail

When a search query in the app returns no results, the Rayyan_df becomes empty and the app may fails.

To prevent this issue, we could implement a check for empty results before attempting to create or copy the data frame.

Suggestion to replace poetry with PDM

I would like to suggest the idea of replacing poetry with PDM.

While both are almost identical in the way they manage the dependencies, PDM provides the added benefit of using scripts, which is lacking in poetry.

add basic search methods

add basic search methods via findpapers fork

default DBs
API key

Word and citation based filter

Keyword-based filtering may help exclude upfront before screening non-relevant papers, while citation-based sorting brings attention to the most impactful papers in a specific field.

1. Filtering Options Using Keywords

Sidebar Integration: Keyword filtering integrated into the Streamlit sidebar for user convenience.
Tag-Based Keyword Input: Users can input keywords using a tag-based system, adding and removing tags easily.
Flexible Entry: Multiple keywords can be added by typing and pressing enter, allowing for a comprehensive keyword list.
Filtering Logic: The system filters each paper by checking if keywords are present in the title, abstract, or keywords.
Exclusion Criteria: Papers containing specified keywords are excluded, ensuring search results are relevant and refined.

2. Selecting Top Cited Papers

Citation-Based Sorting: Papers are sorted based on citation counts, with the most cited papers listed first.
User-Defined Citation Threshold: Users set a threshold (Top 'N' cited papers) to filter papers based on citation impact.
Filter Application: Only the top 'N' cited papers are considered in the final results, focusing on influential studies.
Adjustable Limit: The citation threshold is user-adjustable, allowing for flexible inclusion of papers based on citations.

logs and states in streamlit

Aim 1. in findpapers we is log on INFO level, which paper (e.g., 3/45) is currently being fetched or enhanced with information from other sources. We could show this info to the user, because the waiting time can be quite long and the user has no idea how long it might take.

stqdm

Aim 2. Furthermore, logged exceptions (query issues etc.) which are shown in the console could be displayed

st.error
st.info
st.warning

refactor & integrate cli from findpapers

provide numeric input to set similarity threshold

A threshold-based duplicate check is provided (see ChristianGerloff/findpapers#8 ). The user should be able to adjust the threshold [0,1] in streamlit.

Provide abstract for screening directly

It would be convenient if the option to provide automatically the next paper for review is available on the selection page. Additionally, it would be better to replace the buttons with radio buttons.

Highlighting functionality for abstract screening

This feature aids users in quickly identifying keywords from the search string in the abstract of a paper, especially when screening longer abstracts. Use css to highlight keywords

Hosting the app

We plan on hosting the app on Heroku. Add the relevant files & connect the repo to the heroku account.

Create an account on heroku & connect it to this repo @ChristianGerloff
Adj readme with URL @ChristianGerloff
Add the relevant files needed for hosting the app on heroku @kashyapm94

download fulltext papers

--> will be part of find papers

new step after selection based on abstract (see navigation)
adj in find papers --> own method of class paper?

use ieee & scopus as databases based on api key availability

If the user has chosen all the databases by default but has no API key for IEEE & Scopus, can we remove these databases automatically before the search begins?

Currently, if all the databases are used, the search function gets the following parameter as input:

databases = ['ACM', 'arXiv', 'bioRxiv', 'IEEE', 'medRxiv', 'PubMed', 'Scopus']

By automatically removing the above-mentioned databases if their API keys are None, the parameter becomes:

databases = ['ACM', 'arXiv', 'bioRxiv', 'medRxiv', 'PubMed']

Store and load search results only

We aim to add the ability to save and load results using pickle. This feature allows for easy retrieval of previous work. For now without rating information. Additionally, we aim to replace the download button with HTML code to prevent unnecessary page refresh.

update README

add the README to include:

About the app
Getting started
Contribute (Before making the repo public)

PsyArxiv

The aim is to add PsyArxiv as an additional databases.
probably into findpapers fork

author name order

In our literature search (conducted 12.12.2021), references from most databases had author names in the wrong order:
Example: Santa N. Claus
Correct version: Claus, Santa N.

Imported in Zotero:

Correct:
- Scopus: Claus, Santa N.
Incorrect:
- medRxiv: Santa N. Claus
- bioRxiv: Santa N. Claus
- PubMed: Santa N. Claus
- OpenCitations: Santa N. Claus

Imported in EndNote:

Correct:
- Scopus: Claus, Santa N.
Incorrect:
- medRxiv: Santa, N. Claus
- bioRxiv: Santa, N. Claus
- PubMed: Santa, N. Claus
- OpenCitations: Santa, N. Claus

citation network

at least for scopus

VIZ: Network visualization

It would be nice to be able to visualize the citations between the papers found. Such a search network could be realized via NetworkX but requires further API requests to capture the references of the individual publications. A list of DOIs as a reference list would be sufficient.

Furthermore, it would be great to add a "network without search" that basically describes the n1 neighbours of each paper. Consequently, this network would contain papers that are not in the search and should be highlighted.

Example

https://www.connectedpapers.com

streamlit scaffolding

scaffolding to provide basic structure including

main page
about

main page

add fields for

DB checkboxes
date picker (start & end)
star textbox for API keys (IEEE, scopus)
search string textbox
join button for search string (including:
- AND
- OR
- (AND
- (OR
- )
- None - to add no additional operator
Search button
slider for number of total results
format downloading (csv, bib)
viewer for pandas data table (results view)