Giter VIP home page Giter VIP logo

pipeline-resources's Introduction

PHA4GE logo Bioinformatics Pipelines and Visualization Working Group Resources

Overview

This repository hosts PHA4GE-developed guidance documents and resources that address common challenges regarding the integration of bioinformatics solutions for the global public health community.

Contents

Rationale

As public health bioinformatic workflows become increasingly complicated, efforts are needed to promote sensible standardization, portability and reproducibility of assays and workflows across a range of environments, contexts and resource conditions.

SARS-CoV-2 Resources

The PHA4GE Pipelines and Visualization Working Group has created this document to highlight critical open-source/accesses resources to aid in the understanding and further analysis of the Omicron variant.

SARS-CoV-2 recombinants have garnered the attention of the public health community largely due to the unknown clinical and epidemiological implications. This uncertainty emphasizes the need to detect and characterize recombinant SARS-CoV-2 genomes, but the ability to do so rapidly and systematically is not without challenges. Often, recombinant genomes receive an β€œUnassigned” pango lineage, a non-recombinant pango lineage, or the incorrect recombinant lineage assignment. Additionally, determining the site of recombination within the genome can be difficult for those without extensive SARS-CoV-2 bioinformatics experience.

The PHA4GE Pipelines and Visualization Working Group has created this document as an attempt to highlight critical sources of information and open-source/access resources to aid in the analysis and surveillance of potential recombinant specimens.

In an attempt to assist this integration process, the bioinformatics pipeline and visualization working group of the Public Health Alliance for Genomic Epidemiology (PHA4GE) has drafted this living document to help define the major bioinformatics challenges for SC2 genomic analysis and suggest various open-source and freely available bioinformatics resources to address them.

The US Centers for Disease Control and Prevention's Technical Outreach and Assistance for States Team (TOAST) developed benchmark datasets for SARS-CoV-2 sequencing which are designed to help users at varying stages of building sequencing capacity. Rather than duplicating these efforts, the PHA4GE bioinformatics pipeline and visualization working group will be working alongside TOAST members to maintain and improve upon the currently-available validation datasets.

In an attempt to assist with quality control (QC) measures, the bioinformatics pipeline and visualization working group of the Public Health Alliance for Genomic Epidemiology (PHA4GE) has drafted this living document to help define the QC challenges for SC2 genomic analysis and suggest a QC systems solutions to address them.

Informing Public Health Action

{In development}

Mpox Resources

In an attempt to assist this integration process, the bioinformatics pipeline and visualization working group of the Public Health Alliance for Genomic Epidemiology (PHA4GE) has drafted this living document to help define the major bioinformatics challenges for Mpox genomic analysis and suggest various open-source and freely available bioinformatics resources to address them.

HIV Resources

Understanding the HIV genome, evolutionary dynamics, and subtypes are essential for designing bioinformatic processes. Here, we present a set of resources to help springboard researchers into the world of HIV bioinformatics.

Bioinformatics Development

In an attempt to assist software developers, the Bioinformatics Pipelines and Visualization Working Group of the Public Health Alliance for Genomic Epidemiology (PHA4GE) has proposed a set of best practices, tailored specifically for public health bioinformatics pipelines. These best practices aim to provide a guidance framework for development, testing, maintenance of bioinformatics software. By adhering closely to these best practices, developers can enhance the quality, reliability and sustainability of their software, facilitating impact in public health research.

Contributing

Contributions to the documents are more than welcome. To propose a change, edit the source files and open a pull-request with the proposed changes.

If you're interested in participating in further discussions please free to join the Working Group.

pipeline-resources's People

Contributors

bede avatar bwlang avatar cinnetcrash avatar dmaccannell avatar dpark01 avatar emily-smith1 avatar fmaguire avatar frankambrosio3 avatar jamietyger avatar jkspinler avatar kapsakcj avatar kevinlibuit avatar marcniebel avatar pvanheus avatar rmcolq avatar rpetit3 avatar svarona avatar tralynca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipeline-resources's Issues

TB Reference Compilation

In the TB Guidance document, references are currently listed in comments. We would like to compile them into a ref section and add ref indicators in the body of the doc.

Definition of how to meet and verify standards

The pipeline standards descriptions are fantastic, but could be improved by explicitly stating exactly how the standard can be met and how it can be verified that the standard has in fact been met.

Here I propose for each standard including 2 additional statements:
"To meet this standard..."
"To verify this standard..."

This will clear up ambiguities for both developers and reviewers. The goal is to define what pieces of information must be available to reviewers to receive verification of standards being met.

Include MicrobeTrace as a bioinformatics solution for Mpox

Please include MicrobeTrace (https://microbetrace.cdc.gov/MicrobeTrace/) as a bioinformatics solution for Mpox virus (mpxv) genomic analysis. Our MicrobeTrace team at CDC recommends that users load a distance matrix or distance-based edge list in CSV format. MicrobeTrace will perform better with edge lists when compared to loading large genome sequence alignments in fasta format. We're also implementing code to generate distance-based edge lists from the Nextstrain tree output.

Contact our team for a consultation or technical support at [email protected].

HIV Guidance Document - Workflows for HIV analysis

Many tools exist for HIV analysis. Recently, the advent of workflow managers has lead to a large number of analysis workflows comprised of numerous tools. a description of these workflows, and the internal components would be helpful for newer HIV bioinformatics scientists.

HIV Guidance Doc - Markdown formatting of Case Studies Section

πŸ“Œ Explain the Request

Reformat the case studies section of the HIV guidance document to be more reader-friendly. Add section breaks and titles to each case study.

πŸ“š Context

Currently the case studies appear to all be part of the same section and they are not easily distinguishable by the reader.

πŸ“ˆ Desired Contribution

A fork of the main PHA4GE repo including Markdown changes that reformat this section into a more readable version.

ℹ️ Additional Information

No new section need to be added, just reformatting.

Issue and Pull Request Submission Guide

Add a guide to submit issues and pull requests as part of the document maintenance framework.

Issues can be submitted in order to request changes or additions to the documents and should be tagged with the documents they are referring to.
Pull Requests will allow for change tracking and collaborative review of any changes implemented into any document. The PRs should also be tagged with the issue they are "fixing" and they should reference the document under modification in the PR title.

This guide should be added to the README of the pipeline-solutions main page.

Issue and PR templates would be a nice addition.

Observations about Proposed Standards for Public Health Bioinformatics Software

My group and I have been reviewing the Proposed Standards for Public Health Bioinformatics Software document from the perspective of a team dedicated to the development of analysis pipelines, and we have some observations, from our humble opinion and experience, about the document, that we hope could help in its development.

We believe it needs to be clearly defined whether these are minimum requirements, best practices, or guidelines, something we think its already under discussion in the meetings. We also think it should be clarified whether these are standards for pipelines or software, as some points may not apply to pipelines, and reversal.

We also believe that, in addition to indicating how this is going to be evaluated as Frank is working in his PR (#32), it could be useful to provide another section per point with resources, such as links or documents, that can assist developers with each of the standards.

Next, I will describe our observations on some of the points:

  • Version Control: Links like https://semver.org/spec/v2.0.0.html and https://keepachangelog.com/en/1.0.0/ could be added regarding the CHANGELOG.
  • Commitment to Maintain: Perhaps it could be replaced with a section like "Maintenance Capability," as even if the commitment to maintain exists, it may not be fulfilled due to external factors or bad faith. Perhaps it is sufficient with the description/demonstration of how it will be maintained, having it considered in the README and contemplated in the pipeline's background (research project, master project, community...).
  • Documentation for Local Installation and/or Remote Access (e.g. Web Server or Galaxy/Terra Workflow): Some recommendations like pip, conda, etc., could be included for installation.
  • Software Performance: We do not believe that documenting this should be a minimum for a pipeline. Besides, it is relatively difficult for a small group of developers working alone, and it seems more like obtaining feedback from other groups or users.
  • Common File Formats: Is this going to be reviewed with a list of formats? A list would need to be provided in the standard's definition but alse kept up to date over time, which seems complicated but also relevant.
  • Software Security and Vulnerabilities: We do not see this as necessary for a pipeline. Perhaps for a website or database, but even then, in many cases, it is carried out by the security department of the institution externally to the code itself.

Here is a just proposal of reorganization to reduce the list to 10, which I believe was one of the next objectives:

  1. Publicly-Accessible Repository
  2. Version Control
  3. Pipeline Documentation
    • Open-Source License
    • Contribution, Authorship, and Verified Point of Contact
    • Maintenance Capability
    • Conflict of Interest Statement
  4. Pipeline Guidelines
    • Documentation for Local Installation and/or Remote Access
    • Software Functionality
    • Statement of Need with Respect to Public Health Pathogen Genomics
    • Example Usage
    • Container/Packaged Software
  5. Software Testing
  6. Community Guidelines for Contribution and Support
  7. Benchmark/Validation Datasets
  8. Common File Formats
  9. Reference Data Requirements

Discuss masking and handling of low quality data

Primer erosion with omicron is showing itself to be fairly important. In some cases, masking problematic amplicons may help improve the interpretability of the data. Is it worth bringing that out here? Hard to without a fairly detailed tour through the various amplicon strategies and how they're faring.

HIV Guidance Document - Add references for Bioinformatics Tools

In the HIV guidance document there are a number of bioinformatics tools detailed in their application to analyze HIV genomes. Each of these tools should have a reference whether it be a publication or github page.

In this issue we request that each tool be given a reference with link.

HIV Guidance Doc

πŸ“Œ Explain the Request

There exists a need to encourage burgeoning bioinformaticians to pursue the challenges presented by HIV genomics. Here, we present a guidance document designed to help public health scientists interested in performing genomic characterization, antiviral resistance detection, phylogenetic tree construction, and genomic epidemiological investigation involving HIV Next Generation Sequencing data.

πŸ“š Context

There exists a need to encourage burgeoning bioinformaticians to pursue the challenges presented by HIV genomics.

πŸ“ˆ Desired Contribution

We request a guidance document designed to help public health scientists interested in performing genomic characterization, antiviral resistance detection, phylogenetic tree construction, and genomic epidemiological investigation involving HIV Next Generation Sequencing data.

ℹ️ Additional Information

Please include descriptions of common open-source bioinformatics software as well as case study examples.

HIV Guidance Document - Add a Table of Contents

πŸ“Œ Explain the Request

We would like a table of contents added to the HIV Guidance document with links to all sections.

πŸ“š Context

Currently the document is a bit difficult to navigate for a newer github user, so a table of contents would enhance accessibility.

πŸ“ˆ Desired Contribution

A fork of the main PHA4GE repo including Markdown code adding a table of contents to the HIV Guidance Document.

ℹ️ Additional Information

https://github.blog/changelog/2021-04-13-table-of-contents-support-in-markdown-files/

Influenza Guidance Document - Request for feedback

πŸ“Œ Explain the Request
Reformat the doc where it is needed. The doc is a living doc so don't hesitate to propose changes.

Additional Information
If you think that some sections need review and editing please do so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.