Giter VIP home page Giter VIP logo

bill's Introduction

BioInformatics Learning Lab - BILL

Logo BILL

Variant calling pipeline designed for teaching

The Bioinformatics Learning Lab (BILL) is a teaching unit of the Master of Bioinformatics of the University of Montpellier. Students take part in a research project analysing structural variants (SVs) and small nucleotide variants (SNVs).

They perform DNA extraction, sequencing, data analysis and interpretation of the results.

This pipeline starts by trimming the read files by removing reads smaller than 1,000 bp. It then proceeds to align the reads against the virus genomic reference. It processes the alignment by removing unaligned reads and converting them to a sorted binary format. It then performs a variant calling step and filters the resulting variants. Finally, it merges all variant files into a VCF file. Some statistical commands appear throughout the pipeline to check the quality of the data or results.

Pipeline

Getting Started

Dependencies

  • snakemake v7.21.0
  • seqkit v2.3.0
  • minimap2 v2.24-r1122
  • samtools v1.16.1
  • bamCoverage v3.5.1
  • plotCoverage v3.5.1
  • sniffles2 v2.2
  • tabix v1.16
  • bgzip v1.16
  • bcftools v1.16
  • medaka v1.11.3

How to install it

Clone the repository wherever you want on your local:

git clone [email protected]:asfistonlavie/bill.git # by SSH 
git clone https://gitlab.com/asfistonlavie/bill.git # by HTTPS

Copy or move all your input data (reads and genomic references) to the resources/ folder (respectively resources/inputs/ and resources/references).

cp </path/to/your/reads/*.fastq> bill/resources/inputs/
cp </path/to/your/reads/*.fastq.gz> bill/resources/inputs/
cp </path/to/your/references/*.fasta> bill/resources/references/

How to use it

There are three ways to use this pipeline: (1) by file name, (2) by file type, or (3) the whole pipeline. The main command to run all pipeline is snakemake --cores <nb_core_max>.

If you just want a specific file, run:

snakemake --cores <nb_core_max> <file_name>

It will automatically find the correct rule to run based on the file name. File names are constrained by the snakefile (see the wiki for correct file name format).

If you want all of a type of file, run :

snakemake --cores <nb_core_max> <file_type_name>

You can override each option in the configuration file- by adding the parameter --config <option_name>=<option_value> to the snakemake command.

How to cite the pipeline

Variant calling pipeline designed for the Bioinformatics Learning Lab (BILL: release_2023.1) A Soulier, A Arnoux, C Breton, E Cherif, AS Fiston-Lavier Zenodo. https://doi.org/10.5281/zenodo.10020027

Authors

Acknowledgement

We would like to thank the people involved in the BILL project, Jean-Christophe Avarre, Anne-Sophie Gosselin-Grenet and Marie-Ka Tilak, as well as all the students who took part in the project.

Website GitHub GitHub contributors Github release Static Badge

bill's People

Contributors

asfistonlavie avatar souliera avatar

Stargazers

 avatar Hugo-Blvr avatar Johon Li Tuobang 李拓邦 avatar

Watchers

 avatar  avatar

Forkers

souliera plogeur

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.