Giter VIP home page Giter VIP logo

bio-courses's Introduction

Bio-Courses

Introduction

This page compiles a list of links to tutorials which have been written by numerous authors for many of the steps involved in whole genome sequence (WGS) analysis of prokaryotic organisms. Some of these steps contain concepts and ideas that are generally applicable to whole genome sequencing of other organisms (e.g. read QC) although in many cases the recommended software would be different. It should be noted that the first step for any aspiring bioinformatician of any level is to build up familiarity with the Linux command line. This will provide access to powerful and flexible tools for and applications.

Disclaimer

The links and tutorials listed below were not written, and are not owned, by the author of this page unless explicitly noted. We take no responsibility for their maintenance or accuracy.

Content

  1. Linux command line
  2. Programming
    1. Python
    2. Perl
    3. R
  3. Core Concepts in WGS
    1. Whole Genome Sequencing (WGS)
    2. Library Preparation
    3. Sequencing Technology
    4. Coverage
  4. Sequencing Reads
    1. Short Reads
    2. Long Reads
    3. Read QC
  5. Mapping and Variant Calling
  6. Assembly
  7. Assembly QC
  8. Annotation
  9. Phylogenomics
  10. Pangenomics
  11. K-mer and related
  12. Databases
    1. NCBI
    2. ENA
    3. BIGSdb
    4. Enterobase
  13. Servers
    1. EDGE

Command-line tutorials

Familiarity with the Linux command-line is usually the first step for budding informaticians. Many tools are only designed or distributed for Linux-based systems. In addition to this many powerful operations, such as iterating through batches of files, can dramatically reduce and simplify workflows.

Programming

Picking up a programming language allows for an informatician to be more flexible in how they approach analysis workflows. Scripts can be used to automate many complex tasks in a more bespoke way than loops on the command-line. There are some excellent tutorials online for many languages. Python is considered the most powerful and popular language for bioinformatics. Perl comes in a (debatably) close second. R is often used to perform advanced statistical analyses and to produce publication worthy figures.

Perl

Python

R

  • R for begginers – basic introduction to R and statistical analysis.
  • ggplot2 tutorial – an incredibly flexible and powerful family of packages for creating figures using the grammar of graphics.

Core Concepts in WGS

Whole genome sequencing

Library preparation

Sequencing technology

Coverage

Sequence coverage or depth (depth of coverage) is the number of times a base in the target genome is covered by a read e.g. 30x coverage would mean that, on average, each base in your sample will be coverage by 30 reads.

Types of Reads

Short Reads

Long Reads

Read QC

  • Fastqc – an introduction to fastqc, a tool for assessing multiple read quality metrics.
  • Trimmomatic manual - a tools for trimming reads and removing adapter sequences.

Mapping and Variant Calling

  • snippy - a tool for mapping (BWA) and variant calling.

Assembly

Assembly QC

Annotation

Phylogenomics

Pangenomics

K-mer and related

Databases

NCBI

ENA

BIGsDB

Enterobase

Servers

EDGE

bio-courses's People

Contributors

sionbayliss avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.