fei0810 / triti-map Goto Github PK

A Snakemake-based pipeline for gene mapping in Triticeae.

License: MIT License

Python 71.22% Shell 22.49% R 6.18% Dockerfile 0.11%

bioinformatics snakemake epigenetics variant-analysis genomics

triti-map's Introduction

Triti-Map is a Snakemake-based pipeline for gene mapping in Triticeae, which contains a suite of user-friendly computational packages and web-interface integrating multi-omics data from Triticeae species including genomic, epigenomic, evolutionary and homologous information.

Triti-Map could efficiently explore trait-related genes or functional elements not present in the reference genome and reduce the time and labor required for gene mapping in large genome species.

More thorough information and explanations are provided in the Triti-Map Wiki.

Triti-Map workflow overview

Triti-Map ptimization steps to address specific challenges of Triticeae gene-mapping

Getting Started with Triti-Map

Installation

Installing from Bioconda

First, to install Triti-Map you need a UNIX environment contains Bioconda. See how to install Conda

# create new environment and install Triti-Map
conda create -c conda-forge -c bioconda -n tritimap tritimap
# activate Triti-Map environment
conda activate tritimap
# test Triti-Map
tritimap --help

Installing from Docker

You can also use Triti-Map via Docker. See how to install Docker, then download and run this image using the following commands:

# docker pull command
docker pull fei0810/tritimap:v0.9.7
# run docker
docker run -i -t fei0810/tritimap:v0.9.7 /bin/bash

Installing from GitHub

# download Triti-Map
git clone https://github.com/fei0810/Triti-Map.git
cd Triti-Map
# install Triti-Map
python setup.py install

When using source code for installation, you need to install other dependencies of Triti-Map by yourself. You can view Triti-Map dependent software via tritimap_env.yaml

Preparing relevant files

Genome and annotation files

Downloading the genome and annotation files you need. Here are some links to download the genome of the Triticeae species.

Building GATK and samtools index file

# for example
# GATK index
gatk CreateSequenceDictionary -R /genome/path/genome.fasta
# samtools index
samtools faidx /genome/path/genome.fasta

DNA-seq data use bwa-mem2。

# for example
bwa-mem2 index /genome/path/genome.fasta

RNA-seq data use STAR。

# for example
STAR --runThreadN 30 --runMode genomeGenerate \
--genomeDir /star/index/path/genome_star \
--genomeFastaFiles /genome/path/genome.fasta \
--sjdbOverhang 100 \
--sjdbGTFfile /anntotaion/path/genome.gtf \
--genomeChrBinNbits 18 \
--limitGenomeGenerateRAM 50805727274

Configuration files

# generate configuration file in current directory
tritimap init

#Or generate configuration file in running directory
tritimap init -d /your/work/path

When tritimap init is run successfully, the working directory will generate three configuration files that you need to modify.

config.yaml: Triti-Map configuration file
sample.csv: Sample information file
region.csv: Chromosome region file used to filter the raw results (required only when running Assembly Module alone)

The Triti-Map wiki contains detailed information about the meaning and usage of each parameter in the configuration file.

Running Triti-Map

conda activate tritimap
# running directory
cd /your/work/path

# three types of analysis method

# running both Interval Mapping Module and Assembly Module
tritimap run -j 30 all
# only running Interval Mapping Module
tritimap run -j 30 only_mapping
# only running Assembly Module
tritimap run -j 30 only_assembly

Triti-Map supports three types of analysis method.

tritimap run -j 30 only_mapping: If you only need to identify trait association intervals and mutations, then run the Interval Mapping Module.
tritimap run -j 30 only_assembly: If you only need to identify trait association new genes, then run the Assembly Module.
tritimap run -j 30 all: run the Interval Mapping Module and the Assembly Module together.

Note: Triti-Map pipeline may take a long time to run(1 to 2 days). The screen command is useful for the cases when you need to start a long-running process. Learn more about GNU Screen.

Exploring Triti-Map's results

A complete catalog of Triti-Map results is shown below.

├── results
│   ├── 01_cleandata
│   ├── 02_mergedata
│   ├── 03_mappingout
│   ├── 04_GATKout
│   ├── 05_vcfout
│   ├── 06_regionout
│   ├── 07_assembleout
│   ├── logs

You can learn about the results generated by Triti-Map in the Triti-Map wiki.

Triti-Map Annotation Platform

Triti-Map Annotation Platform is an online analysis module of Triti-Map. To locate causal variants and candidate genes or regulatory elements, Triti-Map integrated multi-omics data and various information from Triticeae species to provide a functional and evolutionary characterization of SNPs, genes, genomic regions, and new sequences related to the target trait.

The platform can perform various analyses, including SNP annotation and visual display, homologous gene analysis, collinearity analysis, and new sequence function annotation, providing richer reference information for Triticeae gene mapping.

Frequently Asked Questions

You can also check out some of the FAQ you may encounter during use Triti-Map

Citing

Zhao F, Tian S, Li Z, et al. Utility of Triti-Map for bulk-segregated mapping of causal genes and regulatory elements in Triticeae[J]. Plant Communications, 2022: 100304.

https://doi.org/10.1016/j.xplc.2022.100304

Author/Support

Fei Zhao ([email protected])

Lab Home Page：http://bioinfo.cemps.ac.cn/zhanglab/

Issues can be raised at: https://github.com/fei0810/Triti-Map/issues

We also encourage you to contribute to TriTi-Map! To fix bugs or add new features you need to create a Pull Request.

Maintainer

Fei Zhao ([email protected])

Shilong Tian ([email protected])

Acknowledgements

Thanks to @xuzhougeng who provided installation support for Bioconda, and thanks to @zwbao who offered Docker installation support.

triti-map's People

Contributors

Stargazers

Watchers

Forkers

zwbao

triti-map's Issues

GATK index by samtools may be wrong ?

Describe the bug
A clear and concise description of what the bug is.
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
To Reproduce
Steps to reproduce the behavior:

Select jobs to execute...

[Sun May 29 13:12:58 2022]
Job 13:
Processing /public-dss/share/TYP_lab/HZX/allcleandata/28/duo28_RRA114646-V_1.clean.fq and /public-dss/share/TYP_lab/HZX/allcleandata/28/duo28_RRA114646-V_2.c
lean.fq with fastp

Reason: Missing output files: results/01_cleandata/duo28_rnaseq_duo28_pool_fastp_R1.fq.gz, results/01_cleandata/duo28_rnaseq_duo28_pool_fastp_R2.fq.gz

[Sun May 29 13:12:58 2022]
Job 14:
Processing /public-dss/share/TYP_lab/HZX/allcleandata/28/shao28_RRA114647-V_1.clean.fq and /public-dss/share/TYP_lab/HZX/allcleandata/28/shao28_RRA114647-V_2
.clean.fq with fastp

Reason: Missing output files: results/01_cleandata/shao28_rnaseq_shao28_pool_fastp_R1.fq.gz, results/01_cleandata/shao28_rnaseq_shao28_pool_fastp_R2.fq.gz

/public-supool/home/tong_lab/miniconda3/envs/tritimap/lib/python3.9/site-packages/tritimap/rules/gatk4_calling.smk:4: FutureWarning: The squeeze argument has
been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.

contigs = pd.read_csv(config["ref"]["genome"] + ".fai", sep = '\t', header=None, usecols=[0], squeeze=True, dtype=str)
/public-supool/home/tong_lab/miniconda3/envs/tritimap/lib/python3.9/site-packages/tritimap/rules/gatk4_calling.smk:4: FutureWarning: The squeeze argument has
been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.

contigs = pd.read_csv(config["ref"]["genome"] + ".fai", sep = '\t', header=None, usecols=[0], squeeze=True, dtype=str)
[Sun May 29 13:17:05 2022]
Finished job 14.
1 of 59 steps (2%) done
[Sun May 29 13:17:14 2022]
Finished job 13.
2 of 59 steps (3%) done
Select jobs to execute...

[Sun May 29 13:17:14 2022]
Job 12:
Merge different type of ChIP-seq data to one file to calling snp. Input fils: results/01_cleandata/duo28_rnaseq_duo28_pool_fastp_R1.fq.gz results/01_cleandat
a/duo28_rnaseq_duo28_pool_fastp_R2.fq.gz results/01_cleandata/shao28_rnaseq_shao28_pool_fastp_R1.fq.gz results/01_cleandata/shao28_rnaseq_shao28_pool_fastp_R
2.fq.gz

Reason: Missing output files: results/02_mergedata/shao28_pool_merge_fastp_R1.fq.gz, results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz, results/02_mergeda
ta/shao28_pool_merge_fastp_R2.fq.gz, results/02_mergedata/duo28_pool_merge_fastp_R2.fq.gz; Input files updated by another job: results/01_cleandata/duo28_rna
seq_duo28_pool_fastp_R1.fq.gz, results/01_cleandata/shao28_rnaseq_shao28_pool_fastp_R1.fq.gz, results/01_cleandata/duo28_rnaseq_duo28_pool_fastp_R2.fq.gz, re
sults/01_cleandata/shao28_rnaseq_shao28_pool_fastp_R2.fq.gz

contigs = pd.read_csv(config["ref"]["genome"] + ".fai", sep = '\t', header=None, usecols=[0], squeeze=True, dtype=str)
[Sun May 29 13:17:39 2022]
Finished job 12.
3 of 59 steps (5%) done
Select jobs to execute...

[Sun May 29 13:17:39 2022]
Job 47:
Assembly results/02_mergedata/shao28_pool_merge_fastp_R1.fq.gz results/02_mergedata/shao28_pool_merge_fastp_R2.fq.gz

Reason: Missing output files: results/07_assembleout/shao28_merge_denovo_scaffolds.fasta; Input files updated by another job: results/02_mergedata/shao28_poo
l_merge_fastp_R1.fq.gz, results/02_mergedata/shao28_pool_merge_fastp_R2.fq.gz

[Sun May 29 13:17:39 2022]
Job 48:
Assembly results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz results/02_mergedata/duo28_pool_merge_fastp_R2.fq.gz

Reason: Missing output files: results/07_assembleout/duo28_merge_denovo_scaffolds.fasta; Input files updated by another job: results/02_mergedata/duo28_pool_
merge_fastp_R2.fq.gz, results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz

[Sun May 29 15:39:45 2022]
Finished job 47.
4 of 59 steps (7%) done
Select jobs to execute...

[Sun May 29 15:39:45 2022]
Job 15:
Mapping results/02_mergedata/shao28_pool_merge_fastp_R1.fq.gz results/02_mergedata/shao28_pool_merge_fastp_R2.fq.gz with STAR (step 1)

Reason: Missing output files: results/03_mappingout/shao28_pool_step1/shao28_poolSJ.out.tab; Input files updated by another job: results/02_mergedata/shao28_
pool_merge_fastp_R1.fq.gz, results/02_mergedata/shao28_pool_merge_fastp_R2.fq.gz

[Sun May 29 15:41:41 2022]
Finished job 48.
5 of 59 steps (8%) done
Select jobs to execute...

[Sun May 29 15:41:41 2022]
Job 22:
Mapping results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz results/02_mergedata/duo28_pool_merge_fastp_R2.fq.gz with STAR (step 1)

Reason: Missing output files: results/03_mappingout/duo28_pool_step1/duo28_poolSJ.out.tab; Input files updated by another job: results/02_mergedata/duo28_poo
l_merge_fastp_R2.fq.gz, results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz

[Sun May 29 15:55:36 2022]
Finished job 15.
6 of 59 steps (10%) done
Select jobs to execute...

[Sun May 29 15:55:36 2022]
Job 11:
Mapping results/02_mergedata/shao28_pool_merge_fastp_R1.fq.gz results/02_mergedata/shao28_pool_merge_fastp_R2.fq.gz with STAR (Step 2)

Reason: Missing output files: results/03_mappingout/shao28_pool_step2/shao28_poolAligned.sortedByCoord.out.bam; Input files updated by another job: results/0
2_mergedata/shao28_pool_merge_fastp_R1.fq.gz, results/03_mappingout/shao28_pool_step1/shao28_poolSJ.out.tab, results/02_mergedata/shao28_pool_merge_fastp_R2.
fq.gz

[Sun May 29 15:57:50 2022]
Finished job 22.
7 of 59 steps (12%) done
Select jobs to execute...

[Sun May 29 15:57:50 2022]
Job 21:
Mapping results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz results/02_mergedata/duo28_pool_merge_fastp_R2.fq.gz with STAR (Step 2)

Reason: Missing output files: results/03_mappingout/duo28_pool_step2/duo28_poolAligned.sortedByCoord.out.bam; Input files updated by another job: results/03_
mappingout/duo28_pool_step1/duo28_poolSJ.out.tab, results/02_mergedata/duo28_pool_merge_fastp_R2.fq.gz, results/02_mergedata/duo28_pool_merge_fastp_R1.fq.gz

open: No such file or directory
[bam_index_build2] fail to open the BAM file.
[Sun May 29 16:39:14 2022]
Finished job 21.
8 of 59 steps (14%) done
Select jobs to execute...

[Sun May 29 16:39:14 2022]
Job 20: Replace ReadsGroups results/03_mappingout/duo28_pool_step2/duo28_poolAligned.sortedByCoord.out.bam with GATK4
Reason: Missing output files: results/04_GATKout/duo28_pool_reprg.bam; Input files updated by another job: results/03_mappingout/duo28_pool_step2/duo28_poolA
ligned.sortedByCoord.out.bam

open: No such file or directory
[bam_index_build2] fail to open the BAM file.
[Sun May 29 16:41:56 2022]
Finished job 11.
9 of 59 steps (15%) done
Select jobs to execute...

[Sun May 29 16:41:56 2022]
Job 10: Replace ReadsGroups results/03_mappingout/shao28_pool_step2/shao28_poolAligned.sortedByCoord.out.bam with GATK4
Reason: Missing output files: results/04_GATKout/shao28_pool_reprg.bam; Input files updated by another job: results/03_mappingout/shao28_pool_step2/shao28_po
olAligned.sortedByCoord.out.bam

[Sun May 29 16:45:02 2022]
Finished job 20.
10 of 59 steps (17%) done
Select jobs to execute...

[Sun May 29 16:45:02 2022]
Job 19:
Remove Duplicates results/04_GATKout/duo28_pool_reprg.bam with GATK4

Reason: Missing output files: results/04_GATKout/duo28_pool_rmdup.bam; Input files updated by another job: results/04_GATKout/duo28_pool_reprg.bam

open: No such file or directory
[bam_index_build2] fail to open the BAM file.
[Sun May 29 16:48:39 2022]
Finished job 10.
11 of 59 steps (19%) done
Select jobs to execute...

[Sun May 29 16:48:39 2022]
Job 9:
Remove Duplicates results/04_GATKout/shao28_pool_reprg.bam with GATK4

Reason: Missing output files: results/04_GATKout/shao28_pool_rmdup.bam; Input files updated by another job: results/04_GATKout/shao28_pool_reprg.bam

open: No such file or directory
[bam_index_build2] fail to open the BAM file.
[Sun May 29 16:57:17 2022]
Finished job 19.
12 of 59 steps (20%) done
Removing temporary output results/04_GATKout/duo28_pool_reprg.bam.
Select jobs to execute...

[Sun May 29 16:57:17 2022]
Job 18:
Filter results/04_GATKout/duo28_pool_rmdup.bam to uniqmap and prepore pair with sambamba

Reason: Missing output files: results/04_GATKout/duo28_pool_uniqmap.bam; Input files updated by another job: results/04_GATKout/duo28_pool_rmdup.bam

open: No such file or directory
[bam_index_build2] fail to open the BAM file.
open: No such file or directory
[bam_sort_core] fail to open file results/04_GATKout/duo28_pool_uniqmap.bam
[samopen] SAM header is present: 22 sequences.
[Sun May 29 16:57:17 2022]
Error in rule rnaFilter2Uniqmap:
jobid: 18
output: results/04_GATKout/duo28_pool_uniqmap.bam
log: results/logs/duo28_pool_rna_uniqmap.log (check log file(s) for error message)
shell:
samtools index -c results/04_GATKout/duo28_pool_rmdup.bam && samtools view -h results/04_GATKout/duo28_pool_rmdup.bam |egrep 'NH:i:1[^0-9]|^@' | samt
ools view -h -f 3 -S -b - | samtools sort -o results/04_GATKout/duo28_pool_uniqmap.bam -
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Sun May 29 17:02:12 2022]
Finished job 9.
13 of 59 steps (22%) done
Removing temporary output results/04_GATKout/shao28_pool_reprg.bam.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-05-29T131247.147336.snakemake.log
Errors.
[2022-05-d 17:02 CRITICAL] Command 'snakemake --snakefile /public-supool/home/tong_lab/miniconda3/envs/tritimap/lib/python3.9/site-packages/tritimap/Snakefil
e --directory /public-dss/share/TYP_lab/HZX/call28 --jobs 30 --rerun-incomplete --configfile 'config.yaml' --nolock --config module=all' returned non-ze
ro exit status 1.
Expected behavior*
A clear and concise description of what you expected to happen.

Error log
supool/home/tong_lab/miniconda3/envs/tritimap/lib/python3.9/site-packages/tritimap/Snakefil
e --directory /public-dss/share/TYP_lab/HZX/call28 --jobs 30 --rerun-incomplete --configfile 'config.yaml' --nolock --config module=all' returned non-ze
ro exit status 1.

Additional context
Add any other context about the problem here.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.