Giter VIP home page Giter VIP logo

smsk_popoolation's Introduction

smsk_popoolation: A Snakemake pipeline for population genomics

Build Status DOI

1. Description

This is a repo that contains installers and snakemake scripts to execute the pipelines described by Kofler et al. in popoolation 1 and 2:

  • Mapping with bwa-mem2

  • BAM wrangling and SNP calling with samtools and picard

  • Population measures with popoolation. Computation of expected heterozygosity h_p with a python script.

  • Pairwise comparisons between populations with popoolation2

2. First steps

  1. Install (ana|mini)conda

  2. Clone and install the software

    git clone https://github.com/jlanga/smsk_popoolation.git smsk_popoolation
    cd smsk_popoolation
    snakemake --use-conda --create-envs-only
  3. Run the test dataset:

    snakemake --use-conda -c 8
  4. Modify the following files:

    • config/features.yaml: the path to the genome reference, and the names of every chromosome to be processed.

    • config/samples.tsv: paths and library information of each of the samples.

    • config/params.yml: execution parameters of different tools.

  5. Execute the pipeline:

    snakemake --use-conda -j

Representation of the pipeline

smsk_popoolation pipeline

Bibliography

smsk_popoolation's People

Contributors

github-actions[bot] avatar jlanga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

beekis sunnycqcn

smsk_popoolation's Issues

Add test data

  • Find some reads
  • Find a test genome
  • Look for Kofler's original exercises

Possible issue with mkfifo

I got the following error while running the example dataset. The file 2R.mpileup.log is empty - possibly a problem with fifo. System administrator checked and mkfifo is definitely installedon the system being used. The version installed is 8.28.
System administrator also tried recreating the software (running the snakemake pipeline) on another of our development machines and hit other problems with packages that are definitely installed but it doesn’t seem to like it.
I'm currently testing the popoolation2 part by running each step separately from command line.

Activating conda environment: /home/dyern/lstm_scratch/smsk_popoolation/.snakemake/conda/5af75ca214322542d38ca9626894a8ef

mkfifo: cannot create fifo 'results/mpileup/filt/pop2/pop2.2R.mpileup': Operation not permitted

[Mon Dec 6 12:29:00 2021]

Error in rule mpileup_popoolation_filter_indels:

jobid: 27

output: results/mpileup/filt/pop2/pop2.2R.mpileup, results/mpileup/filt/pop2/pop2.2R.mpileup.gz

log: results/mpileup/filt/pop2/pop2.2R.mpileup.log (check log file(s) for error message)

conda-env: /home/dyern/lstm_scratch/smsk_popoolation/.snakemake/conda/5af75ca214322542d38ca9626894a8ef

shell:

   

    mkfifo results/mpileup/filt/pop2/pop2.2R.mpileup



    (cat results/mpileup/filt/pop2/pop2.2R.mpileup | gzip --fast > results/mpileup/filt/pop2/pop2.2R.mpileup.gz &)



    perl src/popoolation_1.2.2/basic-pipeline/filter-pileup-by-gtf.pl             --input <(gzip --decompress --stdout results/mpileup/raw/pop2/pop2.2R.mpileup.gz)             --gtf results/mpileup/filt/pop2/pop2.2R.gtf             --output results/mpileup/filt/pop2/pop2.2R.mpileup         2> results/mpileup/filt/pop2/pop2.2R.mpileup.log 1>&2

   

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Dec 6 12:29:03 2021]

Finished job 6.

33 of 71 steps (46%) done

Shutting down, this might take some time.

Exiting because a job execution failed. Look above for error message

Complete log: /home/dyern/lstm_scratch/smsk_popoolation/.snakemake/log/2021-12-06T122740.426984.snakemake.log

Memory in Java

picard uses by default just 1Gb of RAM. It causes problems when working on big files.

Add something in config.yaml to expand it and modify the map and mpileup steps

tmp folder in samtools / picard steps

When sorting, samtools writes a bunch of temporary files next to the final bam. if snakemake is interrupted, it removes some of the files, but not the intermediate ones. Use /tmp/ to store the intermediate files and for easier cleanup later.

about the snakemake doesn't work

Hi, thank you for your contributions!
While I want to install smsk_popoolation after installing anaconda3, I met an issue like the following.
git clone https://github.com/jlanga/smsk_popoolation.git smsk_popoolation #run well
cd smsk_popoolation # run well
snakemake --use-conda --create-envs-only #dont't work and get the message

The program 'snakemake' is currently not installed. To run 'snakemake' please ask your administrator to install the package 'snakemake'.

so I need install 'snakemake' before using snakemake or 'snakemaker' is a script of 'smsk_popoolation'?

Any reply will be welcomed.

error in rule "raw_extract_genome"

Working through the example data, I ran using "snakemake --use-conda --cores 24"
However, I got an error in rule "raw_extract_genome" that the 2R.fa.gz file (supplied example data) is not in gzip format. Is it the example data or the command at fault?

use-envs-only problem

This may be due to an update in snakemake commands. When I tried to paste the command from the instructions "snakemake --use-conda --create-envs-only" I got the error message "snakemake: error: unrecognized arguments: --create-envs-only"
First I tried replacing with snakemake --use-conda --conda-create-envs-only but I got the following error message asking me to enter the number of cores (Error: you need to specify the maximum number of CPU cores to be used at the same time. If you want to use N cores, say --cores N or -cN. For all cores on your system (be sure that this is appropriate) use --cores all. For no parallelization use --cores 1 or -c1.)
I then tried replacing with "snakemake --use-conda --conda-create-envs-only --cores 24" This seems to run as far as "Building DAG of jobs...
Creating conda environment src/snakefiles/hp.yml...
Downloading and installing remote packages."
Although this is taking rather a long time (> 5 minutes) so there may still be an issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.