Giter VIP home page Giter VIP logo

make_prg's Introduction

make_prg

A tool to create and update PRGs for input to Pandora from a set of Multiple Sequence Alignments.

Table of Contents

Dependencies

make_prg has two commands: from_msa and update. The update command requires MAFFT to be in your PATH. It can be installed:

  1. from source: https://mafft.cbrc.jp/alignment/software/;
  2. using conda: conda install -c bioconda mafft;

Install

No installation needed - precompiled portable binary

You can use make_prg with no installation at all by simply downloading the precompiled binary, and running it. In this binary, all libraries are linked statically.

  • Requirements:

    • GLIBC >= 2.17 (present on Ubuntu >= 13.04, Debian >= 8.0, CentOS >= 7, RHEL >= 7.9, Fedora >= 19, etc);
  • Download:

    wget https://github.com/leoisl/make_prg/releases/download/v0.3.0/make_prg_0.3.0
    
  • Running:

chmod +x make_prg_0.3.0
./make_prg_0.3.0 -h
  • Credits:

  • Notes:

    • We provide precompiled binaries for Linux OS only;

pip

  • Requirements: python>=3.7

  • Installing:

pip install git+https://github.com/leoisl/make_prg
  • Running:
make_prg -h

Running on a sample example

To see how to input files to both make_prg from_msa and make_prg update, and the outputs they create on a sample example, see sample example.

Usage

$ make_prg --help
usage: make_prg <subcommand> <options>

Subcommand entrypoint

optional arguments:
  -h, --help     show this help message and exit
  -V, --version  show program's version number and exit

Available subcommands:
  
    from_msa     Make PRG from multiple sequence alignment dir
    update       Update PRGs given new sequences output by pandora.

from_msa

$ make_prg from_msa --help
usage: make_prg from_msa

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input dir: all files in this will try to be read as the supported alignment_format. If not aligned in fasta alignment_format, use -f to input the alignment_format type
  -o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
                        Output prefix: prefix for the output files
  -t THREADS, --threads THREADS
                        Number of threads
  -f ALIGNMENT_FORMAT, --alignment_format ALIGNMENT_FORMAT
                        Alignment format of MSA, must be a biopython AlignIO input alignment_format. See http://biopython.org/wiki/AlignIO. Default: fasta
  --max_nesting MAX_NESTING
                        Maximum number of levels to use for nesting. Default: 5
  --min_match_length MIN_MATCH_LENGTH
                        Minimum number of consecutive characters which must be identical for a match. Default: 7
  -v, --verbose         Increase output verbosity

update

$ make_prg update --help
usage: make_prg update

optional arguments:
  -h, --help            show this help message and exit
  -u UPDATE_DS, --update_DS UPDATE_DS
                        Filepath to the update data structures. Should point to a file *.update_DS.
  -d DENOVO_PATHS, --denovo_paths DENOVO_PATHS
                        Filepath containing denovo sequences output by pandora. Should point to a denovo_paths.txt file.
  -o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
                        Output prefix: prefix for the output files
  -t THREADS, --threads THREADS
                        Number of threads
  --mafft MAFFT         Path to MAFFT executable. By default, it is assumed to be on PATH
  --keep_temp           Keep temp files.
  -v, --verbose         Increase output verbosity

make_prg's People

Contributors

bricoletc avatar leoisl avatar martinghunt avatar mbhall88 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

make_prg's Issues

Build a concatenated prg

I've been building 'genome-wide' prgs for gramtools by building one prg per msa/locus, and then adding in the linear reference genome seq in between.

I'd like an option for make_prg to do that. I can attempt implementing it myself.

The minimal info to do this is:
--ref: a ref genome
--fofn: a file of file names (or a bed file?), with the information: file name, chrom, start, end

@leoisl is the --input option a directory name currently? can a fofn be passed to make_prg?

multiprocessing

Dear development team,

I got this error from make_prg update after the updated alignments were available:

2021-11-24 12:49:25.877 | INFO     | make_prg.subcommands.update:update:131 - Write PRG file to /net/metagenomics/data/from_moni/old.tzuhao/test_pandora/results/subset_test/by_pipeline/pangenome_updated/pangenome_updated_tmp/GC00000862_2_na_aln/GC00000862_2_na_aln.prg.fa
2021-11-24 12:49:28.138 | INFO     | make_prg.subcommands.update:update:131 - Write PRG file to /net/metagenomics/data/from_moni/old.tzuhao/test_pandora/results/subset_test/by_pipeline/pangenome_updated/pangenome_updated_tmp/GC00002479_na_aln/GC00002479_na_aln.prg.fa
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "multiprocessing/pool.py", line 119, in worker
  File "multiprocessing/pool.py", line 47, in starmapstar
  File "make_prg/subcommands/update.py", line 119, in update
  File "make_prg/prg_builder.py", line 385, in batch_update
  File "make_prg/prg_builder.py", line 418, in _update_leaf
  File "make_prg/io_utils.py", line 21, in load_alignment_file
  File "Bio/AlignIO/__init__.py", line 392, in read
ValueError: No records found in handle
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "make_prg/__main__.py", line 49, in <module>
  File "make_prg/__main__.py", line 43, in main
  File "make_prg/subcommands/update.py", line 186, in run
  File "multiprocessing/pool.py", line 274, in starmap
  File "multiprocessing/pool.py", line 644, in get
ValueError: No records found in handle
[41767] Failed to execute script __main__

My command was

/net/metagenomics/data/from_moni/old.tzuhao/test_pandora/test_make_prg/bin/make_prg update --threads 3 \
  --update_DS /net/metagenomics/data/from_moni/old.tzuhao/test_pandora/test_make_prg/results/pangenome.update_DS   \
  --denovo_paths /net/metagenomics/data/from_moni/old.tzuhao/test_pandora/results/subset_test/by_pipeline/all_denovo_paths.txt  \ 
  --output_prefix /net/metagenomics/data/from_moni/old.tzuhao/test_pandora/results/subset_test/by_pipeline/pangenome_updated/pangenome_updated 

, executed using Snakemake.

Could it be some issue with the multithreading script? Otherwise, what may I need to check first? Is it likely to rerun with those updated alignments? Thank you for considering my question.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.