marbl / harvest Goto Github PK

View Code? Open in Web Editor NEW

50.0 50.0 11.0 30.73 MB

License: Other

harvest's People

Contributors

Stargazers

Watchers

Forkers

lukeping bioinformaticsarchive ondovb nichukr rnandety jilooi ajkarloss pombredanne gunzivan28 zhangxiaodong8315 abubakariabdulwasid

harvest's Issues

What application opens the gingr file on linux?

I've downloaded the .ggr file on my linux machine but could not open the downloaded file. The message reads: Could not open "run_mers.gingr1.ggr" Archive type not supported. I tried running it from the command line (sudo ./run_mers) but this error message popped up sudo: sudo: ./run_mers: command not found.

Problem running parsnp

First of all...thank you very much for this software.
I have a problem running parsnp on Linux (I am using the precompiled package). I cannot get rid of this error:

lfreschi@katak:~/sw/harvest/all/harvest-Linux64-v1.0.1> ./parsnp -v -d gen/ -r PA14.fasta
Warning: Cannot determine OS, defaulting to linux
|--Parsnp v1.0--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
sh: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC
sh: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC
sh: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC
sh: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC
sh: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC
sh: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC

-->Reading Genome (asm, fasta) files from gen/..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) ..
|->[WARNING]: no genbank file provided for reference annotations, skipping..
-->Calculating MUMi..
/bin/bash: symbol lookup error: /scratch/_MEIKYhGiu/libreadline.so.5: undefined symbol: PC
ERROR
The following command failed:

/scratch/_MEIKYhGiu/parsnp /project/rclevesq/users/lfreschi/sw/harvest/all/harvest-Linux64-v1.0.1/P_2014_10_20_120717242134/all_mumi.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
ERROR

I set the TMP, TMPDIR and TEMP environmental variables and I am sure that on /scratch there is enough space. From the documentation it is not clear to me what I should do.

UPDATE: I tried to install the Mac version of parsnp on my laptop and, well, it works perfectly. However, it would be really important for me to be able to run the analyses on Linux. Could you please give some hints about how to solve this problem?

Parsnp Error

I am using parsnp to do genome alignment with different bacterial species without problem on a linux cluster platform. However, when I did parsnp on hundreds of E.coli complete genomes, following error occurred:

ERROR
The following command failed:

/tmp/_MEIUxjmZE/parsnp Ecoli_SNP_out/parsnpAligner.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
ERROR

Is it due to the limited space on the /tmp directory on my cluster?
How can I solve this problem? Can I redirect the tmp files to another directory?
Thanks.

Parameter : -a (default = 1.1*Log(S)), VALUE OF S ?

Hi,

Sorry for the stupid question but I couldn't find an answer.
As said in the title, for the following parameter :
-a = : min (a)NCHOR length (default = 1.1*Log(S))
What does S stand for ?

Regards,

Sam

alignment of 1000 Klebsiella pneumoniae genomes with parsnp

Hi Treangen, Thank you very much for the software. I have no difficulties in genomes alignment of a small sample sizes (up to 45 bacterial genomes). But I am faced with a problem for the sample size of more than 1000 genomes. I run parsnp with the following command:
parsnp -r ~/Documents/tsvetkova/KPN_genomes/GCF_000409715.1_Klebsiella_pneumoniae_ATCC_25955.fna -d ~/Documents/tsvetkova/KPN_genomes -p 24 -c

And I receive the error message:
-->Running Parsnp multi-MUM search and libMUSCLE aligner..
ERROR
The following command is failed:

/tmp/_MEIxyqNbZ/parsnp /home/biouser/Documents/tsvetkova/P_2016_12_21_155218224432/parsnpAligner.ini

I am asking for your help in resolving this problem.

parsnp ignores file called "1.fasta"

Interesting 'bug'. I had a sequence file called '1.fasta' and parsnp persistently ignored it from the alignment until I renamed it 1_seq.fasta.

parsnp -c -p 30 -d seqs/ -o26 -r ref.fasta -z 60 -o parsnp-o26

MUM calculation

Dear Dr. Treangen,

Here's a simple bug. If parsnp is run with both -c and -M, it crashes. The output is below.
Once I realized that -c could be incompatible with -M, I removed it and it behaved as expected (it also behaved as expected if -c was retained and -M was dropped).

Thanks for providing such a useful tool.

Sincerely,
Adam

p.s. are you familiar with any simple tool that will quickly calculate MUMi for pairs of genomes (so that I can do an all v. all calculation)? Thanks again!

======= run with M flag ============
parsnp -c -r ! -o ParSNP_250 -d assemblies_250/ -M
|--Parsnp v1.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest

-->Reading Genome (asm, fasta) files from assemblies_250/..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) ..
|->[WARNING]: no genbank file provided for reference annotations, skipping..
Traceback (most recent call last):
File "", line 876, in
AttributeError: 'str' object has no attribute 'close'

Parsnp: Error creating Gingr input file

I'm getting an error with Parsnp at the last step creating the Gingr input file:

chaconas@chaconas-X10DRi:~/Programs/Parsnp-Linux64-v1.2$ ./parsnp -d /home/chaconas/Desktop/Projects/RDVParsnpTrial/References/GenomesAndContigs/ -g /home/chaconas/Desktop/Projects/RDVParsnpTrial/Queries/11-282_06052015.gbk -P 30000000
|--Parsnp v1.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest

SETTINGS:
|-refgenome: /home/chaconas/Desktop/Projects/RDVParsnpTrial/Queries/11-282_06052015.gbk.fna
|-aligner: libMUSCLE
|-seqdir: /home/chaconas/Desktop/Projects/RDVParsnpTrial/References/GenomesAndContigs/
|-outdir: /home/chaconas/Programs/Parsnp-Linux64-v1.2/P_2015_07_03_152715335014
|-OS: Linux
|-threads: 32

-->Reading Genome (asm, fasta) files from /home/chaconas/Desktop/Projects/RDVParsnpTrial/References/GenomesAndContigs/..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) /home/chaconas/Desktop/Projects/RDVParsnpTrial/Queries/11-282_06052015.gbk..
|->[OK]
-->Calculating MUMi..
|->[OK]
-->Running Parsnp multi-MUM search and libMUSCLE aligner..
|->[OK]
-->Running PhiPack on LCBs to detect recombination..
|->[SKIP]
-->Reconstructing core genome phylogeny..
|->[OK]
-->Creating Gingr input file..
ERROR
The following command failed:

/tmp/_MEIXUzQtH/harvest --midpoint-reroot -u -q -i /home/chaconas/Programs/Parsnp-Linux64-v1.2/P_2015_07_03_152715335014/parsnp.ggr -o /home/chaconas/Programs/Parsnp-Linux64-v1.2/P_2015_07_03_152715335014/parsnp.ggr -n /home/chaconas/Programs/Parsnp-Linux64-v1.2/P_2015_07_03_152715335014/parsnp.tree
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
ERROR

Work with 1000 genomes

Dear team parsnp

I have a directory with 1320 assemblies genomes. I run the command

parsnp -g ../../DB_all/genome_reference_VP/CP007004.gbk,../../DB_all/genome_reference_VP/CP007005.gbk -d ../../Data/all_v_p_for_parsnp/ -p 6

It works well but only aligment 73 genomes.

######################

For detailed documentation please see --> http://harvest.readthedocs.org/en/latest

-->Reading Genome (asm, fasta) files from ../../Data/all_v_p_for_parsnp/..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) ../../DB_all/genome_reference_VP/CP007004.gbk,../../DB_all/genome_reference_VP/CP007005.gbk..
|->[OK]
-->Calculating MUMi..
|->[OK]
-->Running Parsnp multi-MUM search and libMUSCLE aligner..

|->[OK]
-->Running PhiPack on LCBs to detect recombination..
|->[SKIP]
-->Reconstructing core genome phylogeny..
|->[OK]
-->Creating Gingr input file..
|->[OK]
-->Calculating wall clock time..
|->Aligned 73 genomes in 1.00 hours

<<Parsnp finished! All output available in /home/ubuntu/Results/all_1327_parsnp/P_2018_06_11_185050006955>>

Validating output directory contents...
1)parsnp.tree: newick format tree [OK]
2)parsnp.ggr: harvest input file for gingr (GUI) [OK]
3)parsnp.xmfa: XMFA formatted multi-alignment [OK]

parsnp error

The following command failed:

fasttree -nt -quote -gamma -slow -boot 100 /home/hitesh/Downloads/P_2019_05_03_114053207553/parsnp.snps.mblocks > /home/hitesh/Downloads/P_2019_05_03_114053207553/parsnp.tree
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

parsnp not creating vcf file

I followed the given pipeline to create vcf file from parsnp to further use it for genotyphi.. Commands runs ok but not creating vcf file. any help ?

majority of input genomes get ignored

Hi,

I ran parsnp in the fasta-reference mode only (so no genbank file). I put more than 500 query genomes into a folder, but only 36 of them were actually taken into account in the analysis.

Is that a known issue? What should I do?

Thank you!

Edit: I just reran the exact same command, and now 29 instead of 36 genomes were analyzed. I am at utterly at loss.

Issues with harvest-Linux64.tar.gz (1.0)

None of these are critical :-)

The gingr-Linux64 binary did not have executable bit set (parsnp and harvesttools did)
Be good if they could be staticallty linked binaries (harder these days i know). My ubuntu/debian seems to have libtiff.so.4 and v3 doesn't seem to be in APT: ./gingr: error while loading shared libraries: libtiff.so.3: cannot open shared object file: No such file or directory I can get it working if needed, just thought I'd let you know.
Be good if the tar file had everything in a folder, namely "harvest-Linux64-1.0"
Be good if the binaries did not have the "-Linux64" suffix. See 3.

parsnp does not recruit all genomes provided

Hi,

I was trying to align 16 genomes from different pathotypes of the same protist species, but the program only recruited 12. Is there a reason for this? In the bar below the Gingr interface it says that the core is only 87%
Also, is there a way to use the Gingr interface to search only for polymorphic regions. I do not want to scan everything manually, but rather have something that takes me to the next polymorphism. For the little I have scanned I cannot find lots of polymorphisms; for what I can see I can only find polymorphisms of all my sequences to the reference genome but not among them. I know they are very closely related, but there should be differences between pathotypes.

thanks for your help

ignoring sequences ending with the same numbers as the reference genome

I noticed this odd little bug in parsnp v1.2. When running parsnp using the -c -d options and a reference ending with numbers, genomes that have names that are contained in the ending numbers of that reference file get excluded.

For example:
when using the reference "H4476.fasta" , the genomes 6.fasta, 76.fasta and 476.fasta get silently ignored (they are not listed in the ini files). When I rename these three genomes to bla6bla.fasta , bla76bla.fasta and bla476bla.fasta they do get included. I'm assuming this is some sort of bug in the code that excludes the reference sequence from being selected as a query genome.

parsnp with genbank file error

I've downloaded a genbank file (full) with sequence. when running parsnp as follows:

parsnp -g ../HE681097.1.gb -d .. -p 4

i receive the following error:

-->Reading Genome (asm, fasta) files from ....
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) ../HE681097.1.gb..
|->[OK]
Traceback (most recent call last):
File "", line 624, in
NameError: name 'nameok' is not defined

any help appreciated - thanks!

not creating vcf file

I followed the given pipeline to create vcf file from parsnp to further use it for genotyphi.. Commands runs ok but not creating vcf file. any help ?

Question about parsnp

Hi Treangen,

Thank you for the software. I am new to bioinformatics and programming. I apologize if I asked stupid questions. I have three issues when I ran Parsnp.

the step of "determining repetitive regions..." did not show up in my output, comparing with you tutorial examples and other peoples output that I found online.
I am missing .vcf file in my output directory contents. But I can export via gingr.
When I force to include all the genomes in my data director (13 fasta files), I got a lot un-alinged regions, with 87% coverage of the alignment. I tried different -C parameters(1000, 10000, and 100000). The coverage seems not improved.

I am using Ubuntu linux OS, and linked Parsnp to ~/bin, using:
ln -fs ~/bifido/harvest/src/Parsnp-Linux64-v1.2/parsnp ~/bin
I don't know if the linking thing matters.

Thank you for your time and help! Let me know if any other information I should provide.

A small subset of samples running in sample directory

Hello,

I have been having an issue while running parsnp. Sometimes parsnp only includes a very limited number samples in the sample_directory. It normally is under 10 samples.

I run them like this:
parsnp -g Enterobacter_aerogenes_KCTC_2190.gb -r Enterobacter_aerogenes_KCTC_2190.fna -d sample_directory -p 20 -o output_directory/

I have 26 contig files and I downloaded 150 scaffolds of the same bacterial genome. However, I'm unable to include all of the 176 samples together. I've only been able to get 100 samples to run together. I have been able to get all them to work with smaller sample numbers therefore I do not think it is the file itself.

Any tips?

Thanks
Samantha

Low cluster coverage

I'm trying to use parsnp to infer core genes (and importantly, get rid of signal introduced by recombination in core genes) on around 1000 bins retrieved from metagenomes. The problem is that parsnp throws an error reporting that cluster coverage is too low (below 1%)

I have run roary on the same set of genomes without any problems. Pairwise core gene distances are above 97%. When I lower the number of genomes to a smaller subset (e.g. 100 randomly selected genomes), parsnp works fine.

I am running parsnp like so:

./parsnp -r /path/to/ref_genome.fa -d genomes_folder -p 30 -x -c

Any suggestions would be very much appreciated.

ParSNP duplicates randomly chosen reference in tree

I always use the " -r ! " switch to choose the reference sequence. In version 1.2 that works great, but version 1.5.6 duplicates the chosen reference in the tree, so adding one extra entry. Can that be prevented?

parSNP issue

Hi,

When I run parSNP on 14 genomes with a random reference genome, the output file only includes 12 of them. I am using parSNP with the -c command. I have tried to circumvent the problem by forcing the program to use one of the dropped genomes as the reference (using -r). The error I receive is:
-->Running Parsnp multi-MUM search and libMUSCLE aligner..
ERROR
The following command failed:

/tmp/_MEIIrMqqu/parsnp /path/to/my/directory//parsnpAligner.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
ERROR

Can you help me determine what is going on? It seems like the two genomes that are removed are random.

Thanks.

parsnp fails on Manjaro (Archlinux)

I try to run parsnp (downloaded binay) on my Manjaro (a flavour of Archlinux) laptop but the program fails with the following error:

$ parsnp -r ./0.0_ref/Scedosporium_apiospermum.ScApio1_0.31.dna.toplevel.fna -p 6 -c -d ./0.2_data 2>&1 | tee test_harvest1_Sapio.log
Warning: Cannot determine OS, defaulting to linux
|--Parsnp v1.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
-->Reading Genome (asm, fasta) files from ./0.2_data..
  |->[OK]
-->Reading Genbank file(s) for reference (.gbk) ..
  |->[WARNING]: no genbank file provided for reference annotations, skipping..
**ERROR**
The following command failed:
>>/tmp/_MEIkmsiuD/parsnp /home/ludo/Documents/01_professional/06_Projects_experiments/2016-2017_Angers/2016-2017_Scedosporium/01_Pipeline_CompGeno/05_Harvest/02_Ssapio_only/P_2016_10_25_175622540458/parsnpAligner.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
**ERROR**


**********************************************************************************************************************************************************************************************
SETTINGS:
|-refgenome:    ./0.0_ref/Scedosporium_apiospermum.ScApio1_0.31.dna.toplevel.fna
|-aligner:  libMUSCLE
|-seqdir:   ./0.2_data
|-outdir:   /home/ludo/Documents/01_professional/06_Projects_experiments/2016-2017_Angers/2016-2017_Scedosporium/01_Pipeline_CompGeno/05_Harvest/02_Ssapio_only/P_2016_10_25_175622540458
|-OS:       linux
|-threads:  6
**********************************************************************************************************************************************************************************************

<<Parsnp started>>

-->Running Parsnp multi-MUM search and libMUSCLE aligner..

It's important to note here that the very same design works on the ubuntu laptop of a colleague (same folders, files, commands).
I thought first it was due to the small size of the /tmp folder (only 8GB) but when I changed it by setting 'TMPDIR="/home/ludo/tmp"', I got the very same error:

$ parsnp -r ./0.0_ref/Scedosporium_apiospermum.ScApio1_0.31.dna.toplevel.fna -p 6 -c -d ./0.2_data 2>&1 | tee test_harvest1_Sapio.log
Warning: Cannot determine OS, defaulting to linux
|--Parsnp v1.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
sh: symbol lookup error: sh: undefined symbol: rl_signal_event_hook
-->Reading Genome (asm, fasta) files from ./0.2_data..
  |->[OK]
-->Reading Genbank file(s) for reference (.gbk) ..
  |->[WARNING]: no genbank file provided for reference annotations, skipping..
**ERROR**
The following command failed:
>>/home/ludo/tmp/_MEI8uE6ZN/parsnp /home/ludo/Documents/01_professional/06_Projects_experiments/2016-2017_Angers/2016-2017_Scedosporium/01_Pipeline_CompGeno/05_Harvest/02_Ssapio_only/P_2016_10_25_175011659051/parsnpAligner.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
**ERROR**


**********************************************************************************************************************************************************************************************
SETTINGS:
|-refgenome:    ./0.0_ref/Scedosporium_apiospermum.ScApio1_0.31.dna.toplevel.fna
|-aligner:  libMUSCLE
|-seqdir:   ./0.2_data
|-outdir:   /home/ludo/Documents/01_professional/06_Projects_experiments/2016-2017_Angers/2016-2017_Scedosporium/01_Pipeline_CompGeno/05_Harvest/02_Ssapio_only/P_2016_10_25_175011659051
|-OS:       linux
|-threads:  6
**********************************************************************************************************************************************************************************************

<<Parsnp started>>

-->Running Parsnp multi-MUM search and libMUSCLE aligner..

Important note: default python on Archlinux is python3, may that be a problem?

Thank you for any help!
Ludovic

PPS: so far trying to compile from the git source is not an option either as libMuscle fails to compile due to CXXFLAGS=’-fopenmp’.

ERROR: ref genome sequence seems to aligned! remove and restart

Hello,

When running parsnp v1.2 with the following command:
parsnp -r 1339277.3.fna -d genomes -p 1 -o out

I get the following strange error:

-->Reading Genbank file(s) for reference (.gbk) ..
  |->[WARNING]: no genbank file provided for reference annotations, skipping..
ERROR: ref genome sequence 1339277.3.fna seems to aligned! remove and restart

However, if I specify a different genome with -r, the program runs to completion. All genomes are in the genomes directory. All are draft genomes from isolates from the same species.

I'm trying to incorporate parsnp into a pipeline, but this unpredictable error is problematic. Any help would be appreciated.

Thanks,
Stephen

parsnp calculates genome size wrongly

Hi,
I used parsnp for closed genomes with defined numbers of plasmids and recognized that the calculated genome size in the log file deviated from the given size of the input. Unfortunately this influences the calculated cluster coverage percentage. It seems that for each given plasmid (or contig) the genome size was extended by 310 bp, i.e. with 4 plasmids the genome size was overestimated by 1240 bp.
Is there a reason for this behaviour? And is there a possibility to switch this off?
Thanks

Using parsnp with a draft reference

Hi,

I'd like to use parsnp with the genbank option on, but my reference is a draft genome. Is this a problem for parsnp or can it handle the situation? I've had a look in the manual, tutorial and this forum, and couldn't find the answer. Sorry if this is here somewhere...

I have about 43 scaffolds in my reference by the way.

Many thanks for your help!

Mathieu

Question about Gingr

I met a problem when using gingr on cluster.
The name of genome can not be shown in the phylo tree. They are all black blocks.
I would like to know how the problem can be solved.
Thanks!

gingr on hidpi displays

The default font sizes are too small when using gingr to visualize alignments on hidpi displays. Is it possible to specify a custom font or font size when starting gingr?

Error when using parsnp in Harvest v1.1 release

The parsnp v1.1 binary is not properly packaged in Harvest v1.1 release, causing call to libMUSCLE to fail if not built and installed from source. This affects only the Linux 64-bit release.

Missing samples from tree

Thanks for a great software, super easy to use! I have been using harvest to make a pan-core tree of 83 bacterial isolates (where I know some will diverge a lot from the others). When the final tree is generated several of the strains in my query folder are not included. How can I force Harvest to include all isolates?

Missing VCF file

Hi Todd,

I ran parsnp with the following command:
parsnp -r genome_dir/ref_genome.fna -d genome_dir/ -p 10

And got the following output files:
parsnpAligner.ini parsnpAligner.log parsnp.ggr parsnp.tree parsnp.xmfa

How can I get parsnp to produce a VCF file?

Thanks,
Stephen

Index Error

I have been able to run Parsnp on a small subset of draft genomes (~20) in .fasta format. However, when I tried to scale up to a larger number (~200), I get the following error:
File "", line 656, in
IndexError: string index out of range

Any suggestion on how to fix this error?

Thanks!

indels

Hi,
I am looking for insertion and deletion in my dataset with Parsnp and Harvesttools. I have SNP information in VCF output file but no indels information. My command is:
Harvest-Linux64-v1.1.2/parsnp -r -d -p 32 -o -c -x

Harvest-Linux64-v1.1.2/harvesttools -x -f
-n -V

Generating a whole genome POA

Hi Guys
I'd like to be able to take a Harvest core genome MSA + the non-core bits that were cut out, and make a partial order alignment. i.e. I'd like something like

block {a,b,c} block {d,e,f} block {h,i,j} block

where the blocks are the core MSA, and a,b,c are the sequences that the various genomes have connecting the blocks
iei I don't care about aligning the connecting sequence joining the bits of core genome.

Is that something Harvest can generate??

cheers

Zam

parsnp consider - in contig names as gaps

When the name of a contig contains dashes, parsnp identifies them as gaps and produce the following error:
"ERROR: ref genome sequence %s seems to aligned! remove and restart"

Even if it's easy to remove dashes from names, parsnp should be able to differentiate between contigs names and sequences.

What is the different between Newick formatted core genome SNP tree and the tree in Gingr file

Only 2% for core percentage

Hi,

I used parsnp on 26 bacterial genomes from the same subspecies, each about 2.8 Mb in length. The reference genome is a single chromosome, but all of the other genomes are in contigs.

I am only getting 2% for my core percentage score and I'm only seeing results for a tiny bit of the snp heatmap (see attached image).

I'm not sure what's wrong, does anyone have an idea?

Thanks!
Korin

each input fasta file has more than one chromosome and plasmid?

I have dozens of bacterial genome with high assembled quality, and each genome has 2 chromosome and two or more plasmids. since parSNP only align core genes and call SNPs from core genes. How should I format my input file? Core genes of my bacteria are in two chromosomes. Thanks.

Question

Hi-

Is there a way to create an alignment with parsnp/harvesttools that includes the unaligned sequences in addition to the core sequences? I have 99% coverage in my genome alignments but there is still about 21k bp omitted from the alignment when creating the xmfa or multi-fasta alignment file. For example, my reference genome shrinks from 4043846 to 4023750, as do the rest of the aligned genomes. Those missing bases throw off the annotation results from the gwas on the snps since I'm using the reference genome's genbank file.

Suggestions?

Thanks!

Glen

gingr crashes when you close the window on CentOS 6.6

Open gingr
Close the window
Receive abort report:
*** glibc detected *** gingr: free(): invalid pointer: 0x00007fffccce7f10 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3085075e66]
gingr[0x427439]
gingr[0x42258f]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x308501ed5d]
gingr[0x40e6b9]

very very strange error，puzzled，help！！

I have 1654 strains ,each is about 7M. the command is: ./parsnp -g merge.chr12.gbk -d test -c -o test.out2 -p 15.

and the error message is as follow:

|--Parsnp v1.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
-->Reading Genome (asm, fasta) files from test..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) merge.chr12.gb..
|->[OK]
ERROR
The following command failed:

/tmp/_MEIBF1c04/parsnp test.out2/parsnpAligner.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
ERROR

<>
-->Running Parsnp multi-MUM search and libMUSCLE aligner..

And then I divided the strains into two folders,and run the former command respectively,it is ok!!! what is happening ?? what should i do ????

gingr .vcf export problem with reference state coding

Hi,

when I export a vcf file, the SNP states in the 'sample' columns appear to be wrong. The reference genome is often not recorded in state '0'. SNPs with more than two states seem to also be missing the '0' state, and only have two: 1 and 2.

Thanks

Made source code or ppc64le binary available

Hello! I would like to use your suite in an IBM Power 9 little endian cluster. Would it be possible to get a binary for ppc64le or the source code so I can compile it?

Thanks in advance,

need some help with -c

thanks for the programme which really helps our work!!!

I've come across a few problems and have a few questions :)

in a folder I've got 24 fasta, but everytime I ran it with "parsnp -d -r 1.fasta -c", it only included 23 fasta files even using "-c" and also I tried -U with a few values, which did help as well.
when I was using "parsnp -d -r ! -c", the size of included fasta files varied each time (from4to23) and sometimes showing "you have to include at least 2 fasta etc.."

so...need some help with this...what should I do to include all the fastas...any way rather than adding "-c"?

really appreciate

Parsnp Error

hi,I am using parsnp to do genome alignment with different bacterial species, The following command failed:
<>

-->Reading Genome (asm, fasta) files from /media/xwg/HJ/rename/fna/import/..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) ..
|->[WARNING]: no genbank file provided for reference annotations, skipping..
-->Running Parsnp multi-MUM search and libMUSCLE aligner..
ERROR
The following command failed:

/tmp/_MEI8TX9MJ/parsnp tree3/parsnpAligner.ini
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.
*ERROR

Unable to build Parsnp on Ubuntu 18.04.1

I am trying to build Parsnp on my Linux machine, I have built libMUSCLE succesfully but while bulding Parsnp it is giving me these errors.

$ sudo make install
Making all in src
make[1]: Entering directory '/home/sroot/parsnp_src/src'
g++ -O3 -m64 -fopenmp -funroll-all-loops -fomit-frame-pointer -ftree-vectorize  -g -O2 -lgomp -lstdc++ -lpthread -std=gnu++0x -Wl,-rpath,/home/sroot/parsnp_src/muscle/lib -L/home/sroot/parsnp_src/muscle/lib -lMUSCLE-3.7   -o parsnp parsnp-MuscleInterface.o parsnp-parsnp.o parsnp-LCB.o parsnp-LCR.o parsnp-TMum.o parsnp-Converter.o ./ext/parsnp-iniFile.o  
parsnp-MuscleInterface.o: In function `MuscleInterface::CallMuscleFast(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
/home/sroot/parsnp_src/src/MuscleInterface.cpp:44: undefined reference to `muscle::g_uMaxIters'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:48: undefined reference to `muscle::g_SeqWeight1'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:43: undefined reference to `muscle::g_SeqType'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:45: undefined reference to `muscle::g_bStable'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:46: undefined reference to `muscle::g_bVerbose'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:47: undefined reference to `muscle::g_bQuiet'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:49: undefined reference to `muscle::SetMaxIters(unsigned int)'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:50: undefined reference to `muscle::SetSeqWeightMethod(muscle::SEQWEIGHT)'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:51: undefined reference to `muscle::g_ulMaxSecs'
parsnp-MuscleInterface.o: In function `MuscleInterface::CallMuscleFast(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
/home/sroot/parsnp_src/src/../muscle/libMUSCLE/seqvect.h:14: undefined reference to `vtable for muscle::SeqVect'
parsnp-MuscleInterface.o: In function `MuscleInterface::CallMuscleFast(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
/home/sroot/parsnp_src/src/MuscleInterface.cpp:62: undefined reference to `muscle::SeqVect::AppendSeq(muscle::Seq const&)'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:58: undefined reference to `muscle::Seq::SetName(char const*)'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:65: undefined reference to `muscle::MSA::MSA()'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:66: undefined reference to `muscle::MUSCLE(muscle::SeqVect&, muscle::MSA&)'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:72: undefined reference to `muscle::MSA::GetSeqIndex(unsigned int) const'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:73: undefined reference to `muscle::MSA::GetSeqBuffer(unsigned int) const'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:65: undefined reference to `muscle::MSA::~MSA()'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:52: undefined reference to `muscle::SeqVect::~SeqVect()'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:52: undefined reference to `muscle::SeqVect::~SeqVect()'
/home/sroot/parsnp_src/src/MuscleInterface.cpp:65: undefined reference to `muscle::MSA::~MSA()'
collect2: error: ld returned 1 exit status
Makefile:364: recipe for target 'parsnp' failed
make[1]: *** [parsnp] Error 1
make[1]: Leaving directory '/home/sroot/parsnp_src/src'
Makefile:340: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

Please can you help?

parsnp failed to generate phylogeny

I am trying to run parsnp for 330 bacteria genome (genome size ~2mb). The alignment seems to work properly. but it failed in the last step, the error message is pasted below.
Your help would be greatly appreciated!
Li

-bash-4.1$ ./parsnp -g ./ref/2603V-R_NC004116.gb -d GBS_WGS_2017/ -c -C -p 32

-->Reading Genome (asm, fasta) files from GBS_WGS_2017/..
|->[OK]
-->Reading Genbank file(s) for reference (.gbk) ./ref/2603V-R_NC004116.gb..
|->[OK]
-->Running Parsnp multi-MUM search and libMUSCLE aligner..
|->[OK]
-->Running PhiPack on LCBs to detect recombination..
|->[SKIP]
-->Reconstructing core genome phylogeny..
|->[OK]
-->Creating Gingr input file..
ERROR
The following command failed:

/tmp/_MEI2r348L/harvest --midpoint-reroot -u -q -i /afs/grid.pfizer.com/vaccine/ngs/Tools/Parsnp-Linux64-v1.2/P_2017_03_16_161557720869/parsnp.ggr -o /afs/grid.pfizer.com/vaccine/ngs/Tools/Parsnp-Linux64-v1.2/P_2017_03_16_161557720869/parsnp.ggr -n /afs/grid.pfizer.com/vaccine/ngs/Tools/Parsnp-Linux64-v1.2/P_2017_03_16_161557720869/parsnp.tree
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

Snapshot failed

Hi.
When i create a snapshot for tree and alignment in png or jpeg format and save it in the desktop, the file doesn’t exist and not appear.
Until the moment, i obtain the ss and edited with gimp.

Thanks.

How to use parsnp to align against a multi-chromosome reference?

Hi! I am trying to align my draft assemblies against the reference genome of the same species with parsnp to see how divergent these bacterial isolates are. My reference has three chromosomes, plus a number of plasmids. Should I concatenate the .fna files when I specify my reference (this won't work very well for the .gbk files)? Or would it be better to align the draft genomes against each of the chromosome separately?

Thanks!

gray zones

Hi,
What are the gray zones in the alignment bar, please? Unaligned regions in the reference? That would be strange in my genomes set.

Thanks.

marbl / harvest Goto Github PK

harvest's People

Contributors

Stargazers

Watchers

Forkers

harvest's Issues

Recommend Projects

Recommend Topics

Recommend Org