Giter VIP home page Giter VIP logo

harvest-tools's Introduction

HarvestTools is a part of the Harvest software suite and provides file conversion between Gingr files and various standard text formats (see Harvest).

HarvestTools is normally distributed as a binary for OS X or Linux (see Harvest link above). However, the source is provided here for unique build environments or for development.

CITATION provides details on how to cite HarvestTools. INSTALL.txt provides instructions for building from source and installing. LICENSE.txt provides licensing information.

harvest-tools's People

Contributors

bkille avatar frikiluser avatar ondovb avatar treangen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

harvest-tools's Issues

Rerooting with non-binary root

Rerooting code assumes a binary root. When the root has more than two children, rerooting operations have incorrect results.

parsnp only recruits a subset of genome fasta files in the directory

Hi

I am trying to do an alignment of 190 Mycobacterium tuberculosis genome sequences using parsnp. It seems to work fine and creates an .xmfa file. However it doesn't recruit all the genome fasta files into the alignment - only 95 of the total 190. I'm sure why because all of the files are in fasta format, and it doesn't seem to be related to the file name.

Any advice would be greatly appreciated.

Thanks
Tasha

Abort trap: 6 when converting file-type

Dear Harvest developers,

I am trying to use your software to convert an xmfa file into a multi-fasta, using
harvesttools -x ${samplename}.xmfa -M ${samplename}.fasta
But it throws an error:
libc++abi.dylib: terminating with uncaught exception of type std::length_error: basic_string
Abort trap: 6

My best guess is that there must be something wrong with the input-file specification. However, it is possible to load the xmfa into the mauve viewer without any problems; and also when eyeballing the file itself nothing seems wrong.
I am working on MacOS 10.13.6 with 16 Gb of memory.

Do you know what might be causing this error, how I can diagnose it further, or how I could resolve it? Any help would be greatly appreciated!

configure: WARNING: unrecognized options: --with-capnp

I am upgrading the brew package to 1.2 and get this problem:

./configure --with-capnp=/bio/linuxbrew/opt/capnp --with-protobuf=/bio/linuxbrew/opt/protobuf
configure: WARNING: unrecognized options: --with-capnp
checking for protoc... yes
checking for capnp... yes
checking whether the C++ compiler works... yes
<snip>
config.status: creating Makefile
configure: WARNING: unrecognized options: --with-capnp

However, I think it has created the Makefile just fine, and it compiles.

default actions/warnings for no output

It is currently possible to load files without writing, which is counterintuitive.

Possible improvements:

  • provide a warning when imported data would be lost
  • automatically write to the input Gingr file if no output was specified

Variant annotation

There should be an output format that includes gene and amino acid information for variants based on reference position and annotations, either in VCF out or a new format.

Add ability to extract clade defining SNPs

Currently, there is no easy way to extract clade defining SNPs. Add a function to harvest-tools to enable extraction of these SNPs for tie-in with Gingr and to also improve command-line access to this info.

Protobuf error, possibly related to C++ version

I'm trying to add HarvestTools to EasyBuild, for use on HPC clusters.

Having included the fix to use C++17 which allows the configure step to complete, I see this error during build:

g++ -c -O2 -ftree-vectorize -march=native -fno-math-errno -std=c++17 -Isrc -I/apps/eb/el8/upstream/software/protobuf/23.0-GCCcore-12.2.0/include -I/apps/eb/el8/upstream/software/CapnProto/0.10.3-GCCcore-12.2.0/include -include src/harvest/memcpyLink.h -Wl,--wrap=memcpy -I/apps/eb/el8/upstream/software/CapnProto/0.10.3-GCCcore-12.2.0/include -I/apps/eb/el8/upstream/software/protobuf/23.0-GCCcore-12.2.0/include -I/apps/eb/el8/upstream/software/zlib/1.2.12-GCCcore-12.2.0/include -o src/harvest/pb/harvest.pb.o src/harvest/pb/harvest.pb.cc
src/harvest/HarvestIO.cpp: In member function bool HarvestIO::loadHarvestProtocolBuffer(const char*):
src/harvest/HarvestIO.cpp:211:39: error: no matching function for call to google::protobuf::io::CodedInputStream::SetTotalBytesLimit(int, int)
  211 |         coded_input.SetTotalBytesLimit(INT_MAX, INT_MAX);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
In file included from src/harvest/pb/harvest.pb.h:24,
                 from src/harvest/HarvestIO.h:10,
                 from src/harvest/HarvestIO.cpp:7:
/apps/eb/el8/upstream/software/protobuf/23.0-GCCcore-12.2.0/include/google/protobuf/io/coded_stream.h:390:8: note: candidate: void google::protobuf::io::CodedInputStream::SetTotalBytesLimit(int)
  390 |   void SetTotalBytesLimit(int total_bytes_limit);
      |        ^~~~~~~~~~~~~~~~~~
/apps/eb/el8/upstream/software/protobuf/23.0-GCCcore-12.2.0/include/google/protobuf/io/coded_stream.h:390:8: note:   candidate expects 1 argument, 2 provided
make: *** [Makefile:38: src/harvest/HarvestIO.o] Error 1
make: *** Waiting for unfinished jobs....
 (at easybuild/tools/run.py:682 in parse_cmd_output)

I'm using protobuf 23.0 and CapnProto 0.10.3.

There seems to be discussion of a similar problem here:
onnx/onnx#2678

API documentation

The HarvestIO interface should be documented for third party development, ideally with an automated tool such as Doxygen.

NameError

When trying to run parsnp I get the following Error

Traceback (most recent call last):
File "", line 624, in
NameError: name 'nameok' is not defined

What might be the reason for that?
thx

Please enable parallel builds

Hi,
I received a patch to the Debian package of harvest-tools which enables parallel builds. Please feel free to apply this for your next release.
Kind regards, Andreas.

Please port to Python3

Hello,
the Debian Med team is maintaining harvest-tools for official Debian. The recently released Debian 10 was the last Debian release featuring Python2 since this programming language is EOL. If you are interested that we continue to maintain harvest-tools in official Debian (and that users of other modern distributions will have no problems to install harvest-tools on their systems) I'd recommend you port your code to Python3. The 2to3 tool might be of great help here.
Kind regards, Andreas.

command failed >>/tmp/_MEIVWgmag/harvest --midpoint-reroot -u -q -i

The following command failed:

/tmp/_MEIVWgmag/harvest --midpoint-reroot -u -q -i /project/genomicfoodbornepathogenecology/Continuum/ST58fasta/parsnp/parsnp.ggr -o /project/genomicfoodbornepathogenecology/Continuum/ST58fasta/parsnp/parsnp.ggr -n /project/genomicfoodbornepathogenecology/Continuum/ST58fasta/parsnp/parsnp.tree
Please veryify input data and restart Parsnp. If the problem persists please contact the Parsnp development team.

Protocol buffer size limit

Messages cannot be greater than 2GB (before compression) due to the implementation of Protocol Buffers. This limit is hit with large alignments because Gingr files rely on zlib compression after serialization. There are several potential solutions:

  • Reduce the serialized size by storing only variant rows rather than entire columns (also enables row-based filters)
  • Write multiple protobuf messages to each Gingr file
  • Use another serialization format, such as HDF5

how to use masking repeat file (.bed) in the reference genome

Hi,
I am working with my 70 assembled genomes to identify core SNPs. Therefore, I am looking to use masking repeat file .bed (generated from Mummer). Please let me know how to use the file in the analysis with the following command.

parsnp -g /ref/ref_genomic -d scaffolds/*.fasta -c

Thank you!

alignment file broken?

Hi,

I have been using parSNP successfully a couple of times.
The last time I ran it on 50 K.pneumoniae genomes and it didn´t produce any error message.
But when I wanted to convert the .ggr file or .xmfa file to a mutlifasta file with harvestools it was not possible. Although I was able to open the files in ginger without problems.
The message I got is:
ERROR:LCB 634 extends beyond reference (position 47356)
When the programm tried to write the new file.
I am using the Version 1.2

Is the gingr file broken?

Thank you,
Lisa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.