Giter VIP home page Giter VIP logo

binnacle's People

Contributors

hsmurali avatar shahnidhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

binnacle's Issues

understanding the output of Estimate_Abundances.py and Collate.py

Hey!

Thanks for the great software! I am following the wiki to combine the scaffold coverages from two samples.
When I run the code Estimate_Abundances.py and Collate.py, I get two files Scaffolds.fasta and Feature-Matrix-concoct.txt.

I wanted to ask-
a) Does the Scaffolds.fasta file corresponds to the "final" scaffolds file of two samples? Can I use this to run the binning software (eg-concoct)? If yes, then do I just concatenate the reads files from the two samples as concoct requires one paired-end reads file along with the scaffolds file.
b) What is Collate.py doing? I get the Feature-Matrix-concoct.txt file, but what does it mean?
c) I ran the Estimate_Abundances.py using sample1 or sample2 as the starting file for Coords_After_Delinking.txt. They generate a Scaffolds.fasta file with a very different number of scaffolds. Do I just use the one with the most number of scaffolds for (a)?

Looking forward to your reply!

Issue running binnacle output through CONCOCT

Hi,

I was trying to run the binnacle output through CONCOCT:

concoct -t 10 --composition_file data/processed/megahit/binnacle/Scaffolds.fasta --coverage_file data/processed/megahit/binnacle/Feature-Matrix-concoct.txt -b test

But I ran into the following issue:

Up and running. Check /data/san/data0/users/chris/prophage_mag_binning_comparison/test_log.txt for progress
/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/lib/python3.8/site-packages/sklearn/utils/validation.py:1673: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  warnings.warn(
Traceback (most recent call last):
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/bin/concoct", line 90, in <module>
    results = main(args)
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/bin/concoct", line 37, in main
    transform_filter, pca = perform_pca(
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/lib/python3.8/site-packages/concoct/transform.py", line 5, in perform_pca
    pca_object = PCA(n_components=nc, random_state=seed).fit(d)
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 382, in fit
    self._fit(X)
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 430, in _fit
    X = self._validate_data(
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/lib/python3.8/site-packages/sklearn/base.py", line 557, in _validate_data
    X = check_array(X, **check_params)
  File "/data/san/data0/users/chris/Programs/miniconda3/envs/concoct/lib/python3.8/site-packages/sklearn/utils/validation.py", line 797, in check_array
    raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 138)) while a minimum of 1 is required.

After some toiling around, it looks like CONCOCT freaks out at the use of numbers as fasta headers (not sure if it relates to their use in the fasta itself or the Feature-Matrix file but either way doesn't matter since they're linked).

It looks to have ran ok after I appended '_contig' to the end of the fasta headers and their associated column in the Feature-Matrix using a couple of bash one-liners and re-running it with their outputs as per below:

sed 's/>.*/&_contig/' data/processed/megahit/binnacle/Scaffolds.fasta > data/processed/megahit/binnacle/Scaffolds_edit.fasta
awk 'BEGIN{FS=OFS="\t"}{$1=$1"_contig"}1' data/processed/megahit/binnacle/Feature-Matrix-concoct.txt > data/processed/megahit/binnacle/Feature-Matrix-concoct_edit.txt
concoct -t 10 --composition_file data/processed/megahit/binnacle/Scaffolds_edit.fasta --coverage_file data/processed/megahit/binnacle/Feature-Matrix-concoct_edit.txt -b test

Just wanted to post this as a heads up that this might be an issue that needs fixed in a future release (as I think using numbers only is the default scaffold naming scheme for binnacle?) and incase someone else runs into the same issue and is looking for a fix.

Thanks for developing this awesome addon!

Chris

Issue with binnacle output with concoct and metabat2

Hi everyone, and thank you for your attention.

I've run through the complete Binnacle pipeline flawlessly and ran Collate.py to get a Feature-Matrix for concoct and another for metabat.

However, when it comes to feed Binnacle's output to binner algorithms i started struggling. Let's start with concoct. I'm using version 1.0.0

concoct --version

concoct 1.0.0

concoct -t 30 --composition_file Scaffolds.fasta --coverage_file Feature-Matrix-concoct.txt -b test_concoct

/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/lib/python2.7/site-packages/concoct/input.py:82: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
cov = p.read_table(cov_file, header=0, index_col=0)
Traceback (most recent call last):
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/bin/concoct", line 88, in
results = main(args)
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/bin/concoct", line 40, in main
args.seed
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/lib/python2.7/site-packages/concoct/transform.py", line 5, in perform_pca
pca_object = PCA(n_components=nc, random_state=seed).fit(d)
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 340, in fit
self._fit(X)
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 381, in _fit
copy=self.copy)
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/lib/python2.7/site-packages/sklearn/utils/validation.py", line 573, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/mnt/mini1/work/marco/miniconda3/envs/metawrap-new/lib/python2.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I checked my Feature-Matrix for NaN, NAs, Inf, but all values seem to be fine. Here the feature matrix structure (I have 12 samples in my test dataset, truncated for clarity):

head -4 Feature-Matrix-concoct.txt

Binnacle_Scaffold_1 53.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Binnacle_Scaffold_2 125.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Binnacle_Scaffold_3 63.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Binnacle_Scaffold_4 40.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

I then tested the metabat-formatted feature matrix with metabat v2.15:

metabat2 -t 30 -i Scaffolds.fasta -a Feature-Matrix-metabat.txt -o test_metabat

MetaBAT 2 (2.15 (Bioconda)) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 200 and minClsSize 200000. with random seed=1641895722
terminate called after throwing an instance of 'boost::wrapexceptboost::bad_lexical_cast'
what(): bad lexical cast: source type value could not be interpreted as target
Aborted

I also would like to specify that both binners work with no problems in classic bin pipelines as metaWRAP, so i think that my installation is not the issue here.

Any way i can overcome from this issue? Am i missing something? If more data are needed i would be more than willing to add it to this post.

Thanks!

Marco

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.