bigdatabiology / macrel Goto Github PK
View Code? Open in Web Editor NEWPredict AMPs in (meta)genomes and peptides
Home Page: http://big-data-biology.org/software/macrel
License: Other
Predict AMPs in (meta)genomes and peptides
Home Page: http://big-data-biology.org/software/macrel
License: Other
Hi, I have test it with my own data. The warining message was listed as follows"/opt/conda/lib/python3.10/site-packages/sklearn/base.py:299: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.1 when using version 1.2.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn("
would you like to help me fix it? thanks a lot
Hi Luis,
I ran macrel contigs but it got interrupted because of the peptides issue (now solved).
That means I have (I think) the input file ready to feed to macrel peptides without having to run starting from the contigs again.
Now, as macrel contigs produced TWO output files being:
macrel.out.all_orfs.faa
macrel.out.smorfs.faa
Should macrel peptides be run on one, or the other, or both?
Example:
macrel peptides \
--fasta macrel.out.all_orfs.faa \ <- or : macrel.out.smorfs.faa or macrel.out.*.faa ???
--output out_peptides \
-t 8
Thank you!
Dany
I encountered an error while running the abundance subcommand:
macrel abundance -1 SRR16178793_1.fastq.gz -2 SRR16178793_2.fastq.gz --fasta peptide.fasta --output ./output/SRR16178793_abun --tag ./outtag/SRR16178793_outtag -t 16
the error is that:
......
[M::mem_process_seqs] Processed 3299460 protein sequences in 743.597 CPU sec, 47.076 real sec
[M::process] Read 714582 protein sequences (34476126 AA)...
[M::mem_process_seqs] Processed 3298914 protein sequences in 688.743 CPU sec, 43.647 real sec
[M::mem_process_seqs] Processed 714582 protein sequences in 146.474 CPU sec, 9.278 real sec
[M::renderNumberAligned] Aligned 34710242 out of 51321549 total detected ORF sequences (67.63%)
[main] Version: 1.4.6
[main] CMD: paladin align -t 16 -T 20 -f 10 -z 11 -a -V -M /tmp/tmpdcdn385h/paladin.faa /tmp/tmpdcdn385h/preproc.pair.1.fq.gz
[main] Real time: 8120.267 sec; CPU: 84019.436 sec
NGLess v1.5.0 (C) NGLess authors
https://ngless.embl.de/
When publishing results from this script, please cite the following references:
- Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P.,
NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. in
Microbiome 7:84 (2019). DOI: https://doi.org/10.1186/s40168-019-0684-8
[Mon 29-04-2024 21:57] Line 9: /tmp/counts.paladin154619-0.txt: renameFile:renamePath:rename: does not exist (No such file or directory)
Exiting after fatal error:
/tmp/counts.paladin154619-0.txt: renameFile:renamePath:rename: does not exist (No such file or directory)
Traceback (most recent call last):
File "/path/to/bin/macrel", line 10, in
sys.exit(main())
File "/path/to/macrel/main.py", line 371, in main
do_abundance(args, tdir,logfile)
File "/path/to/macrel/main.py", line 222, in do_abundance
subprocess.check_call([
File "/path/to/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ngless', '--no-create-report', '--quiet', '-j', '16', '/path/to/scripts/count.ngl', '/tmp/tmpdcdn385h/paladin.out.sam', './output/SRR16178793/ ./outtag/.abundance.txt']' returned non-zero exit status 1.
Could you please help to see what caused the error? thanks a lot.
Hello im currently trying to run
macrel abundance -1 may10.fq.gz --fasta macrelabundancepeptides.faa --output out_abundancemay15latest --force
However I am receiving this error
[main] Version: 1.3.1
[main] CMD: paladin index -r3 /tmp/tmpdrzjismz/paladin.faa
[main] Real time: 0.051 sec; CPU: 0.009 sec
align: invalid option -- 'z'
Output folder already exists, but --force flag was usedTraceback (most recent call last):
File "/home/user/.conda/envs/macrelabundancemay15/bin/macrel", line 10, in <module>
sys.exit(main())
File "/home/user/.conda/envs/macrelabundancemay15/lib/python3.9/site-packages/macrel/main.py", line 340, in main
do_abundance(args, tdir,logfile)
File "/home/user/.conda/envs/macrelabundancemay15/lib/python3.9/site-packages/macrel/main.py", line 195, in do_abundance
subprocess.check_call([
File "/home/user/.conda/envs/macrelabundancemay15/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['paladin', 'align', '-t', '1', '-T', '20', '-f', '10', '-z', '11', '-a', '-V', '-M', '/tmp/tmpdrzjismz/paladin.faa', '/tmp/tmpdrzjismz/preproc.fq.gz']' returned non-zero exit status 1.
the reads file is 14gb.
Looking forward for your help and feedback
I encountered an error while running the contigs subcommand:
macrel contigs \
--fasta example_seqs/excontigs.fna.gz \
--output out_contigs
The error is as follows.
Traceback (most recent call last):
File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/bin/macrel", line 10, in <module>
sys.exit(main())
File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/main.py", line 331, in main
do_smorfs(args, tdir,logfile)
File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/main.py", line 146, in do_smorfs
predict_genes(args.fasta_file, all_peptide_file)
File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/ORFs_prediction.py", line 34, in predict_genes
gorf, morf_finder = create_pyrodigal_orffinder()
File "/lustre/home/acct-clslj/clslj-1/.conda/envs/macrel/lib/python3.10/site-packages/macrel/ORFs_prediction.py", line 4, in create_pyrodigal_orffinder
gorf = pyrodigal.OrfFinder(closed=True,
AttributeError: module 'pyrodigal' has no attribute 'OrfFinder'
How can I solve this problem?
I have been testing out the functions of macrel with some of my own data. I used the contigs subcommand without issues and produce all expected files. I used some metagenomic data to run the reads subcommand and have been not receiving the AMP prediction output from any of the sequences I have tested.
From this I decided to use the test files that you provide to run the reads subcommand to see if I can reproduce all of the correct files. However I am also not receiving the expected macrel.out.prediction.gz file that is associated with the test reads that you provide.
Here is the code that I am using to produce the data:
macrel reads -1 ./test_reads_data/R1.fq.gz -2 ./test_reads_data/R2.fq.gz --output expected_output
From this code I am able to receive macrel.out.all_orfs.faa and and macrel.out.smorfs.faa but I never produce the macrel.out.prediction.gz file with the reads subcommand for any input data I have used.
hello, the conda install command doesn't work and it gives the following message. could I ask for help?
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
I get this error during the middle of running macrel on conda:
Traceback (most recent call last):
File "/user/conda_macrel_env/bin/macrel", line 11, in
sys.exit(main())
File "/user/conda_macrel_env/lib/python3.7/site-packages/macrel/main.py", line 282, in main
do_predict(args, tdir)
File "/user/conda_macrel_env/lib/python3.7/site-packages/macrel/main.py", line 241, in do_predict
fs)
File "/user/conda_macrel_env/lib/python3.7/site-packages/macrel/AMP_predict.py", line 7, in predict
model1 = pickle.load(gzip.open(model1, 'rb'))
ModuleNotFoundError: No module named 'sklearn'
The commandline that I used after activating the conda macrel environment is this:
macrel contigs --fasta test_bacterium_genome.fasta --output test_contigs
I would appreciate your feedback here.
Hi, I seached metagomic raw data from ncbi and found some sra labeled with RNA metagenomics. So I wonder if those data can be used with this tool?
Hi,
Any chance that you are considering MACREL release for macOS?
Thanks
Somak
From the amphsphere-users mailing list:
I tested Macrel AMP detection on metagenome assemblies and was wondering, if a flag exists to save the log information. something like --logfile path/to/logfile.txt. The log is usually quite big
It seems that the methionine excision, when N-terminal is not working when macrel is called by this module. This can be affecting the prediction. Also, the option --keep-negatives seems to be default when calling this mode.
Is it possible to retrain the model used in Macrel with new training data?
I'm trying to optimize specifically for shorter peptides (< 50 aa), but the training data used in the Macrel paper (downloaded from the original Bhadra 2018 paper) has a lot of much longer peptides. I found that retraining the model in amPEPpy with only shorter peptides improved accuracy on my data specifically, I was hoping to try the same with Macrel.
Thanks,
Carter
Hi, I have tested the gut 16s amplicon of a lot of host such as duck, blacksoldierfly,silkworm, housefly, et .al and some enviromental metageomes sequnced on illumina by paire ends. Unfortunately there is no result in prediction gz files. I also tested with the example data. The result is normal. I don't know what's the problem
Hi Luis,
I am having a problem with running macrel peptides.
macrel contigs it's running fine, but I am guessing it's because it didn't get to the peptide part yet.
Steps I ran:
conda create -n macrel_env
conda activate macrel_env
conda install -c bioconda macrel
macrel get-examples
macrel peptides \
--fasta example_seqs/expep.faa.gz \
--output out_peptides \
-t 4
error message of the last command being:
rpy2.rinterface_lib.embedded.RRuntimeError: Error: package or namespace load failed for ‘Peptides’:
package ‘Peptides’ was installed before R 4.0.0: please re-install it
I tried a few re-install attampts, none of which worked. Hope you can suggest a solution.
Thank you,
Dany
Hello
I am trying to run macrel but I got the follow error:
1_302_409_-
Traceback (most recent call last):
File "/home/ofm/anaconda3/envs/env_macrel/bin/macrel", line 11, in
sys.exit(main())
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/macrel/main.py", line 282, in main
do_predict(args, tdir)
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/macrel/main.py", line 241, in do_predict
fs)
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/macrel/AMP_predict.py", line 7, in predict
model1 = pickle.load(gzip.open(model1, 'rb'))
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/sklearn/init.py", line 82, in
from .base import clone
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/sklearn/base.py", line 17, in
from .utils import _IS_32BIT
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/sklearn/utils/init.py", line 25, in
from . import _joblib
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/sklearn/utils/_joblib.py", line 7, in
import joblib
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/joblib/init.py", line 113, in
from .memory import Memory, MemorizedResult, register_store_backend
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/site-packages/joblib/memory.py", line 16, in
import pydoc
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/pydoc.py", line 370, in
class Doc:
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/pydoc.py", line 402, in Doc
def getdocloc(self, object, basedir=sysconfig.get_path('stdlib')):
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/sysconfig.py", line 521, in get_path
return get_paths(scheme, vars, expand)[name]
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/sysconfig.py", line 511, in get_paths
return _expand_vars(scheme, vars)
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/sysconfig.py", line 172, in _expand_vars
_extend_dict(vars, get_config_vars())
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/sysconfig.py", line 559, in get_config_vars
_init_posix(_CONFIG_VARS)
File "/home/ofm/anaconda3/envs/env_macrel/lib/python3.7/sysconfig.py", line 430, in _init_posix
_temp = import(name, globals(), locals(), ['build_time_vars'], 0)
ModuleNotFoundError: No module named '_sysconfigdata_x86_64_conda_linux_gnu'
What can I do?
Thanks in advance
Hi Luis,
Just wanted to know if this is something I should worry about.
I am using the conda installed macrel, using a conda environment.
Warning message:
In options(stringsAsFactors = TRUE) :
'options(stringsAsFactors = TRUE)' is deprecated and will be disabled
/shared/homes/12705859/miniconda3/envs/macrel_env/lib/python3.6/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.22.1 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
Hi!
I normally don't like making this kind of advertisement out of the blue, but this could be of interest to you: in the last two years I developed Python bindings for Prodigal in a package named pyrodigal
, and I just released a new version that supports setting a custom minimum gene length. Maybe this could save you the trouble of having to compile and maintain a customized Prodigal fork just for this feature.
Cheers!
Hi,
I'm trying to run macrel in peptide mode for ~200,000 sequences. I get the following error:
Traceback (most recent call last):
File "Programs/miniconda3/envs/macrel/bin/macrel", line 11, in <module>
sys.exit(main())
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/macrel/main.py", line 282, in main
do_predict(args, tdir)
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/macrel/main.py", line 241, in do_predict
fs)
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/macrel/AMP_predict.py", line 7, in predict
model1 = pickle.load(gzip.open(model1, 'rb'))
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/sklearn/__init__.py", line 82, in <module>
from .base import clone
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/sklearn/base.py", line 17, in <module>
from .utils import _IS_32BIT
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/sklearn/utils/__init__.py", line 25, in <module>
from . import _joblib
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/sklearn/utils/_joblib.py", line 7, in <module>
import joblib
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/joblib/__init__.py", line 113, in <module>
from .memory import Memory, MemorizedResult, register_store_backend
File "Programs/miniconda3/envs/macrel/lib/python3.7/site-packages/joblib/memory.py", line 16, in <module>
import pydoc
File "Programs/miniconda3/envs/macrel/lib/python3.7/pydoc.py", line 370, in <module>
class Doc:
File "Programs/miniconda3/envs/macrel/lib/python3.7/pydoc.py", line 402, in Doc
def getdocloc(self, object, basedir=sysconfig.get_path('stdlib')):
File "Programs/miniconda3/envs/macrel/lib/python3.7/sysconfig.py", line 521, in get_path
return get_paths(scheme, vars, expand)[name]
File "Programs/miniconda3/envs/macrel/lib/python3.7/sysconfig.py", line 511, in get_paths
return _expand_vars(scheme, vars)
File "Programs/miniconda3/envs/macrel/lib/python3.7/sysconfig.py", line 172, in _expand_vars
_extend_dict(vars, get_config_vars())
File "Programs/miniconda3/envs/macrel/lib/python3.7/sysconfig.py", line 559, in get_config_vars
_init_posix(_CONFIG_VARS)
File "Programs/miniconda3/envs/macrel/lib/python3.7/sysconfig.py", line 430, in _init_posix
_temp = __import__(name, globals(), locals(), ['build_time_vars'], 0)
ModuleNotFoundError: No module named '_sysconfigdata_x86_64_conda_linux_gnu'
It did run OK for a different dataset of ~600,000 peptides on the same machine so I wonder what could raise this error...
Can you please advice?
Thanks!
Hi Luis,
I am using cd-hit to cluster Macrel's output. Do you have better ideas?
Out of 70k predicted AMPs, only about half of them get clustered. Can you advice on tools to use to cluster Macrel's output?
This is my current use of cd-hit on Macrel's output:
cd-hit -i macrel.out.prediction.all.fasta -o /.../.../cd_hit_onMacrel98 -c 0.98 -n 5 -d 0 -M 30000 -T 10
Thank you
Dany
what does CLP and CDP mean in the results? There is no information on them, I just checked using chatgpt, it said that CLP (Classical Lipopeptides) and CDP (Cyclic Depsipeptides). I don't know if it is right?
When prodigal_sm
fails, its error messages are not shown to the user correctly
See
4267c99 added a test
IMHO, the right behaviour is to classify very short sequences as non-AMPs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.