geronimp / enrichm Goto Github PK

Toolbox for comparative genomics of MAGs

Python 97.69% R 1.94% Scheme 0.36%

enrichm's Introduction

EnrichM is a set of comparative genomics tools for large sets of metagenome assembled genomes (MAGs). The current functionality includes:

A basic annotation pipeline for MAGs.
A pipeline to determine the metabolic pathways that are encoded by MAGs, using KEGG modules as a reference (although custom pathways can be specified)
A pipeline to identify genes or metabolic pathways that are enriched within and between user-defined groups of genomes (groups can be genomes that are related functionally, phylogenetically, recovered from different environments, etc).
To construct metabolic networks from annotated population genomes.
Construct random forest machine learning models from the functional composition of either MAGs, metagenomes or transcriptomes.
Apply these random forest machine learning models to classify new MAGs metagenomes.

EnrichM is under active development, so there is no guaratee that master is stable. It's recommended that EnrichM is downloaded either via pypi or conda (see below).

Installation

Dependencies

EnrichM is written in python 3, and required >v3.6 to run. EnrichM requires the following non-python dependencies:

hmmer >= 3.1b
diamond == 0.9.22
prodigal >= 2.6.3
parallel >= 20180222
mmseqs >= 2-23394
R >= 3.0.1
mcl >= 14-137

PyPi

Install from PyPi like this:

sudo pip3 install enrichm

conda (recommended)

Install the conda package like so:

# Create a python3 environment for EnrichM. Replace "X.X.X" with the EnrichM version number
conda create -c bioconda -n enrichm_X.X.X enrichm=X.X.X

After this, you'll need to set up EnrichM to run by downloading its back end databases.

Setup

Loading EnrichM's database

Before running EnrichM, you'll need to download the back-end database. This file is large (5.7G) and contains all the reference databases EnrichM needs to annotate and compare your genomes. This includes Pfam-A HMMs, TIGRfam HMMs, a DIAMOND database of the sequences in uniref100 with EC and KO annotations, and KoFamKOALA HMMs. By default the database will be installed in your home directory. This is done using a command in EnrichM:

enrichm data

This should take approximately 15 minutes. To check for updates and install updates, simply run the same command. You can uninstall the database, using:

enrichm data --uninstall

Sepcifying the location of the EnrichM database

If you would like to store the EnrichM database outside of your home directory, move you need to tell EnrichM where to look. To do this, export a BASH variable named "ENRICHM_DB":

export ENRICHM_DB=/path/to/database/

After which EnrichM should be able to find the database. It may be worthwhile putting this in your .bashrc so you dont have to re-run it every time you open a terminal.

Subcommands

annotate

Annotate is a function that allows you to annotate your population genomes with KO, PFAM, TIGRFAM, and CAZY using dbCAN. The result will be a .gff file for each genome, and a frequency matrix for each annotation type where the rows are annotation IDs and the columns are genomes.

See the annotate help page for more

classify

Classify quickly reads in KO annotations in the form of a matrix (KO IDs as rows, genomes as columns) and determines which KEGG modules are complete. Annotation matrices can be generated using the annotate function.

See the classify help page for more

enrichment

Enrichment will read in KO or PFAM annotations in the form of a matrix (IDs as rows, genomes as columns) and a metadata file that separates genomes into groups to compare, and will run some basic stats to determine the enrichment of modules or pfam clans between and within the groups.

See the enrichment help page for more

pathway

Pathway reads in a KO matrix and generates a Cytoscape-readable metabolic network and metadata file. Only reactions that are possible given the KOs present in the input matrix are shown, and the modules and reactions that are included in the output can be customized.

See the pathway help page for more

explore

Explore is similar to pathway, but rather than generating a specified pathway it will start from a given query compound ID, and explore the possible reactions that use that compound given the enzymes present in the input KO matrix.

See the explore help page for more

Contact

If you have any feedback about EnrichM, drop an email to the SupportM public help forum. Software by Joel A. Boyd (@geronimp) at the Australian Centre for Ecogenomics (ACE).

License

EnrichM is licensed under the GNU GPL v3+. See LICENSE.txt for further details.

Contributing

I want EnrichM to be as useful as possible, so please feel free to leave feature requests and bug reports.

Citation

A manuscript is in the final stages of preparation and a bioRxiv pre-print will be up shortly. If you find EnrichM useful and use it in your work, please cite it as follows:

Comparative genomics using EnrichM. Joel A Boyd Ben J Woodcroft Gene W Tyson. 2019. In preparation.

enrichm's People

Contributors

Stargazers

Watchers

enrichm's Issues

classify error : ZeroDivisionError: division by zero

Hi,
When running the following command

enrichm classify --genome_and_annotation_matrix ko_frequency_table.tsv --aggregate --output Classify

I got this error

[2019-06-09 10:25:34 AM] INFO: Command: /home/michoug/miniconda3/envs/enrichm/bin/enrichm classify --genome_and_annotation_matrix ko_frequency_table.tsv --aggregate --output Classify
[2019-06-09 10:25:34 AM] INFO: Running the classify pipeline
[2019-06-09 10:25:39 AM] INFO: Reading in abundances: ko_frequency_table.tsv
[2019-06-09 10:25:55 AM] INFO: Read in annotations for 937 genomes
Traceback (most recent call last):
  File "/home/michoug/miniconda3/envs/enrichm/bin/enrichm", line 342, in <module>
    run.run_enrichm(args, sys.argv)
  File "/home/michoug/miniconda3/envs/enrichm/lib/python3.6/site-packages/enrichm/run.py", line 317, in run_enrichm
    args.genome_and_annotation_matrix, args.output)
  File "/home/michoug/miniconda3/envs/enrichm/lib/python3.6/site-packages/enrichm/classifier.py", line 143, in classify_pipeline
    pathway_average_abundance = sum(pathway_abundance) / len(pathway_abundance)
ZeroDivisionError: division by zero

It doesn't appear if I add the option --cutoff 0.5

enrichm classify fails on blank lines in input file

Using version 0.5.0 Running enrichm classify with --genome_and_annotation_file that includes a blank line (or new line at EOF) causes:

Traceback (most recent call last):
  File "/path/to/.local/lib/python3.8/site-packages/enrichm/parser.py", line 43, in parse_genome_and_annotation_file_lf
    genome, annotation = line.strip().split("\t")
ValueError: not enough values to unpack (expected 2, got 1)

To fix add:

if line == '\n': continue

Just above where the error occurs.

Error: ln: missing file operand when running 'enrichm annotate'

Working with enrichm 0.5.1

[143470@ermdc14 enrichm_test]$ enrichm annotate --output EAC.bins9.0.out --verbosity 5 --genome_files EAC.bin.9.fa --ko --force
[2019-11-08 10:04:16 AM] INFO: Command: enrichm annotate --output EAC.bins9.0.out --verbosity 5 --genome_files EAC.bin.9.fa --ko
[2019-11-08 10:04:16 AM] INFO: Running the annotate pipeline
[2019-11-08 10:04:16 AM] INFO: Running pipeline: annotate
[2019-11-08 10:04:16 AM] INFO: Setting up for genome annotation
[2019-11-08 10:04:16 AM] INFO: Calling proteins for annotation
[2019-11-08 10:04:16 AM] INFO: Preparing genomes for annotation
[2019-11-08 10:04:16 AM] DEBUG: xargs --arg-file=/dev/stdin ln -s --target-directory=EAC.bins9.0.out/genome_bin
ln: missing file operand
Try `ln --help' for more information.
[2019-11-08 10:04:16 AM] INFO: - Calling proteins for 0 genomes
[2019-11-08 10:04:16 AM] DEBUG: ls EAC.bins9.0.out/genome_bin/.fna | sed 's/.fna//g' | grep -o '[^/]$' | parallel -j 5 prodigal -q -p meta -o /dev/null -d EAC.bins9.0.out/genome_genes/{}.fna -a EAC.bins9.0.out/genome_proteins/{}.faa -i EAC.bins9.0.out/genome_bin/{}.fna > /dev/null 2>&1
ls: cannot access EAC.bins9.0.out/genome_bin/*.fna: No such file or directory
[2019-11-08 10:04:16 AM] DEBUG: Finished
[2019-11-08 10:04:16 AM] ERROR: No files found with .fna suffix in input directory
[2019-11-08 10:04:16 AM] INFO: Finished running EnrichM

Rpair option needs to be re-implemented in pathway and explore

Add CAZY annotation

enrichm data command error

When I run the enrichm data command, I get the following error
IOError: [Errno socket error] [Errno 110] Connection timed out

Any workaround to this?

Issues installing/referencing the database

Hello,

I've been trying to install EnrichM and I keep running into the same problem (tried with 2 versions of python, 3.6 & 3.8).

When I try running enrichm data --output /path/
I get the following error

[2020-03-12 18:48:37 PM] INFO: Command: /opt/miniconda3/envs/enrichm/bin/enrichm data --output /data/databases/enrichM/
[2020-03-12 18:48:37 PM] INFO: Running the data pipeline
Traceback (most recent call last):
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/data.py", line 114, in do
    version_remote = urllib.request.urlopen(self.ftp + self.VERSION).readline().strip().decode("utf-8")
AttributeError: module 'urllib' has no attribute 'request'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/miniconda3/envs/enrichm/bin/enrichm", line 342, in <module>
    run.run_enrichm(args, sys.argv)
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/run.py", line 288, in run_enrichm
    d.do(args.uninstall, args.dry)
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/data.py", line 116, in do
    raise Exception(
Exception: Unable to find most current EnrichM database VERSION in ftp. Please complain at https://github.com/geronimp/enrichM

I tried to get around that by manually installing and unpacking the latest database version (v10). When I point ENRICHM_DB to the unpacked tarball I get this error:

Traceback (most recent call last):
  File "/opt/miniconda3/envs/enrichm/bin/enrichm", line 38, in <module>
    from enrichm.run import Run
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/run.py", line 24, in <module>
    from enrichm.network_analyzer import NetworkAnalyser
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/network_analyzer.py", line 22, in <module>
    from enrichm.network_builder import NetworkBuilder
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/network_builder.py", line 24, in <module>
    from enrichm.databases import Databases
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/databases.py", line 28, in <module>
    class Databases:
  File "/opt/miniconda3/envs/enrichm/lib/python3.8/site-packages/enrichm/databases.py", line 36, in Databases
    PICKLE_VERSION = open(os.path.join(CUR_DATABASE_DIR, 'VERSION')).readline().strip()
FileNotFoundError: [Errno 2] No such file or directory: '/data/databases/enrichM/enrichm_database_v10/26-11-2018/VERSION'

The first issue seems to be a urllib error, and I saw somewhere online that changing the import statement from import urllib to import urllib.request as urllib might fix it, but I haven't tried this modification yet.

The database error is clearly an issue with the path specification since the file VERSION within enrichm_database_v10 points to 26-11-2018
Probably because I haven't formatted something correctly that the enrichm data1 does.

To install enrichM I've tried:

conda create -n enrichm
conda activate enrichm
conda install -c bioconda mcl R hmmer diamond prodigal parallel openmp mmseqs2 moreutils seqmagick
conda install -c geronimp enrichm

I also forced it to try using python3.8:

conda create -n enrichm python=3.8
conda activate enrichm
conda install -c bioconda mcl R hmmer diamond prodigal parallel openmp mmseqs2 moreutils seqmagick
conda install -c geronimp enrichm
#dependency issue
pip install enrichm
#worked fine, but gave the exact same errors as above

About my environment:

    active environment : enrichm
    active env location : /opt/miniconda3/envs/enrichm
            shell level : 2
       user config file : /home/li49pol/.condarc
 populated config files : /home/li49pol/.condarc
          conda version : 4.8.2
    conda-build version : not installed
         python version : 2.7.11.final.0
       virtual packages : __glibc=2.27
       base environment : /opt/miniconda3  (read only)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/miniconda3/pkgs
                          /home/li49pol/.conda/pkgs
       envs directories : /home/li49pol/data/programs/conda
                          /home/li49pol/.conda/envs
                          /opt/miniconda3/envs
               platform : linux-64
             user-agent : conda/4.8.2 requests/2.22.0 CPython/2.7.11 Linux/4.15.0-64-generic ubuntu/18.04.3 glibc/2.27
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False


python --version
#Python 3.8.1

Thanks for your time!

Best,
Will

super long run time for 2 genomes

Hello,

I have 2 genomes in fna, this is my script:

source activate enrichm_0.5.0
export ENRICHM_DB=/path_to_enrichm_db

enrichm annotate
--output /path_to_output
--genome_directory /path_to_genomes
--ko_hmm
--ec
--pfam
--orthologs
--threads 8
--log /path_to_out/LOG

conda deactivate

I submitted it to a server, requested 8 cores and 250 GB RAM. It was killed after 108 hours, because not enough wall-time:
PBS: job killed: walltime 388897 exceeded limit 388800 (unit is minutes)

Can the program simply pick it up from where it left if I re-run the job?

The genomes are 2M in size, does this run time seem normal to you?

Many thanks!

write splits characters in module path header

The code to convert values to strings for use with join in the writer acts on the header string in the genome_lines array and tab-splits the header.
A possible fix is to modify lone 104 in classifier.py like this:

genome_output_lines = [["Genome_name", "Module_id", "Module_name"]]

I haven't experienced a bug with a similar description of output_lines above. But I image a similar thing would be happening.

Scripts to update the database

Hey @geronimp!

I see that's been some time since EnrichM's database was last updated. I imagine that you don't got the time to maintain the database these days. It's understandble.

I just updated my database to use the latest version of the Pfam an the KofamKOALA database. Replacing the HMM and threshold file is easy, but the remaining files require some manual work and I can't guarantee that I'm doing things the same way you did.

Do you think you can provide the scripts to generate the files in the database (eg.: the dictionaries in the pickle files, the KEGG module definition file etc.)?

AttributeError: 'Databases' object has no attribute 'REF_DIR'

Thank for the nice tool! I downloaded the database manually and specified its location. However, I when used enirchm annotate, there is one error message come out. It reads as follows. Could you give me some hints on how I can solve it? Thanks!

-bash-4.2$ enrichm annotate --genome_files reference.faa
[2020-03-30 06:44:36 AM] INFO: Command:enrichm annotate --genome_files reference.faa
[2020-03-30 06:44:36 AM] INFO: Running the annotate pipeline
Traceback (most recent call last):
  File "/enrichm/bin/enrichm", line 342, in <module>
    run.run_enrichm(args, sys.argv)
  File "/enrichm/lib/python3.7/site-packages/enrichm/run.py", line 305, in run_enrichm
    args.threads, args.parallel, args.suffix, args.light)
  File "/enrichm/lib/python3.7/site-packages/enrichm/annotate.py", line 122, in __init__
    self.databases = Databases()
  File "/enrichm/lib/python3.7/site-packages/enrichm/databases.py", line 98, in __init__
    self.KO_DB = os.path.join(self.REF_DIR, self.KO_DB_NAME + self.DMND_SUFFIX)
AttributeError: 'Databases' object has no attribute 'REF_DIR'

Cannot install using conda

Hello,
I was trying to install enrichM using conda, and got error messages below:

conda install -c geronimp enrichm
Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

enrichm

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.

I also was not able to install one package (moreutils):

conda install -c bioconda moreutils

Collecting package metadata: done
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.

Could you help me to solve this problem so that I can finish installation? Thanks.

Using custom databases for annotation

Is there an option for users to provide their own databases for the annotation step. I would like to use more recent versions of PFAM and dbCAN for my dataset

ZeroDivisionError in annotate

Hi,

I have encountered this error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/site-packages/enrichm/annotate.py", line 43, in parse_genomes
genome = Genome(*params)
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/site-packages/enrichm/genome.py", line 53, in init
self.gc = round((gc_list/float(self.length))*100, 2)
ZeroDivisionError: float division by zero
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/bin/enrichm", line 342, in
run.run_enrichm(args, sys.argv)
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/site-packages/enrichm/run.py", line 310, in run_enrichm
args.protein_files)
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/site-packages/enrichm/annotate.py", line 790, in annotate_pipeline
genome_files, protein_files)
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/site-packages/enrichm/annotate.py", line 771, in parse_genome_inputs
genomes_list += self.pool.map(parse_genomes, chunk)
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/multiprocessing/pool.py", line 288, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/30days/uqgni1/tools/Miniconda3/envs/enrichm_0.5.0/lib/python3.6/multiprocessing/pool.py", line 670, in get
raise self._value
ZeroDivisionError: float division by zero

My code is:
source activate enrichm_0.5.0
export ENRICHM_DB=/30days/uqgni1/tools/enrichm_dada

enrichm annotate
--output /30days/uqgni1/16_1_unitem_consensus/enrichm_archaea
--genome_directory /30days/uqgni1/16_1_unitem_consensus/Archaea_MAGs
--ko_hmm
--orthologs
--threads 24
--log /30days/uqgni1/16_1_unitem_consensus/enrichm_archaea/log

conda deactivate

Not sure how to troubleshoot, could you help?

Cheers!

Custom Database

How do I create a custom database? I plan on using PhyloDB (https://github.com/allenlab/PhyloDB)

Large disk usage for lotsa genomes input

I note that one of the first steps is to copy the genomes/proteomes into the results folder. Especially when --light is specified, maybe it would be sufficient to softlink instead?

Error when using protein sequences

Hi Joel and community,

I ran EnrichM on my MAGs and it worked fine for nucleotide fasta files (.fa). I'd like to run protein through the pipeline so I annotated the MAGs using Prokka and I tried running the .faa files on EnrichM, however, I got the below error. Any help would be appreciated.

Thanks,
Ashley

(EnrichM) ai37@aduae387-lap:~/Representatives$ enrichm annotate --output alpha_rep_output/ --protein_directory genome_proteins/ --ko --pfam --threads 16
[2019-11-07 18:23:44 PM] INFO: Running command: /home/ai37/miniconda3/envs/EnrichM/bin/enrichm annotate --output alpha_rep_output/ --protein_directory genome_proteins/ --ko --pfam --threads 16
[2019-11-07 18:23:44 PM] INFO: Loading databases
[2019-11-07 18:23:44 PM] INFO: Loading reference db paths
[2019-11-07 18:23:44 PM] INFO: Running pipeline: annotate
[2019-11-07 18:23:44 PM] INFO: Setting up for genome annotation
[2019-11-07 18:23:44 PM] INFO: Using provided proteins
[2019-11-07 18:23:44 PM] INFO: Preparing genomes for annotation
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/genome.py", line 227, in init
= description.split(' # ')
ValueError: not enough values to unpack (expected 5, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/annotate.py", line 50, in parse_genomes
genome = Genome(*params)
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/genome.py", line 67, in init
sequence = Sequence(description, sequence)
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/genome.py", line 231, in init
raise Exception("Error parsing genome proteins. Was the output from prodigal?")
Exception: Error parsing genome proteins. Was the output from prodigal?
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ai37/miniconda3/envs/EnrichM/bin/enrichm", line 357, in
r.main(args, sys.argv)
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/run.py", line 323, in main
args.protein_files)
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/annotate.py", line 641, in do
genomes_list = self.parse_genome_inputs(genome_directory, protein_directory, genome_files, protein_files)
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/site-packages/enrichm/annotate.py", line 622, in parse_genome_inputs
genomes_list += self.pool.map(parse_genomes, chunk)
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/ai37/miniconda3/envs/EnrichM/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
Exception: Error parsing genome proteins. Was the output from prodigal?

Error opening file /home/databases/enrichm_database_v10/databases/uniref100.dmnd

I get this error when i use the following commond:

enrichm annotate --genome_directory all_cat/ --ko --threads 40 --output 1
[2020-06-28 09:21:07 AM] INFO: Running command: /home/emma/anaconda2/envs/enrichm_0.5.0/bin/enrichm annotate --genome_directory all_cat/ --ko --threads 40 --output 1
[2020-06-28 09:21:07 AM] INFO: Loading databases
[2020-06-28 09:21:07 AM] INFO: Loading reference db paths
[2020-06-28 09:21:08 AM] INFO: Running pipeline: annotate
[2020-06-28 09:21:08 AM] INFO: Setting up for genome annotation
[2020-06-28 09:21:08 AM] INFO: Calling proteins for annotation
[2020-06-28 09:21:08 AM] INFO: - Calling proteins for 4 genomes
[2020-06-28 09:22:34 AM] INFO: Starting annotation:
[2020-06-28 09:22:34 AM] INFO: - Annotating genomes with ko ids
[2020-06-28 09:22:34 AM] INFO: - BLASTing genomes
No such file or directory
Error: Error opening file /home/emma/databases/enrichm_database_v10/databases/uniref100.dmnd
Traceback (most recent call last):
File "/home/emma/anaconda2/envs/enrichm_0.5.0/bin/enrichm", line 357, in
r.main(args, sys.argv)
File "/home/emma/anaconda2/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/run.py", line 323, in main
args.protein_files)
File "/home/emma/anaconda2/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/annotate.py", line 661, in do
self.annotate_ko(genomes_list)
File "/home/emma/anaconda2/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/annotate.py", line 230, in annotate_ko
for genome_name, batch in self.get_batches(output_annotation_path):
File "/home/emma/anaconda2/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/annotate.py", line 242, in get_batches
input_file_io = open(input_file)
FileNotFoundError: [Errno 2] No such file or directory: '1/annotations_ko/DIAMOND_search.tsv'

Could you help me?
Thank you!
Looking forward to your reply.

bioconda package

Hey @geronimp

An interesting tool. I'd like to try it out, so I' building a bioconda package. bioconda/bioconda-recipes#24200

Could you add a bit more documentation and one or two images would be nice. Thank you.

the --suffix option is not writeable

Hi Joel,
Very simple and easy to use software, nice!

However, when trying to set the --suffix argument it is non-responsive? In my case the files are .fa and when setting --suffix .fa it prints an error saying no file with suffix .fna was found, irrespective if I use --genome_files or --genome_directory. If this is not a simple bug in the program I will of course start trouble-shooting my own stuff...

Best,
Thomas

Error when downloading database

First I installed enrichM via conda following the steps below.
conda create -n enrichm_0.5.0 python=3
conda activate enrichm_0.5.0
conda install -c bioconda mcl R hmmer diamond prodigal parallel openmp mmseqs2 moreutils
conda install -c geronimp enrichm
R
install.packages('gridExtra')
install.packages('optparse')
q()

After the installation steps above, I downloaded the database using "enrichm data" command, then a error happened, which was showed below. Can you help to solve the problem? Thank you!

(enrichm_0.5.0) jinsong@server:/data/liangjinsong/N_update/single_group_assembly_bin/5_N_genes_usearch$ enrichm data
Traceback (most recent call last):
File "/data/software/miniconda3/envs/enrichm_0.5.0/bin/enrichm", line 357, in
r.main(args, sys.argv)
File "/data/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/run.py", line 283, in main
self._check_general(args)
File "/data/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/run.py", line 116, in _check_general
raise Exception('The following dependencies need to be installed to run enrichm:\n%s' % (dependency_string))
Exception: The following dependencies need to be installed to run enrichm:
seqmagick https://fhcrc.github.io/seqmagick

add marker genes for modules

Make --metadata optional in enrichm pathway/explore?

Hi joel,

I've started a new issue to keep things separated. I've noticed that you've made it mandatory to specify groups of genomes to compare using the --metadata option in both the pathway and explore functions. Would it be possible to skip the comparative analysis and just extract the relevant information for all genomes as "one group"? Or am mistaken how to use the tools?

I'm asking, because this is what I would like to do in many cases - to use the tools exploratively on a set of genomes. Of course it would still be relevant to make comparisons under some circumstances, for which the --metadata option are great.

Best,
Thomas

Incorrect database file name

When running the "annotate" step in enrichm of the latest version (installed via conda), ERROR happened as below.

[2019-08-17 15:04:12 PM] INFO: Running command: /data/software/miniconda3/envs/enrichm_0.5.0/bin/enrichm annotate --genome_directory /data/liangjinsong/N_update/single_group_assembly_bin/enrichm_test --output /data/liangjinsong/N_update/single_group_assembly_bin/enrichm_test_out --force --threads 95 --suffix fa --ko --parallel 95
[2019-08-17 15:04:12 PM] INFO: Loading databases
[2019-08-17 15:04:13 PM] INFO: Loading reference db paths
[2019-08-17 15:04:13 PM] INFO: Running pipeline: annotate
[2019-08-17 15:04:13 PM] INFO: Setting up for genome annotation
[2019-08-17 15:04:13 PM] INFO: Calling proteins for annotation
[2019-08-17 15:04:13 PM] INFO: - Calling proteins for 11 genomes
[2019-08-17 15:04:30 PM] INFO: Starting annotation:
[2019-08-17 15:04:30 PM] INFO: - Annotating genomes with ko ids
[2019-08-17 15:04:30 PM] INFO: - BLASTing genomes
diamond v0.9.25.126 | by Benjamin Buchfink [email protected]
Licensed under the GNU GPL https://www.gnu.org/licenses/gpl.txt
Check http://github.com/bbuchfink/diamond for updates.

No such file or directory
Error: Error opening file /home/jinsong/databases/enrichm_database_v10/databases/uniref100.dmnd
Traceback (most recent call last):
File "/data/software/miniconda3/envs/enrichm_0.5.0/bin/enrichm", line 357, in
r.main(args, sys.argv)
File "/data/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/run.py", line 323, in main
args.protein_files)
File "/data/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/annotate.py", line 661, in do
self.annotate_ko(genomes_list)
File "/data/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/annotate.py", line 230, in annotate_ko
for genome_name, batch in self.get_batches(output_annotation_path):
File "/data/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/annotate.py", line 242, in get_batches
input_file_io = open(input_file)
FileNotFoundError: [Errno 2] No such file or directory: '/data/liangjinsong/N_update/single_group_assembly_bin/enrichm_test_out/annotations_ko/DIAMOND_search.tsv'

Then, I checked the database directory /home/jinsong/databases/enrichm_database_v10/databases/, and found files as below:
cazy.hmm ko.hmm pfam.hmm tigrfam.hmm uniref100.EC.dmnd uniref100.KO.dmnd

There is not a file named "uniref100.dmnd", which is required for the script. I think the mistake should be corrected.

missing ec_to_description.26-11-2018.pickle

Hi Joel,
When I'm trying annotate genome using 'enrichm annotate' command I get error:
FileNotFoundError: [Errno 2] No such file or directory: '/media/kris/HGST/Metabolism/enrichm/database/enrichm_database_v7/ec_to_description.26-11-2018.pickle'.
I've checked and there is no 'ec_to_description.26-11-2018.pickle' in any enrichm_database_v7 folder. So, could you please advise me how to solve this issue.

Best regards
Chris

Enrichment - PCA plots and KO breakdown plots not generated

Hi All,

I am unsure why none of the plots are being produced. The log states that summary, ko breakdown and PCA plots are being generated, however, these are never produced. I have checked that gridExtra and optparse R packages are installed but still nothing.

Any help would be appreciated.

Thanks,
Ashley

Directionality on reactions for enrichm pathway/explore?

Hi Joel,

Since you're doing a major upgrade - could you make the network being generated from the enrichm pathway and explore functions directional? This information would be very helpful for generating meaningful de-novo metabolic networks and is available in the KEGG database.

Best,
Thomas

Missing ko_cutoffs.tsv in the database

Hi,
I was running the 0.5.0 version of the pipeline with the --ko_hmm parameters.
It failed due to a missing ko_cutoffs.tsv file in the database (version 10)
Best
Greg

Nucleotide sequences of genes in 'genome_genes' directory all have identical sequences

Hey Joel,

Hope you are doing well! Found a funky bug in the nucleotide sequence output from enrichm annotate. Here's an example:

>contig_112_pilon_1
TATTTAGTTAATATGTCATTTATATCTTTTGCATTTAGAGAAGAGTATGAGAAGGTAAAGCTTTTGGGAGACAAATTGAACGAGATTGACTCATTGATCAACTGGGAATCATTTAGACCGATAGTGAAAGATATGTTTGACAACAAAAGTGAAAAGGGTGGACGTCCTAATATCGATGAAGTTGTAATGATCAAAACCCTGATTTTACAGGAGTGGCATGGTCTTTCTGATCCAGAACTTGAGCGACAAATCACCGACAGGATATCCTTCCGCAAGTTTTTAGGTTTTCCTGAAAACATACCTGATTTCACAACAGTCTGGACTTTTCGAGAGCGGTTAAGCAAAAAAGGTAAGGACAAAGAAATCTGGAAAGAATTACAGAGACAGCTTGATTCAAAGGGATTGAAGGTAAAAAAGGGGGTTATACAGGATGCAACATTTATCACATCTGATCCAGGACATGCAAAAGCAGATAAACCAAGAGGTGATGAGGCAAAAACACGAAGAAGTAAAGATGGTACCTGGGTAAAAAAGAACAGTAAGTCATACTTCGGGTATAAGTTTCACTCAAAGGAAGATGTTGATTACGGTCTTATAAGGAAGATCGAGACTACAACGGCATCAGTACACGATAGTCAGATTGATCTCTCTGAACCAGGAGAAGTCGTGTACAAGGATAAAGGATATTTTGGAGCGTCATCAAAAGGATACAGTGCGACTATGAGAAGATCTGTTCGTGGTCATCCGATTGGTATCAAAGATATTCTGCGTAACAAACGAATTAGCAAGAAAAGAGCACCTGGAGAAAGACCCTATGCAGTGATTAAAAATGTATTCAAATCAGGGCATATTATGGTTACAACCGTTGCCAGGGCAGCAGTCAAAACGGTATTTACAGCATTTGGATTCAATCTATATCAACTCTTAACTTTGAAGAAACAAGGAATTGTATAG
>contig_112_pilon_2 K20155
TATTTAGTTAATATGTCATTTATATCTTTTGCATTTAGAGAAGAGTATGAGAAGGTAAAGCTTTTGGGAGACAAATTGAACGAGATTGACTCATTGATCAACTGGGAATCATTTAGACCGATAGTGAAAGATATGTTTGACAACAAAAGTGAAAAGGGTGGACGTCCTAATATCGATGAAGTTGTAATGATCAAAACCCTGATTTTACAGGAGTGGCATGGTCTTTCTGATCCAGAACTTGAGCGACAAATCACCGACAGGATATCCTTCCGCAAGTTTTTAGGTTTTCCTGAAAACATACCTGATTTCACAACAGTCTGGACTTTTCGAGAGCGGTTAAGCAAAAAAGGTAAGGACAAAGAAATCTGGAAAGAATTACAGAGACAGCTTGATTCAAAGGGATTGAAGGTAAAAAAGGGGGTTATACAGGATGCAACATTTATCACATCTGATCCAGGACATGCAAAAGCAGATAAACCAAGAGGTGATGAGGCAAAAACACGAAGAAGTAAAGATGGTACCTGGGTAAAAAAGAACAGTAAGTCATACTTCGGGTATAAGTTTCACTCAAAGGAAGATGTTGATTACGGTCTTATAAGGAAGATCGAGACTACAACGGCATCAGTACACGATAGTCAGATTGATCTCTCTGAACCAGGAGAAGTCGTGTACAAGGATAAAGGATATTTTGGAGCGTCATCAAAAGGATACAGTGCGACTATGAGAAGATCTGTTCGTGGTCATCCGATTGGTATCAAAGATATTCTGCGTAACAAACGAATTAGCAAGAAAAGAGCACCTGGAGAAAGACCCTATGCAGTGATTAAAAATGTATTCAAATCAGGGCATATTATGGTTACAACCGTTGCCAGGGCAGCAGTCAAAACGGTATTTACAGCATTTGGATTCAATCTATATCAACTCTTAACTTTGAAGAAACAAGGAATTGTATAG
>contig_112_pilon_3
TATTTAGTTAATATGTCATTTATATCTTTTGCATTTAGAGAAGAGTATGAGAAGGTAAAGCTTTTGGGAGACAAATTGAACGAGATTGACTCATTGATCAACTGGGAATCATTTAGACCGATAGTGAAAGATATGTTTGACAACAAAAGTGAAAAGGGTGGACGTCCTAATATCGATGAAGTTGTAATGATCAAAACCCTGATTTTACAGGAGTGGCATGGTCTTTCTGATCCAGAACTTGAGCGACAAATCACCGACAGGATATCCTTCCGCAAGTTTTTAGGTTTTCCTGAAAACATACCTGATTTCACAACAGTCTGGACTTTTCGAGAGCGGTTAAGCAAAAAAGGTAAGGACAAAGAAATCTGGAAAGAATTACAGAGACAGCTTGATTCAAAGGGATTGAAGGTAAAAAAGGGGGTTATACAGGATGCAACATTTATCACATCTGATCCAGGACATGCAAAAGCAGATAAACCAAGAGGTGATGAGGCAAAAACACGAAGAAGTAAAGATGGTACCTGGGTAAAAAAGAACAGTAAGTCATACTTCGGGTATAAGTTTCACTCAAAGGAAGATGTTGATTACGGTCTTATAAGGAAGATCGAGACTACAACGGCATCAGTACACGATAGTCAGATTGATCTCTCTGAACCAGGAGAAGTCGTGTACAAGGATAAAGGATATTTTGGAGCGTCATCAAAAGGATACAGTGCGACTATGAGAAGATCTGTTCGTGGTCATCCGATTGGTATCAAAGATATTCTGCGTAACAAACGAATTAGCAAGAAAAGAGCACCTGGAGAAAGACCCTATGCAGTGATTAAAAATGTATTCAAATCAGGGCATATTATGGTTACAACCGTTGCCAGGGCAGCAGTCAAAACGGTATTTACAGCATTTGGATTCAATCTATATCAACTCTTAACTTTGAAGAAACAAGGAATTGTATAG

As you can see, these are all the same sequence.

Thanks,

Rhys

enrichm enrichment error

Hello,

I was doing the enrichm enrichment and met some problems. I could not find what was wrong. Followed were the scripts and files:
enrichm enrichment --output MGII_enrichment --annotate_output mgii_annotation/ko_frequency_table.tsv --metadata metadata.txt --threshold 32 --ko --force

the error:
Traceback (most recent call last):
File "/home/jianchang/software/miniconda3/envs/enrichm_0.5.0/bin/enrichm", line 357, in
r.main(args, sys.argv)
File "/home/jianchang/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/run.py", line 356, in main
args.output)
File "/home/jianchang/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/enrichment.py", line 389, in do
= self._parse_annotation_matrix(annotation_matrix)
File "/home/jianchang/software/miniconda3/envs/enrichm_0.5.0/lib/python3.7/site-packages/enrichm/enrichment.py", line 185, in _parse_annotation_matrix
matrix_file_io = open(annotation_matrix)
TypeError: expected str, bytes or os.PathLike object, not NoneType

bin_enrichm.o52650.txt
bin_enrichm.e52650.txt
metadata.txt
metadata.txt

enrichm explore AttributeError

Hi Joel,

I am running enrichm version 0.4.7 with the database version 2019-03-19-v_8. I have an issue with the explore function, which throws the error

AttributeError: 'Namespace' object has no attribute 'enrichment_output'

I am pretty sure the data needed for the explore function are correctly formatted, first I generated annotation and KO matrix :

enrichm annotate --output annot_test --genome_directory bins --ko

Then I called the explore function. I provided the --metadata and --queries as specified in the documentation:

enrichm explore enrichm explore --matrix annot_test/ko_frequency_table.tsv --queries compounds.txt --metadata groups.txt

This the output in the terminal:

Traceback (most recent call last):
File "/space/sharedbin_ubuntu_14_04/software/EnrichM/0.4.7-foss-2018a-Python-3.6.4/bin/enrichm", line 4, in
import('pkg_resources').run_script('enrichm==0.4.7', 'enrichm')
File "/space/sharedbin_ubuntu_14_04/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/init.py", line 664, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/space/sharedbin_ubuntu_14_04/software/Python/3.6.4-foss-2018a/lib/python3.6/site-packages/pkg_resources/init.py", line 1444, in run_script
exec(code, namespace, namespace)
File "/space/sharedbin_ubuntu_14_04/software/EnrichM/0.4.7-foss-2018a-Python-3.6.4/lib/python3.6/site-packages/enrichm-0.4.7-py3.6.egg/EGG-INFO/scripts/enrichm", line 359, in
r.main(args, sys.argv)
File "/space/sharedbin_ubuntu_14_04/software/EnrichM/0.4.7-foss-2018a-Python-3.6.4/lib/python3.6/site-packages/enrichm-0.4.7-py3.6.egg/enrichm/run.py", line 410, in main
args.enrichment_output,
AttributeError: 'Namespace' object has no attribute 'enrichment_output'

I am not sure if this is a bug or I am missing some input files?

I've provided the files used (I just included two bins):
groups.txt
compounds.txt
bins.zip

Looking forward to your reply.
Best,
Thomas

Diamond database version

Hi! I just started using enrichM and everything's going well, but I'm primarily interested in the KO annotations. However, when I go to annotate using the uniref diamond database, I get an error stating:

Error: Database was built with a different version of Diamond and is incompatible.

I'm downloading the uniref100 database on my own and I'll make a new diamond database out of it, but I'm not sure if that will mess anything up, so I thought I'd ask- which version of Diamond was used to construct uniref100.dmnd? I can just install that version into a conda virtual environment and use that to annotate with KO if need be.

Thank you so much!!

Add tests

Database downloader is not working

Hi!

The database downloader script is currently not working. I could solve the issue by altering a single line:

From:

import urllib

To:

import urllib.request

Add metabolome option to all network options

Change output file specification

Change the specification of outputs from an 'output prefix' to which file name endings are appended, to files named explicitly by the user

It just makes it easier to use/ easier to understand

Add doco

Annotate pooled proteins in KO pipeline

"xargs: illegal option -- -" leading to "No files found with .faa suffix in input directory"

Hi!

I've been trying to use this tool to annotate a prodigal output, and get the % of kegg module completeness.

Running this:
enrichm annotate --protein_directory Samples --ko --threads 2 --force

Leads to this:

[2020-08-06 10:54:32 AM] INFO: Command: /usr/local/bin/enrichm annotate --protein_directory Samples --ko --threads 2 --force --verbosity 5
[2020-08-06 10:54:32 AM] INFO: Running the annotate pipeline
[2020-08-06 10:54:32 AM] INFO: Running pipeline: annotate
[2020-08-06 10:54:32 AM] INFO: Setting up for genome annotation
[2020-08-06 10:54:32 AM] INFO: Using provided proteins
[2020-08-06 10:54:32 AM] INFO: Preparing genomes for annotation
[2020-08-06 10:54:32 AM] DEBUG: xargs --arg-file=/dev/stdin ln -s --target-directory=2020-08-06_10-54-enrichm_annotate_output/genome_proteins
xargs: illegal option -- -
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
             [-L number] [-n number [-x]] [-P maxprocs] [-s size]
             [utility [argument ...]]
[2020-08-06 10:54:32 AM] ERROR: No files found with .faa suffix in input directory
[2020-08-06 10:54:32 AM] INFO: Finished running EnrichM

It seems to be the annotate.py script that's running into issues recognising files within the supplied directory. Specifically I think this is the line that's failing:

cmd = "xargs --arg-file=/dev/stdin ln -s --target-directory=%s" % genome_directory
I've already made changes as suggested with Issue number #94, but it didn't seem to help.

These are the package versions I'm using on a Mac:
enrichM 0.5.0
hmmer 3.3.1
diamond 0.9.36 ( I can't seem to install version 0.9.22 as the installation instructions recommend)
prodigal 2.6.2
parallel 0200722
MMseqs2 11-e1a1c_1
R 4.0.2_1
mcl 14-137

What does --depth in enricm explore mean?

Hi Joel,
This is related to #17 (your'e probably busy now!) but could I get just brief elaboration on the --depth parameter in enrichm explore besides whats available from the -h: "Number of steps to take into the metabolic network"

GFF generating failure when supplying own proteomes

Supply Prokka-annotated faa files gave this error:

$ enrichm annotate --protein_directory proteomes/ --ko_hmm --output ko_hmm_test
[2019-07-02 11:12:26 AM] INFO: Command: /srv/sw/miniconda3/envs/enrichm_0.5.0rc1/bin/enrichm annotate --protein_directory proteomes/ --ko_hmm --output ko_hmm_test                                                
[2019-07-02 11:12:26 AM] INFO: Running the annotate pipeline
[2019-07-02 11:12:26 AM] INFO: Running pipeline: annotate
[2019-07-02 11:12:26 AM] INFO: Setting up for genome annotation
[2019-07-02 11:12:26 AM] INFO: Using provided proteins
[2019-07-02 11:12:26 AM] INFO: Preparing genomes for annotation
[2019-07-02 11:12:28 AM] INFO: Starting annotation:
[2019-07-02 11:12:28 AM] INFO:     - Annotating genomes with ko ids using HMMs
[2019-07-02 11:49:59 AM] INFO:     - Generating ko frequency table
[2019-07-02 11:49:59 AM] INFO:     - Writing results to file: ko_hmm_test/ko_hmm_frequency_table.tsv
[2019-07-02 11:50:00 AM] INFO: Generating .gff files:
...
[2019-07-02 11:50:00 AM] INFO:     - Generating .gff file for wierdbin1.prokka_archaea
Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/bin/enrichm", line 374, in <module>
    r.run_enrichm(args, sys.argv)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/run.py", line 359, in run_enrichm                                                                                            
    args.protein_files)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 893, in annotate_pipeline                                                                                 
    self.generate_gff_files(genomes_list)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 634, in generate_gff_files                                                                                
    Writer.write_gff(genome, gff_output)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/writer.py", line 69, in write_gff                                                                                            
    'prodigal_id=%s' % sequence.prod_id,
AttributeError: 'Sequence' object has no attribute 'prod_id'

I guess it is expecting a prodigal style header maybe?

enrichm data components missing

Hi Joel,
I've installed the latest version of enrichm. The installation worked without issues. Also downloading the the database with enrichm data worked without complaints.

However, when running enrichm annotate it says:
ko_descriptions.23-08-2018.pickle FileNotFoundError: [Errno 2] No such file or directory: '/home/thomas.michaelsen/databases/enrichm_database_v4/ko_descriptions.23-08-2018.pickle'`

And I've checked the downloaded folders - there is no ko_descriptions.23-08-2018.pickle

Best,
Thomas

Improve definition of KEGG steps

Empty TIGRfam Pfam annotations

Hi Joel

I managed to run enrichm annotate on a bunch of genomes (~350 of them). However, the tsv files summarizing the tigrfam and pfam annotations only have zeroes. I can understand that this can be the case for a few genomes, but not for all. These genomes are fairly complete (~70%)

help page/wiki

Hi there,

I cannot look into the information at the help pages/wiki for annotate, classify, etc..

When can that happen?

Cheers

enrichm annotate fasta file error

Hi Joel,

I generated data using enrichm v0.5.0 and the enrichm annotate function. The fasta files in the subfolder genome_genes seems to contain only one sequence per file which is copied multiple times throughout the file. The headers are however different. This issue does not occur in the genome_proteins subfolder.

Best,
Thomas

don't pass args to class

--custom_modules not working

Hi Joel,

Hope your writing is going well!

I wanted to use a custom module in the enrichm classify function using the following command:

enrichm classify --output test --genome_and_annotation_matrix ko_frequency_table.tsv --custom_modules woodcroft_modules.tsv

But it returns non-sense results. Im not sure about the format of the custom_modules input file, please see the attached files for an example. I have upgraded to the latest (0.5.0) version of enrichm.

Best,
Thomas

example.zip

Pathway-cytoscape

Hi Joel, thanks for your nice tool.
I have completed enrichm pathway and got two .tsv files see attachments. I think they are what you said "Cytoscape-readable metabolic network", but when i put them in the cytoscape, i cann't got any edge, should i use CoNet module of cytoscape to calculate？
I know your paper is on ready, it would be nice if there are examples can share now.

metadata.txt
network.txt

gene naming issue

I guess this is caused by there being a ~ character in some gene name, as well as separating the genome from the gene name?

enrichm annotate --genome_directory dereplicated_representatives_fasta/ --parallel 20 --ko --ko_hmm     
[2019-10-28 13:42:57 PM] INFO: Command: /srv/sw/miniconda3/envs/enrichm_0.5.0rc1/bin/enrichm annotate --genome_directory dereplicated_representatives_fasta/ --parallel 20 --ko --ko_hmm                          
[2019-10-28 13:42:57 PM] INFO: Running the annotate pipeline
[2019-10-28 13:42:57 PM] INFO: Running pipeline: annotate
[2019-10-28 13:42:57 PM] INFO: Setting up for genome annotation
[2019-10-28 13:42:57 PM] INFO: Calling proteins for annotation
[2019-10-28 13:42:57 PM] INFO:     - Calling proteins for 716 genomes
[2019-10-28 20:00:27 PM] INFO: Starting annotation:
[2019-10-28 20:00:27 PM] INFO:     - Annotating genomes with ko ids using DIAMOND
[2019-10-28 20:00:27 PM] INFO:     - BLASTing genomes
Traceback (most recent call last):
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/bin/enrichm", line 374, in <module>
    r.run_enrichm(args, sys.argv)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/run.py", line 359, in run_enrichm                                                                                            
    args.protein_files)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 815, in annotate_pipeline                                                                                 
    self.GENOME_KO)
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 256, in annotate_diamond                                                                                  
    for genome_name, batch in self.get_batches(output_annotation_path):
  File "/srv/sw/miniconda3/envs/enrichm_0.5.0rc1/lib/python3.6/site-packages/enrichm/annotate.py", line 277, in get_batches                                                                                       
    genome_id, _ = split_line[0].split('~')
ValueError: too many values to unpack (expected 2)

geronimp / enrichm Goto Github PK

enrichm's Introduction

Installation

Dependencies

PyPi

conda (recommended)

Setup

Loading EnrichM's database

Sepcifying the location of the EnrichM database

Subcommands

annotate

classify

enrichment

pathway

explore

Contact

License

Contributing

Citation

enrichm's People

Contributors

Stargazers

Watchers

Forkers

enrichm's Issues

Recommend Projects

Recommend Topics

Recommend Org