aureme / metage2metabo Goto Github PK

View Code? Open in Web Editor NEW

44.0 6.0 5.0 37.16 MB

From annotated genomes to metabolic screening in large scale microbiotas

Home Page: https://metage2metabo.readthedocs.io

License: GNU Lesser General Public License v3.0

Python 17.13% Dockerfile 0.11% R 4.96% Jupyter Notebook 77.66% Singularity 0.09% Batchfile 0.02% Shell 0.03%

bioinformatics bioinformatics-pipeline metabolic-models

metage2metabo's Introduction

AuReMe

License

This project is licensed under the GNU GPL-3.0-or-later, see the LICENSE file for details.

Docmentation

AuReMe documentation

metage2metabo's People

Contributors

Stargazers

Watchers

Forkers

hongzhonglu syssynbio vikash84 haroon123 trellixvulnteam

metage2metabo's Issues

Issue with pyasp.

Since binaries of pyasp have been removed, there is an error in the build (12922da).

m2m_analysis - deal with powergrasp config file creation when launching multiple runs

By default m2m_analysis creates powergrasp config file then deletes it. But this will cause issue when launching multiple runs of m2m_analysis at the same time (the same file being manipulated by different processes). An option preventing the creation of this file (and then asking the user to create this file) should be a way to fix this issue.

Error with test data.

It seems that test data (especially tca_cycle_ecoli) are not compatible with Pathway Tools 23.0.

Replace them with compatible test data.

m2m_analysis powergraph not working as expected

Hi @ArnaudBelcour,

I have been having some issues running m2m_analysis powergraph. Using genome-scale metabolic models for almost 3300 gut microbes, I am trying to visualize the minimal communities required to produce the output metabolites from addedvalue. All steps in the m2m pipeline worked except "m2m_analysis powergraph", for which I get the error attached below. Previously, this step worked for a smaller microbial sample, but I had issues with obtaining the full image (only a section of the powergraph image was observed in the output).

Do you have any recommendations on how I could rectify that?

Thank you!

Best,

Noor Alsmadi

Oog.jar no longer available at link

the Oog.jar file is no longer available a the link provided in the documentation.

Help with creating seed file

Hi, I am importing my GSM models that were made using gapseq into m2m. I am confused by how to create the seed file. In your tutorial, it is said:

Once a list of metabolites has been designed, these metabolites must be converted to a list of IDs of the metabolic database corresponding to your metabolic networks. For example, in the VMH seeds ethanol is etoh. In the MetaCyc database, the ID of ethanol is ETOH. Then the ID must be checked with the ID in the SBML files of the metabolic networks. In this example ETOH is associated to M_ETOH_c in the SBML file in the species field (<species id="M_ETOH_c" name="ETOH" compartment="c"/>). The M_ corresponds to metabolite (another possibility for this prefix is the R_ for reaction). And _c corresponds to the cytosol compartment.

I basically want the following metabolites in the seed file. These metabolites were used to fill gaps in the models in gapseq, so it's good to keep things consistent.

compounds,name,maxFlux
cpd00363,Ethanol,0.1
cpd00001,H2O,100
cpd01420,beta-Carotene,0.1
cpd00365,Retinol,0.1
cpd00305,Thiamin,0.1
cpd03424,Vitamin B12,0.1
cpd00220,Riboflavin,0.1
cpd00218,Niacin,0.1
cpd00133,Nicotinamide,0.1
cpd00644,PAN,0.1
cpd00419,PM,0.1
cpd00263,Pyridoxol,0.1
cpd00215,Pyridoxal,0.1
cpd00104,BIOT,0.1
cpd00201,10-Formyltetrahydrofolate,0.1
cpd00345,5-Methyltetrahydrofolate,0.1
cpd00087,Tetrahydrofolate,0.1
cpd00059,L-Ascorbate,0.1
cpd00857,Provitamin D3,0.1
cpd01628,Vitamin E,0.1
cpd01401,Vitamin K1,0.1
cpd00063,Ca2+,100
cpd00099,Cl-,5
cpd00205,K+,100
cpd00254,Mg,7
cpd00971,Na+,4
cpd00009,Phosphate,6
cpd00058,Cu2+,0.1
cpd10515,Fe2+,0.1
cpd10516,fe3,0.1
cpd00030,Mn2+,0.1
cpd00034,Zn2+,0.1
cpd00314,D-Mannitol,0.1
cpd00588,Sorbitol,0.1
cpd00306,Xylitol,0.1
cpd00208,LACT,0.1
cpd00179,Maltose,0.1
cpd00076,Sucrose,3
cpd00082,D-Fructose,1
cpd00108,Galactose,0.1
cpd00027,D-Glucose,2
cpd00035,L-Alanine,3
cpd00051,L-Arginine,2
cpd00041,L-Aspartate,4
cpd00084,L-Cysteine,0.1
cpd00023,L-Glutamate,9
cpd00033,Glycine,3
cpd00300,Urate,4
cpd00119,L-Histidine,1
cpd00322,L-Isoleucine,2
cpd00107,L-Leucine,4
cpd00039,L-Lysine,3
cpd00060,L-Methionine,1
cpd00066,L-Phenylalanine,2
cpd00129,L-Proline,5
cpd00054,L-Serine,3
cpd00161,L-Threonine,2
cpd00065,L-Tryptophan,0.1
cpd00069,L-Tyrosine,1
cpd00156,L-Valine,3
cpd01107,Decanoate,0.1
cpd03847,Myristic acid,1
cpd15622,pentadecanoate (C15:0),0.1
cpd00214,Palmitate,4
cpd15609,heptadecanoate (C17:0),0.1
cpd01080,ocdca,1
cpd15269,octadecenoate,4
cpd01122,Linoleate,3
cpd03850,Linolenate,0.1
cpd15016,Stearidonic acid,0.1
cpd03848,Arachidic acid,0.1
cpd00188,Arachidonate,0.1
cpd16342,Adrenic acid,0.1
cpd16301,Docosapentaenoic acid,0.1
cpd03852,Docosahexaenoic acid,0.1
cpd00211,Butyrate,1
cpd03846,octanoate,0.1
cpd00160,Cholesterol,1
cpd27519,Pectin,10
cpd00158,CELB,100
cpd11732,Xylan,0.1
cpd11970,Arabinoxylan,0.1
cpd11955,Glucomannan,0.1
cpd00656,Galactomannan,0.1
cpd11696,beta-Glucan,1
cpd00067,H+,100
cpd00149,Co,0.1
cpd00011,CO2,100
cpd11640,H2,100
cpd00048,SO4,0.1
cpd00029,Acetate,1
cpd00141,Propionate,1
cpd00256,Cholate,1
cpd01663,Chemodeoxycholate,1
cpd02733,Deoxycholate,1
cpd02475,Lithocholate,1
cpd03246,Taurochenodeoxycholate,1
cpd03247,Glycochenodeoxycholate,1
cpd03047,Taurocholate,1
cpd01318,Glycocholate,1

The gapseq website (https://gapseq.readthedocs.io/en/latest/database/biochemistry.html#) says:
The gapseq database for chemical compounds and reactions originated from the SEED database.

Any advice will be greatly appreciated.

Individual scopes include seeds even if non producible or absent from the GSMNs

Problem

The default implementation of the scope includes as reachable metabolites:

the seeds
the products of a reaction provided all of its reactants are producible.

However, this definition can lead to error-prone interpretation as some of the seeds may not occur in every metabolic network and yet appear in the corresponding individual scope.

For instance with the test data test/metabolic_data/toy_bact/

M_CELLULOSE_c is in the seeds.
It is included in all individual scopes, and therefore in the intersection of producible metabolites. However, only 4 of the 17 models in the toy set include this compound.

Solution

Issue 15 of MeneTools addresses the problem of dealing with seeds in the scope.

It creates new atoms that can be used in M2M to properly address the different statuses of metabolites:

non producible seed
producible seed
other producible metabolite

M2M should be updated to be compatible with these changes

Producibility_targets file is not created when `m2m mincom` is called

There is no reason producibility_targets.jsonshould only be created when m2m metacom is called. It should also be created with mincom.

CRITICAL:metage2metabo.m2m.reconstruction:Something went wrong running Pathway Tools. See the log file in /home/chencong/output3/pgdb_log/log_error.txt

Hi @cfrioux,
It alerted me to this problem when I used the recon pipeline.

gff file format problem while running m2m recon

Hello,
I have a problem running the following command : m2m recon -g rocks -o rocks_m2m -c 16
Here is the error message :

######### Running metabolic network reconstruction with Pathway Tools #########
---------- Launching mpwt ----------
No region feature in the GFF file of rocks_bin_231_1, GFF file must have region features.

and here is an example of my gff file (obtained with prokka) :

##gff-version 3
##sequence-region c_000000064644 1 1078188
##sequence-region c_000000108022 1 72333
##sequence-region c_000000177873 1 343976
##sequence-region c_000000291080 1 70581
##sequence-region c_000000322418 1 191404
##sequence-region c_000000063443 1 369988
##sequence-region c_000000303010 1 359558
##sequence-region c_000000318942 1 41289
##sequence-region c_000000318773 1 77515
##sequence-region c_000000223418 1 76416
##sequence-region c_000000203556 1 36017
##sequence-region c_000000344245 1 16104
##sequence-region c_000000367754 1 171665
##sequence-region c_000000255903 1 128623
##sequence-region c_000000275449 1 42783
##sequence-region c_000000354647 1 95858
##sequence-region c_000000024095 1 45991
##sequence-region c_000000372217 1 178286
##sequence-region c_000000107681 1 114885
##sequence-region c_000000035827 1 97699
##sequence-region c_000000026743 1 214901
##sequence-region c_000000063357 1 165766
##sequence-region c_000000013580 1 89699
##sequence-region c_000000299742 1 96705
##sequence-region c_000000021783 1 29717
##sequence-region c_000000429765 1 46987
##sequence-region c_000000271007 1 23297
##sequence-region c_000000291991 1 29199
##sequence-region c_000000297940 1 11292
c_000000064644  Prodigal:002006 CDS     137     1213    .       -       0       ID=KJGKFCMM_00001;Name=mreB_1;gene=mreB_1;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:A0A0H3C7V4;locus_tag=KJGKFCMM_00001;product=Cell shape-determining protein MreB
c_000000064644  Prodigal:002006 CDS     1420    2079    .       -       0       ID=KJGKFCMM_00002;inference=ab initio prediction:Prodigal:002006;locus_tag=KJGKFCMM_00002;product=hypothetical protein
c_000000064644  Prodigal:002006 CDS     2445    3521    .       -       0       ID=KJGKFCMM_00003;eC_number=5.2.1.8;Name=surA_1;gene=surA_1;inference=ab initio prediction:Prodigal:002006,protein motif:HAMAP:MF_01183;locus_tag=KJGKFCMM_00003;product=Chaperone SurA
c_000000064644  Prodigal:002006 CDS     3621    7397    .       -       0       ID=KJGKFCMM_00004;eC_number=3.6.4.-;Name=mfd;gene=mfd;inference=ab initio prediction:Prodigal:002006,protein motif:HAMAP:MF_00969;locus_tag=KJGKFCMM_00004;product=Transcription-repair-coupling factor
c_000000064644  Prodigal:002006 CDS     7642    8118    .       -       0       ID=KJGKFCMM_00005;Name=hldE_1;db_xref=COG:COG2870;gene=hldE_1;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P76658;locus_tag=KJGKFCMM_00005;product=Bifunctional protein HldE
c_000000064644  Prodigal:002006 CDS     8330    9919    .       +       0       ID=KJGKFCMM_00006;eC_number=6.1.1.6;Name=lysU;db_xref=COG:COG1190;gene=lysU;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P0A8N5;locus_tag=KJGKFCMM_00006;product=Lysine--tRNA ligase%2C heat inducible
c_000000064644  Prodigal:002006 CDS     10088   10498   .       -       0       ID=KJGKFCMM_00007;inference=ab initio prediction:Prodigal:002006;locus_tag=KJGKFCMM_00007;product=hypothetical protein
c_000000064644  Prodigal:002006 CDS     10679   11710   .       -       0       ID=KJGKFCMM_00008;db_xref=COG:COG4447;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:Q8DI95;locus_tag=KJGKFCMM_00008;product=Ycf48-like protein

I tried to convert to gbk but the taxonomy was missing and it was still not working. Do you have any idea how I could fix this problem ?
Thanks a lot,
Guillaume

A few process questions

Hi there! I have few process questions:

In your paper you have several sets of metabolic targets, did you run the workflow separately for each list of targets?
Some of the genome annotations pass the build step and an sbml file is created, but have several warnings. Is there a way to evaluate annotation quality?
In one of my metagenome sets, a few of the builds fail and then the recon process stops, without continuing to create the sbml files. Is there a command to ignore the failed builds and continue?

Create a list of targets for the content of the community metabolic potential

An interesting analysis can be to select minimal communities suitable to sustain the producibility of all compounds the community can produce. This consists in building a set of targets consisting of the metabolites in the community scope (community metabolic potential).

An easy solution can be to create a SBML file containing compounds occurring in the community scope (community_analysis/comm_scopes.json).

KeyError 'Producible' when running test data

Hi there,

when running m2m test I get an error message when it comes to "Running minimal community selection":

Running minimal community selection Traceback (most recent call last): File "/home/ubuntu/.local/bin/m2m", line 8, in <module> sys.exit(main()) File "/home/ubuntu/.local/lib/python3.6/site-packages/metage2metabo/__main__.py", line 337, in main main_test(args.out, args.cpu) File "/home/ubuntu/.local/lib/python3.6/site-packages/metage2metabo/__main__.py", line 473, in main_test padmet_bool, host_mn, targets_file) File "/home/ubuntu/.local/lib/python3.6/site-packages/metage2metabo/__main__.py", line 346, in main_workflow run_workflow(*allargs) File "/home/ubuntu/.local/lib/python3.6/site-packages/metage2metabo/m2m_workflow.py", line 72, in run_workflow metacom_analysis(sbml_dir, out_dir, seeds, host_mn, targets_file) File "/home/ubuntu/.local/lib/python3.6/site-packages/metage2metabo/m2m_workflow.py", line 131, in metacom_analysis mincom(instance_w_targets, out_dir) File "/home/ubuntu/.local/lib/python3.6/site-packages/metage2metabo/m2m_workflow.py", line 280, in mincom producible_targets = all_results['producible'] KeyError: 'producible'

Any idea what might cause this error and how to fix this?

Thanks a lot,
Denny

Non optimal powergraph visualisation

We have observed in several experiments that the visualisation offered by the powergraph does not reflect exactly the combinatorics of the solution. This seems to be normal as powergrasp uses some heuristics and can output sub-optimal visualisation.

But as it is also possible that powergrasp outputs optimal visualisation (such as the lipid powergraph in the metage2metabo article), we need to find a way to identify when a visualisation is optimal and when it is not.

A possibility could be to create automatically a list of equations using powernode names to list the generic associations of AND/OR of powernodes. Then using the nodes in each powernodes to compute the combinatorics. This would created the expected combinatorics from the powergraph visualisation.
It could be compared to the observed combinatorics (stored in the community json in the enum_bacteria field). If there is a difference between the two then the powergraph is suboptimal. And if the two are equal then the powergraph is an optimal visualisation of the solution.

Also we have to add this in the readme and doc to avoid people misinterpretating the result of the powergraph.

Progress problem

Traceback (most recent call last):
File "D:\Metage2Metabo\method_tutorial\test", line 3, in
instance_path, network_scopes = community_scope.cscope('data/community', 'data/seeds.sbml', 'output_folder')
File "C:\Users\陈聪聪\AppData\Roaming\Python\Python310\site-packages\metage2metabo\m2m\community_scope.py", line 50, in cscope
community_reachable_metabolites = comm_scope_run(instance_com, out_dir, host)
File "C:\Users\chen\AppData\Roaming\Python\Python310\site-packages\metage2metabo\m2m\community_scope.py", line 108, in comm_scope_run
microbiota_scope = run_scopes(lp_instance_file=instance)
File "C:\Users\chen\AppData\Roaming\Python\Python310\site-packages\miscoto\miscoto_scopes.py", line 171, in run_scopes
model = query.get_scopes(lp_instance_file, commons.ASP_SRC_SCOPES)
File "C:\Users\chen\AppData\Roaming\Python\Python310\site-packages\miscoto\query.py", line 34, in get_scopes
for model in models.discard_quotes.by_arity:
File "C:\Program Files\Python310\lib\site-packages\clyngor\answers.py", line 205, in iter
for answer_set, optimization, optimality, answer_number in self._answers:
File "C:\Program Files\Python310\lib\site-packages\clyngor\solving.py", line 231, in _gen_answers
for payload in validate_clasp_stderr(stderr):
File "C:\Program Files\Python310\lib\site-packages\clyngor\parsing.py", line 244, in validate_clasp_stderr
line = next(stderr).strip()
File "C:\Program Files\Python310\lib\site-packages\clyngor\solving.py", line 115, in
stderr = (line.decode() for line in clingo.stderr)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 9: invalid start byte

Singularity recipe uses python3.5 while clyngor requires a more recent version

We probably need to install python 3.6 (3.7) in the image or use an ubuntu version that natively includes more recent versions of python

When using `--pwt-xml`, the xml files extracted from Pathway Tools can be incompatible with python-libsbml.

Some XML files created by Pathway Tools (and taken by m2m when using the --pwt-xml option) have incorrect IDs for SBML format . Indeed some metabolite IDs can begin with a number which is forbidden in SBML format leading to error when these files are read by python-libsbml, for example when creating the target files from the addedvalue.

A fix is to rename all of these IDs by prefixing them with M_ just after the reconstruction.

I have the following problems when running the t2d_m2m_target_producers.R file:

target_producers$sample <- droplevels(target_producers$sample)
Error in UseMethod("droplevels") :
no applicable method for 'droplevels' applied to an object of class "character" @cfrioux

Issue with pathway tools API limitation and PGDB entries verification

Hello,

I am trying to use M2M on a set of a few hundreds of bacterial genome. I am using m2m 1.5.0 with pathway tools 25.5 in a conda environment with python 3.6.

My issue is linked to the recon step and populating the pgdb local instance with my own genome. During the process, I start getting error message about having reached an API limit, usually after 256 genome insertions. Rerunning the same command allows me to go forward but I then have error message about with the pathologic files (already present in PGDB) or erroneous flat files from mpwt.
If I restart with the same outdir and the --clean option, the accesory .log and .lisp files from my input directory are removed as well as the content of my pgdb local instance (ptools-local). It keeps all the entries inside my outdir but then fails on a File Error 17 because the genome is already present.

Basically, I think there is something weird going on during the verification step of doing either the flat files creation or the insertion inside PGDB. It is difficult for me to figure it out as the log states that some genome will be skipped because already present but still getting an error message about it in the end, probably because you need to handle 3 sets of genome entries (input dir, output dir, local PGDB)

Could you provide a guideline to how to run the workflow properly while avoiding issues linked to that pathway tools API limitation ?

Thank you

Compute deadend and orphan metabolites

To add more details on the metabolic networks used we could compute the deadend and the orphan metabolites for both individual and community.

For individual organism we could use menetools dead (for dead end). But we will have to implement menetools orphan.

For community, we will have to create a new command in miscoto, that could do both analysis (deadend + orphan).

This will create new results in the json output file.

No /home/chencong/.ncbirc file, please fix it before using the program

Hi @clémence Frioux,
This question arises when I reconstruct the metabolic network. I checked my home directory with this filename.
(base) chencong@chencong-QiTianM428-A606:$ m2m recon -g /home/chencong/test -o /home/chencong/output -c 1
######### Running metabolic network reconstruction with Pathway Tools #########
No /home/chencong/.ncbirc file, please fix it before using the program
(base) chencong@chencong-QiTianM428-A606:$ ls -a
. .config output ptools-local
.. Desktop output1 Public
AIC-prefs Documents output2 .python_history
anaconda3 Downloads output3 snap
.bash_history .gnupg PathoLogic-Console-Logs .sudo_as_admin_successful
.bash_logout .local .pathway-tools Templates
.bashrc .mozilla pathway-tools test
.cache Music Pictures Videos
.conda .ncbirc .profile

Non-caught error in mincom if target sbml is not ok

This might also occur when reading other sbml files along the pipeline (seeds?): has to be checked
Todo: catch the error and exit properly

m2m_analysis : ignore unproducible targets

Running m2m_analysis with a target that is not producible by the community leads to the following error:

Traceback (most recent call last):
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 608, in format
    record.message = record.getMessage()
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 369, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "xxx/miniconda3/envs/m2m/bin/m2m_analysis", line 8, in <module>
    sys.exit(main())
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/__main_analysis__.py", line 289, in main
    args.oog, new_arg_modelhost, args.level)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/__main_analysis__.py", line 303, in main_analysis_workflow
    run_analysis_workflow(*allargs)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_analysis/m2m_analysis_workflow.py", line 46, in run_analysis_workflow
    gml_output = graph_analysis(json_file_folder, target_folder_file, output_dir, taxon_file, taxonomy_level)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_analysis/solution_graph.py", line 58, in graph_analysis
    create_gml(json_paths, target_paths, output_dir, taxonomy_output_file)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_analysis/solution_graph.py", line 116, in create_gml
    logger.warning('ERROR ', dicti["still_unprod"], ' is unproducible')
Message: 'ERROR '
Arguments: (['M_CPD__45__1823_c', 'M_HISTAMINE_c', 'M_TYRAMINE_c'], ' is unproducible')
--- Logging error ---
Traceback (most recent call last):
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 608, in format
    record.message = record.getMessage()
  File "xxx/miniconda3/envs/m2m/lib/python3.7/logging/__init__.py", line 369, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "xxx/miniconda3/envs/m2m/bin/m2m_analysis", line 8, in <module>
    sys.exit(main())
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/__main_analysis__.py", line 289, in main
    args.oog, new_arg_modelhost, args.level)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/__main_analysis__.py", line 303, in main_analysis_workflow
    run_analysis_workflow(*allargs)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_analysis/m2m_analysis_workflow.py", line 46, in run_analysis_workflow
  gml_output = graph_analysis(json_file_folder, target_folder_file, output_dir, taxon_file, taxonomy_level)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_analysis/solution_graph.py", line 58, in graph_analysis
    create_gml(json_paths, target_paths, output_dir, taxonomy_output_file)
  File "xxx/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_analysis/solution_graph.py", line 116, in create_gml
    logger.warning('ERROR ', dicti["still_unprod"], ' is unproducible')
Message: 'ERROR '
Arguments: (['M_CPD__45__1823_c', 'M_HISTAMINE_c', 'M_TYRAMINE_c'], ' is unproducible')

An enhancement would be to drop targets that are not producible from the analysis instead of exiting in error.

Producibility_targets.json does not include essential and alterative symbiont list

Current versions require users to look into logs or mincom.json file in order to get a machine readable list of essential and alternative symbionts. The latter two lists should also be included in producibility_targets.json.

New GenBank file format

GenBank has changed the file format from .gbk to .gbff
"*_genomic.gbff.gz (Genomic GenBank format)
GenBank flat file format of the genomic sequence(s) in the assembly. This file includes both the genomic sequence and the CONTIG description (for CON records), hence, it replaces both the .gbk .gbs format files that were provided in the old genomes FTP directories."
https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#files

Recon won't run with the new file format. Suggestions?

metabolic objective

Hi, I am new to metage2metabo. It looks like a great tool.

I have a general question. Can the metabolic objective be modified to subset on only certain classes of metabolites, for example, vitamins as the cooperation potential?

Error running iscope -- Something went wrong running Menetools

Hi there,

I am trying to run Metage2metabo and I am stuck at the iscope step.

The only error I have is Something went wrong running Menetools.

I already had something similar at the pathway tool step where I had to run the python code within python for it to see what the error was but I have no knowledge of menetool.

Is there a way to get a more comprehensive error message to know what's wrong?

Thank you,

Adrien

Hi

m2m_analysis - deal with empty last row when reading taxon_id

A if statement here would prevent failure due to blank lines in the taxon_id.tsv file.

Issue during installation with Python 3.8.

Due to the padmet dependency, metage2metabo will try to install scipy 1.3.0 which is incompatible with Python 3.8. This lead to an error when using pip install metage2metabo.

How to prepare inputs for metage2metabo

In the documents, https://metage2metabo.readthedocs.io/en/latest/input.html. The genomes are expected to be in GenBank format (.gbk or .gbff), I want to know which software and parameters are suitable for produce these input file, because I try several softwares, it report errors. In addition, the metabolic networks must be in SBML format, I also want to know the detail procedure to abtain this sbml file, using CarveMe,ModelSEED or gapseq? Thank you very much!

Error with analysis graph when using taxon_id

Hi! I am running m2m with a singularity on a cluster. I get the following error when running m2m_analysis graph (below). I get the same error with other genomes too. Have you encountered this error before?

Edit: Actually I have an issue when running without using the taxon function as well. Enumeration runs fine. The html was created fine (as I've shared with you before, I was able to simplify it quite a bit by separating out individual pathways). But the svg was not created.

I think these are likely two separate issues:

maybe an issue with the taxon id version?
maybe an issue with the file path of the Oog.jar or powergraph programs with the singularity?

Error log for running analysis with --taxon:

Traceback (most recent call last):
File "/usr/local/bin/m2m_analysis", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/metage2metabo/main_analysis.py", line 288, in main
main_analysis_workflow(network_dir, args.targets, args.seeds, args.out, args.taxon,
File "/usr/local/lib/python3.8/dist-packages/metage2metabo/main_analysis.py", line 303, in main_analysis_workflow
run_analysis_workflow(*allargs)
File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/m2m_analysis_workflow.py", line 46, in run_analysis_workflow
gml_output = graph_analysis(json_file_folder, target_folder_file, output_dir, taxon_file, taxonomy_level)
File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 58, in graph_analysis
create_gml(json_paths, target_paths, output_dir, taxonomy_output_file)
File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 124, in create_gml
key_species_data[target_category]['essential_symbionts'][taxon] = [organism for organism in key_species_types if key_species_types[organism] == 'ES' and taxon_named_species[organism].split('')[0] == taxon]
File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 124, in
key_species_data[target_category]['essential_symbionts'][taxon] = [organism for organism in key_species_types if key_species_types[organism] == 'ES' and taxon_named_species[organism].split('')[0] == taxon]
KeyError: 'GCA_900475215'

..................
Here is the end of the analysis workflow log for graph and powergraph sections:

######### Graph of targets_tolp1_sty #########
Number of nodes: 17
Number of edges: 70
--- Graph runtime 0.03 seconds ---

######### Graph compression: targets_tolp1_sty #########
Number of powernodes: 2
Number of poweredges: 1
Compression runtime 0.94 seconds ---

######### PowerGraph visualization: targets_tolp1_sty #########
######### Creation of the powergraph website accessible at analysis/09analysis/html/targets_tolp1_sty #########
--- Powergraph runtime 1.05 seconds ---

m2m recon not working with gbff file inputs

Hi @ArnaudBelcour,

I am trying to use .gbff files downloaded from NCBI for metabolic reconstruction using Pathway Tools (m2m recon). However, I am obtaining the error attached. Please note that m2m recon works for me when using .gbk files as input.

Thank you.

Best regards,

Noor Alsmadi

Do not allow abbreviation in command arguments.

argparse by default allows to abbreviate arguments (for example --taxon could be abbreviated into --tax). To avoid potential issue of argument, Metage2Metabo should not use these abbreviations.

Error during recon: XML not well formed

With certain .gbff files, I get the following error during the recon step: Fatal error: XML not well-formed - encountered token at illegal syntax position: 'START-TAG' following: '(:COMMENT "[if gt IE 8]>

Typically, it will be all genomes in a particular metagenomic assembly. Any advice for dealing with this? Is it the quality of annotation? Do you have recommendations for being able to screen for which annotated assemblies will run without issue?

Thanks!

Multiple errors when running test data

Hello,

I have installed Pathway Tools v25.5 and Metage2Metabo v1.5.0 both locally and with docker. In both cases when I run either m2m test or m2m workflow, I am getting the following output with error from MiSCoTo:

Uncompressing test data to /shared/test_output_directory
Launching workflow on test data
######### Running metabolic network reconstruction with Pathway Tools #########
---------- Launching mpwt ----------
|Input Check|GCA_003433675| Missing flat_files_creation.lisp; genetic-elements.dat; organism-params.dat. Inputs file created for GCA_003433675
|PathoLogic|GCA_003433675| pathway-tools -no-web-cel-overview -no-cel-overview -no-patch-download -disable-metadata-saving -nologfile -patho /shared/test_output_directory/workflow_genomes/GCA_003433675
|Flat files creation|GCA_003433675| pathway-tools -no-patch-download -disable-metadata-saving -nologfile -load /shared/test_output_directory/workflow_genomes/GCA_003433675/flat_files_creation.lisp
|Output Check|/shared/test_output_directory/workflow_genomes/GCA_003433675| 23 out of 23 dat files created.
|Moving output files|GCA_003433675| 
|Input Check|GCA_003433665| Missing flat_files_creation.lisp; genetic-elements.dat; organism-params.dat. Inputs file created for GCA_003433665
|PathoLogic|GCA_003433665| pathway-tools -no-web-cel-overview -no-cel-overview -no-patch-download -disable-metadata-saving -nologfile -patho /shared/test_output_directory/workflow_genomes/GCA_003433665
|Flat files creation|GCA_003433665| pathway-tools -no-patch-download -disable-metadata-saving -nologfile -load /shared/test_output_directory/workflow_genomes/GCA_003433665/flat_files_creation.lisp
|Output Check|/shared/test_output_directory/workflow_genomes/GCA_003433665| 23 out of 23 dat files created.
|Moving output files|GCA_003433665| 
|Output Check| 2 on 2 builds have passed!
-------------- Checking mpwt runs --------------
All runs are successful.
-------------- mpwt has finished in 449.76s! Thank you for using it. --------------
######### Creating SBML files #########
######### Stats GSMN reconstruction #########
Number of genomes: 2
Number of reactions in all GSMN: 2220
Number of compounds in all GSMN: 2250
Average reactions per GSMN: 1566.50(+/- 760.14)
Average compounds per GSMN: 1655.50(+/- 696.50)
Average genes per GSMN: 922.50(+/- 499.92)
Average pathways per GSMN: 256.50(+/- 139.30)
Percentage of reactions associated with genes: 72.03(+/- 3.73)
--- Recon runtime 455.74 seconds ---

######### Running individual metabolic scopes #########
Individual scopes for all metabolic networks available in /shared/test_output_directory/indiv_scopes/indiv_scopes.json
2 metabolic models considered.

153 metabolites in core reachable by all organisms (intersection) 

"M_P3I_c"
"M_RIBOSE__45__1P_c"
"M_CPD0__45__2472_c"
...

318 metabolites reachable by individual organisms altogether (union), among which 26 seeds (growth medium) 

"M_N__45__FORMIMINO__45__L__45__GLUTAMATE_c"
"M_P3I_c"
"M_RIBOSE__45__1P_c"
...

intersection of scope 153
union of scope 318
max metabolites in scope 314
min metabolites in scope 157
average number of metabolites in scope 235.50 (+/- 111.02)
Analysis of functional redundancy (producers of all metabolites) is computed as a dictionary in /shared/test_output_directory/indiv_scopes/rev_iscope.json and as a matrix in /shared/test_output_directory/indiv_scopes/rev_iscope.tsv.
--- Indiv scopes runtime 0.75 seconds ---

######### Creating metabolic instance for the whole community #########
Created instance in /shared/test_output_directory/community_analysis/miscoto_v47dbvv5.lp
Running whole-community metabolic scopes
/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:19:14-22: info: atom does not occur in any rule head:
  draft(D)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:22:14-22: info: atom does not occur in any rule head:
  draft(D)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:26:19-28: info: atom does not occur in any rule head:
  target(M)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:27:21-30: info: atom does not occur in any rule head:
  target(M)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:31:16-24: info: atom does not occur in any rule head:
  draft(O)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:42:19-28: info: atom does not occur in any rule head:
  target(M)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:43:21-30: info: atom does not occur in any rule head:
  target(M)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:48:72-80: info: atom does not occur in any rule head:
  draft(O)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:70:51-60: info: atom does not occur in any rule head:
  target(T)

/usr/local/lib/python3.6/dist-packages/miscoto/encodings/scopes.lp:72:51-60: info: atom does not occur in any rule head:
  target(T)

Community scopes for all metabolic networks available in /shared/test_output_directory/community_analysis/comm_scopes.json
--- Community scope runtime 0.71 seconds ---

Added value of cooperation over individual metabolism: 27 newly reachable metabolites: 

"M_UTP_c"
"M_DCDP_c"
"M_CTP_c"
...

Added-value of cooperation written in /shared/test_output_directory/community_analysis/addedvalue.json

Target file created with the addedvalue targets in: /shared/test_output_directory/community_analysis/targets.sbml
Setting 27 compounds as targets 

Running minimal community selection
/usr/local/lib/python3.6/dist-packages/miscoto/encodings/community_soup.lp
Traceback (most recent call last):
  File "/usr/local/bin/m2m", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/__main__.py", line 380, in main
    main_test(args.out, args.cpu)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/__main__.py", line 520, in main_test
    use_pwt_xml)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/__main__.py", line 389, in main_workflow
    run_workflow(*allargs)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m/m2m_workflow.py", line 53, in run_workflow
    metacom_analysis(sbml_dir, out_dir, seeds, host_mn, targets_file, nb_cpu)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m/m2m_workflow.py", line 114, in metacom_analysis
    mincom(instance_w_targets, out_dir)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m/minimal_community.py", line 46, in mincom
    all_results = compute_mincom(instance_w_targets, miscoto_dir)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m/minimal_community.py", line 114, in compute_mincom
    output_json=mincom_json_file)
  File "/usr/local/lib/python3.6/dist-packages/miscoto/miscoto_mincom.py", line 216, in run_mincom
    score = one_model[1]
TypeError: 'NoneType' object is not subscriptable

Do you have any suggestions as to what might be wrong? Thank you very much.

Ugly exit when target sbml file has empty metabolites

With an SBML file with such "empty metabolite" for targets:

<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level2" level="2" version="1">
  <model>
    <listOfSpecies>
      <species/>
      <species id="M_MYO__45__INOSITOL_c" name="MYO-INOSITOL" compartment="c"/>
      <species id="M_METHYLNICOTINATE_c" name="METHYLNICOTINATE" compartment="c"/>
      <species id="M_PRO_c" name="PRO" compartment="c"/>
[...]

The file will be read but when coming to creating the ASP instance the following traceback will be produced:

Traceback (most recent call last):
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/bin/m2m", line 8, in <module>
    sys.exit(main())
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/__main__.py", line 322, in main
    main_metacom(network_dir, args.out, args.seeds, new_arg_modelhost, args.targets)
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/__main__.py", line 348, in main_metacom
    metacom_analysis(*allargs)
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_workflow.py", line 88, in metacom_analysis
    instance_com, targets_cscope = cscope(sbml_dir, seeds, out_dir, targets_file, host_mn)
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_workflow.py", line 215, in cscope
    instance_com = instance_community(sbmldir, seeds, out_dir, targets_file, host)
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/metage2metabo/m2m_workflow.py", line 819, in instance_community
    output=outputfile)
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/miscoto/miscoto_instance.py", line 111, in run_instance
    targets = sbml.readSBMLspecies_clyngor(targets_file, 'target')
  File "/hpc-home/frioux/bin/miniconda3/envs/m2m/lib/python3.7/site-packages/miscoto/sbml.py", line 500, in readSBMLspecies_clyngor
    all_atoms.add(Atom(speciestype, ["\""+e.attrib.get("id")+"\""]))
TypeError: can only concatenate str (not "NoneType") to str

Catching this error and properly exit the program would solve this issue.

Error for teste with PathoLogic subprocess, return code: 255

Hi,

I'm trying to run your pipeline but I had an error. Do you have any idea how can I solve this? I'm leaving here the input and outputs.Thank you very much
m2m_test.zip
`
######### Running metabolic network reconstruction with Pathway Tools #########
Check and delete unfinished builds of Pathway Tools.

Checking inputs for teste: missing flat_files_creation.lisp; genetic-elements.dat; organism-params.dat. Inputs file created for teste.
----------End of creation of input data from Genbank/GFF/PF: 0.41s----------
~~~~~~~~~~Inference on the data~~~~~~~~~~
pathway-tools -no-web-cel-overview -no-cel-overview -no-patch-download -disable-metadata-saving -nologfile -patho gbk/teste
!!!!!!!!!!!!!!!!!----------------------------------------!!!!!!!!!!!!!!!!!
Error for teste with PathoLogic subprocess, return code: 255
=== Error in pathologic.log for gbk/teste===
	Error from the pathologic.log file: gbk/teste/pathologic.log
	batch-pathologic: A fatal error occurred for gbk/teste/ version 1.0.

	See pathologic log file gbk/teste/pathologic.log for more details.

	03-Mar-2021  00:31:11 Fatal error: file #P"gbk/teste/migs.dat" does not exist: No such file or directory [errno=2].

	Evaluation stack:

	

	 ->(TPL::ZOOM-COMMAND :FROM-READ-EVAL-PRINT-LOOP NIL ...)

	   (SYS::..RUNTIME-OPERATION "applyn" :UNKNOWN-ARGS)

	   (TPL:DO-COMMAND "zoom" :FROM-READ-EVAL-PRINT-LOOP ...)

=== Pathway Tools log ===
	;; Optimization settings: safety 3, space 1, speed 1, debug 3.

	;; For a complete description of all compiler switches given the

	;; current optimization settings evaluate (EXPLAIN-COMPILER-SETTINGS).

	

	*** This Pathway Tools executable built on Thu Jan 14, 2021 at 17:34:28. ***

	[oot::acache-connect: Note that another Ocelot KB named NCBI-TAXONOMY already exists (#<OCELOT-FILE-KB

	                                                                                        NCBI-TAXONOMY NIL

	                                                                                        @

	                                                                                        #x10044b029d2>); creating new KB named NCBI-TAXONOMY]

	[Opened acache database /shared/software/apps/pathway-tools/24.5/aic-export/pathway-tools/ocelot-acache/, which contains 1 ocelot KBs]

	[Reading Pathway Tools init file "/raeslab/scratch/pathway-tools/ptools-local/ptools-init.dat" ]

	[Scanning PGDB directories in   /raeslab/scratch/pathway-tools/ptools-local/pgdbs/user/   3 total PGDBs have now been found]

	[Scanning PGDB directories in   /raeslab/scratch/pathway-tools/ptools-local/pgdbs/registry/   3 total PGDBs have now been found]

	[Loading Ocelot KB from /shared/software/apps/pathway-tools/24.5/aic-export/pgdbs/biocyc/PGDB-METADATA.ocelot

	 KB name=PGDB-METADATA, format=V1-SEXPR

	Warning: Skipping load of frame ECOLI-0 which already exists in PGDB-METADATA

	Warning: Skipping load of frame META-NIL which already exists in PGDB-METADATA

	 0 frames loaded]

	[Scanning PGDB directories in   /shared/software/apps/pathway-tools/24.5/aic-export/pgdbs/biocyc/   3 total PGDBs have now been found]

	====== SCIP, package: #<The SCIP

	                        package>, #+-scip: SCIP, (fboundp 'load-scip-lib-at-execution-time): #<Function LOAD-SCIP-LIB-AT-EXECUTION-TIME>.

	Loading SCIP Library at Start Up (Runtime?: PTOOLS-RUNTIME)...

	Loading scip lib from scip-interface: /shared/software/apps/pathway-tools/24.5/aic-export/pathway-tools/ptools/24.5/exe/libScipAll.so

	

	        ::::::::  Begin Batch PathoLogic for gbk/teste/  ::::::::

	

	[Redirecting standard-output and error-output to gbk/teste/pathologic.log]

	20 40 60 80 100 120 140 20 40 60 80 100 Copying file gbk/teste/genetic-elements.dat

	    to       /raeslab/scratch/pathway-tools/ptools-local/pgdbs/user/testecyc/1.0/input/genetic-elements.dat

	Copying file /raeslab/scratch/lucmac/bin/emapper_to_gbk-0.0.7/m2m_test/gbk/teste/teste.gbk

	    to       /raeslab/scratch/pathway-tools/ptools-local/pgdbs/user/testecyc/1.0/input/teste.gbk

	[Indexed TESTEBASE class People of 21 frames yielding hash table of 53 entries]

	[Indexed TESTEBASE class Growth-Media of 1 frames yielding hash table of 2 entries]

	[Indexed TESTEBASE class Extragenic-Sites of 3 frames yielding hash table of 4 entries]

	[Indexed TESTEBASE class Organisms of 22 frames yielding hash table of 101 entries]

	[Indexed TESTEBASE class Transcription-Units of 1 frames yielding hash table of 1 entries]

	[Indexed TESTEBASE class Gene-Ontology-Terms of 3 frames yielding hash table of 14 entries]

	[Indexed TESTEBASE class DNAs of 7 frames yielding hash table of 11 entries]

	[Indexed TESTEBASE class Reactions of 40 frames yielding hash table of 51 entries]

	[Indexed TESTEBASE class Pathways of 672 frames yielding hash table of 1145 entries]

	[Indexed TESTEBASE class Compounds of 4662 frames yielding hash table of 12745 entries]

	[Indexed TESTEBASE class Polynucleotides of 281 frames yielding hash table of 637 entries]

	[Indexed TESTEBASE class Proteins of 168 frames yielding hash table of 451 entries]

	[Indexed TESTEBASE class Enzymatic-Reactions of 1 frames yielding hash table of 0 entries]

	[Indexed TESTEBASE class RNAs of 248 frames yielding hash table of 572 entries]

	[Indexed TESTEBASE class All-Genes of 10 frames yielding hash table of 12 entries]

	100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900 10000 10100 10200 10300 10400 10500 10600 10700 10800 10900 11000 11100 11200 11300 11400 11500 11600 11700 
!!!!!!!!!!!!!!!!!----------------------------------------!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!----------------------------------------!!!!!!!!!!!!!!!!!
=== Error in pathologic.log for gbk/teste===
	Error from the pathologic.log file: gbk/teste/pathologic.log
	batch-pathologic: A fatal error occurred for gbk/teste/ version 1.0.

	See pathologic log file gbk/teste/pathologic.log for more details.

	03-Mar-2021  00:31:11 Fatal error: file #P"gbk/teste/migs.dat" does not exist: No such file or directory [errno=2].

	Evaluation stack:

	

	 ->(TPL::ZOOM-COMMAND :FROM-READ-EVAL-PRINT-LOOP NIL ...)

	   (SYS::..RUNTIME-OPERATION "applyn" :UNKNOWN-ARGS)

	   (TPL:DO-COMMAND "zoom" :FROM-READ-EVAL-PRINT-LOOP ...)

=== Pathway Tools log ===
	;; Optimization settings: safety 3, space 1, speed 1, debug 3.

	;; For a complete description of all compiler switches given the

	;; current optimization settings evaluate (EXPLAIN-COMPILER-SETTINGS).

	

	*** This Pathway Tools executable built on Thu Jan 14, 2021 at 17:34:28. ***

	[oot::acache-connect: Note that another Ocelot KB named NCBI-TAXONOMY already exists (#<OCELOT-FILE-KB

	                                                                                        NCBI-TAXONOMY NIL

	                                                                                        @

	                                                                                        #x10044b029d2>); creating new KB named NCBI-TAXONOMY]

	[Opened acache database /shared/software/apps/pathway-tools/24.5/aic-export/pathway-tools/ocelot-acache/, which contains 1 ocelot KBs]

	[Reading Pathway Tools init file "/raeslab/scratch/pathway-tools/ptools-local/ptools-init.dat" ]

	[Scanning PGDB directories in   /raeslab/scratch/pathway-tools/ptools-local/pgdbs/user/   3 total PGDBs have now been found]

	[Scanning PGDB directories in   /raeslab/scratch/pathway-tools/ptools-local/pgdbs/registry/   3 total PGDBs have now been found]

	[Loading Ocelot KB from /shared/software/apps/pathway-tools/24.5/aic-export/pgdbs/biocyc/PGDB-METADATA.ocelot

	 KB name=PGDB-METADATA, format=V1-SEXPR

	Warning: Skipping load of frame ECOLI-0 which already exists in PGDB-METADATA

	Warning: Skipping load of frame META-NIL which already exists in PGDB-METADATA

	 0 frames loaded]

	[Scanning PGDB directories in   /shared/software/apps/pathway-tools/24.5/aic-export/pgdbs/biocyc/   3 total PGDBs have now been found]

	====== SCIP, package: #<The SCIP

	                        package>, #+-scip: SCIP, (fboundp 'load-scip-lib-at-execution-time): #<Function LOAD-SCIP-LIB-AT-EXECUTION-TIME>.

	Loading SCIP Library at Start Up (Runtime?: PTOOLS-RUNTIME)...

	Loading scip lib from scip-interface: /shared/software/apps/pathway-tools/24.5/aic-export/pathway-tools/ptools/24.5/exe/libScipAll.so

	

	        ::::::::  Begin Batch PathoLogic for gbk/teste/  ::::::::

	

	[Redirecting standard-output and error-output to gbk/teste/pathologic.log]

	20 40 60 80 100 120 140 20 40 60 80 100 Copying file gbk/teste/genetic-elements.dat

	    to       /raeslab/scratch/pathway-tools/ptools-local/pgdbs/user/testecyc/1.0/input/genetic-elements.dat

	Copying file /raeslab/scratch/lucmac/bin/emapper_to_gbk-0.0.7/m2m_test/gbk/teste/teste.gbk

	    to       /raeslab/scratch/pathway-tools/ptools-local/pgdbs/user/testecyc/1.0/input/teste.gbk

	[Indexed TESTEBASE class People of 21 frames yielding hash table of 53 entries]

	[Indexed TESTEBASE class Growth-Media of 1 frames yielding hash table of 2 entries]

	[Indexed TESTEBASE class Extragenic-Sites of 3 frames yielding hash table of 4 entries]

	[Indexed TESTEBASE class Organisms of 22 frames yielding hash table of 101 entries]

	[Indexed TESTEBASE class Transcription-Units of 1 frames yielding hash table of 1 entries]

	[Indexed TESTEBASE class Gene-Ontology-Terms of 3 frames yielding hash table of 14 entries]

	[Indexed TESTEBASE class DNAs of 7 frames yielding hash table of 11 entries]

	[Indexed TESTEBASE class Reactions of 40 frames yielding hash table of 51 entries]

	[Indexed TESTEBASE class Pathways of 672 frames yielding hash table of 1145 entries]

	[Indexed TESTEBASE class Compounds of 4662 frames yielding hash table of 12745 entries]

	[Indexed TESTEBASE class Polynucleotides of 281 frames yielding hash table of 637 entries]

	[Indexed TESTEBASE class Proteins of 168 frames yielding hash table of 451 entries]

	[Indexed TESTEBASE class Enzymatic-Reactions of 1 frames yielding hash table of 0 entries]

	[Indexed TESTEBASE class RNAs of 248 frames yielding hash table of 572 entries]

	[Indexed TESTEBASE class All-Genes of 10 frames yielding hash table of 12 entries]

	100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900 10000 10100 10200 10300 10400 10500 10600 10700 10800 10900 11000 11100 11200 11300 11400 11500 11600 11700 
!!!!!!!!!!!!!!!!!----------------------------------------!!!!!!!!!!!!!!!!!
~~~~~~~~~~Check inference~~~~~~~~~~
No log directory, it will be created.
WARNING: 1 build has failed! See the log for more information.
`

"Can't connect to X11 window server using..." m2m_analysis

The following java error occurred for a run of m2m_analysis on HPC:

<II> Create SVG ... Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using ':0' as the value of the DISPLAY variable.
	at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
	at sun.awt.X11GraphicsEnvironment.access$200(X11GraphicsEnvironment.java:65)
	at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:115)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:74)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at java.awt.GraphicsEnvironment.createGE(GraphicsEnvironment.java:103)
	at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:82)
	at java.awt.image.BufferedImage.createGraphics(BufferedImage.java:1181)
	at org.apache.batik.svggen.SVGGraphics2D.<init>(Unknown Source)
	at org.apache.batik.svggen.SVGGraphics2D.<init>(Unknown Source)
	at org.mattlab.eaglevista.ui.GraphImageFactory.getSVGGraphicsOfGraph(GraphImageFactory.java:157)
	at org.mattlab.eaglevista.ui.GraphImageFactory.createImage(GraphImageFactory.java:80)
	at org.mattlab.eaglevista.ui.EvCommandLineInterpreter.runImpl(EvCommandLineInterpreter.java:466)
	at org.mattlab.eaglevista.ui.EvCommandLineInterpreter.main(EvCommandLineInterpreter.java:81)

It prevents the powergraph java extension to create the svg file of the powergraph and the program exits with an error.

A better focus on the host metabolism when a host is provided

Currently, providing a host will not impact the first part of m2m which is the calculation of individual scopes for bacterial members of the community.

We want the host to be considered with each individual symbiont already at that step. This would lead to calculating what is producible by the symbiont in a community consisting of the host + the symbiont. More precisely, the activated metabolism of the host in the medium will provide new metabolites that can be used by the symbiont. Using miscoto focus should do the job.

The community scope will be calculated as it is currently, considering the added value of the host metabolism as well as the other symbionts' metabolism.

A specific focus on the host gain with its community (what it produces with the community - what it produces alone) should be performed too.

Should the targets to be considered be the host metabolites that are producible only in community?

Install metage2metabo problem

Installing collected packages: immutables, contextvars, sniffio, idna, dataclasses, six, rfc3986, mpmath, h11, certifi, anyio, sympy, swiglpk, ruamel.yaml.clib, pytz, python-dateutil, pygments, numpy, httpcore, commonmark, charset-normalizer, async-generator, simplejson, ruamel.yaml, rich, python-libsbml, pyfaidx, pydantic, pandas, optlang, importlib-resources, httpx, future, diskcache, depinfo, argh, argcomplete, appdirs, lxml, gffutils, cobra, clyngor-with-clingo, chardet, biopython, padmet, mpwt, miscoto, menetools, Metage2Metabo
Running setup.py install for python-libsbml: started
Running setup.py install for python-libsbml: finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/setup.py'"'"'; file='"'"'/tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-pmi3rl57/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6/python-libsbml
cwd: /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/
Complete output (26 lines):
Using libSBML from: /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/libsbml_source
Using VERSION.txt: /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/libsbml_source/VERSION.txt
Creating: python-libsbml
Version is: 5.20.0
building for python: 3.6.9 (default, Mar 10 2023, 16:46:00)
[GCC 8.4.0]
running install
/usr/local/lib/python3.6/dist-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_py
package init file 'libsbml/init.py' not found (or not a regular file)
running build_ext
name: _libsbml.cpython-36m-x86_64-linux-gnu.so
build temp: build/temp.linux-x86_64-3.6
extension name: _libsbml
extension dir: build/lib.linux-x86_64-3.6/libsbml/_libsbml.cpython-36m-x86_64-linux-gnu.so
target_dir_path: /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/build/lib.linux-x86_64-3.6/libsbml
target_lib_path: /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/build/lib.linux-x86_64-3.6/libsbml/_libsbml.cpython-36m-x86_64-linux-gnu.so
suffix: linux-x86_64-3.6
cwd: /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6
name: _libsbml.cpython-36m-x86_64-linux-gnu.so, tmp: build/temp.linux-x86_64-3.6
compiling dependencies
cmake /tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/libsbml_dependencies -DCMAKE_BUILD_TYPE=Release -DWITH_STATIC_RUNTIME=ON -DCMAKE_INSTALL_PREFIX=/tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/install_dependencies_linux-x86_64 -DWITH_BZIP2=ON -DWITH_CHECK=OFF -DWITH_EXPAT=ON -DWITH_XERCES=OFF -DWITH_ICONV=OFF -DWITH_LIBXML=OFF
unable to execute 'cmake': No such file or directory
error: command 'cmake' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/setup.py'"'"'; file='"'"'/tmp/pip-install-uduyfhq3/python-libsbml_4ef9866f78b6401fafd40149055eeff6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-pmi3rl57/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6/python-libsbml Check the logs for full command output.
The command '/bin/sh -c curl https://bootstrap.pypa.io/pip/3.6/get-pip.py | python3; pip install graphviz ete3 networkx; pip install powergrasp; pip install Metage2Metabo' returned a non-zero code: 1

Hi，I'm trying to run your pipeline but I had an error. Do you have any idea how can I solve this? I'm leaving here the input and outputs.Thank you very much

import logging
logging.basicConfig()
from metage2metabo.m2m import individual_scope
individual_scope.iscope('community','seeds.sbml','output_folder')
Invalid syntax in SBML file: community\OrgA.sbml
---------------Something went wrong running Menetools on OrgA---------------
Invalid syntax in SBML file: community\OrgB.sbml
---------------Something went wrong running Menetools on OrgB---------------
Invalid syntax in SBML file: community\OrgC.sbml
---------------Something went wrong running Menetools on OrgC---------------
CRITICAL:metage2metabo.m2m.individual_scope:------------An error occurred during M2M run of Menetools, M2M will stop------------- @ArnaudBelcour

I'm terribly sorry, but I can't solve it by trying many methods. I'm looking for your help.

D:\Metage2Metabo\method_tutorial>set PYTHONIOENCODING=UTF-8 python3

D:\Metage2Metabo\method_tutorial>py -3
Fatal Python error: init_stdio_encoding: failed to get the Python codec name of the stdio encoding
Python runtime state: core initialized
LookupError: unknown encoding: UTF-8 python3

Current thread 0x00001474 (most recent call first):

D:\Metage2Metabo\method_tutorial>

Sanitize use of tarfile extractall.

Following PR #44, we should implement a function to check that only files in the output folder are modified during the tar extraction.

It is not a critical issue in our case (as the used tarfile has been created by us and contain only metabolic networks) but it is useful to have this implemented.

Powergraph labels

Hello, me again,

I have a couple of questions about the m2m_analysis workflow.

Would it be possible to change the taxonomy level of the stat summary steps? The class level is not really informative for what I am looking at and I was wondering if it could be changed to any lower level
I was looking at the powergraph output from the m2m pipeline and it seems that at some point my bacteria names got changed to their taxonomies but I can't find how to link their changed names to the original. Is there a way to keep the original name in the figure?

Example below:

UnicodeDecodeError when running m2m_analysis with Singularity

The following error was obtained when running m2m_analysis in a Singularity container:

######### Creation of the powergraph website accessible at results_m2m_analysis_scfa/html/metabolic_targets_scfa #########
Traceback (most recent call last):
  File "/usr/local/bin/m2m_analysis", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/__main_analysis__.py", line 289, in main
    args.oog, new_arg_modelhost, args.level)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/__main_analysis__.py", line 303, in main_analysis_workflow
    run_analysis_workflow(*allargs)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m_analysis/m2m_analysis_workflow.py", line 48, in run_analysis_workflow
    powergraph_analysis(gml_output, output_dir, oog_jar, taxon_file, taxonomy_level)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m_analysis/graph_compression.py", line 174, in powergraph_analysis
    merge_html_css_js(html_target +'_taxon', output_html_merged)
  File "/usr/local/lib/python3.6/dist-packages/metage2metabo/m2m_analysis/graph_compression.py", line 473, in merge_html_css_js
    cytoscape_min_js_str =  input_cytoscape_min_js.read()
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 234868: ordinal not in range(128)

[Question] Where can I find seed files?

I'm trying to test out m2m workflow but it says I need to find a seed file. Where I can find these? Is there a general one I can use or do they all have to be domain-specific? For example, can I use one for human oral microbiomes and then use the same one for marine microbiomes? If not, where can I find different ones?

mpwt could not find the version of Pathway Tools

I have this error message:

######### Running metabolic network reconstruction with Pathway Tools ######### Remove genomes temporary datas. mpwt could not find the version of Pathway Tools. It is possibly an issue with the installation of Pathway Tools (maybe it is not in the PATH). Or it can be due to a change in the output of pathway-tools -id command.

I have followed all neccesary steps and added pathway_tolols in the PATH.

Can you assist?