jotech / gapseq Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 32.0 412.29 MB

Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks

License: GNU General Public License v3.0

Shell 18.49% Jupyter Notebook 10.43% R 46.47% Python 3.07% Perl 2.10% HTML 19.44%

cobra curation gap-filling metabolic-models metabolic-pathways metabolic-reconstruction r

gapseq's People

Contributors

Stargazers

Watchers

gapseq's Issues

gapseq_quickstart missing

Dear All,
I have noticed that the script gapseq_quickstart is missing from the main folder of gapseq. Has it been removed intentionally ?
Kind Regards,
Georgios

running GapSeq with customized pathways

Dear GapSeq developers,
Thanks for developing this wonderful tool.
I am trying to use GapSeq to see the occurence of a few customized pathways in my genomes.

To do this, I removed the kegg_pwy.tbl, meta_pwy.tbl and seed_pwy.tbl in the dat folder, made-up my own pathways according to the pwy file format and added them to custom_pwy.tbl.

One thing I'd like to know is what does the last four columns in the pwy files refer to (e.g. 13 13 FALSE TRUE )? Also, can I simply leave the "taxrange" column blank if I am not sure about what to put there? Do you have any description docs for the pwy files?

Thanks in advance,
Weizhi

definition of model variable

Dear All,

I would like to inform you that if someone uses a fasta file without the .fna file extension but with the .fasta extension, the gapseq_quickstart.sh breaks when it comes to the following line because of the model variable.

gapseq/gapseq_quickstart.sh

Line 14 in 0675eee

 Rscript $dir/src/generate_GSdraft.R -r $model-all-Reactions.tbl -t "$model-Transporter.lst" -c $fasta -b 200 

Therefore, please revise the following line accordingly (e.g. with a stop function)

gapseq/gapseq_quickstart.sh

Line 4 in 0675eee

model="${fasta/.fna*/}"

Regards,
Georgios

Pipeline error BlastHSPMapperParamsNew

Dear All,

I am working with the GapSeq version fe42620 and I am facing the following issue.
My input is
~/gapseq/./gapseq_quickstart.sh NZ_CP015698_1.fasta ~/gapseq/dat/media/MM_glu.csv
The output is:

....
No transporter found for compounds (existing transporter/exchanges should be removed?):
acetoin
butyrate
cholesterol
cysteine
d-lactate
fucose
gluconate
glycogen
hydrogen
hydrogen sulfide
inosine
melibiose
n-acetyl-d-glucosamine
n-acetylgalactosamine
propionate
pyruvate
raffinose
raffinose
rhamnose
ribitol
salicin
sucrose
tryptophan
tyrosine
uridine
index file NZ_CP015698_1.fasta.tmp.fai not found, generating...
blastn: symbol lookup error: /usr/lib/ncbi-blast+/libxblast.so: undefined symbol: BlastHSPMapperParamsNew

Predicted gram staining:  ambiguous 
Error in build_draft_model_from_blast_results(blast.res = blast.res, transporter.res = transporter.res,  : 
  ERROR: Gram-staining prediction failed or ambiguous result. Please check whether genome sequence contains 16S rRNA gene(s).
No traceback available 
Error in saveRDS(mod$mod, file = paste0(model.name, ".RDS")) : 
  object 'mod' not found
2: stop("ERROR: Gram-staining prediction failed or ambiguous result. Please check whether genome sequence contains 16S rRNA gene(s).")
1: build_draft_model_from_blast_results(blast.res = blast.res, transporter.res = transporter.res, 
       gram = gram, model.name = model.name, genome.seq = genome.seq, 
       high.evi.rxn.BS = high.evi.rxn.BS, script.dir = script.dir)
Error in saveRDS(mod$cand.rxns, file = paste0(model.name, "-rxnWeights.RDS")) : 
  object 'mod' not found
1: saveRDS(mod$mod, file = paste0(model.name, ".RDS"))
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file 'NZ_CP015698_1-rxnWeights.RDS', probable reason 'No such file or directory'
Execution halted

I would be grateful if you had a look on that. Just to inform you that the fasta file worked with a previous commit.

Kind Regards,
@maringos

Update_sequence.sh

The script is missing $seqpath/rxn on line 16

Gapseq on multiple files

Hi Johannes and Silvio,

It is beautiful to see how gapseq evolves!

I have an easy noob question for you, probably not related to gapseq, but rather to linux. How can I run gapseq on multiple genomes at once?

I tried something like the following, but it did not work:

./gapseq find -e 5.4.99.17 {1}:::/MarReF_Genomes/*.fa

Best,
Martin

Production of methane in bacteria

Hi,
I am using gapseq (1.1 f9e1ce6) and checking some well-known bacterial species (Clostridium ultunense, Moorella thermoacetica, Tepidanaerobacter acetatoxydans) that have a reductive acetyl coenzyme A pathway I (homoacetogenic bacteria) for conversion of acetate to CO2 (reversible).
In the list of pathways there are no enzymes related to methane production, however in the output of gapseq after the gapfilling one of the main products is methane.
Additionally, the pathway is present in only one of the species tested.
How is this possible? To me this sounds very strange.
I have attached one model just in case you would like to have a check
Moorella_thermoacetica_ATCC39073.xml.gz
Thanks in advance for your help

gapseq transporter error

Hi,
I think the transporter file should be called by $dir/src/transporter.sh -b 200 $file

gapseq/gapseq

Line 78 in 3cb64e7

$dir/transporter.sh -b 200 $file

Regards,
Georgios
Thanks for your rapid support beforehand :)

Include in-vitro data in the pipeline

Dear All,

Once I have made some simulations with gapseq models under a specific minimal medium. When my collaborators tested the bacteria under this medium, some bacteria failed to grow, some others grew. Would it possible to include this fail/success information in the pipeline of model creation?

Kind Regards,
Georgios

awk: illegal primary in regular expression +-NEOMENTHOL-DEHYDROGENASE-RXN at -NEOMENTHOL-DEHYDROGENASE-RXN

Hi,

I am no expert in the field, merely a programmers that's using this package to get things done. I have already ran gapseq doall on 10 different genome files and have run each time into the following warning / error:

awk: illegal primary in regular expression +-NEOMENTHOL-DEHYDROGENASE-RXN at -NEOMENTHOL-DEHYDROGENASE-RXN
input record number 1, file
source line number 1

Should I worry about this? The script finishes, but I don't understand if I have missed data due to this problem. Thanks.

Cspr

Readme for toy files outdated

Hi,
I would like to comment that the readme info for toy files is outdated.
e.g ./gapseq doall dat/myb71.fna
Kind Regards,
@maringos

Enzymes with two or more subunits in Bacteria but only one in Archaea

Hello all,

currently I am trying to use gapseq to analyse the completeness of a specific pathway that happens both in Bacteria and Archaea. I am able to add the protein sequences that I am interested in for both Bacteria and Archaea in the EC.fasta file, however when I run gapseq, it misses the Archaea sequences as it is expecting that protein to have subunits instead of a single hit.

Is there a way around this?

thank you in advance,
Alessandro

Query subunit fasta issue

I encountered an issue when predicting pathways with polymer protein complexes.

Example command that reproduces the error:

../../gapseq/./gapseq.sh -b 200 -p dichloroethane GCF_000005845.2_ASM584v2_genomic.fna.gz

Please note that you can add any fasta file to reproduce the error. The output of the command is stated below.

Number of pathways to be considered: 1

1/1: Checking for pathway |12DICHLORETHDEG-PWY| 1,2-dichloroethane degradation
(Degradation,CHLORINATED-COMPOUNDS-DEG)
	DHLAXANAU-RXN 1,2-dichloroethane dehalogenase 3.8.1.5
		/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/3.8.1.5.fasta
		NO blast hit
	MOXXANAU-RXN 2-chloroethanol dehydrogenase 1.1.2.7
		/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/1.1.2.7.fasta
ls: cannot access 'query_subunit.part-*.fasta': No such file or directory
rm: cannot remove 'query_subunit.part-*.fasta*': No such file or directory
		total subunits found: 0 / 3
		NO hit because of missing subunits
	ALDXANAU-RXN chloroacetaldehyde dehydrogenase 1.2.1.4
		/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/1.2.1.4.fasta
		Blast hit (5x)
			bit=350 id=43.142 cov=99 hit=UniRef90_Q8GAK7
			bit=252 id=34.879 cov=99 hit=UniRef90_Q8GAK7
			bit=224 id=32.969 cov=99 hit=UniRef90_Q8GAK7
		Candidate reaction for import: 6
	DHLBXANAU-RXN chloroacetate dehalogenase 3.8.1.3
		/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/3.8.1.3.fasta
		NO good blast hit
			(best one: bit=54.3 id=25.225 cov=91)
Pathway completeness: 1/4 (25%)
Hits with candidate reactions in database: 1/4
Key reactions: 0/1

Pathways found:

Candidate reactions found: 6 

%CPU %MEM CMD
 0.7  0.0 /bin/bash ../../gapseq/./gapseq.sh -b 200 -p dichloroethane GCF_000005845.2_ASM584v2_genomic.fna.gz
Running time: 6 s.

The error messages are:

ls: cannot access 'query_subunit.part-*.fasta': No such file or directory
rm: cannot remove 'query_subunit.part-*.fasta*': No such file or directory

duplicated pathway ID and permission denied warning

Hi,
I am running a gapseq session with the doall command. It only returns this message (nothing else) and I don't know if this is normal.

Mon 24 Feb 2020 11:26:15 AM CET
Duplicated pathway IDs found: |NADPHOS-DEPHOS-PWY| will only use /home/minerva/gapseq/src/../dat/custom_pwy.tbl
\|NADPHOS-DEPHOS-PWY\|

Kind Regards,
Georgios Marinos

gapseq's sbml in memote

Hi, jotech, thanks for providing that media file, seems that composition of my media was wrong, now the growth rate is 0.056.

However now the xml file generated is showing valid by sbml validator but it shows following warning:

Warning: As a principle of best modeling practice, the should set an initial value (amount or concentration) rather than be left undefined. Doing so improves the portability of models between different simulation and analysis systems, and helps make it easier to detect potential errors in models. The with the id 'M_cpd00001_c0' does not have an 'initialConcentration' or 'initialAmount' attribute, nor is its initial value set by an or .

Also when I try to run in memote it shows following error which was not being shown previously

critical: The model could not be loaded due to the following SBML errors.
error: Something went wrong reading the SBML model. Most likely the SBML model is not valid. Please check that your model is valid using the cobra.io.sbml.validate_sbml_model function or via the online validator at http://sbml.org/validator .
error: (model, errors) = validate_sbml_model(filename)
error: If the model is valid and cannot be read please open an issue at https://github.com/opencobra/cobrapy/issues .
error: Line 2, Column 0 - #1013: Invalid or undefined XML namespace prefix.
error: - Category: XML content, Severity: 2

the error occur even when I do a fresh install of memote in new virtual environment.

I also check with cobra validation command cobra.io.sbml.validate_sbml_model('TelongatusBP-1.xml')
and it give following error

(None, {'SBML_FATAL': [], 'SBML_ERROR': ['E0 (Error): XML content (core, L2); Bad XML prefix; Invalid or undefined XML namespace prefix.\n'], 'SBML_SCHEMA_ERROR': [], 'SBML_WARNING': [], 'COBRA_FATAL': [], 'COBRA_ERROR': ['No SBML model detected in file.'], 'COBRA_WARNING': [], 'COBRA_CHECK': []})

Originally posted by @hites77 in #41 (comment)

Multimer Enzymes not predicted

GapSeq predicts that the enzyme with EC 1.6.5.3 should consists of 12 different subunits. But only 6 are found in E. coli, where all should be present.

The command for reproduction of this issue:

gapseq.sh -p PWY0-1335 -b 200 GCF_000005845.2_ASM584v2_genomic.fna.gz

This genome of E. coli can be downloaded here

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz

Addition of a compound in the network

Dear All,

I need to check the production or consumption of a molecule (EX_cpd00281_e0) in some models. Unfortunately, it is not included in the networks. How can I include it (and the relevant reactions) in the model?
I have tried to add it in the predifined medium during the model construction, but this leads only to consumption as expected.

Kind Regards,
Georgios

BRENDA E.C. alias (transferation from 1 EC to 2 or more)

I came across an issue were an EC number is wrongly assigned to reactions in the seed DB, due to a specific structure in the brenda_ec.csv table.

Example: Searching for the EC 4.2.99.20 with the command
./gapseq.sh -e 4.2.99.20 -b 200 -i 35 bsub.fasta
results in hits that corresbond also to reactions with EC 2.2.1.9

This is due to the following line in the brenda_ec.csv
2.5.1.64;2-succinyl-6-hydroxy-2,4-cyclohexadiene-1-carboxylate synthase;transferred to EC 2.2.1.9 and EC 4.2.99.20.
Apparently, the EC number 2.5.1.64 was deleted by the EC-Commission and replaced by the two individual sub-reactions EC 2.2.1.9 and EC 4.2.99.20. B. subtilis for instance has two enzymes, one for EC 2.2.1.9 and one for EC 4.2.99.20. However, both genes are each assigned to both new reactions. This causes False-negative predictions for gene essentiality.

@jotech : My suggestion would be to omit all lines in the brenda_ec.csv where there are two ore more EC-numbers in the 3rd column.

Cheers,
Silvio

doall stall

Sorry to start another issue thread.

When I run gapseq doall I can see that some commands are running but at one point the scripts seem to hang. I don't have any output printed on the terminal and there are no files created in the folder.

Do you have an idea of what could be the issue?

I did just try to run gapseq find -p all and this seems to work, I can see the script looking for pathways.

Smetana compatibility

Hello,
I have generated some models with your source and I was trying to analyze them with Smetana software (https://github.com/cdanielmachado/smetana). I set the parameter --ext to e0 to change the extracellular compartment identifier in the models. However, I get an error from an inner compound probably always due to compartments nomenclature. Here the command line and the error I get. Do you have any suggestion?

smetana --global *.xml --flavor fbc2 --verbose --ext e0

Loading community: all
Traceback (most recent call last):
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/bin/smetana", line 468, in <module>
    ignore_coupling=args.no_coupling,
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/bin/smetana", line 333, in main
    comm_models = [model_cache.get_model(org_id, reset_id=True) for org_id in organisms]
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/bin/smetana", line 333, in <listcomp>
    comm_models = [model_cache.get_model(org_id, reset_id=True) for org_id in organisms]
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/lib/python3.6/site-packages/framed/io/cache.py", line 24, in get_model
    model = load_cbmodel(self.paths[model_id], **self.load_args)
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/lib/python3.6/site-packages/framed/io/sbml.py", line 146, in load_cbmodel
    load_gprs=load_gprs, load_metadata=load_metadata)
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/lib/python3.6/site-packages/framed/io/sbml.py", line 104, in load_sbml_model
    load_gprs=load_gprs, load_metadata=load_metadata)
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/lib/python3.6/site-packages/framed/io/sbml.py", line 306, in _load_cbmodel
    _load_metabolites(sbml_model, model, flavor, load_metadata=load_metadata)
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/lib/python3.6/site-packages/framed/io/sbml.py", line 180, in _load_metabolites
    model.add_metabolite(_load_metabolite(species, flavor, load_metadata=load_metadata), clear_tmp=False)
  File "/mnt/data_SSD/anaconda3/envs/flux_balance/lib/python3.6/site-packages/framed/model/model.py", line 323, in add_metabolite
    raise KeyError("Failed to add metabolite '{}' (invalid compartment)".format(metabolite.id))
KeyError: "Failed to add metabolite 'M_cpd15561_c0' (invalid compartment)"

Sincerely,
Arianna Basile

alternative transporters

During transporter search, it can happen that not candidate reaction is found for a certain transporter type. In this case, an alternative transporters can be used (i.e.: a different transporter type). Up to now, this alternatives were not listed in *-Transporter.tbl (only in *-Transporter.lst).

The question is, how to handle alternative transporters, should they be included?

grep regular expression error for Pathway ""

gapseq.sh -b 200 -p "carvone" GCF_000005845.2_ASM584v2_genomic.fna.gz

The command produces following output, incl. grep error messages:

################################
metacyc,custom
Checking for pathways and reactions in: GCF_000005845.2_ASM584v2_genomic.fna.gz carvone
Number of pathways to be considered: 2

1/2: Checking for pathway |PWY-5928| (4R)-carvone biosynthesis
(Biosynthesis,SECONDARY-METABOLITE-BIOSYNTHESIS,Terpenoid-Biosynthesis,MONOTERPENOID-SYN)
--LIMONENE-6-MONOOXYGENASE-RXN (-)-limonene-6-hydroxylase 1.14.14.51
grep: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
grep: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/1.14.14.51.fasta
cat: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN.blast'
Try 'cat --help' for more information.
cat: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN.blast'
Try 'cat --help' for more information.
cat: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN.blast'
Try 'cat --help' for more information.
cat: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN.blast'
Try 'cat --help' for more information.
cat: unrecognized option '--LIMONENE-6-MONOOXYGENASE-RXN.blast'
Try 'cat --help' for more information.
NO good blast hit
(best one: id= bit= cov=)
CARVEOL-DEHYDROGENASE-RXN (-)-carveol dehydrogenase 1.1.1.243
/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/1.1.1.243.fasta
Blast hit (1x)
bit=250 id=35.811 cov=77 hit=UniRef90_O85057
Candidate reaction for import: 1
4.2.3.16-RXN limonene synthase 4.2.3.16
NO sequence data found
Pathway completeness: 1/3 (33%)
Hits with candidate reactions in database: 1/3
Key reactions: 0/0

2/2: Checking for pathway |PWY-7443| (4S)-carvone biosynthesis
(Biosynthesis,SECONDARY-METABOLITE-BIOSYNTHESIS,Terpenoid-Biosynthesis,MONOTERPENOID-SYN)
4.2.3.20-RXN (4R)-limonene synthase 4.2.3.20
NO sequence data found
RXN-10746 (R)-limonene 6-monooxygenase 1.14.14.53
NO sequence data found
RXN-9397 (+)-trans-carveol dehydrogenase 1.1.1.275
/home/swaschina/workspace-kiel/2018/gapseq/dat/seq/Bacteria/unipac90/1.1.1.275.fasta
NO good blast hit
(best one: id=34.182 bit=152 cov=100)
Pathway completeness: 0/3 (0%)
Hits with candidate reactions in database: 0/3
Key reactions: 0/0

Pathways found:

Candidate reactions found: 1

%CPU %MEM CMD
0.5 0.0 /bin/bash .././gapseq.sh -b 200 -p carvone GCF_000005845.2_ASM584v2_genomic.fna.gz
Running time: 10 s.
################################

Genome (E. coli K12) can be downloaded here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz

Complex Blast Results not reported in second pathway query

BLAST results for reactions with polymer enzyme structures (complexes) are only stated for the pathway with the first occurrence of this reaction but not for subsequent pathways.

An Example:

The two pathways PWY-6122 and PWY-6121 both contain the reaction EC 6.3.5.3. However the blast results table does report the results for this reaction only for the first pathway. To reproduce the issue run for instance the following command:

./gapseq.sh -b 200 -p "PWY-6122|PWY-6121" -g -i 35 -k Bifidobacterium_longum_longum_JCM_1217.fasta

The genome can be downloaded here

tmp files error & archaea

Dear GapSeq developers,
Thank you for the development of such an interesting source. I have installed it and everything went fine. I tried to run ./gapseq doall toy/myb71.fna.gz and I got the following error.

Checking updates for Bacteria /mnt/data_SSD/gapseq/src/../dat/seq/Bacteria 2020-07-24 16:30:56 URL: ftp://ftp.rz.uni-kiel.de/pub/medsystbio/Bacteria/rev/sequences.tar.gz [193] -> ".listing" [1] 2020-07-24 16:31:22 URL: ftp://ftp.rz.uni-kiel.de/pub/medsystbio/Bacteria/rev/sequences.tar.gz [21329707] -> "sequences.tar.gz" [1] 2020-07-24 16:31:22 URL: ftp://ftp.rz.uni-kiel.de/pub/medsystbio/Bacteria/unrev/sequences.tar.gz [194] -> ".listing" [1] 2020-07-24 16:36:55 URL: ftp://ftp.rz.uni-kiel.de/pub/medsystbio/Bacteria/unrev/sequences.tar.gz [254056885] -> "sequences.tar.gz" [1] 2020-07-24 16:36:56 URL: ftp://ftp.rz.uni-kiel.de/pub/medsystbio/Bacteria/rxn/sequences.tar.gz [193] -> ".listing" [1] 2020-07-24 16:36:57 URL: ftp://ftp.rz.uni-kiel.de/pub/medsystbio/Bacteria/rxn/sequences.tar.gz [734237] -> "sequences.tar.gz" [1] Updated Bacteria reviewed sequences Updated Bacteria unreviewed sequences Updated Bacteria additional reaction sequences Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : oggetto "finalize_filexp" non trovato Calls: readAAStringSet ... .read_XStringSet -> fasta.index -> .finalize_filexp_list -> ::: -> get Esecuzione interrotta cat: /tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta: No such file or directory cat: /tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta: No such file or directory cp: cannot stat '/tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta': No such file or directory Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : oggetto "finalize_filexp" non trovato Calls: readAAStringSet ... .read_XStringSet -> fasta.index -> .finalize_filexp_list -> ::: -> get Esecuzione interrotta cat: /tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta: No such file or directory cat: /tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta: No such file or directory cp: cannot stat '/tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta': No such file or directory Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : oggetto "finalize_filexp" non trovato Calls: readAAStringSet ... .read_XStringSet -> fasta.index -> .finalize_filexp_list -> ::: -> get Esecuzione interrotta cat: /tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta: No such file or directory cat: /tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta: No such file or directory cp: cannot stat '/tmp/tmp.DjAQqNaSp8/subunit_tmp.fasta': No such file or directory

Do you have any suggestion on how to solve it?
Many thanks in advance.

Arianna Basile

Error in readRDS(rxnXgene.table) : bad 'file' argument Execution halted

Dear All,

In the folder where fna file is located I run
/home/minerva/gapseq/gapseq_quickstart.sh *.fna /home/minerva/HydMic/Media/R2A.csv

But I am facing the follwing issue


Creating Gene-Reaction list... 1519 unique genes on 27 genetic element(s)
Constructing draft model: 
 1817 / 1817
Error in readRDS(rxnXgene.table) : bad 'file' argument
Execution halted

There is though such a file : /home/minerva/HydMic/Model_Creation_GapSeq/230419/C1-1_S10/C1-1_S10_07042014-rxnXgenes.RDS

Based on the fact that this RDS file is an intermediate file, there is something wrong in the readRDS pathway.

Could you replicate this error and if possibly fix it?

Morever, is it possible to continue from the point where pipeline failed, as it takes hours until this point?

Kind Regards,
Georgios

could the annotated sequence as the input for gapseq

Hi @jotech ,
Thanks for your nice work! I have a question about the input sequence format to build model using gapseq.
For example, I have 2000 annotated proteins in a genome, is this can be as input for gapseq to generate the metabolic model? Looking forward to your help!

Best regards,
Hongzhong

../ error file not found

Hi everyone,

Nice application! I have encountered a minor issue though. The ../ in the addresses of the files in the code causes misunderstanding and issues especially if you use the application for the first time and you do not know how the directory structure looks like and you end up searching for missing files :)

Regards
George

Propagate existing model to new draft model

Is there any way to propagate existing model for a genome to new draft model using gapseq ?

Format of Diet

Hi,

I am planning to use gapsep with my own nutritional input. Should the exchange reactions comply to a specific format? For instance, the compounds in the uploaded diets are not formatted like EX_MetabolitesID, but MetabolitesID. Should I put filler numbers for the available compounds or can I assume flux = molecular quantity ?

Kind Regards,
Georgios

assemblyTaxo.tsv is missing

Hi,

Where is the file "/nuuk/2018/bacref/metadata/assemblyTaxo.tsv"? Unfortunately computer cannot find it...

Error in fread("/nuuk/2018/bacref/metadata/assemblyTaxo.tsv", quote = "",  : 
  File '/nuuk/2018/bacref/metadata/assemblyTaxo.tsv' does not exist or is non-readable. getwd()=='/home/athena/github/gapseq/src'
Calls: build_draft_model_from_blast_results -> fread

Blast problems

for bug reports and errors please report the output of: ./gapseq test.
the output:
gapseq version: 1.1 1fa79b7
#######################
#Checking dependencies#
#######################
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options: GNU long options:
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
-m[fr] val
-O --optimize
-W compat --compat
-W copyleft --copyleft
-W copyright --copyright
-W dump-variables[=file] --dump-variables[=file]
-W exec=file --exec=file
-W gen-po --gen-po
-W help --help
-W lint[=fatal] --lint[=fatal]
-W lint-old --lint-old
-W non-decimal-data --non-decimal-data
-W profile[=file] --profile[=file]
-W posix --posix
-W re-interval --re-interval
-W source=program-text --source=program-text
-W traditional --traditional
-W usage --usage
-W use-lc-numeric --use-lc-numeric
-W version --version

To report bugs, see node Bugs' in gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
gawk '{ sum += $1 }; END { print sum }' file
gawk -F: '{ print $1 }' /etc/passwd

GNU sed version 4.2.1
grep (GNU grep) 2.10
This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi
tblastn: 2.2.29+
exonerate from exonerate version 2.2.0
bedtools v2.29.2
barrnap 0.9 - rapid ribosomal RNA prediction
R version 3.5.0 (2018-04-23) -- "Joy in Playing"
git version 2.19.2
Missing dependencies: 0

#####################
#Checking R packages#
#####################
ARGUMENT 'avail.packages~+<-+~installed.packages()' ignored

ARGUMENT 'i=0' ignored

ARGUMENT 'for(~~+~~pkg~~+in+~~needed.packages~~+~~){+' ignored

ARGUMENT '~~+++~~+~~idx~~+~~<-~+~~match(pkg,~~+~avail.packages[,"Package"])' ignored

ARGUMENT '~~+++~~+~~if(~~+~~!~~+~~is.na(idx)~~+~~){' ignored

ARGUMENT '~~+++++++~~+~~cat("~~+~~+~~",pkg,~~+~~avail.packages[idx,"Version"],~~+~~"\n")+' ignored

ARGUMENT '~~+++~~+~~}else{' ignored

ARGUMENT '~~+++++++~~+~~cat("~~+~~+~~",~~+~~pkg,~~+~~"NOT~+~~FOUND",~~+~"\n")' ignored

ARGUMENT '~+++++++~~+~i=i+1' ignored

ARGUMENT '~~+++~~+~~}' ignored

ARGUMENT '}' ignored

ARGUMENT 'cat("\nMissing~+R+~~packages:~~+",+i,+~"\n\n")' ignored

##############################
#Checking basic functionality#
##############################
ARGUMENT 'data(Ec_core)' ignored

ARGUMENT 'sybil::optimizeProb(Ec_core)' ignored

Loading required package: Matrix
Loading required package: lattice.

Question related to model quality

Dear All,

I would like to ask you concerning the model that is produced after the step Rscript src/generate_GSdraft.R -r myb71-all-blast.tbl -t myb71-Transporter.lst -c myb71.fna -b 200. How much should someone trust it? What is the difference with a model that is produced by ModelSeed? Can I have a robust gap-filled model without any knowledge on the diet?

These questions may sound naive, but they will help me to understand the general concepts of gap-filling.

Kind Regards,
Georgios

Issue report: '-b archae' lead to error in build_draft_model_from_blast_results

Hi all,

I found a tiny issue running .gapseq draft.

-b|--biomass Gram "pos" OR "neg" OR "archae" OR "auto"? Default: "auto". Please note: if set to "auto", the external programms barrnap, usearch, and bedtools are required.

When using '-b archae' as a suggested flag for archaeal genome, an error occurs as follows:

Error in build_draft_model_from_blast_results(blast.res = blast.res, transporter.res = transporter.res,

The way to fix this is to use '-b archaea' instead. Hope this helps for an issue addressing.

Best, and Merry Xmas :)

Shan

Unable to ran model xml in memote

I tried to check the constructed model in memote but it hangs at around testconsistency.

Also when I tried to load draft model in Cobratoolbox it gives error
modelFileName = 'Test1_gapseq_draft.xml';
model = readCbModel(modelFileName);
Output argument "qualifiers" (and maybe others) not assigned during call to "getDataBases".

Error in parseCVTerms (line 27)
[databases,identifiers,relations] = cellfun(@getDataBases, {CVTerms.resources},
{CVTerms.qualifier},'UniformOutput',0);

Error in readSBML (line 121)
[databases,identifiers,qualifiers] = cellfun(@parseCVTerms, cvterms,'UniformOutput',0);

Error in readCbModel (line 211)
model = readSBML(fileName,defaultBound);

Please advice on the compatibility and standard output of the model

-draft error in automated pipeline

The addition of "-draft" in the line below causes an error in gapseq automated pipeline, as the input of gf.suite.R is gf.suite.R -m ./${id}.RDS

gapseq/src/generate_GSdraft.R

Line 359 in 7d69773

saveRDS(mod$mod,file = paste0(model.name, "-draft.RDS"))

gapseq/gapseq

Line 80 in 7d69773

 Rscript $dir/src/gf.suite.R -m ./${id}.RDS -n $medium -c ./${id}-rxnWeights.RDS -b 100 -g ./${id}-rxnXgenes.RDS 

Error in BacArena after loading model generated after gapfilling with LB media

Following error got popped up when model was loaded in bacArena. (The gapfilling of model was done with LB supplemented glucose. )

"Median lower bound for non-zero and non-Inf exchanges is:06"
Warning message in Organism(model = model, ...):
"Many lower bounds of of the model seems to be set to non -infinity. Please be aware that they will be used as maximal uptake rates even when the available medium is more abundant! (set setAllExInf=TRUE to reset all exchanges to -INF)"

When I used LB media without glucose for gapfiling then the notification was

"Median lower bound for non-zero and non-Inf exchanges is:-16"
Warning message in Organism(model = model, ...):
"Many lower bounds of of the model seems to be set to non -infinity. Please be aware that they will be used as maximal uptake rates even when the available medium is more abundant! (set setAllExInf=TRUE to reset all exchanges to -INF)"

This error was not observed when model gapfilling was done with minimal media.
Here it shows
"Median lower bound for non-zero and non-Inf exchanges is:-1006"

How to solve this issue

Media definition for new isolated gut strains

Hi GapSeq developers [@jotech],
Could I ask a question about how to define the media used for the gap-filling phase because I have several isolated human gut strains but I did not know their detailed nutrient requirement?

Thanks a lot!

Best regards,
Hongzhong

Files not found

Hi,
I would like to inform you about the following errors:

stat: cannot stat ‘/home/***/software/gapseq/src/../dat/seq/Bacteria/rev/sequences.tar.gz’: No such file or directory
stat: cannot stat ‘/home/***/software/gapseq/src/../dat/seq/Bacteria/unrev/sequences.tar.gz’: No such file or directory
stat: cannot stat ‘/home/***/software/gapseq/src/../dat/seq/Bacteria/rxn/sequences.tar.gz’: No such file or directory

*** stands for my account name
I went through this path in the repository and indeed there are no such files.
Kind Regards,
Georgios

Reaction rxn00068

Hi all,

Thanks for your great job. I have a question concerning the direction of a reaction.

The reaction rxn00068 should be to this direction < according to ModelSEED. Here I can read in database that the direction is =.

Any opinion from you side?

Kind Regards,
Georgios

Required R Packages

Dear All,

It would be helpful if you write a code that installs all the required R packages automatically.
For example, my brand new R installation needed data.table getopt stringr sybil and for sure some others that I may have already installed but I cannot directly identify.

Kind Regards,
George

Exchange Reaction of Manganese

Dear All,

I would like to ask you concerning the exchange reaction of manganese in TSB medium. Why did you choose the identifier cpd00030 instead of cpd20864 or cpd20863?

Kind Regards,
Georgios

P.S.: This type of question may be valid for other compounds also.

Mac compatibility

Finally getting the time to test your tools 😉
Here are a bunch of notes about issues I encountered trying to run gapseq on my Mojave 10.14.6

Mac users need to have coreutils and gnu binutils installed otherwise you will have the following errors:
readlink: illegal option -- f
stat: illegal option -- c

Both can be installed with homebrew and symlinked with:

ln -s /usr/local/bin/gstat /usr/local/bin/stat
and
ln -s /usr/local/bin/greadlink /usr/local/bin/readlink

Question/issue about transporter.sh

Hi,

As I have mentioned in my previous issue: I am a simple programmer, assigned to get your algorithm working. I have this weird issue with /src/transporter.sh. Every time it is called I get a syntax error:

sed: 1: "all.fasta": command a expects \ followed by text
/data/0/src/transporter.sh: line 111: syntax error near unexpected token `('
/data/0/src/transporter.sh: line 111: `fastafetch -f all.fasta -i tcdb.idx -Fq <(sort -u hits ) > tcdbsmall.fasta'

It obviously has to do with this part <(sort -u hits ) >. I don't get it! I think the problems started when I installed Exonerate on my Mac. This was a mistake, I found out Exonerate wasn't in your Mac-install instructions. I probably shouldn't have done it, I just followed the instructions from somebody else. And now I get this syntax error. I hope you can help me out here, and my apologies if this is caused by me.

Thanks in advance

Cspr

Errors in HPC cluster

Dear All,

I would like to report some errors arisen while I was running 6 instances of GapSeq in my HPC cluster. In each instance that was the error output:

Warning: [tblastn] Query_1 UniRef50_O30642 M.. : One or more O characters replaced by X for alignment score calculations at positions 201 
Warning: [tblastn] Query_1 UniRef90_Q18TV3 T.. : One or more O characters replaced by X for alignment score calculations at positions 330 
grep: invalid option -- 'g'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
grep: invalid max count
index file AEP.fna.tmp.fai not found, generating...

Only the last line seems to be not problematic...
Please be aware that the pipeline ended successfully.
I will try to repeat the pipeline with the myb71.fna file and I will return.

Kind Regards,
Georgios

Unziping Error due to Git LFS

Hi,

I am facing an error appearing twice

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

This happens in the very beginning and the pipeline does not stop.
I am suspecting the unzip function for the files that contain sequences (sequences.tar.gz) in the folders $seqpath/rev/ and $seqpath/unrev/

Could you replicate it?

Kind Regards,
Georgios

missing biomass components for archaea

The biomass reaction for archaea is taken from the Methanosarcina barkeri iAF692model.

Right now not all of the components can be produced even when all reactions from the seed database are used.

List of missing biomass compounds:

cpd15817 3hdpgpe 2-O-(3'-hydroxy)phytanyl-3-O-phytanyl-sn-glycero-l-phosphoethanolamine
cpd15818 3hdpgpg 2-O-(3'-hydroxy)phytanyl-3-O-phytanyl-sn-glycerol-1-phospho-3'-sn-glycerol
cpd15819 3hdpgpi 2-O-(3'-hydroxyl)phytanyl-3-O-phytanyl-sn-glycero-1-phospho-myo-inositol
cpd15820 3hdpgps 2-O-(3'-hydroxy)phytanyl-3-O-phytanyl-sn-glycero-l-phosphoserine
cpd15834 adocblhbi Adenosylcobalamin-HBI
cpd02817 cob coenzyme b
cpd15858 dpgpe 2,3-O-phytanyl-sn-glycero-1-phosphoethanolamine
cpd15859 dpgpg 2,3-di-O-phytanyl-sn-glycerol-1-phospho-3'-sn-glycerol
cpd15860 dpgpi 2,3-O-phytanyl-sn-glycero-1-phospho-myo-inositol
cpd15861 dpgps 2,3-O-phytanyl-sn-glycero-1-phosphoserine
cpd15862 f390a coenzyme F390 (adenosine)
cpd15863 f390g coenzyme F390 (guanosine)
cpd03425 f430 coenzyme F430
cpd15880 gdpgpi glucosaminyl archaetidyl-myo-inositol
cpd15884 h4spt tetrahydrosarcinapterin

All other biomass compounds (incl. coenzym m: cpd02246) can already be produced.

Could I apply this tool to other species?

Hi,

As the title of your article said, gapseq could be used for the prediction of bacterial metabolic pathways. But I still would like to know whether this tool can be applied to other species, such as fungi.

Best,

Subunit substring match

In subunit search, it occasionally occurs that the names of the one subunit is a substring of another subunit. E.g. "Subunit 1" is a substring of "Subunit 11". Thus, searching for "Subunit 1" results also in hits with the database for Subunit 11.

Example to reproduce the issue:
./gapseq.sh -e 7.1.2.2 [fasta]

clpexAPI related bug on Virtual machine

Hello Johannes,

So I still can't install cplexAPI on my virtual machine and couldn't find how to have the glpk solver used instead. I have digged a bit in the code and found out that the adapt.R and gf.suite.R scripts have to both load the cplexAPI before checking if it is present in the installed package.

if you move the

suppressMessages(library(cplexAPI))

within the the if loop below

if( "cplexAPI" %in% rownames(installed.packages()) ){
  sybil::SYBIL_SETTINGS("SOLVER","cplexAPI"); ok <- 1
}else{
  warning("glpkAPI is used but cplexAPI is recommended because it is much faster")
  sybil::SYBIL_SETTINGS("SOLVER","glpkAPI"); ok <- 5
}

like this

if( "cplexAPI" %in% rownames(installed.packages()) ){
  suppressMessages(library(cplexAPI))
  sybil::SYBIL_SETTINGS("SOLVER","cplexAPI"); ok <- 1
}

it seems to fix the script crashing because it can't load the cplexAPI so it can't choose the solver =)

Photosynthesis

Dear All,

I would like to ask you about photosynthesis pathways. According to my collaborator, a marine bacterium should be able to make photosynthesis, but I see that there is oxygen dependency in the respective model. Assuming that the genomic information is complete, would gapseq add photosynthesis pathways?

Kind Regards,
Georgios

jotech / gapseq Goto Github PK

gapseq's People

Contributors

Stargazers

Watchers

Forkers

gapseq's Issues

Error in build_draft_model_from_blast_results(blast.res = blast.res, transporter.res = transporter.res,

Recommend Projects

Recommend Topics

Recommend Org