Giter VIP home page Giter VIP logo

metamos's People

Contributors

bodington avatar cmhill-zz avatar dsommer avatar keuv-grvl avatar koadman avatar mihaipop avatar mlangill avatar skoren avatar treangen avatar wookietreiber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metamos's Issues

Report error if appropriate version of python is not found

metAMOS requires python 2.6. Running it with another version of python will cause library errors. Therefore, there should be a script to configure runPipeline.py and createProject.py to check for python 2.6 and configure itself to use it. Otherwise, an installation error should be reported.

more efficient snippet

Hi Todd,

I made some changes to the FragGeneScan portion of the findorfs.py that have yielded efficiency improvements by orders of magnitude.
I am not quite sure how to submit the code to the github so I will post here and leave it to your discretion to incorporate it (old code commented out for comparison).
I am still trying to get the pipeline to complete end-to-end on actual data set...I will keep you posted.

Cheers!

for seq in seqs:
    hdr,gene = seq.split("\n",1)
    #hdr = hdr.split("\n")[0]
    hdr = hdr.rstrip("\n")
    #gene_ids.append(hdr)
    #split the header in two
    orfkey = '_'.join(hdr.split('_')[:6])
    orfval = '_'.join(hdr.split('_')[7:])
    orfhdrs[orfkey]=orfval

#for key in gene_ids:
    # genecnt = 1
    # gkey = ""
    # if not is_scaff:
        # for ckey in cvg_dict.keys():
            # if ckey in key:
                # gkey = ckey

        # if gkey != "":
            # cvgg.write("%s\t%s\n"%(key,cvg_dict[gkey])) 
        # else:
            # cvgg.write("%s\t%s\n"%(key,1.0))
for key in orfhdrs.keys():
    if key in cvg_dict:
        cvgg.write("%s\t%s\n"%((key + orfhdrs[key]),cvg_dict[key]))
    else:
        cvgg.write("%s\t%s\n"%((key + orfhdrs[key]),str(1.0)))
cvgg.close()

Newbler split reads not supported

When newbler runs, it can split a read within assembly into multiple pieces. Currently metAMOS will only take the first occurrence of the read.

createProject parameters order dependent

Library insert sizes need to appear before the -1,-2,-s,-sm parameters otherwise createProject will fail. would be convenient to make these order independent, especially for backwards compatibility with previous test scripts/user scripts.

Propagate missing BLAST support

Propagate support metaphyler and PhyloSift but not BLAST or phmmer or PhymBL. Add support for propagating their classifications

Generate an interleaved/non-interleaved file for each library

Some tools can only accept interleaved files, some can only accept non-interleaved files. To support both tools, each library (regardless of how it was input) should have a corresponding interleaved and non-interleaved file generated by the pipeline.

Amphora lib error

Amphora fails with:

"perl: symbol lookup error: /home/ondovb/metAMOS/Amphora-2/lib/auto/Math/Random/Random.so: undefined symbol: Perl_Tstack_sp_ptr"

on Linux 2.6.41.4-1.fc15.x86_64 x86_64 with Perl v5.12.4.

Fixed by running amphora_install.pl.

FASTQ file filtering

Mated FASTQ reads in non-interleaved format MUST be aligned or all reads could potentially be discarded. Need to index read ids and check the pairs to see if they are in the file but slightly out of order. If so, should fix order. If can't be found, discard the read or place in unpaired/unmated/frag file.

bowtie segmentation fault

bowtie is segfaulting during the scaffold step on a CentOS 5 system with the following kernel:
uname -a Linux jumbo-0-1.merlot.genomecenter.ucdavis.edu 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Running the exact same bowtie command using bowtie-0.12.7 downloaded from sourceforge completes without error.

I'm not sure whether it's relevant, but there appears to be a difference in which libs were linked. bowtie in metamos has:
-bash-3.2$ ldd /home/koadman/software/metAMOS/Utilities/cpp/Linux-x86_64/bowtie libpthread.so.0 => /lib64/libpthread.so.0 (0x00000031d8c00000) libm.so.6 => /lib64/libm.so.6 (0x00000031d9000000) libc.so.6 => /lib64/libc.so.6 (0x00000031d8400000) /lib64/ld-linux-x86-64.so.2 (0x00000031d8000000)

bowtie from sf.net has:

-bash-3.2$ ldd ~/software/bowtie-0.12.7/bowtie libpthread.so.0 => /lib64/libpthread.so.0 (0x00000031d8c00000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000031db000000) libm.so.6 => /lib64/libm.so.6 (0x00000031d9000000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000031dac00000) libc.so.6 => /lib64/libc.so.6 (0x00000031d8400000) /lib64/ld-linux-x86-64.so.2 (0x00000031d8000000) -bash-3.2$ file ~/software/bowtie-0.12.7/bowtie /home/koadman/software/bowtie-0.12.7/bowtie: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

Krona charts require internet connection

There is currently no way to make make local charts for isolated systems, since the krona folder doesn't include the web resources. Is it feasible to drop in the complete KronaTools directory, rather than the flat folder? I'm working on integrating the Amphora and Phmmer scripts, and the next release of KronaTools will have the ability to include the Perl module from anywhere, so that should make it easier to add more scripts in the future.

Test MetaPhlyer with FragGeneScan

Test dataset of FCP (test_fcp) has no MetaPhyler classifications when using fraggenescan. However, it does have classifications when using metagenemark. Make sure the genes are being properly passed.

Support multiple file types/formats

Currently, in 0.33, you cannot mix paired and unpaired data (such as by using the command -1 pairs_1.fq,unpaired.fq -2 pairs_2.fq. Support this functionality as well as multiple file types (fasta and fastq) in one input.

Glimmer-MG not supported

This is a metagenomic gene finder and support should be added soon. Some code stubs may be in place but input/output parsers and integration into the FindORFs step is required.

warning from curl during taxonomy download

Warning: Illegal date format for -z/--timecond (and not a file name).
Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.

This is on ubuntu 10.04 LTS with curl version:

koadman@edhar:~$ curl --version
curl 7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz

Help debugging failed run

Hi folks,

I was happy to get the pipeline to run end to end on sub-sampling of my data to 10M paired reads.
I then attempted to run it on the entire data set of 76M paired reads.
It unfortunately crashed at the findORFS step.

The error log is at bottom of this message.
Here are my additional questions:

  • Is there a flag for printing the list of commands to a file or to STDOUT / STDERR ?
  • what does the --fastest flag do specifically?

Here is the command used:
${metAMOS}/runPipeline -c amphora2 -d METAMOS_BS27FULL -g fraggenescan -k 43 -p 22 -a velvet 1> METAMOS_BS27FULL.run.out 2> METAMOS_BS27FULL.run.err &

Her is the STDERR log:

Job = [[SGI_BS27.1.fastq, SGI_BS27.2.fastq] -> preprocess.success] completed

Completed Task = preprocess.Preprocess
Job = [[lib1.seq] -> [proba.asm.contig]] completed
Completed Task = assemble.Assemble
Job = [proba.asm.contig -> proba.bout] completed
Completed Task = mapreads.MapReads
Traceback (most recent call last):
File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/runPipeline", line 367, in
pipeline_run([preprocess.Preprocess,assemble.Assemble,findorfs.FindORFS, findreps.FindRepeats, annotate.Annotate, abundance.Abundance, scaffold.Scaffold, findscforfs.FindScaffoldORFS, propagate.Propagate, classify.Classify, postprocess.Postprocess], verbose = 1)
File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 2680, in pipeline_run
raise errt
ruffus.ruffus_exceptions.RethrownJobError:

Exceptions running jobs for

'def findorfs.FindORFS(...):'

Original exception:

Exception #1
exceptions.ValueError(need more than 1 value to unpack):
for findorfs.FindORFS.Job = [proba.asm.contig -> proba.faa]

Traceback (most recent call last):
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 524, in run_pooled_job_without_exceptions
    return t_job_result(task_name, JOB_COMPLETED, job_name, return_value, None)
  File "/bio_bin/python26/lib/python2.6/contextlib.py", line 34, in __exit__
    self.gen.throw(type, value, traceback)
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 232, in do_nothing_semaphore
    yield
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 517, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/Utilities/ruffus/task.py", line 447, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 243, in FindORFS
    parse_fraggenescanout("%s/FindORFS/out/%s.orfs"%(_settings.rundir,_settings.PREFIX))
  File "/bioinformatics/asm/bio_bin/metAMOS/metAMOS-6b17a08-0.35/src/findorfs.py", line 191, in parse_fraggenescanout
    hdr,gene = seq.split("\n",1)
ValueError: need more than 1 value to unpack

Thanks

Create new importer for Amphora2

The current Krona importer for Amphora 2 uses only species-level classifications to build the visualization. Update the importer to record the lowest accurately classified level (by contig) from Amphora instead.

Assembly error in runPipeline

Hello, I am trying to run MetAMOS on our UC Davis servers, and I keep getting a persistent error as follows (error is repeatable regardless of which Illumnia dataset I try to process - seems like a broken pipe somewhere?):

-bash-3.2$ initPipeline -1 Aphelenchus_1510-KO-4_L4_1.fastq -2 Aphelenchus_1510-KO-4_L4_2.fastq -d Aphelenhcus_4Mar -i 100:600 -q
Project dir /share/jumbo-0-1-scratch-2/hbik/Aphelenhcus_4Mar successfully created!
Use runPipeline.py to start Pipeline
-bash-3.2$ runPipeline -k 45 -d Aphelenhcus_4Mar/
Starting metAMOS pipeline
Warning: Newbler is not found, some functionality will not be available
Warning: FCP is not found, some functionality will not be available
Warning: PHmmer is not found, some functionality will not be available


Tasks which will be run:

Task = preprocess.Preprocess
Task = assemble.Assemble
Task = findorfs.FindORFS
Task = findreps.FindRepeats
Task = annotate.Annotate
Task = abundance.Abundance
Task = scaffold.Scaffold
Task = findscforfs.FindScaffoldORFS
Task = propagate.Propagate
Task = classify.Classify
Task = postprocess.Postprocess


Job = [[Aphelenchus_1510-KO-4_L4_1.fastq, Aphelenchus_1510-KO-4_L4_2.fastq] -> preprocess.success] completed
Completed Task = preprocess.Preprocess
Running SOAPdenovo on input reads...
Traceback (most recent call last):
File "/home/koadman/software/metAMOS/runPipeline", line 358, in
pipeline_run([preprocess.Preprocess,assemble.Assemble,findorfs.FindORFS, findreps.FindRepeats, annotate.Annotate, abundance.Abundance, scaffold.Scaffold, findscforfs.FindScaffoldORFS, propagate.Propagate, classify.Classify, postprocess.Postprocess], verbose = 1)
File "/home/koadman/software/metAMOS/Utilities/ruffus/task.py", line 2680, in pipeline_run
raise errt
ruffus.ruffus_exceptions.RethrownJobError:

Exceptions running jobs for

'def assemble.Assemble(...):'

Original exception:

Exception #1
exceptions.ValueError(invalid literal for int() with base 10: 'ggaggdfadae]gggggcggfdfefbgggaffdcdfdffdaggggggg_ggdgggfggggffffdggf_ggggggggggg'):
for assemble.Assemble.Job = [[lib1.seq] -> [proba.asm.contig]]

Traceback (most recent call last):
File "/home/koadman/software/metAMOS/Utilities/ruffus/task.py", line 517, in run_pooled_job_without_exceptions
return_value = job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
File "/home/koadman/software/metAMOS/Utilities/ruffus/task.py", line 447, in job_wrapper_io_files
ret_val = user_defined_work_func(*param)
File "/home/koadman/software/metAMOS/src/assemble.py", line 464, in Assemble
map2contig()
File "/home/koadman/software/metAMOS/src/assemble.py", line 110, in map2contig
epos = int(spos)+len(read_seq)
ValueError: invalid literal for int() with base 10: 'ggaggdfadae]gggggcggfdfefbgggaffdcdfdffdaggggggg_ggdgggfggggffffdggf_ggggggggggg'

-bash-3.2$

Support gzipped input files

The default for Illumina instruments is now gzipped fastq files. Support these files in the pipeline. Rather than extracting the file in initPipepeline, it would be better if the file could remain gzipped as long as possible to save space.

Parsing of MetaGeneMark output slow

Inefficient parsing of GeneMark output to create gene files & gene coverage file results in a bottleneck in the pipeline, requiring an hour (or more) to parse out 50-100K ORFs from the output file. Need to look closer into this to see if we can simply generate the files via the command line (don't think so) or more efficient parsing of output.

Preprocess/out not properly initialized for unmated fastq files

There are no libX.seq/libX.seq.mates files created when running the pipeline on unmated fastq files. The following code should be added to line 1163 (in release 0.2):
elif lib.format == "fastq" and not lib.mated:
run_process("ln -s %s/Preprocess/in/%s %s/Preprocess/out/lib%d.seq"%(rundir, lib.fq, fname, rundir, lib.id), "Preprocess")
run_process("touch %s/Preprocess/out/lib%d.seq.mates"%(rundir, lib.id), "Preprocess")

and line 1230 should be:
soapd = soapd.replace("LIB%dQ1REPLACE"%(lib.id),"%s/Preprocess/out/%s"%(rundir,lib.f1.fname))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.