Giter VIP home page Giter VIP logo

galaxytools's Introduction

Build Status DOI

Galaxy Tool wrappers

This repository contains a variety of different tools that can be installed and used inside the Galaxy. Many tools are already included in the Galaxy Tool Shed, others needs some more love.

Wrapping tools for use in Galaxy is easy! If you want to start please see the Galaxy wiki or get in contact. If you want to contribute to this repository please see our contributing guidelines and our list of issues.

Highlights

  • ChemicalToolBox includes a lot of cheminformatic tools into Galaxy
  • RNA tools are integrated as part of the de.NBI RNA Bioinformatics Center
  • Genome Annotation tools

Other repositories with high quality tools

Running the tests

First, install Planemo. Run the commands at the project root directory.

python3 -m venv planemo
. planemo/bin/activate
pip install planemo

To run the tests for a specific tool (e.g., pca) in a specific folder (e.g. sklearn)

planemo test --biocontainers tools/sklearn/pca.xml

To run the tests for all tools in a specific folder (e.g. sklearn)

planemo test --biocontainers tools/sklearn

To run the tests for all tools in all folders

planemo test --biocontainers tools

3 steps to get your tool into Galaxy - A real-world example

In this blog post, we will explain how you can get your software tool into a Galaxy server and with this, exposed to thousands of researchers. For this purpose, we will follow David’s steps to add the very generic UNIX diff tool to Galaxy.

The first step to getting your software tool deployed into a Galaxy instance is to develop a Conda package for it. Conda is the de facto standard in many different communities to deploy software easily and reproducibly. The European Galaxy team is heavily involved in the conda-forge and Bioconda projects and Galaxy does have built-in support for both channels. If your software tool is from the Biomedical domain, we recommend the Bioconda channel. Otherwise, create a Conda package for conda-forge. Here, David has created the following Pull Request (PR) against the conda-forge repo:

Step 1 - the Conda package: conda-forge/staged-recipes#11170

After merging, a diffutils repository is created and the Conda package is available usually within 30 min.

The second step is to create the Galaxy wrapper. A Galaxy wrapper is a formal description of all inputs, outputs and parameters of your tool, so that Galaxy can generate a GUI out of it and later a command to send to the cluster. You will find a tutorial on how to create such a wrapper in the planemo documentation. The community has created a few best-practices for Galaxy wrapper development and we recommend to follow them as this will ensure your tools are high-quality and can be deployed at the big public Galaxy servers. David has created the following PR was created against a public repository that collects a variety of different tools.

Step 2 - the Galaxy wrapper: #966

We recommend the submission of your tool to one of the bigger community projects like the ones listed below. This has the advantage that you will most likely get a review and can improve your tool, but also get some infrastructure for automated testing and ToolShed deployment for free.

Other repositories with Galaxy tools:

Once David’s Galaxy wrapper PR passed all tests and was merged, it was automatically pushed to the Galaxy ToolShed, an app store for Galaxy. From there, every Galaxy instance can install tools (apps).

Furthermore, a bot is automatically creating (Bio)Containers (Docker, rkt and Singularity) by tracking all Galaxy tools to ensure that a container exists for each tool. You can see the bot in action in the following PR:

Automatic containers: BioContainers/multi-package-containers#1236

Last but not least, David wanted to get the Galaxy diff tool into the European Galaxy server. For that, a new PR was created against the tool repository from usegalaxy-eu.

Step 3 - request for installation: usegalaxy-eu/usegalaxy-eu-tools#318

Once this is merged, another bot installs all the new tools but also tool updates automatically every Saturday. As a result, the installed diff tool can be used on the European Galaxy server following this link: https://usegalaxy.eu/root?tool_id=diff

That's it - 3 steps to get your tool exposed to thousands of researchers!

galaxytools's People

Contributors

anuprulez avatar beatrizserrano avatar bebatut avatar bernt-matthias avatar bgruening avatar damcorreia avatar dyusuf avatar eggzilla avatar elischberg avatar erxleben avatar gallardoalba avatar gregvonkuster avatar hassanamr avatar hexylena avatar jfallmann avatar lorrainealisha75 avatar martin-raden avatar mblue9 avatar michauhl avatar mmiladi avatar nsoranzo avatar pkohvaei avatar qiagu avatar simonbray avatar smithcr avatar sunyi000 avatar tdudgeon avatar torhou avatar yhoogstrate avatar zwanli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

galaxytools's Issues

restructure repository

galaxytools should be restructured to look and feel like the tools-iuc and tools-devteam repository.

To reflect the different areas of tools and to keep the links stable I will have different tools directories.

  • rna-tools
  • chemicaltoolbox

Remove useless script from the root and improve the readme file.

opt_validate error on command line

Got this error trying to implement MACS2 callpeak...this doesn't happen with MACS1.4:

Traceback (most recent call last):
File "/local/cluster/bin/macs2", line 362, in
main()
File "/local/cluster/bin/macs2", line 45, in main
run( args )
File "/local/cluster/lib/python2.7/site-packages/MACS2/callpeak.py", line 55, in run
options = opt_validate( args )
File "/local/cluster/lib/python2.7/site-packages/MACS2/OptValidator.py", line 184, in opt_validate
if options.refine_peaks:
AttributeError: 'Namespace' object has no attribute 'refine_peaks'
Full Command: macs2 callpeak -t aGFP_align_sorted.bam -c input_align_sorted.bam -f BAM -g 245000000 -p 1.00e-5 -n aGFP --verbose 3

aragorn wrapper update

@erasche a thought a little bit more about it and I think we should remove the FASTA output completely and offer a tool which extracts the correct sequences from the input fasta + the gff file. So what we need is one wrapper that extracts sequences from a FASTA file given a corresponding gff/gtf/BED file.

aragorn_out_to_gff3.py: Handle introns

Does aragorn_out_to_gff3.py handle introns? It doesn't appear to after scanning the code. Feature request? Here's an example of a few tRNA with introns.

3   tRNA-Val              c[883351,883869]  35      (tac)i(40,443)
5   tRNA-seC               [244644,244785]  87      (tca)i(29,57)
7   tRNA-Arg               [615338,616255]  36      (tcg)i(40,822)
2   tRNA-Ser               [324667,326116]  35      (gga)i(40,1354)
1   tRNA-Val                c[84563,87002]  2383    (tac)i(34,2347)
1   tRNA-Asp              c[114384,114687]  39      (atc)i(43,206)
4   tRNA-seC               [324290,324369]  36      (tca)i(38,6)
5   tRNA-Ser               [345441,345776]  28      (gct)i(33,251)
7   tRNA-Val               [405066,405293]  34      (cac)i(35,133)
3   tRNA-Asn               [367853,369971]  2062    (att)i(32,2028)
3   tRNA-Pro               [234486,236430]  1887    (agg)i(31,1856)
1   tRNA-Ser                 [66020,66741]  32      (gct)i(37,633)
2   tRNA-Leu                 [84137,84224]  36      (tag)i(38,10)
3   tRNA-Arg               [168865,170209]  1285    (gcg)i(32,1253)
4   tRNA-His               [175766,176602]  776     (gtg)i(34,742)
1   tRNA-seC                c[88201,88280]  36      (tca)i(38,6)

Prinseq bowtie2 dep in tool_dependencies.xml

Presently prinseq/tool_dependencies.xml downloads and installs bowtie2 v2.1.0, which may be provided instead by devteam's package_bowtie2_2_1_0 in the Tool Shed.
This just came to my eyes while looking at this file, I've not tested the possible change.

GATK2 Unified Genotyper creates too many threads

My colleague Paolo Uva noticed that the gatk2_unified_genotyper tool creates a number of threads which is 6 times the number of cores it has been assigned in job_conf.xml (i.e. $GALAXY_SLOTS). This is due to the options "--num_threads ${GALAXY_SLOTS:-4}" (specified by the macro @threads@) and "--num_cpu_threads_per_data_thread 6".

When this job runs on a machine where it is supposed to occupy only a part of the available CPU cores, it harms the performances of other jobs.

The easiest solution is obviously to change the second option to "--num_cpu_threads_per_data_thread 1".

GATK: missing --interval_padding common option

Option --interval_padding, which is accepted by all GATK tools, is missing.

The param is a non-negative integer, default is 0. Help text should be "Amount of padding (in bp) to add to each interval" .

It should be added to standard_gatk_options and gatk_param_type_conditional macro.

package_blat_35x1 broken

orphan_tool_dependencies/package_blat_35x1/tool_dependencies.xml contains:

<url_template os="linux" architecture="x86_64">https://github.com/bgruening/download_store/raw/master/ucsc/linux/x86_64/blat_35x1</url_template>
<url_template os="darwin" architecture="i686">https://github.com/bgruening/download_store/raw/master/ucsc/darwin/i386/blat_35x1</url_template>
<url_template os="darwin" architecture="i386">https://github.com/bgruening/download_store/raw/master/ucsc/darwin/i386/blat_35x1</url_template>
<url_template os="darwin" architecture="x86_64">https://github.com/bgruening/download_store/raw/master/ucsc/darwin/x86_64/blat_35x1</url_template>

but since bgruening/download_store@bea3c95 all these files have been removed.

RNACode - output columns

The documentation of the tool specifies 10 output columns. 11 are actually produced, columns 2 & 3 should be combined

Improve the unix tools

Unix tools is now known as text-processing tools. We need to add some tests and add missing tools.

  • tool tests
  • missing tools ?
  • send Assaf Gordon a beer for his initial wrappers

MSA Datatypes, "infernal"

On L18 of msa/datatypes_conf.xml, the sniffer is listed as

<sniffer type="galaxy.datatypes.infernal:Stockholm_1_0"/>

whilst the actual datatype is listed as

<datatype extension="stockholm" type="galaxy.datatypes.msa:Stockholm_1_0" display_in_upload="True" />

Should galaxy.datatypes.infernal be galaxy.datatypes.msa?

Add tools tests

Tracking bug report for missing tools test.

  • suspenders
  • blockbuster
  • diffbind
  • ChemicalToolBox

Getting biopython tools into the TS

@peterjc @erasche (because your are both heavily involved in genome annotation).
We are running a genome annotation workshop in the end of this month here and I would like to include our biopython tools from: https://github.com/bgruening/galaxytools/tree/master/tools/biopython into the TS before. I started this many years back but never put them into the TS.

I'm planning to add

  • tests
  • tool dependencies
  • citations
  • macros

What is your opinion on one biopython tool repository vs. many single repositories?
Should I put everything in one github folder but trying the planemo demultiplexing feature to create several TS repositories?
Are these tools good enough?
Should I skip a few (e.g. translate?), because they are already in EMBOSS?
General comments?

Useful imaging tools for Brain data

Brain cell finder is a tool for fully automated localization of soma in 3D mouse brain images acquired by confocal light sheet microscopy.

https://github.com/paolo-f/bcfind

Started here:
https://github.com/bgruening/galaxytools/blob/master/tools/image_processing/bcfind/bcfind.xml

GotohScan

The GotohScan program is a search tool that finds shorter sequences
(usually genes) in large database sequences (chromosomes, genomes, ..)
by computing all semi-global alignments. Thus, the query sequence is
never truncated or split into subsequences, but always mapped to the
database over its complete length. The alignment is computed via the
Gotoh-alignment algorithm using affine gap costs.

http://www.bioinf.uni-leipzig.de/Software/GotohScan/

lncPro

Predicting the interaction between long noncoding RNAs and proteins. By coding RNA and protein sequences into vectors, we use matrix multiplication to give score to each RNA-protein pair. This score can be the measurement of interactions between the RNA-protein pair.

http://bioinfo.bjmu.edu.cn/lncpro/
Requested by Gregor Klaus via mail.

SAMblaster

samblaster is a fast and flexible program for marking duplicates in read-id grouped1 paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.

https://github.com/GregoryFaust/samblaster

Integrate tbl2asn into Galaxy

Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.

http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/

Samtools does not build on centos

I see this error:
bam_tview_curses.c:5:20: error: curses.h: No such file or directory bam_tview_curses.c:7:2: warning: #warning "_CURSES_LIB=1 but NCURSES_VERSION not defined; tview is NOT compiled" bam_tview_curses.c:287:2: warning: #warning "No curses library is available; tview with curses is disabled." make[1]: *** [bam_tview_curses.o] Error 1 make: *** [all-recur] Error 1

no curses library (or ncurses)...

I can build manually if i set D_CURSES_LIB=0 and comment -lcurses

IPyNB HTML report

(03:39:04 PM) jmchilton: erasche2: Can you just build the HTML report inside the docker container at the end and push it out instead of requiring ipython on the machine (and executing arbitrary code outside a docker container).
(03:39:26 PM) jmchilton: That is my last big concern with the datatypes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.