Giter VIP home page Giter VIP logo

smashgc's Introduction

smashGC

A tool that was created to work downstream of antismash7 results to look for evidence of horizontal gene transfer (HGT) events.

The tool takes input of a folder containing gbff files (this can be multiple within the directory) and a single tsv file describing the biosynthetic gene clusters (BGC) regions within the genomes. The tool was built around correlating GC content of antismash BGCs with host GC content.

The tool then finds the GC content of the BGC vs the whole genome. Example of

The output is a plot of the correlation between the two GC content values created using this tool. Antismash was run on 616 enterococcal genomes and the tool was run on the genomes and a tsv file describing the products. Example of a plot created with the tool

Features

  • BGC Prediction Correlation: Correlates antiSMASH-predicted BGC's GC content with whole genome GC.

Requirements

  • Python 3.7.12
  • antiSMASH (for generating initial data)
  • Biopython

Installation

  1. Clone the repository to your local machine:
    git clone https://github.com/DEHourigan/smashGC.git
  2. Navigate to the cloned directory:
    cd smashGC
  3. Install the required dependencies:
    conda env create -f smashGC.yml
    conda activate smashGC

Usage

The tool comes with a biopython script to pull out the necessary information from an antismash results start and end in the context of the whole genome.

  1. Prepare your data:

    • Ensure your .gbff files are located within a single folder.
    • Create a .tsv file containing the headers product, assembly, orig_start, orig_end, locus, filename as described:
      • product: Predicted by antiSMASH. ie. lanthipeptide-i
      • assembly: Accession of the genome.
      • orig_start and orig_end: antiSMASH regions in the context of the whole genome.
      • locus: Contig name
      • filename: Name of each individual .gbff file.
  2. Run smashGC:

    python smashGC.py -f /path/to/folder -t /path/to/file.tsv

Replace /path/to/folder and /path/to/file.tsv with the actual paths to your .gbff files folder and .tsv file, respectively.

Output

tsv file contiaining GC content of BGC vs genome. Can be plotted from here.

Contributing

DEHourigan

Contact

For any queries, please reach out via GitHub issues or directly to [email protected].


smashgc's People

Contributors

dehourigan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.