Giter VIP home page Giter VIP logo

golden-gate-cloning-junction-sequence-checker-and-generator's Introduction

README for Golden Gate Cloning Junction Sequence Checker and Generator

Please read the following article for a precis of Golden Gate cloning.

This is a two-tool program that:

  1. Searches if the provided 4nt sequence (i.e. Type IIS restriction enzyme that leaves a 4nt overhang sequence) is compatible with one another in an in-silico cloning design.
  2. Creates a set of compatible 4nt overhang sequence dependent on a sequence of 2AA sequences provided. The latter is useful for generating compatible 4nt sequences for junction sequences that one desires to dictate the AA used in the junction sites.

Below is a common list of type iis re that leaves a 4nt overhang sequence:

This program is taylored for Type IIS restriction enzyme that leaves a 4nt overhang sequence. There are plans on furthering the capability of this program by encompassing Type IIS restriction enzymes that leave a 3nt overhang sequences and overhang sequences >4nt in length.

Clone repository and run AA_to_NT_generator.py and NT_checker.py in the repository directory. This is imperitive in establishing relative directories to the .csv and .json files, which are used to sequence input, verification, and output.

CSV: Please format your CSV as the following. (note: an example 'input.csv' is established in the /CSV directory)

  • The column headers is not case sensitive, however col 1 must be names 'NT sequence' (for NT_checker.py script) and col 2 must be named 'AA sequence' (for AA_to_NT_generator.py script).
  • In your CSV file, each row corrensponds to individual sequences (whether NT or AA) that is to be checked.
    • Please list all NT sequences in column one. The case is not sensitive, but make sure to write a 4nt sequence per row. At least two rows are required, or an exception will be thrown.
    • For the AA column, please write your AA sequence as two single-letter AA code. Two and only two AA sequence must be written per row. The cells are not case sensitive. More than two rows are required to run the program, or an execption will be thrown. (please reference figure below)

JSON: There is a JSON file 'gg_ntdict.json' in the /JSON directory that is used to search a library of established nucleotide sequences for a sequence of amino acids. Do not change the name of this file or an exception will be thrown!

Running the program: Please run the program in your terminal - ensure your directory is set to the repository. Run AA_to_NT_generator.py script by typing 'python AA_to_NT_generator.py' in your terminal. Run NT_checker.py script by typing 'python NT_checker.py' in your terminal. For both scripts, a prompt will appear that requests the .csv filename for your input sequence(s). You may either enter the filename with or without the .csv extension.

NT_checker.py: This script will look at content in col1 (where only NT sequences are listed). A detailed result will print onto your terminal. If the NT sequence fails, reasons for failure will be outlined on the terminal.

AA_to_NT_generator.py: This script will look at content in col2 (where only AA sequences are listed). A load bar and script status will be printed onto your terminal. If the script completed successfully, you will find a summary .csv file written to your CSV directory called '"input file name"_results.csv.' Please look at this file for reference when working with your successful NT sequences generated from the input AA sequences (please reference figure below).

The beauty of the AA_to_NT_generator.py script is its memoization. Whether the AA sequence checks were successful or a failure, the JSON file will append these results to an expanding dictionary. When initianting the script, the script will first check the JSON file for the input AA sequences, i.e. it will never check the same two AA sequence set twice. The more the script is run with different AA combinations, the more elaborate your dictionary becomes.

Criteria for Selection

For example, lets compare two 3' to 5' sequences:

  1. AGCG
  2. TGCG

Rule 1: check nucleotide for nucleotide directly comparing NT matches in the 3' to 5' direction. Note: we are not comparing complementary similarity. No more than two matches are allowed. Two offset comparisons are employed, as well. If any of these conditions show greater than two matches, the sequence (in its entirity) will be deemed void.

A) 3'- AGCG -5'
            | | | |
     3'- TGCG -5'
     3 of 4 NT match - no good [FAIL]

B) 3'- AGCG -5'
            | | |
  3'- TGCG -5'
     0 of 4 NT match - good

C) 3'- AGCG -5'
               | | |
        3'- TGCG -5'
     0 of 4 NT match - good

Rule 2: Similar to rule 1, compare NT to NT comparison as is, however, convert the second sequence in comparison to a reverse complement. Aside from this caveat, all conditions are identical to Rule 1.

A) 3'- AGCG -5'
            | | | |
     3'- CGCA -5'
     2 of 4 NT match - good

B) 3'- AGCG -5'
            | | |
  3'- CGCA -5'
     0 of 4 NT match - good

C) 3'- AGCG -5'
               | | |
        3'- CGCA -5'
     0 of 4 NT match - good

For this example, Rule 2 did not fail, but Rule 1 did. If this was checked in the NT_checker.py script, a warning would be printed on the terminal stating Rule 1 was violated. If this set of sequence were to arise in the AA_to_NT_generator.py, the generated NT sequence will not be considered, and a new set of AA compatible NT sequences will be checked.

Rule 3: Palindromic sequences will be rejected. If checked in the NT_checker.py script, a warning will be printed onto the terminal. For AA_to_NT_generator.py, the generated palindromic NT sequence will not be considered and a new set of AA compatible NT sequences will be checked.

Palindromic sequence:

3'- AGCT -5', which in reverse complement, is:
3'- AGCT -5'

This is the summary for now.

golden-gate-cloning-junction-sequence-checker-and-generator's People

Contributors

irahorecka avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.