Giter VIP home page Giter VIP logo

crypt-dnasequence's Introduction

Encrypt and decrypt strings to DNA Sequences

SYNOPSIS

use Crypt::DNASequence;
Crypt::DNASequence->encrypt($input_filename);
Crypt::DNASequence->decrypt($encrypted_file);

DESCRIPTION

The module is naive and just for fun. It transforms text strings into DNA sequences. A DNA sequence is composed of four nucleotides which are represented as A, T, C, G. If we transform "abcdefghijklmnopqistuvwxyzABCDEFGHIJKLMNOPQISTUVWXYZ", the corresponding sequence would be:

TAGACCTGTCGGTGGGTGGTTCCCCGTATGAAGGGGGGGGCTATGAGTCTACAGACTGACAGGGGGAGTC
AGCGAGTTAGCTAGTAAGCAAGTCCGCCCGTGCGCGCGTTCGCTCGTACGCACGTCGTCCGTTGCGCGGT
CGGTCTGTTAGTCAGTTCTTCCTTTGTTCGACGTACAGACGTACATACGAACAAACGCCCACCCGGCCAG
CCGTCCATCCGACCAACCGCGGCCGGTGCCAGGGCGGGCTGGTAGGCAGGTCTGCCTGTGTGCGGGGTGG
GGCCCACGCCGCACAGCAGCAGGGGGGGGGGGGGGC

or its complement:

CGAGTTCACTAACAAACAACCTTTTACGCAGGAAAAAAAATCGCAGACTCGTGAGTCAGTGAAAAAGACT
GATAGACCGATCGACGGATGGACTTATTTACATATATACCTATCTACGTATGTACTACTTACCATATAAC
TAACTCACCGACTGACCTCCTTCCCACCTAGTACGTGAGTACGTGCGTAGGTGGGTATTTGTTTAATTGA
TTACTTGCTTAGTTGGTTATAATTAACATTGAAATAAATCAACGAATGAACTCATTCACACATAAAACAA
AATTTGTATTATGTGATGATGAAAAAAAAAAAAAAT

The transformation is not unique due to a random mapping, but all the transformed sequences can be decrypted correctly to the origin string.

ALGORITHM

First, text file are compressed into gzip files when encrypting and DNA sequences are first decrypted into gzip files and then uncompressed into normal text file.

The algorithm behind the module is simple. Two binary bits are used to represent a nucleotide such as '00' for A, '01' for C. If you have some knowledge of molecular biology, you would know that A only matches to T and C only matches to G. So if '00' is chosen to be A, then '11' should be used to represent 'T'. In the module, the correspondence between binary bits and nucleotides are applied randomly. The information of the correspondence dictionary is also stored in the finnal sequence.

Here is the procedure for encryption:

  1. Split a string into a set of letters or charactors.
  2. For each letter, convert to its binary form and transform to ATCG every two bits using a randomly generated dictionary. The dictionary may looks like:
$dict = { '00' => 'A',
          '11' => 'T',
          '01' => 'C',
          '10' => 'G' };
  1. Join the A, T, G, C as a single sequence.
  2. Find the first nucleotide of the sequence.
  3. Find the number of the first nucleotide in the sequence.
  4. There is a database storing all arrangements of '00', '11', '01', '10'.
  5. Calculate the index value from the number of the first nucleotide by mod calculation.
  6. Retrieve the arrangement with the index value, map them to the dictionary and get four nucleotides. E.g. the first nucleotide of the sequence is G. The number of G in the sequence is 40. The number of all arrangement in the database is 24. Then we calculate the index value by 40 % 24 = 16. Then the 16th arrangement is retrieved and may looks like ['01', '11', '10', '00']. The four items in the array are mapped to the dictionary to be four nucleotides such as CTGA. Note this information can be used in the decryption procedure.
  7. Put the first two nucleotides at the begining of the sequence and the last two nucleotides at the end of the sequence. 10. That is the finnal seuqence.

Here is the procedure for decryption:

  1. Extract the first two and the last two nucleotides fromt the sequence. E.g. CT and GA.
  2. Count the number of the first nucleotide in the real sequence, e.g., 40 for G.
  3. Use this number to calculate the index in the arrangement database, e.g., 16.
  4. find the dictionary, i.e. a dictionary is generated from the 16th arrangement ['01', '11', '10', '00'] and CTGA.
  5. Translate the DNA sequence according the dictionary into binary bit form and finnaly to the orgin format.

Subroutines

Crypt::DNASequence->encrypt($input_file): encrypt the text file to DNA sequence

Crypt::DNASequence->decrypt($encrypted_file): decrypt the DNA sequence to the origin text file.

crypt-dnasequence's People

Contributors

jokergoo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.