Giter VIP home page Giter VIP logo

cryfa's Introduction

Cryfa

License: GPL v3 CI

Cryfa is an ultrafast encryption tool specifically designed for genomic data. Besides providing robust security, it also compresses FASTA/FASTQ sequences by a factor of three, making it an efficient solution for managing genomic data.

Installation

Conda

conda install -c bioconda -y cryfa

Docker

# Pull & Run the image
docker pull smortezah/cryfa;
docker run -it smortezah/cryfa;

Build from source

Linux

# Install git and cmake
sudo apt update;
sudo apt install git cmake;

# Clone and install Cryfa
git clone https://github.com/cobilab/cryfa.git;
cd cryfa;
sh install.sh;

macOS

# Install Homebrew, git and cmake
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
brew install git cmake;

# Clone and install Cryfa
git clone https://github.com/cobilab/cryfa.git;
cd cryfa;
sh install.sh;

Note

Pre-compiled versions of Cryfa, optimized for 64-bit Linux and macOS, can be found in the bin/ directory.

Usage

To execute Cryfa in stand-alone mode, utilize the command below:

./cryfa [OPTION]... -k [KEY_FILE] [-d] [IN_FILE] > [OUT_FILE]

For instance, to compact and encrypt data, execute the following command:

./cryfa -k pass.txt in.fq > comp

To decrypt and unpack the data, execute the command below:

./cryfa -k pass.txt -d comp > orig.fq

A sample file, "in.fq", is available in the example/ directory. Detailed descriptions of the options are provided in the subsequent sections.

Note

Cryfa supports a maximum file size of 64 GB. For larger files, consider splitting them into smaller chunks, e.g. using the split command in Linux, and then encrypt each chunk separately. After decryption, you can reassemble the chunks using the cat command.

Input file format

Cryfa identifies the format of a genomic data file by examining its content, not its extension. For instance, a FASTA file named "test" can be input into Cryfa with any extension, such as "test", "test.fa", "test.fasta", "test.fas", "test.fsa", etc. Based on this, executing the command

./cryfa -k pass.txt test > comp

is equivalent to running

./cryfa -k pass.txt test.fa > comp

Note

The password file extension is not a limiting factor for Cryfa. It can have any extension or even no extension at all. For instance, "pass", "pass.txt", "pass.dat", and so on, are all valid and yield the same result.

Benchmarking Cryfa Against Other Methods

To benchmark Cryfa against other methods, configure the parameters in the run.sh bash script and execute it:

./run.sh

This script automates the process of downloading datasets, installing dependencies, setting up compression and encryption tools, executing these tools, and finally, displaying the results.

Options

To explore the available options in Cryfa, execute the command below:

./cryfa

which will yield the following:

SYNOPSIS
      ./cryfa [OPTION]... -k [KEY_FILE] [-d] [IN_FILE] > [OUT_FILE]

SAMPLE
      Encrypt and compact:  ./cryfa -k pass.txt in.fq > comp     
      Decrypt and unpack:   ./cryfa -k pass.txt -d comp > orig.fq
      
      Encrypt:              ./cryfa -k pass.txt in > enc
      Decrypt:              ./cryfa -k pass.txt -d enc > orig

OPTIONS
      Compact & encrypt FASTA/FASTQ files.
      Encrypt any text-based genomic data, e.g., VCF/SAM/BAM.

      -k [KEY_FILE],  --key [KEY_FILE]
           key file name -- MANDATORY
           The KEY_FILE should contain a password.
           To make a strong password, the "keygen" program can be
           used via the command "./keygen".

      -d,  --dec
           decrypt & unpack
           
      -f,  --force
           force to consider input as non-FASTA/FASTQ
           Forces Cryfa not to compact, but shuffle and encrypt.
           If the input is FASTA/FASTQ, it is considered as
           non-FASTA/FASTQ; so, compaction will be ignored, but 
           shuffling and encryption will be performed.
           
      -s,  --stop_shuffle
           stop shuffling the input

      -t [NUMBER],  --thread [NUMBER]
           number of threads

      -v,  --verbose
           verbose mode (more information)
      
      -h,  --help
           usage guide

      --version
           version information

Cryfa leverages the standard output stream, allowing seamless integration with existing data processing pipelines.

Creating a Key File

There are two approaches to create a "KEY_FILE" that can be used with the -k or --key flags. You can either save a raw password in a file or use the provided "keygen" program to generate a robust password. The latter method is strongly recommended for enhanced security.

To utilize the first method, use the commands below to save a raw password in a file, which can then be passed to Cryfa. In this example, "Such a strong password!" is the raw password and "pass.txt" is the file where the password is stored. Alternatively, you can use a text editor to save the password in a file:

echo "Such a strong password!" > pass.txt
./cryfa -k pass.txt IN_FILE > OUT_FILE

While the password must contain at least 8 characters, it's highly recommended to use a strong password for better security. A strong password:

  • Is at least 12 characters long
  • Includes a mix of lowercase (a-z) and uppercase (A-Z) letters, digits (0-9), and symbols (e.g., !, #, $, %, and })
  • Is not a simple repetition of characters (e.g., zzzzzz), a keyboard pattern (e.g., qwerty), or a sequence of digits (e.g., 123456)

Alternatively, you can leverage the "keygen" program to automatically generate a robust password. To do this, execute:

./keygen

You'll be prompted with:

Enter a password, then press 'Enter':

At this point, input a raw password, for example, "A keygen raw pass!", and press "Enter". Subsequently, you'll see:

Enter a file name to save the generated key, then press 'Enter':

The robust password generated by the "keygen" program will be stored in the file you specify, such as "key.txt". Note that "keygen" requires an initial raw password to generate a strong password, but this initial password doesn't need to be particularly strong. Once the key file is created, you can use it with Cryfa as shown below:

./cryfa -k key.txt IN_FILE > OUT_FILE

For a deeper understanding of "key management" - which encompasses the generation, exchange, storage, usage, and replacement of keys - consider exploring [1], [2], [3] and [4].

Citation

If you utilize Cryfa in your research, please acknowledge the tool by citing the following references:

  • M. Hosseini, D. Pratas and A.J. Pinho, "Cryfa: a secure encryption tool for genomic data," Bioinformatics, vol. 35, no. 1, pp. 146--148, 2018. DOI: 10.1093/bioinformatics/bty645
  • [OPTIONAL] D. Pratas, M. Hosseini and A.J. Pinho, "Cryfa: a tool to compact and encrypt FASTA files," 11'th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB), Springer, June 2017. DOI: 10.1007/978-3-319-60816-7_37

Codebase

Visualization of this repo

License

Cryfa is licensed under the GPLv3.

cryfa's People

Contributors

smortezah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cryfa's Issues

AES/GCM: message length exceeds maximum

Dear Morteze,

I am testing some encryption tools for bam files. Cryfa works perfectly on exome file, but can not work with bigger genome file.

I am using this command

cryfa -f -k simonsSS6004478.srt.aln.key -t 1 Data/simonsSS6004478.srt.aln.bam > Compression/output/simonsSS6004478.srt.aln.1.cryfa

and the issue is

This is not a FASTA/FASTQ file and we just encrypt it.
Encrypting...
Caught InvalidArgument...
AES/GCM: message length exceeds maximum
Done, in 620.7435 seconds.

and the output is much small file (64 GB) than the original bam file (106 GB).

I would like to ask for any suggestion for that, as I read on google that AES/GCM has strange behaviour when the maximum message size of ~64G is exceeded .

Thanks in advance,

Emad

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.