Giter VIP home page Giter VIP logo

hic-inspector's Introduction

HiC-inspector

About

High-throughput conformation capture (Hi-C) allows to study the three- dimensional architecture of whole genomes through the detection of long range chromosomal interactions. We developed HiC-inspector, a bioinformatics pipeline to facilitate the analysis of HiC dataset. Our tools performs read alignment, filtering of the reads that are in the DNA fragment size window around the restriction enzyme sites, counting of interactions with a user- defined resolution, and generation of contact matrix and heatmaps with a complete mapping of the interaction in the reference genome. HiC-inspector is publicly available open source, and can be used for paired-end sequencing from different platforms.

Authors

HiC-inspector is written by Giancarlo Castellano, Fran?ois Le Dily, Antoni Hermoso Pulido and Guglielmo Roma from Bioinformatics Core and Miguel Beato's Lab @ CRG.

License

This software is distributed under the terms of GPL 3.0

Source

You can grab the last code from:

[https://github.com/HiC-inspector/HiC-inspector](https://github.com/HiC- inspector/HiC-inspector)

Contact

If you have any comments, suggestions, questions, bug reports, etc., you can submit an issue in Github or feel free to contact: [email protected], [email protected] or [email protected] PLEASE attach your command line and log messages if possible.

README

Introduction

With the improvement of sequencing techniques, chromatin digestion with a restriction enzyme, ligation of the resulting sticky ends followed by high throughput sequencing is getting popular to study genome-wide DNA-DNA interactions. We present a novel pipeline, named HiC-inspector, for identifying DNA interacting sites.

Requirements

Install

Please check the file 'INSTALL' in the distribution.

conf.pl

  • In this file you can change some of the path and parameters used by the application.
  • Don't forget the trailing slash at the end the dir's path.
BEGIN {
    package main;

    %conf = (
        # Email address
                'email'                 => '[email protected]',
        # Path to local tools
                'rdir'                  => "/soft/general/R-2.13/bin/",
                'bowtiedir'     => "/data/projects/hic/bin/",
                'bedtoolsdir'   => "/data/projects/hic/bin/",
                # Debug while running
                'debug'                 =>1
        );
}

1;

Usage

Usage: perl
hic-inspector.pl [-n missmatches] [-m multiplemappings] [-cf chrsizefile] [-df
designfile] [-sf selectfile] [-dd datadir] [-rd restrictiondir] [-dfo
dataformat] [-pd projectdir] [-g genome] [-fs fragmentsize] [-b bin] [-s step]
[-t test] [-u utils] [-pr processors] [-h help]

Options:

-n, -missmatches= Bowtie option -n: Max mismatches in seed (can be 0-3, default: -n 2) 
-m, -multiplemappings= Bowtie option -m: Suppress all alignments if >  exist (def: no limit) 
-pd, -projectdir= Directory where to write all the results. This folder is created by the pipeline, if it does not exist 
-df, -designfile= Input file describing the experimental design (tab separated text: 1st-column is sample_name, 2nd-column is read1_file, 3rd-column is read2_file, 4th-column is restriction_enzyme_file) 
-dd, -datadir= Directory containing data to be analysed. These can be raw reads in qseq or fastq format(compressed or not), or mapped reads in BED format 
-rd, -restrictiondir= Directory containing restriction enzyme sites to be considered in the analysis. Should be provided in BED format 
-dfo, -dataformat= Format of sequencing reads. Valid options are: qseq (default), fastq, and bed 
-g, -genome= Indexed genome file for the reads alignment 
-sf, -selectfile= Input file with user-defined genomic regions of interest (BED format) 
-fs, -fragment_size= Maximum expected fragment size 
-cf, -chrsizefile= Input file with chromosome sizes (tab separated text: 1st-column is chr, 2nd-column is size) 
-b, -bin= Genomic windows, or "bins", to count for chromatin interactions. Several bins can be provided as a comma separated string (e.g. -b 100000,1000000) 
-s, -step= Analysis steps to be performed. More steps can be provided either as comma-separated list [e.g. 1,2,3] or as dash-separated range [e.g. 1-3] 
Available steps: 
	STEP 1: copying files to local directory; 
	STEP 2: converting qseq file to FASTQ format; 
	STEP 3: mapping reads to genome; 
	STEP 4: converting mapping output to BED format; 
	STEP 5: filtering reads by proximity to restriction sites; 
	STEP 6: filtering reads for regions of interest; 
	STEP 7: combining mate filtered outputs; 
	STEP 8: calculating distances distribution between mate pairs; 
	STEP 9: generating contact matrix; STEP 10: analyzing contact matrix; 
-t, -test Test mode. Prints out commands without executing them 
-u, -utils Specify the genome release to be used among those already provided by HiC-inspector (e.g. hg19) 
-pr,   -processors=               Number of processors to be used -for allowing parallelization (default: 2)
-help This documentation. 

Example result

We provide an example result at: http://biocore.crg.cat/software/HiC- inspector/

This uses hg19 processed with hindIII and a couple of reads from SRR027956.

Design file

Named design.GM.hindIII.hg19

GM.hindIII SRR027956.lite.sra_1.fastq SRR027956.lite.sra_2.fastq hindIII.hg19.bed

Chromosome sizes file

We used fetchChromSizes script to create the chrom.sizes file for the UCSC database you are working with (e.g. hg19)

Executed command

perl mypath/hic-inspector.pl -df design.GM.hindIII.hg19 -dd inputreadsdir -pd output/myproject.hindIII.hg19 -dfo fastq -u hg19 -b 1000000,10000000

hic-inspector's People

Contributors

toniher avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.