Giter VIP home page Giter VIP logo

hictk's Introduction

hictk

License docs Ubuntu CI MacOS CI Windows CI Build Dockerfile Fuzzy testing Download from Bioconda

Zenodo DOI


hictk is a blazing fast toolkit to work with .hic and .cool files.

This repository hosts hictk: a set of CLI tools to work with Cooler, as well as libhictk: the C++ library underlying hictk.

Python bindings for libhictk are available at paulsengroup/hictkpy.

hictk is capable of reading files in .cool, .mcool, .scool and .hic format (including hic v9) as well as writing .hic, .cool and .mcool files.

Installing hictk

hictk is developed on Linux and tested on Linux, MacOS and Windows.

hictk can be installed using containers, bioconda or directly from source. Refer to Installation for more information.

Running hictk

hictk provides the following subcommands:

subcommand description
balance Balance HiC matrices using ICE, SCALE or VC.
convert Convert matrices to a different format.
dump Dump data from .hic and Cooler files to stdout.
fix-mcool Fix corrupted .mcool files.
load Build .cool and .hic files from interactions in various text formats.
merge Merge multiple Cooler or .hic files into a single file.
rename-chromosomes Rename chromosomes found in a Cooler file.
validate Validate .hic and Cooler files.
zoomify Convert single-resolution Cooler and .hic files to multi-resolution by coarsening.

Refer to Quickstart (CLI) and CLI Reference for more details.

Using libhictk

libhictk can be installed in various way, including with Conan and CMake FetchContent. Section Quickstart (API) of hictk documentation contains further details on how this can be accomplished.

Quickstart (API) also showcases the basic functionality offered by libhictk. For more complex examples refer to the sample programs under the examples/ folder as well as to the source code of hictk.

The public C++ API of hictk is documented in the C++ API Reference section of hictk documentation.

Citing

If you use hictk in you reaserch, please cite the following publication:

Roberto Rossini, Jonas Paulsen hictk: blazing fast toolkit to work with .hic and .cool files. bioRxiv 2023.11.26.568707. https://doi.org/10.1101/2023.11.26.568707

BibTex
@article {hictk,
	author = {Roberto Rossini and Jonas Paulsen},
	title = {hictk: blazing fast toolkit to work with .hic and .cool files},
	elocation-id = {2023.11.26.568707},
	year = {2023},
	doi = {10.1101/2023.11.26.568707},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2023/11/27/2023.11.26.568707},
	eprint = {https://www.biorxiv.org/content/early/2023/11/27/2023.11.26.568707.full.pdf},
	journal = {bioRxiv}
}

hictk's People

Contributors

dependabot[bot] avatar phlya avatar robomics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

phlya

hictk's Issues

Is HDF5::Cooler format_version 3 supported?

Firstly, thanks for hiktk - it is a very handy conversion, hard to find and works fast.
Does it know how to handle format_version:3 from a recent cooler release?
I'm new to hic so just guessing it might not. Odd that the recent one has 'nchroms': 37688, - currently rebuilding with a reference dm3.fasta to see if that helps.

Attached are two samples in coolsamples.zip:

  • hicex_dm4.cooler is a cooler-0.9.3 file generated using the hicexplorer infrastructure on Galaxy, and
  • matrix.cool is an older one that's part of the Galaxy library test data from cooler-0.7.6

With hictk 0.0.8 on the command line:

  • Both validate as ok.
  • The old one works with convert into hic
  • The new one freezes with 100% cpu spinning - no output with dump or convert.

It's possible to open both with the python api which is how I found out the cooler versions so that was really handy - thanks!

import hictkpy as htk
f = htk.File("matrix.mcool")
>>> f = htk.File("matrix.cool", resolution=1)
>>> f.attributes()
{'bin_size': 5000, 'bin_type': 'fixed', 'format': 'HDF5::Cooler', 'format_version': 2, 'storage-mode': None, 'creation-date': 
'2017-11-15T19:04:25.627510', 'generated-by': 'cooler-0.7.6', 'assembly': 'unknown', 'metadata': '{}', 'format-url': 
'https://github.com/mirnylab/cooler', 'nbins': 33754, 'nchroms': 15, 'nnz': 35857, 'sum': 37321, 'cis': None}
>>> f = htk.File("hicex_dm4.cool", resolution=1)
>>> f.attributes()
{'bin_size': 10000, 'bin_type': 'fixed', 'format': 'HDF5::Cooler', 'format_version': 3, 'storage-mode': 'symmetric-upper', 'creation-
date': '2024-02-03T03:46:02.531452', 'generated-by': 'HiCMatrix-16', 'assembly': "b'dm3'", 'metadata': '{"format": 
"HDF5::Cooler", "format-url": "https://github.com/mirnylab/cooler", "generated-by": "HiCMatrix-16", "generated-by-cooler-lib": 
"cooler-0.9.3", "tool-url": "https://github.com/deeptools/HiCMatrix", "matrix-generated-by": "b\'HiCExplorer-3.7.2\'", "matrix-
generated-by-url": "b\'https://github.com/deeptools/HiCExplorer\'", "genome-assembly": "b\'dm3\'"}', 'format-url': 
'https://github.com/mirnylab/cooler', 'nbins': 50717, 'nchroms': 37688, 'nnz': 19482, 'sum': 4314105, 'cis': None}
>>> 

No build for Mac found via mamba/conda - Mac installation fails.

It looks like there are only versions of hictk available for the linux-64 platform, so the installation instructions you give for Mac are not working. To check, I ran mamba search hictk --info and conda search hictk --info which only list bioconda/linux-64 in the URL.

Would you be able to publish builds for Mac? This is useful software, it would be very nice to have an easy install for my non-Linux-running collaborators! Thank you!.

Error creating All:All matrix

Hi, I am trying out hictk to convert .mcool to .hic, and I am encountering an issue. After writing all by-chromosome pixels, it tries to create the All:All matrix and fails like this:

>hictk convert --threads 8 --tmpdir results/coolers_library_group         results/coolers_library_group/all.sacCer3.mapq_30.1000.mcool results/coolers_library_group/all.sacCer3.mapq_30.1000.hic

...
2024-05-22 13:55:03.595] [info]: writing pixels for All:All matrix...
FAILURE! hictk convert encountered the following error: an error occurred while writing file "results/coolers_library_group/all.sacCer3.mapq_30.1000.hic": an error occurred while writing the 
All:All matrix to file "results/coolers_library_group/all.sacCer3.mapq_30.1000.hic": position is greater than chromosome size: 4140417920 >= 1531933

This is a tiny .mcool with some yeast data that we use for testing in distiller, the file is attached (changed the extension to txt so github doesn't complain)
all.sacCer3.mapq_30.1000.txt

Am I doing something wrong here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.