vaquerizaslab / chess Goto Github PK

Comparison of Hi-C Experiments using Structural Similarity.

License: Other

Python 100.00%

hi-c structural-similarity comparative-genomics chromatin chromatin-organisation

chess's Introduction

chess-hic

CHESS is a tool for the comparison and automatic feature extraction for chromatin contact data, developed in the Vaquerizas Lab.

If you use CHESS in your research, please cite the CHESS paper.

Please check out the online documentation for detailed installation and usage instructions.

chess's People

Contributors

Stargazers

Watchers

Forkers

liz-is evianyiyun lie-ne jakob-zerbs yang0577 fismathack 97xsl1013

chess's Issues

conservation analysis when only a few syntenic blocks are available

Hi,

I would like to compare the Hi-C maps but only a few (e.g. less than five, or even one) syntenic blocks (≥ 1Mb) are available. In my understanding, the raw SN and SSIM calculation do not depend on the sample size, but the z-SSIM will be affected by the sample size. I'm wondering, in my case, if the sample size is small, is it reasonable to use chess to quantify the contact map differences?

Many thanks

Best,
Yang

No valid region pairs found?

Hi, I get the following error:

2021-06-04 17:21:41,544 WARNING 26638 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2021-06-04 17:21:41,544 ERROR No valid region pairs found; aborting.

I think the problem is the default mm10 chromosome that I pass into chess pairs has "chr1, chr2" and so on, but my data is just labelled "1 2 3 ... ". What do you recommend I do to solve this problem?

No valid pairs found problem

Hi, I was using two hic matrices (generated from juicer with -k KR and resolution only 1000000, chromosome only chr1) and trying to use chess sim but shows the following error. Do you know the possible reason? By the way, what is shown is the same if @1000000@KR are eliminated.

2021-02-03 14:11:52,877 INFO Running '/p/keles/schic/volumeC/SiqiShen/miniconda3/envs/chess/bin/chess sim /p/keles/schic/volumeA/Script/Downstream/higashi/higashi_H1Esc_chr1_estimated.bed.hic@1000000@KR /p/keles/schic/volumeA/Script/Downstream/higashi/higashi_HAP1_chr1_estimated.bed.hic@1000000@KR /p/keles/schic/volumeA/Script/chess/chr1.bed /p/keles/schic/volumeA/Script/chess/higashi_H1Esc_HAP1_chr1_results.tsv --oe-input'
2021-02-03 14:11:53,828 INFO CHESS version: 0.3.6
2021-02-03 14:11:53,828 INFO FAN-C version: 0.9.14
2021-02-03 14:11:53,831 INFO Loading reference contact data
2021-02-03 14:12:04,050 INFO Loading query contact data
2021-02-03 14:12:14,455 INFO Loading region pairs
2021-02-03 14:12:14,455 WARNING 15 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2021-02-03 14:12:14,455 ERROR No valid region pairs found; aborting.

Error when running CHESS on JUICER HIC files

Hello,

I would highly appreciate your help with the issues I have been facing when trying to run CHESS on my HIC files.

I have installed CHESS (0.3.6) and successfully tested it on the data provided in the repository.
All output files and plots (for CHESS data sets) have been generated by following the instructions provided in the online manual and the jupyter notebook.

Unfortunately, the tool has crashed when new data (Juicer HIC files) had been provided.
Here is the CHESS error message for the command I have used:

chess sim -p 6 CTRL_1.hic ./Tip5_1.hic mm10_chr2_3mb_win_100kb_step.bed ./chr2.result
2021-02-11 12:36:01,017 INFO CHESS version: 0.3.6
2021-02-11 12:36:01,018 INFO FAN-C version: 0.9.10
2021-02-11 12:36:01,019 INFO Loading reference contact data
2021-02-11 12:37:47,218 INFO Loading query contact data
2021-02-11 12:39:05,585 INFO Loading region pairs
2021-02-11 12:39:05,589 WARNING 2392 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2021-02-11 12:39:05,589 ERROR No valid region pairs found; aborting.

I have been working with MM10 genome and the hic files have been produced with Juicer.
Here is the validation of the HIC files done with the Juicer tools:

TIP5_1 juicer tool validation:
Reading file: ./Tip5_1.hic
File has normalization: VC
Description: Coverage
File has normalization: VC_SQRT
Description: Coverage (Sqrt)
File has normalization: KR
Description: Balanced
File has zoom: BP_2500000
File has zoom: BP_1000000
File has zoom: BP_500000
File has zoom: BP_250000
File has zoom: BP_100000
File has zoom: BP_50000
File has zoom: BP_25000
File has zoom: BP_10000
File has zoom: BP_5000

CTRL_1 validation:
Reading file: ./CTRL_1.hic
File has normalization: VC
Description: Coverage
File has normalization: VC_SQRT
Description: Coverage (Sqrt)
File has normalization: KR
Description: Balanced
File has zoom: BP_2500000
File has zoom: BP_1000000
File has zoom: BP_500000
File has zoom: BP_250000
File has zoom: BP_100000
File has zoom: BP_50000
File has zoom: BP_25000
File has zoom: BP_10000
File has zoom: BP_5000

Any help on the current issue would be highly appreciated.
With best regards.
Ross

plot output?

Sorry this is a question rather than an issue, but does the chess-hic CLI tool outputs any sort of plots, like in your example here https://github.com/vaquerizaslab/chess/blob/master/examples/dlbcl/example_analysis.ipynb?

I run chess pairs, sim, and extract commands with my data, but I don't see any accompanying plots for the gained or lost features. i only see the tables.

NaN continued

Dear chess team,
I have similar results as described in #5 .

I can directly rule out the chr naming issue as I successfully ran chess sim on the same matrices but with larger win size ie :

win=500Kb ; step=100Kb => works
win=250Kb ; step=50Kb => works
win=100Kb ; step=20Kb => only NaN
win=50Kb ; step=25Kb => only NaN

My genome is dm6, I am using an ICEd matrix with 10K bins (I should not have much NA bins).
Is this expected?

non

Chess sim command runtime improvement heuristics?

I’m encountering significant runtime problems when calculating a similarity score for a single target region. I’m performing a comparison between two species over a single syntenic region with balanced but not O/E cooler format inputs with 18 threads, though it doesn’t seem to be using most of those threads most of the time. I’m using the --background-query and --limit-background options.

It is at more than six days of runtime at this point and still has not actually returned any output. Is there a way to speed this up significantly, by selecting a restricted background decided in some other way or using different format files? I will be needing to run the sim command many more times on many more regions if I want to incorporate it into my project, and this is unfeasibly slow.

Output below:

2020-11-09 11:28:04,305 INFO Running '/home/jmcbroome/anaconda3/bin/chess sim --background-query -p 18 --limit-background human_chr1_hic_test.corrected.cool chicken_hic.corrected.cool chess_test.bedpe chess_single_test.txt'
2020-11-09 11:28:09,670 INFO CHESS version: 0.3.5
2020-11-09 11:28:09,670 INFO FAN-C version: 0.9.7
2020-11-09 11:28:09,673 INFO Loading reference contact data
Expected 100% (20923106 of 20923106) |################################################################################################################################################################| Elapsed Time: 1:01:49 Time: 1:01:49
2020-11-09 17:21:29,197 INFO Loading query contact data
Expected 99% (25409016 of 25665672) |###################################################################################################################################################### | Elapsed Time: 1 day, 19:55:08 ETA: 0:00:45
Expected 100% (25665672 of 25665672) || Elapsed Time: 6 days, 18:56:44 Time: 6 days, 18:56:44

It's difficult to find any significant changes in 3D organization of chromatin in highly dissimilar regions

Dear all,

After finishing chess sim, i extracted the highly dissimilar regions defined by low z-ssim and high SN values. After visualization, i found that it is difficult for me to catch any significant changes in 3D organization of chromatin in these regions. I have attached all the figures (https://drive.google.com/file/d/1TBcSvr6QKDqioP0EEefldHmhEjIPq4JE/view?usp=sharing), does it mean my results have a high false positive (FP)? If that is, do you have any suggestions to reduce the FP? Thank you very much.

Best wishes,
Zheng zhuqing

error on running chess sim

Hi, I was using two hic matrices (generated from juicer in resolution 50000, chromosome was all, when I trying to use chess sim but shows the following error. Do you know the possible reason? Thank you!
_2022-05-25 09:52:23,118 INFO Running '/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess sim -p 1 SAMN09691012.hic@50kb SAMEA7629254.hic@50kb susScr11_chr1_5mb_win_200kb_step.bed SAMN09691012_vs_SAMEA7629254_chess_results.tsv'
2022-05-25 09:52:37,504 INFO CHESS version: 0.3.7
2022-05-25 09:52:37,505 INFO FAN-C version: 0.9.23
2022-05-25 09:52:37,510 INFO Loading reference contact data
Traceback (most recent call last):
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess", line 585, in
Chess()
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess", line 75, in init
getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess", line 118, in sim
reference_matrix_file, reference_regions_file)
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/chess/helpers.py", line 554, in load_contacts
edges = oe_edges_dict_from_fanc(reference_loaded)
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/chess/helpers.py", line 389, in oe_edges_dict_from_fanc
for e in hic.edges(oe=True, lazy=True):
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/fanc/matrix.py", line 696, in call
bias = self._regions_pairs.bias_vector()
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/fanc/compatibility/juicer.py", line 1198, in bias_vector
x = np.array([r.bias for r in self.regions(lazy=True)])
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/fanc/compatibility/juicer.py", line 1198, in
x = np.array([r.bias for r in self.regions(lazy=True)])
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/fanc/compatibility/juicer.py", line 856,
in _region_iter
norm = self.normalisation_vector(chromosome)
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/fanc/compatibility/juicer.py", line 745,
in normalisation_vector
JuicerHic._skip_to_normalisation_vectors(req)
File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/fanc/compatibility/juicer.py", line 569,
in skip_to_normalisation_vectors
n_vectors = struct.unpack('<i', req.read(4))[0]
struct.error: unpack requires a buffer of 4 bytes

chess filter Unrecognized command

Dear @nickmachnik

I found that the filter command can not be recognized in the latest chess.

Best wishes,
Zheng zhuqing

speed up the chess run

Dear all,

Thank you for providing this tool. But I'm wondering if it is possible to speed up the chess run. I'm suffering from the slow chess sim process. It takes more than 4 days to read a primate reference balanced cool file. The cool file is in 20Kb resolution generated from ~30X Hi-C reads, about 2Gb in size. Since I'm only interested in one particular chromosome, is it possible to generate a subset of the data as input for chess?

Many thanks

Best,
Yang

non

questions about the results of chess extract

Hi @nickmachnik

After running chess extract, two files will be generated, gained_features.tsv and lost_features.tsv. I found the number of columns is variable (like following, the first row has 2457 columns but the second row has 22 columns). Thus, could you explain what each column represents? Thank you very much.

perl -e 'while(<>){chomp; @A=split/,/; print scalar(@A),"\n"}' lost_features.tsv  | head
2457
22
294
2424
15

Best wishes,
Zheng Zhuqing

CNV bias in normalization

Hello CHESS team,

This is not a issue. I wanted to ask you if the Obs/Exp normalization in CHESS somehow corrects CNV bias or if this something not needed to take into account.

Many thanks
Jorge

Different resolution produce different result

Dear all

According to your published paper, I defined regions with a z-normalized similarity score ≤−1.2 and a signal-to-noise ratio of r ≥ 0.6 as the significant changing. When using 40kb resolution, I can indentify some changing regions. However, it is difficult to catch any any significant changes in the interaction heatmap in these regions.

Continuing from the previous discussion (#34), @liz-is has given us some suggestions to adjust appropriate parameters. Thus, I tried 25kb resolution. In this case, no region can be identified as changing regions.

Does this mean that there is no significant changes in 3D organization of chromatin between these two samples?

Best regards,
Zheng zhuqing

BED / BEDPE file handling - BEDPE file produced by `chess pairs` is not 0-based

Hello,

I was doing some manual processing of my CHESS results to figure out the best parameters for chess extract and noticed that the region coordinates returned by chess.helpers.load_pairs_iter don't match the coordinates given in the pairs file - e.g. the file contains a region "chr9:21900001-31900001", which is loaded as "chr9:21900002-31900001". This occurs because when converting the region into a GenomicRegion object, chess.helpers.load_pairs_iter adds 1 to the start coordinate. This makes sense because BED files, and by extension BEDPE files, are 0-based.

Tracing this back through my file processing (because I've definitely made 0-based / 1-based coordinate conversion errors a bunch of times!), I realised that this comes from the original pairs BEDPE file that was generated with chess pairs, which returns pairs starting with e.g. chr1 1 10000001 chr1 1 10000001 0 . + +

This is of course a very minor issue that's not going to have any impact on actual results given that Hi-C data is binned at much lower resolution and CHESS runs on large genomic regions, but for consistency with other software and ease of handling, it would be nice if the BEDPE files were 0-based! :)

I would be happy to make the change and make a pull request if you are happy with that and it's easiest for you.

Nan Continued

Hello,
Sorry for another post about Nan but I think I've ruled out the other issues from the other posts.

I tried three comparisons between a wild type and a mutant HiC dataset, subsetting for a particular chromosome.

chess sim
WT_combined_chr5.hic
MUT_het_new_combined_chr5.hic
hg38_chr5_1mb_win_10kb_step_2.bed
OUTPUT_1mb_win_10kb_step_chess_results.tsv

I tried three bed files:
3mb_win_100kb_step
1mb_win_10kb_step
100kb_win_10kb_step

The first two give me this log file:
2021-05-19 03:40:54,378 INFO Running '/home/lkw10/.conda/envs/my_CHESS_env_374/bin/chess sim PGP1_combined_chr5.hic CHD4_het_new_combined_chr5.hic hg38_chr5_3mb_win_100kb_step_2.bed PGP1_combined_chr5_vs_CHD4_het_new_combined_chr5_3mb_win_100kb_step_chess_results.tsv'
2021-05-19 03:40:56,359 INFO CHESS version: 0.3.6
2021-05-19 03:40:56,359 INFO FAN-C version: 0.9.18
2021-05-19 03:40:56,361 INFO Loading reference contact data
2021-05-19 03:45:26,042 INFO Loading query contact data
2021-05-19 03:48:41,747 INFO Loading region pairs
2021-05-19 03:48:41,913 WARNING 147 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2021-05-19 03:48:41,926 INFO Launching workers
2021-05-19 03:48:42,115 INFO Submitting pairs for comparison
2021-05-19 04:13:58,868 INFO Could not compute similarity for 28497 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
2021-05-19 04:14:04,193 INFO Finished '/home/lkw10/.conda/envs/my_CHESS_env_374/bin/chess sim PGP1_combined_chr5.hic CHD4_het_new_combined_chr5.hic hg38_chr5_3mb_win_100kb_step_2.bed PGP1_combined_chr5_vs_CHD4_het_new_combined_chr5_3mb_win_100kb_step_chess_results.tsv'

The third (most high res) log gives me this
2021-05-19 04:59:58,617 INFO Running '/home/lkw10/.conda/envs/my_CHESS_env_374/bin/chess sim PGP1_combined_chr5.hic CHD4_het_new_combined_chr5.hic hg38_chr5_100kb_win_10kb_step_2.bed PGP1_combined_chr5_vs_CHD4_het_new_combined_chr5_100kb_win_10kb_step_chess_results.tsv'
2021-05-19 04:59:59,942 INFO CHESS version: 0.3.6
2021-05-19 04:59:59,942 INFO FAN-C version: 0.9.18
2021-05-19 04:59:59,943 INFO Loading reference contact data
2021-05-19 05:04:33,082 INFO Loading query contact data
2021-05-19 05:07:54,585 INFO Loading region pairs
2021-05-19 05:07:58,064 WARNING 9089 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2021-05-19 05:07:58,243 ERROR All regions need to span at least 20 bins. The provided reference regions span at most 10 bins. Please try again with larger regions or a smaller bin size. The bin size of the input data has been detected to be 10000

the bed files look like this:
1 1 100001 1 1 100001 0 . + +
1 10001 110001 1 10001 110001 1 . + +
1 20001 120001 1 20001 120001 2 . + +
1 30001 130001 1 30001 130001 3 . + +

They were generated using chess pairs and I removed the "chr" label.

The hiC files were generated using HOMER from pairfiles generated using cooler/Juicer.

I would very much appreciate your help

Should the users be concerned about the problem raised in the new Contradictory Results bioRxiv preprint?

Hi CHESS,

First of all, thanks so much for letting the users updated about this beautiful software! Like many, I was also recently trying to use CHESS to analyze Hi-C data that our lab has generated using mutagenized zebrafish embryo, but today I came across with this bioRxiv preprint which seems to raise a major concern about this software (Lee, H., Blumberg, B., Lawrence, M. S., and Shoida, T. "Revisiting the Use of Structural Similarity Index in Hi-C" bioRxiv (2021).).

"...here we show that the primary outputs of CHESS–namely, the structural similarity index (SSIM) profiles–are nearly identical regardless of the input matrices, even when query and reference reads were shuffled to destroy any significant differences. This issue stems from the dominance of the regional counting noise arising from stochastic sampling in chromatin-contact maps, reflecting a fundamentally incorrect assumption of the CHESS algorithm. Therefore, biological interpretation of SSIM profiles generated by CHESS requires considerable caution."

I am not a bioinformatician and therefore do not fully understand the technical details presented in their preprint...

Should the users be concerned about this problem? It seems like #34 and #48 are quite related to the concerns raised by the authors of the preprint, but my impression was that the authors were arguing that ssim is unable to measure similarities between Hi-C matrices from the same genomic locus and is worsening the differential contact analysis that is actually done instead by the signal-to-noise ratio.

Is there any method that users can use this software without confronting the concern raised by H. Lee et al.? Or do we might have to wait for further major updates on either the software or the manuscript?

Thanks in advance,

non

closed

non

Chess sim quits with ERROR "all regions need to span at least 20 bins "

I have two hic files, and I want to compare the differences across this files for a given set of loops (bedpe). Whenever I use the hic files, even at the lowest resolution of 5k, I always get the 20 bin error. Is there any way to disable this?

The example hic file cannot be dumped

Dear all,

I found that the hic files cannot be dumped using juicer_tools. The command gives me following message. Any suggestion about visualing the example hic files will be appreciated.
java -jar juicer_tools_1.22.01.jar dump observed NONE ukm_control_fixed_le_25kb_chr2.hic chr2 chr2 BP 10000 output
WARN [2020-11-21T16:54:14,373] [Globals.java:138] [main] Development mode is enabled
Could not read hic file: null

Best wishes,
Zheng zhuqing

'Series' object has no attribute 'bias' in `chess sim`

thanks a lot.

when I ran sim command, I got this error:
AttributeError: 'Series' object has no attribute 'bias'.
the version of pandas is pandas-0.25.3.
do you think that the pandas should be degraded to pandas (0.22.0)?
best,

Originally posted by @BenxiaHu in #2 (comment)

The input image seems to have just one color 0.

Dear author,
When I used chess extract, I came up with some problems as follows:

File "/public/home/longxin/tools/miniconda3/bin/chess", line 550, in
Chess()
File "/public/home/longxin/tools/miniconda3/bin/chess", line 75, in init
getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
File "/public/home/longxin/tools/miniconda3/bin/chess", line 532, in extract
args.closing_square)
File "/public/home/longxin/tools/miniconda3/lib/python3.6/site-packages/chess/get_structures.py", line 154, in extract_structures
filter2 = filters.threshold_otsu(filter_negative, nbins=size)
File "/public/home/longxin/tools/miniconda3/lib/python3.6/site-packages/skimage/filters/thresholding.py", line 283, in threshold_otsu
"to have just one color {0}.".format(image.min()))
ValueError: threshold_otsu is expected to work with images having more than one color. The input image seems to have just one color 0.

I don't understand what is the input image color. BTW, my input data is .cool format. I wonder if it's related to my filterd regions. I only filter regions with z-ssim < -1, because my SN values are all below 0.15, which seems difficult to set a threshold. If it's not raised by the filterd region, what is it?
THANKS for your answering!

chess sim results are all nan

Hi author:

I tried to use chess to compare two .hic file. I ran the following command:

chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out # generate window for the whole genome

chess sim \
HiC_control.hic \
HiC_treat.hic \
hg38_1mwin_100kstep.out \
Ctrl_Treat_1mwin_100kstep_diff.out.tsv

While the ./hg38_1mwin_100kstep.out looks fine to me, the final result Ctrl_Treat_1mwin_100kstep_diff.out.tsv contains all nan like the following:

ID      SN      ssim    z_ssim
0       nan     nan     nan
1       nan     nan     nan
2       nan     nan     nan
3       nan     nan     nan
4       nan     nan     nan
5       nan     nan     nan
6       nan     nan     nan
7       nan     nan     nan
8       nan     nan     nan
9       nan     nan     nan
10      nan     nan     nan
11      nan     nan     nan
12      nan     nan     nan
13      nan     nan     nan
...

Did I miss anything?
Thank you very much!

Error while running "example analysis"

Hello,

I followed instructions for the example analysis but I keep getting an error.

Log:

wget https://github.com/vaquerizaslab/chess/blob/master/examples/dlbcl/ukm_control_fixed_le_25kb_chr2.hic
wget https://github.com/vaquerizaslab/chess/blob/master/examples/dlbcl/ukm_patient_fixed_le_25kb_chr2.hic
chess pairs hg38 3000000 100000 ./hg38_chr2_3mb_win_100kb_step.bed --chromosome chr2
chess sim ukm_control_fixed_le_25kb_chr2.hic ukm_patient_fixed_le_25kb_chr2.hic hg38_chr2_3mb_win_100kb_step.bed ukm_chr2_3mb_control_vs_patient_chess_results.tsv

2020-11-18 10:32:12,627 INFO CHESS version: 0.3.5
2020-11-18 10:32:12,627 INFO FAN-C version: 0.9.7
2020-11-18 10:32:12,631 INFO Loading reference contact data
2020-11-18 10:32:13,514 ERROR Regions file needs to bespecified for sparse input.
2020-11-18 10:32:13,515 ERROR Reference contact data could not be loaded. Please specify a valid input file. Files in sparse format can only be loaded if --reference-regions is specified.
File type not recognised (ukm_control_fixed_le_25kb_chr2.hic).

I assume the example analysis was supposed to work as is (referring to the "Reference contact data could not be loaded. Please specify a valid input file. Files in sparse format can only be loaded if --reference-regions is specified").
What do you think?

Thanks a lot,
Gony.

Error when run chess_extract

Hi, I always got this error recently when run the chess_extarct command:
do you have any idea about this error?

Traceback (most recent call last):
File "/home/ch220811/.local/bin/chess", line 585, in
Chess()
File "/home/ch220811/.local/bin/chess", line 75, in init
getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
File "/home/ch220811/.local/bin/chess", line 567, in extract
args.closing_square)
File "/home/ch220811/.local/lib/python3.6/site-packages/chess/get_structures.py", line 148, in extract_structures
filter1 = filters.threshold_otsu(filter_positive, nbins=size)
File "/home/ch220811/.local/lib/python3.6/site-packages/skimage/filters/thresholding.py", line 285, in threshold_otsu
hist, bin_centers = histogram(image.ravel(), nbins, source_range='image')
File "/home/ch220811/.local/lib/python3.6/site-packages/skimage/exposure/exposure.py", line 139, in histogram
hist, bin_edges = np.histogram(image, bins=nbins, range=hist_range)
File "<array_function internals>", line 6, in histogram
File "/home/ch220811/software/lib/python3.6/site-packages/numpy/lib/histograms.py", line 856, in histogram
decrement = tmp_a < bin_edges[indices]
IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 102

Somtimes I also got this error:
/slurm/reports/var/spool/slurm/d/job466467/slurm_script: line 15: 45401 Killed /home/ch220811/.local/bin/chess extract ${TSV} ${MATRIX1} ${MATRIX2} ${OUT}
Is this because my task got killed by my IT manager or the system due to memory size limit?

Thank you!

Prepare the normalized hic input for CHESS

Dear all,

Recently, I found that FAN-C ran slowly and have discussed here (vaquerizaslab/fanc#42). I will try the pipeline recommended by @kaukrise . Does anyone have experience to prepare the input for CHESS without using FAN-C? Hope you can share us your experience.

Best wishes,
Zheng zhuqing

chess --version doesn't work?

At least for me with Python 3.9 and chess version 0.3.6.

I don't really understand how the way the parser is handling this at the moment works, although fanc seems to use the same approach successfully. All my google searching lead to a different way of handling this, like this, which works for me:

from .version import __version__
parser.add_argument('--version', action='version', version=__version__)

From what I've seen online it seems like from .version import __version__ only needs to be in __init__.py, but it didn't work until I added this into commands.py as well. I tried just adding that line at the top of commands.py and keeping the code the same, but that didn't help.

how to get the actual positions of the highly dissimilar regions highlighted by chess extract

Dear all,

After generating the gain and lost feature results, how can i get the actual genomic positions of these features? Thank you.

Best,
Zheng zhuqing

how to deal with biological replicate

Dear all,

If we have biological data, how to run chess sim? Thank you very much.

Best wishes,
Zheng zhuqing

BEDPE file format

Hi, I am trying to run chess sim command.
In 'pairs' option, I tried to input hiccups output which is composed of 6 columns (chr1, start1, end1, chr2, start2, end2).
However, an error occurred saying a file with 10 columns is expected.
Could you let me know the exact file format for pairs option in chess sim command?
Thanks.

non

chess sim result wit all NAN

Hi,

I have an issue similar to #5 and # 9. The results from chess sim are all NAN. I checked the conversation in #5 and #9 issues, but my situation looks different.

Here is how I run it:

First of all, I downloaded the example files and ran them successfully. It means my system works.

I generate a bed file by chess pair:
head mm10_chr1_3mb_win_100kb_step.bed
chr1 1 3000001 chr1 1 3000001 0 . + +
chr1 100001 3100001 chr1 100001 3100001 1 . + +
chr1 200001 3200001 chr1 200001 3200001 2 . + +
chr1 300001 3300001 chr1 300001 3300001 3 . + +
chr1 400001 3400001 chr1 400001 3400001 4 . + +
chr1 500001 3500001 chr1 500001 3500001 5 . + +
chr1 600001 3600001 chr1 600001 3600001 6 . + +
chr1 700001 3700001 chr1 700001 3700001 7 . + +
chr1 800001 3800001 chr1 800001 3800001 8 . + +
chr1 900001 3900001 chr1 900001 3900001 9 . + +

Then I run: chess sim reference.balanced.chr1.cool query.chr1.cool mm10_chr1_3mb_win_100kb_step_test.bed test3_chr1.tsv

The cool files are balanced by cooler and the resolution are 20kb

Here is how the log shows: 2020-11-09 19:38:59,424 INFO Note: detected 72 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2020-11-09 19:38:59,424 INFO Note: NumExpr detected 72 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-11-09 19:38:59,424 INFO NumExpr defaulting to 8 threads.
2020-11-09 19:39:02,755 INFO CHESS version: 0.3.4
2020-11-09 19:39:02,755 INFO FAN-C version: 0.9.6
2020-11-09 19:39:02,759 INFO Loading reference contact data
Expected 100% (3209946 of 3209946) |#####| Elapsed Time: 0:06:28 Time: 0:06:28
Expected 100% (5892473 of 5892473) |#####| Elapsed Time: 0:11:48 Time: 0:11:48
2020-11-09 21:00:25,584 INFO Loading region pairs
2020-11-09 21:00:25,783 INFO Launching workers
2020-11-09 21:00:26,240 INFO Submitting pairs for comparison
2020-11-09 21:02:40,942 INFO Could not compute similarity for 1925 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins.

I couldn't figure out the problem. The window size is big engoug. I also tried to remove 'chr' in the bed file like said in #5. But it does not work.

Could you please help with it?

Thanks

Gang

Get observed/expected from Juicer Hi-C

Hi, is there a way to get observed/expected from Juicer Hi-C using fanc.load function?

Run time of chess sim module

Hi,

I wonder what would be an approximate run time for sim module at different matrix resolutions, window steps and chromosomes (human genome)? Have you provided this data anywhere (haven't found it in the paper itself)?

Thanks,
Mikhail

Extracting genomic coordinates of differential features

Dear chesser,

I am using the code in example_analysis.ipynb to learn where the differential regions are. I am not really a python guy and would appreciate if you could help me to solve the following issues:

the script currently plot (at the end) 3 matrices. The 2 first one are OK to visualize the TAD structures but the scale is often not adapted to loop visualisation. I'd like to replace the HiC signal with the observed/exp value in the first 2 matrices; but I don t know how to do this. The matrices I am loading are fanc hic matrices.
the script has a commented margin that I believe would allow to increase the visualized region. It is a great idea for the smaller regions and I would like to use this. Unfortunately, when I uncomment the code, the squares (corresponding to the extracted regions) are completely off. Could you help having a version of the script working with the margin ?

Thank you !

non

Error when a chromosome in the pairs file is not in the HiC matrix

When I run chess sim, I get this error, probably because my pairs file contains chromosomes that aren't in the Hi-C data. The problem is that it crashes after running for some hours, so maybe chess sim could issue a warning instead, or warn of the future crash earlier?
Thank you

2020-04-24 13:47:35,830 INFO Running 'chess sim pos_1.sparse pos_1.regions neg_1.sparse neg_1.regions grch38.pairs chess_results/sample_1_pos_vs_neg_comparison.tsv'
2020-04-24 13:47:43,779 INFO [MAIN]: Loading and indexing Hi-C regions
2020-04-24 13:49:16,673 INFO [MAIN]: Loading Hi-C contacts
2020-04-24 13:49:16,674 INFO [MAIN]: Converting to observed / expected
2020-04-24 16:28:18,060 INFO [MAIN]: Loading region pairs
2020-04-24 16:28:43,954 INFO Launching workers
2020-04-24 16:28:49,646 INFO Submitting pairs for comparison
2020-04-24 20:46:30,866 ERROR Worker 8862c7f9-5e90-4938-a8a0-3bb337d4d4c1 encountered a problem: 'chr5_GL339449v2_alt'
Traceback (most recent call last):
  File "/home/mrigaud/.local/bin/chess", line 982, in <module>
    Chess()
  File "/home/mrigaud/.local/bin/chess", line 78, in __init__
    getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
  File "/home/mrigaud/.local/bin/chess", line 274, in sim
    raise out
KeyError: 'chr5_GL339449v2_alt'

issue of normalized/chess extract

Hi, author:
When I used chess , I came up with some problems as follows:
First, I want ask do you normalized your .hic fie? If so , could you tell me how to do that? The software or code you achieve that.
Second , I have .hic file already ,it produced form Juicer . So could you tell me how extract per chormosome and reslution data form .hic file? Juicertools Pre or dump ? maybe other software ?
Third , I met some issue , I use hicexplorer hicConvertFormat convert hic to cool
hicConvertFormat -m sample_inter_30.hic --inputFormat hic --outputFormat cool -o ~/opt/juicer/c
hess/sample.cool --resolutions 50000
then , I extract chromosome from .cool
hicConvertFormat -m hub_50000.cool --inputFormat cool --outputFormat cool -o hub_50000.chr1.cool --chromosome 1
and chess pairs and chess sim was worked , but chess extract error(see chess_extract.err File)
so I tried use normalized by:
hicConvertFormat -m hub_50000.cool --inputFormat cool --outputFormat cool -o hub_50000.chr1.cool --chromosome 1 --correction_name KR
then chess rxtract not disruption ,but unfortunately I got all nan
there are some files if you need:
2022-01-23 17:50:44,884 INFO Running '/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess extract Sscrofa11.1_chr1_3mb_win_100kb_step.bed hub_50000.chr1.cool nmg08_50000.chr1.cool features/' 2022-01-23 17:50:54,086 INFO CHESS version: 0.3.7 2022-01-23 17:50:54,086 INFO FAN-C version: 0.9.21 2022-01-23 17:50:54,090 INFO Loading reference contact data 2022-01-23 17:52:32,215 INFO Loading region pairs 2022-01-23 17:52:32,255 INFO Applying image filtering to identify specific structures Traceback (most recent call last): File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess", line 585, in <module> Chess() File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess", line 75, in __init__ getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:]) File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/bin/chess", line 567, in extract args.closing_square) File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/chess/get_structures.py", line 148, in extract_structures filter1 = filters.threshold_otsu(filter_positive, nbins=size) File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/skimage/filters/thresholding.py", line 364, in threshold_otsu counts, bin_centers = _validate_image_histogram(image, hist, nbins) File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/skimage/filters/thresholding.py", line 307, in _validate_image_histogram image.ravel(), nbins, source_range='image', normalize=normalize File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/skimage/_shared/utils.py", line 338, in fixed_func return func(*args, **kwargs) File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/skimage/exposure/exposure.py", line 267, in histogram hist, bin_centers = _histogram(image, nbins, source_range, normalize) File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/skimage/exposure/exposure.py", line 301, in _histogram hist, bin_edges = np.histogram(image, bins=bins, range=hist_range) File "<__array_function__ internals>", line 6, in histogram File "/home/zhaoqianyi/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/numpy/lib/histograms.py", line 857, in histogram decrement = tmp_a < bin_edges[indices] IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 61
The file named ''Sscrofa11.1_chr1_3mb_win_100kb_step.bed'' foemat like:

and my code of chess extract is:
chess extract Sscrofa11.1_chr1_3mb_win_100kb_step.bed hub_50000.chr1.cool sample_50000.chr1.cool features/
THANKS and looking forward for your answering!

NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

Dear author,

the -t option seems to be ignored. I am always getting the warning message NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
I tried setting the NUMEXPR_MAX_THREADS but the same warning is issued.

> chess --version
0.3.2

How to define the high confidence set of regions showing clear changes?

Dear all,

How to obtain the most different chromatin contact according to chess sim results. According to your Nature Genetics paper, I think that a lower z-ssim (the most import parameter) and higher SN (the second parameter) could represent the higher confidence. Is this right? Thank you very much.

Sincerely,
Zheng zhuqing

Conditions for conservation analysis of syntenic blocks

Hi!

I am analyzing the level of chromatin conformation conservation between different species, and chess looks like the perfect tool for me! I already have the synteny blocks between the species I am working with, however I am not sure on what window size or step should I use. First, I would like to replicate your paper's analysis but I don't know which conditions were used, could you help me with this? Do you have any suggestion on what parameters should I take into account when running this type of analysis? Is chess sensible enough to address TAD variation in interspecies analysis?

Many thanks

Lucía

error of the chess extract

Hi, I always got this error recently when run the chess_extarct command:
do you have any idea about this error
when i input tsv format file ,it shouls:
Traceback (most recent call last):
File "/miniconda3/envs/hicex3.6/bin/chess", line 585, in
Chess()
File "/miniconda3/envs/hicex3.6/bin/chess", line 75, in init
getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
File "/miniconda3/envs/hicex3.6/bin/chess", line 555, in extract
pairs = list(load_pairs_iter(pairs_file))
File "/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/chess/helpers.py", line 469, in load_pairs_iter
).format(len(fields)))
ValueError: 10 columns expected but 4 found in bedpe input
but when i put bedpe format fie, it shows
Traceback (most recent call last):
File "/miniconda3/envs/hicex3.6/bin/chess", line 585, in
Chess()
File "/miniconda3/envs/hicex3.6/bin/chess", line 75, in init
getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
File "/miniconda3/envs/hicex3.6/bin/chess", line 549, in extract
query_matrix_file, query_regions_file)
File "/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/chess/helpers.py", line 581, in load_oe_contacts
edges = edges_dict_from_fanc(reference_loaded)
File "/miniconda3/envs/hicex3.6/lib/python3.7/site-packages/chess/helpers.py", line 371, in edges_dict_from_fanc
for e in hic.edges(lazy=True):
AttributeError: 'Bed' object has no attribute 'edges'

by the way , in the example , it shows i when i run command 'chess extract ' should use a file named filtered_regions_chr2_3mb_100kb.tsv , but i didn't seeing it before , so could you give me some help for this?

hints to chess extract

Dear,

in the chess extract doc I read we are planning to release a guide to this in the future.
Would you by any chance have a draft or few recommendations to give away ?

Between the matrix bin size, the window size and the different extract parameters; it is a lot to test. Any guidance would be greatly appreciated.

chess extract fails with `ValueError: Image must contain only positive values`

Dear authors,

I am trying out chess on my drosophila data. I can reproduce the WF up to the extract feature step; which fails after 90 mns of run with the error ValueError: Image must contain only positive values (full sterr with error stack below). The matrices I passed are the same as in the first step and are in cool format; generated with HicExplorer suite (with ICE correction).

Also, I checked and my cool matrices have no negative values but contains a few hundreds of NaN (masked bins I assume)

Could you please help ?

2020-10-21 18:47:53,749 INFO Running '/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess extract /g/furlong/project/69_Gaby_Hi-C_Mutants/analysis/HiC-chess/search_250kb_win_50kb_step/features/chr2R_WT_23h_vs_KO_23h_chess_results_filtered.bed /g/furlong/project/69_Gaby_Hi-C_Mutants/data/HiC/HiC_Bridge_Merged/hicmatrices/binned/10K/corrected/WT_23h_raw_10K_mrged.corrected.cool /g/furlong/project/69_Gaby_Hi-C_Mutants/data/HiC/HiC_Bridge_Merged/hicmatrices/binned/10K/corrected/KO_23h_raw_10K_mrged.corrected.cool /g/furlong/project/69_Gaby_Hi-C_Mutants/analysis/HiC-chess/search_250kb_win_50kb_step/features'
2020-10-21 18:47:53,750 INFO Loading reference contact data
2020-10-21 18:48:13,070 INFO Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-10-21 20:18:50,367 INFO Loading region pairs
2020-10-21 20:18:50,385 INFO Applying image filtering to identify specific structures
Traceback (most recent call last):
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess", line 548, in <module>
    Chess()
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess", line 76, in __init__
    getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess", line 521, in extract
    extract_structures(
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/lib/python3.8/site-packages/chess/get_structures.py", line 123, in extract_structures
    denoise_positive = restoration.denoise_bilateral(
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/lib/python3.8/site-packages/skimage/restoration/_denoise.py", line 205, in denoise_bilateral
    raise ValueError("Image must contain only positive values")
ValueError: Image must contain only positive values

Could not compute similarity for...

Hi folks,

Some of my region pairs are being deemed invalid, but I don't think they fall into any of the possible reasons given. Do you have any other ideas what the issue might be? Is there a way I can get more diagnostic info to try to debug this myself (without having to dig deep into the code and run each step manually, which I can do if necessary)?

Here's the error message:

2021-01-15 14:35:17,634 INFO Running '/home/research/vaquerizas/liz/project_ko/.ko_venv/bin/chess sim data/hic/ko_Rep1/hic/ko_Rep1_10kb.hic data/hic/wt_Rep1/hic/wt_Rep1_10kb.hic data/chess/dm6_pairs_150x_10kb.bedpe data/chess/ko_Rep1_vs_wt_Rep1/genome_scan_150x_10kb.txt -p 8'
2021-01-15 14:35:26,020 INFO CHESS version: 0.3.6
2021-01-15 14:35:26,021 INFO FAN-C version: 0.9.11
2021-01-15 14:35:26,052 INFO Loading reference contact data
2021-01-15 14:38:42,767 INFO Loading query contact data
2021-01-15 14:43:31,332 INFO Loading region pairs
2021-01-15 14:43:31,690 INFO Launching workers
2021-01-15 14:43:33,110 INFO Submitting pairs for comparison
2021-01-15 14:45:01,759 INFO Could not compute similarity for 6316 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
2021-01-15 14:45:20,267 INFO Finished '/home/research/vaquerizas/liz/project_ko/.ko_venv/bin/chess sim data/hic/ko_Rep1/hic/ko_Rep1_10kb.hic data/hic/wt_Rep1/hic/wt_Rep1_10kb.hic data/chess/dm6_pairs_150x_10kb.bedpe data/chess/ko_Rep1_vs_wt_Rep1/genome_scan_150x_10kb.txt -p 8'
Closing remaining open files:data/hic/ko_Rep1/hic/ko_Rep1_10kb.hic...donedata/hic/wt_Rep1/hic/wt_Rep1_10kb.hic...done

This is Drosophila Hi-C data. I've tried different resolutions and two different window sizes (100x and 150x the bin size). The pairs file for each parameter combo was generated with chess pairs from the same text file with the chromosome sizes (and these files look okay to me from a quick glance).

In each example, all bins from certain chromosomes are missing! In particular, chr 2R and 3R. However I get results for these chrs at 25kb resolution so I don't think there is a chromosome naming mismatch between the files or anything like that.

(N.B., it makes sense that there are no valid pairs on chr 4 at 25kb resolution, since I'm using a window size of at least 2.5 Mb, which is larger than the chromosome size. Same for 10 kb resolution with 150x window size)

I would have thought that it would be a resolution issue (i.e. too many unmappable bins), but having plotted each chromosome at 10kb resolution in both my query and my reference, they look fine. Some unmappable bins but I'd expect to get some results - they don't look any worse than other chromosomes.

I'm happy to look into this further myself since I have some familiarity with the code by now, but I'm not really sure where to start. Do you have any ideas?

I am using a development version of FAN-C, but @kaukrise said that it should work fine.

Also, as a more general comment, would it be possible to implement a more informative version of this message?
2021-01-15 14:45:01,759 INFO Could not compute similarity for 6316 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
I've seen other questions relating to this, so it seems like a common issue/point of confusion. Although most of the time this is easy to solve, it would be helpful to know which of those three possibilities accounts for the invalid pairs as a starting point for debugging.

ValueError: range() arg 3 must not be zero

Good evening,

I am testing chess to compare two .hic matrix that I have obtained with HiCpro.
I am using the following command:

chess sim WT.allValidPairs.hic
KO.allValidPairs.hic
mm10_chr2_3mb_win_100kb_step.bed
WT_vs_KO_chess.tsv

and getting the following error:

Traceback (most recent call last):
File "/hpcnfs/home/ieo5073/miniconda3/envs/HiCtools/bin/chess", line 548, in
Chess()
File "/hpcnfs/home/ieo5073/miniconda3/envs/HiCtools/bin/chess", line 76, in init
getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
File "/hpcnfs/home/ieo5073/miniconda3/envs/HiCtools/bin/chess", line 175, in sim
for chunk in chunks(pairs, threads):
File "/hpcnfs/home/ieo5073/miniconda3/envs/HiCtools/lib/python3.7/site-packages/chess/helpers.py", line 533, in chunks
for i in range(0, len(l), n):
ValueError: range() arg 3 must not be zero

Could you please help me with this issue?

Thank you in advance,
Federico

vaquerizaslab / chess Goto Github PK

chess's Introduction

chess-hic

chess's People

Contributors

Stargazers

Watchers

Forkers

chess's Issues

Recommend Projects

Recommend Topics

Recommend Org