Comments (6)
Hi Nick and kaukrise:
After many tests I guess I sort of find out what the problem is.
- The original .hic file I generated containing multiple resolution and now I generate a new set of .hic file at one fixed resolution. Let's say 5k.
- Kaukrise actually had a great guess related to the problem that the juicer output do not contain "chr". However the problem here is that the .hic file do not contain "chr" prefix while CHESS output windows have "chr". So I triedt to remove the chr before using
chess sim
and it works!
hopefully this will be helpful to other peple.
Thank you all.
from chess.
Hi,
could you please post the full log in here?
Are you using normalized matrices? What is the bin size of your data?
from chess.
Thanks Nick! Sorry I forgot the log. Here I repruduced the error:
Here is my code:
chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out
chess sim \
HiC_control.mapq30.hic \
HiC_treat.mapq30.hic \
./hg38_1mwin_100kstep.out \
H1_ctrl_treat_1mwin_100kstep_diff.out.tsv
Here is the full log:
2020-10-22 10:03:19,635 INFO Running '/software/anaconda/install/bin/chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out'
2020-10-22 10:03:21,908 INFO CHESS version: 0.3.3
2020-10-22 10:03:21,908 INFO FAN-C version: 0.9.5
2020-10-22 10:03:22,011 INFO Finished '/software/anaconda/install/bin/chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out'
2020-10-22 10:03:24,861 INFO Running '/software/anaconda/install/bin/chess sim HiC_control.mapq30.hic HiC_treat.mapq30.hic ./hg38_1mwin_100kstep.out H1_ctrl_treat_1mwin_100kstep_diff.out.tsv'
2020-10-22 10:03:26,368 INFO CHESS version: 0.3.3
2020-10-22 10:03:26,368 INFO FAN-C version: 0.9.5
2020-10-22 10:03:26,369 INFO Loading reference contact data
2020-10-22 10:05:30,980 INFO Loading region pairs
2020-10-22 10:05:31,297 WARNING 392 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2020-10-22 10:05:31,297 INFO Launching workers
2020-10-22 10:05:31,354 INFO Submitting pairs for comparison
2020-10-22 10:05:33,225 INFO Could not compute similarity for 30654 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
2020-10-22 10:05:33,389 INFO Finished '/software/anaconda/install/bin/chess sim HiC_control.mapq30.hic HiC_treat.mapq30.hic ./hg38_1mwin_100kstep.out H1_ctrl_treat_1mwin_100kstep_diff.out.tsv'
Here is the results:
ID SN ssim z_ssim
0 nan nan nan
1 nan nan nan
2 nan nan nan
3 nan nan nan
4 nan nan nan
5 nan nan nan
...
Bin size of the .hic file is 1k. The .hic file is generated using the default parameters from juicebox_tools pre
which is <VC,VC_SQRT,KR,SCALE>.
from chess.
Ok, I am not sure what is happening there, let's start with some guesswork. Your bin size is very small, so maybe you have a large number of unmappable bins. I don't know how long your preprocessing takes, but you could try 10 kb or 25 kb bins and see whether you get the same behaviour.
Another way would be to tweak the parameters of chess sim
. By default unmappable bins are not considered in the comparisons and matrices with more than 10 percent unmappable bins are not compared at all.
You could therefore try to increase --mappability-cutoff
and activate --keep-unmappable-bins
. The problem here is that the these are low / deactivated by default, because I don't think the program behaves very well with a lot of missing data.
Anyhow, I suggest to try these things and see whether that gets rid of the NaNs, it might not even be the problem, then we can take it from there.
Best,
Nick
from chess.
Hey, I have an idea what the problem might be. Juicer by default removes the chr
part of the chromosome names. @Irenexzwen could you post the first couple of lines of hg38_1mwin_100kstep.out
to see if has the chr
prefix or not, please?
from chess.
Hi Kaukrise:
Thanks for the reminder, however the hg38_1mwin_100kstep.out
have "chr" prefix:
>less hg38_1mwin_100kstep.out|head -n 10
chr1 1 1000001 chr1 1 1000001 0 . + +
chr1 100001 1100001 chr1 100001 1100001 1 . + +
chr1 200001 1200001 chr1 200001 1200001 2 . + +
chr1 300001 1300001 chr1 300001 1300001 3 . + +
chr1 400001 1400001 chr1 400001 1400001 4 . + +
chr1 500001 1500001 chr1 500001 1500001 5 . + +
chr1 600001 1600001 chr1 600001 1600001 6 . + +
chr1 700001 1700001 chr1 700001 1700001 7 . + +
chr1 800001 1800001 chr1 800001 1800001 8 . + +
chr1 900001 1900001 chr1 900001 1900001 9 . + +
from chess.
Related Issues (20)
- Get observed/expected from Juicer Hi-C HOT 2
- chess --version doesn't work?
- CNV bias in normalization HOT 2
- Conditions for conservation analysis of syntenic blocks HOT 5
- Nan Continued HOT 2
- No valid region pairs found? HOT 1
- Different resolution produce different result HOT 1
- Should the users be concerned about the problem raised in the new Contradictory Results bioRxiv preprint? HOT 2
- conservation analysis when only a few syntenic blocks are available HOT 3
- speed up the chess run HOT 1
- error of the chess extract HOT 3
- issue of normalized/chess extract HOT 1
- error on running chess sim HOT 2
- error when running extract on .hic files HOT 1
- something different from plotting HOT 9
- _pickle.PicklingError HOT 2
- chess extract error: operands could not be broadcast together with shapes HOT 1
- data_range parameter not specified - error HOT 7
- Chess sim output .tsv file explained HOT 1
- Normalization of .hic files HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chess.