Giter VIP home page Giter VIP logo

Comments (6)

Irenexzwen avatar Irenexzwen commented on June 9, 2024 1

Hi Nick and kaukrise:

After many tests I guess I sort of find out what the problem is.

  1. The original .hic file I generated containing multiple resolution and now I generate a new set of .hic file at one fixed resolution. Let's say 5k.
  2. Kaukrise actually had a great guess related to the problem that the juicer output do not contain "chr". However the problem here is that the .hic file do not contain "chr" prefix while CHESS output windows have "chr". So I triedt to remove the chr before using chess sim and it works!

hopefully this will be helpful to other peple.

Thank you all.

from chess.

nickmachnik avatar nickmachnik commented on June 9, 2024

Hi,
could you please post the full log in here?
Are you using normalized matrices? What is the bin size of your data?

from chess.

Irenexzwen avatar Irenexzwen commented on June 9, 2024

Thanks Nick! Sorry I forgot the log. Here I repruduced the error:

Here is my code:

chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out

chess sim \
HiC_control.mapq30.hic \
HiC_treat.mapq30.hic \
./hg38_1mwin_100kstep.out \
H1_ctrl_treat_1mwin_100kstep_diff.out.tsv

Here is the full log:

2020-10-22 10:03:19,635 INFO Running '/software/anaconda/install/bin/chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out'
2020-10-22 10:03:21,908 INFO CHESS version: 0.3.3
2020-10-22 10:03:21,908 INFO FAN-C version: 0.9.5
2020-10-22 10:03:22,011 INFO Finished '/software/anaconda/install/bin/chess pairs hg38 1000000 100000 ./hg38_1mwin_100kstep.out'
2020-10-22 10:03:24,861 INFO Running '/software/anaconda/install/bin/chess sim HiC_control.mapq30.hic HiC_treat.mapq30.hic ./hg38_1mwin_100kstep.out H1_ctrl_treat_1mwin_100kstep_diff.out.tsv'
2020-10-22 10:03:26,368 INFO CHESS version: 0.3.3
2020-10-22 10:03:26,368 INFO FAN-C version: 0.9.5
2020-10-22 10:03:26,369 INFO Loading reference contact data
2020-10-22 10:05:30,980 INFO Loading region pairs
2020-10-22 10:05:31,297 WARNING 392 region pairs have been dropped, because they involve chromosomes that are not present in the provided contact data.
2020-10-22 10:05:31,297 INFO Launching workers
2020-10-22 10:05:31,354 INFO Submitting pairs for comparison
2020-10-22 10:05:33,225 INFO Could not compute similarity for 30654 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins
2020-10-22 10:05:33,389 INFO Finished '/software/anaconda/install/bin/chess sim HiC_control.mapq30.hic HiC_treat.mapq30.hic ./hg38_1mwin_100kstep.out H1_ctrl_treat_1mwin_100kstep_diff.out.tsv'

Here is the results:

ID      SN      ssim    z_ssim
0       nan     nan     nan
1       nan     nan     nan
2       nan     nan     nan
3       nan     nan     nan
4       nan     nan     nan
5       nan     nan     nan
...

Bin size of the .hic file is 1k. The .hic file is generated using the default parameters from juicebox_tools pre which is <VC,VC_SQRT,KR,SCALE>.

from chess.

nickmachnik avatar nickmachnik commented on June 9, 2024

Ok, I am not sure what is happening there, let's start with some guesswork. Your bin size is very small, so maybe you have a large number of unmappable bins. I don't know how long your preprocessing takes, but you could try 10 kb or 25 kb bins and see whether you get the same behaviour.
Another way would be to tweak the parameters of chess sim. By default unmappable bins are not considered in the comparisons and matrices with more than 10 percent unmappable bins are not compared at all.
You could therefore try to increase --mappability-cutoff and activate --keep-unmappable-bins. The problem here is that the these are low / deactivated by default, because I don't think the program behaves very well with a lot of missing data.
Anyhow, I suggest to try these things and see whether that gets rid of the NaNs, it might not even be the problem, then we can take it from there.
Best,
Nick

from chess.

kaukrise avatar kaukrise commented on June 9, 2024

Hey, I have an idea what the problem might be. Juicer by default removes the chr part of the chromosome names. @Irenexzwen could you post the first couple of lines of hg38_1mwin_100kstep.out to see if has the chr prefix or not, please?

from chess.

Irenexzwen avatar Irenexzwen commented on June 9, 2024

Hi Kaukrise:

Thanks for the reminder, however the hg38_1mwin_100kstep.out have "chr" prefix:

>less hg38_1mwin_100kstep.out|head -n 10
chr1    1       1000001 chr1    1       1000001 0       .       +       +
chr1    100001  1100001 chr1    100001  1100001 1       .       +       +
chr1    200001  1200001 chr1    200001  1200001 2       .       +       +
chr1    300001  1300001 chr1    300001  1300001 3       .       +       +
chr1    400001  1400001 chr1    400001  1400001 4       .       +       +
chr1    500001  1500001 chr1    500001  1500001 5       .       +       +
chr1    600001  1600001 chr1    600001  1600001 6       .       +       +
chr1    700001  1700001 chr1    700001  1700001 7       .       +       +
chr1    800001  1800001 chr1    800001  1800001 8       .       +       +
chr1    900001  1900001 chr1    900001  1900001 9       .       +       +

from chess.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.