Giter VIP home page Giter VIP logo

Comments (5)

yeban avatar yeban commented on August 31, 2024

Do you mean it is taking too long or that the program aborts at blat step. Is it because mygenome.fasta.out can't be found or is it because it should have been split as well. I haven't worked with repeat masked data, so I am not sure what to expect.

from flo.

TakashiKoyama avatar TakashiKoyama commented on August 31, 2024

Hi,

Thank you for your quick response. I mean the program aborts. The log is as below.

mkdir run
cp ../source/source.fasta run/source.fa
cp ../target/mygenome.fasta.masked run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 35 run/chunk_
parallel --joblog run/joblog.faSplit -j 8 -a run/joblst.faSplit
946 pieces of 1371 written
2718 pieces of 3871 written
3401 pieces of 4619 written
3154 pieces of 4711 written
2942 pieces of 4086 written
3777 pieces of 5254 written
58962 pieces of 67611 written
59191 pieces of 68066 written
parallel --joblog run/joblog.blat -j 8 -a run/joblst.blat
Loaded 631070304 letters in 358189 sequences
Searched 18871550 bases in 3777 sequences
Loaded 631070304 letters in 358189 sequences
Searched 4728198 bases in 946 sequences
Loaded 631070304 letters in 358189 sequences
Searched 13581909 bases in 2718 sequences
Loaded 631070304 letters in 358189 sequences
Searched 14694033 bases in 2942 sequences
Loaded 631070304 letters in 358189 sequences
Searched 15754023 bases in 3154 sequences
Loaded 631070304 letters in 358189 sequences
Searched 16996379 bases in 3401 sequences
Loaded 631070304 letters in 358189 sequences
Searched 295953418 bases in 59191 sequences
Loaded 631070304 letters in 358189 sequences
Searched 294805428 bases in 58962 sequences
parallel --joblog run/joblog.liftUp -j 8 -a run/joblst.liftUp
Got 3401 lifts in run/chunk_01.fa.lft
Lifting run/chunk_01.fa.psl
run/chunk_01.fa.psl is empty
Got 3154 lifts in run/chunk_04.fa.lft
Lifting run/chunk_04.fa.psl
run/chunk_04.fa.psl is empty
Got 2718 lifts in run/chunk_00.fa.lft
Lifting run/chunk_00.fa.psl
run/chunk_00.fa.psl is empty
Got 2942 lifts in run/chunk_03.fa.lft
Lifting run/chunk_03.fa.psl
run/chunk_03.fa.psl is empty
Got 946 lifts in run/chunk_05.fa.lft
Lifting run/chunk_05.fa.psl
run/chunk_05.fa.psl is empty
Got 3777 lifts in run/chunk_02.fa.lft
Lifting run/chunk_02.fa.psl
run/chunk_02.fa.psl is empty
Got 59191 lifts in run/chunk_07.fa.lft
Lifting run/chunk_07.fa.psl
run/chunk_07.fa.psl is empty
Got 58962 lifts in run/chunk_06.fa.lft
Lifting run/chunk_06.fa.psl
run/chunk_06.fa.psl is empty
parallel --joblog run/joblog.axtChain -j 8 -a run/joblst.axtChain
run/chunk_02.fa.psl.lifted is empty
run/chunk_01.fa.psl.lifted is empty
run/chunk_03.fa.psl.lifted is empty
run/chunk_05.fa.psl.lifted is empty
run/chunk_00.fa.psl.lifted is empty
run/chunk_07.fa.psl.lifted is empty
run/chunk_04.fa.psl.lifted is empty
run/chunk_06.fa.psl.lifted is empty
parallel --joblog run/joblog.chainSort -j 8 -a run/joblst.chainSort
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
rake aborted!
Errno::ENOENT: No such file or directory @ rb_file_s_rename - (run/000.chain, run/combined.chn.sorted)
/home/koyama/Documents/GenomeAssembly/13LiftOver/flo/wRM/Rakefile:189:in `block in <top (required)>'
/home/koyama/Documents/GenomeAssembly/13LiftOver/flo/wRM/Rakefile:211:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)

The problem seems that run/chunk_*.fa.psl is empty. So I checked joblst.blat and the code is as below.

blat -noHead -fastMap -tileSize=12 -minIdentity=98 -qMask=../target/mygenome.fasta.out run/source.fa run/chunk_01.fa run/chunk_01.fa.psl

In the ../target/mygenome.fasta.out, position of the repeats are shown as below.

$ head ../target/mygenome.fasta.out
   SW   perc perc perc  query     position in query                 matching          repeat                  position in repeat
score   div. del. ins.  sequence  begin     end            (left)   repeat            class/family        begin   end    (left)      ID

   15    0.0  0.0  0.0  chr1              1        17   (5013216) + (GT)n             Simple_repeat             1     17     (0)      1  
   14   15.9  0.0  0.0  chr1           1149      1176   (5012057) + (ATA)n            Simple_repeat             1     28     (0)      2  
   42    0.0  0.0  0.0  chr1           3000      3035   (5010198) + (GT)n             Simple_repeat             1     36     (0)      3  
   33    0.0  0.0  0.0  chr1           3415      3442   (5009791) + (GT)n             Simple_repeat             1     28     (0)      4  
   37    5.1  0.0  0.0  chr1           3871      3911   (5009322) + (GT)n             Simple_repeat             1     41     (0)      5  
   22    0.0  0.0  0.0  chr1           4073      4096   (5009137) + (CA)n             Simple_repeat             1     24     (0)      6  
   42    0.0  0.0  0.0  chr1           6914      6949   (5006284) + (CT)n             Simple_repeat             1     36     (0)      7

I think the problem is conflict between ../target/mygenome.fasta.out and run/chunk_*.fa.

Thanks,
Takashi

from flo.

yeban avatar yeban commented on August 31, 2024

Very likely - because of the chunking, the coordinates specified in RepeatMasker's output no longer tally with target assembly's coordinate.

I think you will need to get your hands dirty here. Relevant file here is Rakefile. The first half is helper fucntions, and the second half the scripting. I would comment out L166-168, remove -fastMap from blat_opts and try again.

from flo.

TakashiKoyama avatar TakashiKoyama commented on August 31, 2024

Mmm, I still got errors as below.

mkdir run
cp ../source/source.fasta run/source.fa
cp ../target/mygenome.fasta.masked run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 35 run/chunk_
parallel --joblog run/joblog.blat -j 8 -a run/joblst.blat
Loaded 631070304 letters in 358189 sequences
Searched 26256550 bases in 5 sequences
Query sequence chr10 has size 6424017, it might take a while.
Query sequence chr12 has size 8541060, it might take a while.
Query sequence chr14 has size 5470449, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 6853198 bases in 1 sequences
Query sequence chr24 has size 6853198, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 23539023 bases in 5 sequences
Query sequence chr19 has size 5582122, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 23086379 bases in 5 sequences
Query sequence chr5 has size 5469732, it might take a while.
Query sequence chr9 has size 5583017, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 20414033 bases in 4 sequences
Query sequence chr15 has size 6390933, it might take a while.
Query sequence chr18 has size 5455016, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 19346909 bases in 4 sequences
Query sequence chr1 has size 5013233, it might take a while.
Query sequence chr2 has size 5346433, it might take a while.
Query sequence chr4 has size 5249697, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 338050428 bases in 1 sequences
Query sequence chrUn1 has size 338050428, it might take a while.
Loaded 631070304 letters in 358189 sequences
Searched 340328418 bases in 1 sequences
Query sequence chrUn2 has size 340328418, it might take a while.
parallel --joblog run/joblog.liftUp -j 8 -a run/joblst.liftUp
Couldn't open run/chunk_01.fa.lft , No such file or directory
Couldn't open run/chunk_07.fa.lft , No such file or directory
Couldn't open run/chunk_04.fa.lft , No such file or directory
Couldn't open run/chunk_00.fa.lft , No such file or directory
Couldn't open run/chunk_06.fa.lft , No such file or directory
Couldn't open run/chunk_03.fa.lft , No such file or directory
Couldn't open run/chunk_02.fa.lft , No such file or directory
Couldn't open run/chunk_05.fa.lft , No such file or directory
rake aborted!
Command failed with status (8): [parallel --joblog run/joblog.liftUp -j 8 -...]
/home/koyama/Documents/GenomeAssembly/13LiftOver/flo/wRM/Rakefile:143:in `parallel'
/home/koyama/Documents/GenomeAssembly/13LiftOver/flo/wRM/Rakefile:174:in `block in <top (required)>'
/home/koyama/Documents/GenomeAssembly/13LiftOver/flo/wRM/Rakefile:211:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)

I might have to seriously look into or give up to use masked one. Since I have never use Ruby, it takes a while. If you find out a solution, please let me know.

Takashi

from flo.

yeban avatar yeban commented on August 31, 2024

My bad. Since we aren't chunking liftUp doesn't need to happen either and thus L174-176 should be commented out as well.

Repeats mess up alignment edges, causing the resulting chains to be miss out parts of an annotation. Annotations that are only partially covered by a chain are not lifted. I have observed such cases in our fire ant dataset when trying to understand why some annotations aren't lifted, but don't have an exact count.

I am very interested in your experiment - understanding the effect of repeat masking on the lift over process. But it is unlikely that this is something I will pursue myself in the foreseeable future. I am happy to help if you would like to continue and incorporate a solution in the pipeline once it presents itself (I am thinking of adding an option to disable chunking, but can make the change only if it works).

from flo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.