Hello, I am running arcs with arch-make arcs-tigmint I am runnin

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

problem with tigmint-cut about arcs HOT 5 CLOSED

bcgsc commented on August 19, 2024

problem with tigmint-cut

from arcs.

Comments (5)

Dharmendra-G-1 commented on August 19, 2024 1

Hello Lauren,
Thanks for pointing to the bedtools Version: v2.27.1, after updating from Version: v2.25.0 to this newer version (v2.27.1) now I have the .fa and .bed output after running tigmint-cut.
Thanks for your help !!

from arcs.

lcoombe commented on August 19, 2024

Hi @Dharmendra-G-1 -
Could you please enclose copy-and-paste blocks in triple back ticks? Using that formatting will make your pasted commands much easier to read.
(https://help.github.com/articles/basic-writing-and-formatting-syntax/#quoting-code)

Also, can you include the standard output from your the tigmint command? It looks like something went wrong in the tigmint-cut command.

from arcs.

Dharmendra-G-1 commented on August 19, 2024

Hello Lauren,
Thanks for your response, Sorry for confusion with format: As per your request here are details:

I am using
draft.fasta and rename linked reads as reads.fa.gz

The main error I am getting is from tigmint-cut:


ubuntu@ip-10-68-102-172:/mid_large_2t/running_arcs$ python3 tigmint/bin/tigmint-cut -p 40 -w 1000 -n 20 -t 0 draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed --fastaout draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
Started at: 2019-01-06 18:04:53.506702
Reading contig lengths...
Finding breakpoints...
Cutting assembly at breakpoints...

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

DONE!
Ended at: 2019-01-06 18:16:46.167144

The initial command I ran are as follows:

./arcs-master/Examples/arcs-make arcs-tigmint draft=draft reads=reads m=40

which ran this 
./arcs-master/Examples/arcs-make arcs-tigmint -n draft=draft reads=reads m=40

ln -s draft.fasta draft.fa
./tigmint/bin/tigmint tigmint draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40 
touch empty.fof
perl -ne 'chomp; if(/>/){$ct+=1; print ">$ct\n";}else{print "$_\n";} ' < draft.tigmint.fa > draft.tigmint.renamed.fa 
bwa index draft.tigmint.renamed.fa 
sh -c 'bwa mem -t40 -C -p draft.tigmint.renamed.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -Sb - | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -n - -o draft.tigmint.sorted.bam' 
echo draft.tigmint.sorted.bam > draft.tigmint_bamfiles.fof
./arcs-master/Arcs/arcs --bx -v -f draft.tigmint.renamed.fa -a draft.tigmint_bamfiles.fof -c 5 -m 40 -s 98 -r 0.05 -e 30000 -z 500 -d 0 --gap 100 -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500
python /mid_large_2t/running_arcs/arcs-master/Examples/makeTSVfile.py draft.tigmint_c5_m40_s98_r0.05_e30000_z500_original.gv draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint.renamed.fa
ln -s draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv
/opt/links_v1.8.6/LINKS -f draft.tigmint.renamed.fa -s empty.fof -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3 -l 5 -a 0.3 -z 500 
rm draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv

I believe the Tigmint tigmint-make will run as follows from the starting part of arcs-tigment
./tigmint tigmint draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40

./tigmint/bin/tigmint tigmint -n draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40
bwa index draft.fa
bwa mem -t40 -pC draft.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -u -F4 | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -tBX -m 2G -T tmp/ -o draft.reads.sortbx.bam
tigmint/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.size2000.bed
/mid_large_2t/important/samtools-1.6/samtools faidx draft.fa
tigmint/bin/tigmint-cut -p40 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
ln -sf draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.tigmint.fa

The tail end of error message I got is as follows:

[M::mem_pestat] mean and std.dev: (242.76, 127.77)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 849)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 2619274 reads in 1927.504 CPU sec, 50.692 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t40 -pC draft.fa reads.fq.gz
[main] Real time: 52106.558 sec; CPU: 1897517.768 sec
[bam_sort_core] merging from 440 files and 40 in-memory blocks...
tigmint/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.size2000.bed
/mid_large_2t/important/samtools-1.6/samtools faidx draft.fa
tigmint/bin/tigmint-cut -p40 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
Started at: 2019-01-05 19:06:03.814077
Reading contig lengths...
Finding breakpoints...

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

Cutting assembly at breakpoints...
DONE!
Ended at: 2019-01-05 19:17:49.508013
ln -sf draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.tigmint.fa
make[1]: Leaving directory '/mid_large_2t/running_arcs'
touch empty.fof
perl -ne 'chomp; if(/>/){$ct+=1; print ">$ct\n";}else{print "$_\n";} ' < draft.tigmint.fa > draft.tigmint.renamed.fa 
bwa index draft.tigmint.renamed.fa 
[bwa_index] Pack FASTA... [bns_fasta2bntseq] Failed to allocate 0 bytes at bntseq.c line 303: Success
arcs-master/Examples/arcs-make:157: recipe for target 'draft.tigmint.renamed.fa.bwt' failed
make: *** [draft.tigmint.renamed.fa.bwt] Error 1

the output file I am getting are as follows:

wxrwxrwx  1 ubuntu ubuntu           11 Jan  4 19:03 draft.fa -> draft.fasta
-rw-rw-r--  1 ubuntu ubuntu       406757 Jan  4 19:40 draft.fa.amb
-rw-rw-r--  1 ubuntu ubuntu      1411478 Jan  4 19:40 draft.fa.ann
-rw-rw-r--  1 ubuntu ubuntu   2494337664 Jan  4 19:40 draft.fa.bwt
-rw-rw-r--  1 ubuntu ubuntu       440254 Jan  5 19:06 draft.fa.fai
-rw-rw-r--  1 ubuntu ubuntu    623584396 Jan  4 19:40 draft.fa.pac
-rw-rw-r--  1 ubuntu ubuntu   1247168840 Jan  4 19:51 draft.fa.sa
-rw-rw-r--  1 ubuntu ubuntu   2526643067 Jan  3 04:49 draft.fasta
-rw-rw-r--  1 ubuntu ubuntu    697296771 Jan  7 03:20 draft.reads.as0.65.nm5.molecule.size2000.bed
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
-rw-rw-r--  1 ubuntu ubuntu       342869 Jan  7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
-rw-rw-r--  1 ubuntu ubuntu 261106244339 Jan  5 12:02 draft.reads.sortbx.bam
lrwxrwxrwx  1 ubuntu ubuntu           77 Jan  5 19:17 draft.tigmint.fa -> draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  5 19:17 draft.tigmint.renamed.fa
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  5 19:17 draft.tigmint.renamed.fa.pac
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  5 19:17 empty.fof

So after the tigmint-molecule I get the bed file " draft.reads.as0.65.nm5.molecule.size2000.bed" which has 16813449 lines but when it ran the tigmint-cut we get .bed file but no .fa file. draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa file get created but completely empty, so no problem with permission as such. The bed file output are as follows :


head draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
0	0	15517657	0-1
0	15517657	15517670	0-2
0	15517670	17147778	0-3
0	17147778	17147785	0-4
0	17147785	23376074	0-5
0	23376074	23376100	0-6
0	23376100	29696924	0-7
0	29696924	29697326	0-8
0	29697326	29697665	0-9
0	29697665	29697943	0-10

tail draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
204375	0	33106	204375
204415	0	57665	204415
204459	0	25333	204459
204499	0	64113	204499
204541	0	58322	204541
204581	0	3400	204581
204621	0	53185	204621
204661	0	20990	204661
204701	0	60883	204701
204741	0	113995	204741

Thanks

from arcs.

lcoombe commented on August 19, 2024

Thanks @Dharmendra-G-1 -
Please try using a newer version of bedtools. The issue is that the more recent versions (I am using 2.27.1) output the fasta to standard out by default, whereas the older version you are using requires an output fasta file specified by -fo. You can see that the help page is being printed in standard out, indicating that the bedtools getfasta command isn't running properly as a result.

Take a look at the different help pages:

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.27.1
Summary: Extract DNA sequences from a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options: 
	-fi	Input FASTA file
	-fo	Output file (opt., default is STDOUT
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-name	Use the name field for the FASTA header
	-name+	Use the name field and coordinates for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequences
		from the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab 
		is used.

Hope that helps!
Lauren

from arcs.

lcoombe commented on August 19, 2024

Excellent - I'm glad you got it working!

from arcs.

problem with tigmint-cut about arcs HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent