Giter VIP home page Giter VIP logo

Comments (23)

Duda5 avatar Duda5 commented on July 17, 2024 1

Hi @lcoombe,
Thank you for your suggestions!

curl command finished successfully, and test_reads.fq was indeed in tests directory, but it still could not be found for some reason.

However, I tried adding -c conda-forge and was able to install GoldRush with:
conda install -c bioconda -c conda-forge goldrush

Trying it out on real ONT data right now and hope the run will complete successfully!

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024 1

Thanks @vlad0x00, that's really helpful!
On my end, I found one of our servers that had a much lower max user processes, and was able to reproduce the error on that machine.
@Duda5 - You could try running ulimit -u 256000 right before running your GoldRush command, which should change that limit for you in your current terminal session. I'm going to test the same on my end!

from goldrush.

vlad0x00 avatar vlad0x00 commented on July 17, 2024 1

If that solves it, perhaps it can be added to the goldrush makefile.

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024 1

Hi @vlad0x00 and @lcoombe,
Thanks for your replies, I will increase max user processes value to 256000 and will run GoldRush overnight
Will let you know whether it worked!

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024 1

I think you are right @lcoombe, it might have been a system error (possibly caused by high max user processes ?)
I re-ran the command; GoldRush re-started from checkpoint and completed!

Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.ntLink-5rounds.fa

I have another question about resultant assembly, but I think I will open a new issue for it at some point.
Thank you for your help!

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024 1

Very glad to hear that it finished successfully!

All the extra processes should be cleared up from GoldRush-Edit now, and ntLink does not use multi-processing, but you could check that with a ps -u <username> to be sure that there aren't any processes hanging around for you.

Thanks for your patience working through this! It's very helpful for us to have this feedback from users :)

from goldrush.

jwcodee avatar jwcodee commented on July 17, 2024

Hi @Duda5,

I have not seen this error before. While we are trying to figure this out, you can try installing GoldRush via conda. That should alleviate the issues you are experiencing.

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

Hi @jowong4,
I tried installing through conda before but it did not work either.
Strangely, the error was not specified

$ conda create -n goldrush
$ conda activate goldrush 
$ conda install -c bioconda goldrush
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: | 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                                       

UnsatisfiableError: 

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Hi @Duda5,

A couple things that could be causing issues with your command in (2):

  • Specifying -t just touches files for the GoldRush makefile, if you wanted to do a run where it prints commands, but doesn't execute them, use -n
  • Don't specify the reads extension in your command,
    So you could try this command, if you were intending to use 18 threads:
goldrush run t=18 G=3e9 reads=ONT_pcd_treads

For the Tigmint error in the demo, I see this:

sh -c 'gunzip -c  | \
/home/linuxbrew/.linuxbrew/Cellar/tigmint/1.2.5/libexec/bin/tigmint_estimate_dist.py - -n 1000000 -o test_reads.tigmint-long.params.tsv'
gzip: compressed data not read from a terminal. Use -f to force decompression.

It looks like it cannot find the input reads file. Would you be able to share the full output from the test demo? Did the curl command execute successfully, and do you see the reads file test_reads.fq in the working directory?

For the conda issues - that's quite strange, I haven't seen that before in a fresh environment. Could you also try adding -c conda-forge to your install command?

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

So, after installing GoldRush with conda, the test run (goldrush_test_demo.sh) completed successfully

...
echo "Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa"
Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa

Test successful!

When trying my own dataset (~ 170 Gb fastq file), GoldRush terminated after running for ~7 hours with the following error:

...
Loading bloom filter from `/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k20.bf'...
Bloom filter FPR: 0.00121%
Starting K run with k = 20
Flanks inserted into k run = 29
0 unique gaps closed for k20
No start/goal kmer: 2
No path: 24
Unique path: 0
Multiple paths: 0
Too many paths: 2
Too many branches: 0
Too many path/path mismatches: 0
Too many path/read mismatches: 0
Contains cycle: 0
Max cost exceeded: 1
Exceeded mem limit: 0
Skipped: 0
29 flanks left
k20 run complete
Total gaps closed so far = 70

K sweep complete
Creating new scaffold with gaps closed...
New scaffold complete
Gaps closed = 70
70.7%

terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-make:48: /dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/batch.ntedited.prepd.sealer_scaffold.upper.fa] Aborted (core dumped)

Traceback (most recent call last):
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 143, in <module>
    polishing_stdout, polishing_stderr = run_polishing(
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 96, in run_polishing
    raise e
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 78, in run_polishing
    sealer_protocol_process = sp.run(
  File "/home/duda5/anaconda3/envs/goldrush/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '["         goldrush-edit-make         seqs_to_polish=/dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/batch.fa         bfs='/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k32.bf /dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k28.bf /dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k24.bf /dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k20.bf'         K='32 28 24 20'         t=1         /dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/batch.ntedited.prepd.sealer_scaffold.upper.fa     "]' returned non-zero exit status 2.
['goldrush-edit-polish-batch', 'batch.fa', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs', '/dev/shm', '6yAujHN4TKm4REbs7yvOvQ', '18', '-k32', '-k28', '-k24', '-k20', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k32.bf', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k28.bf', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k24.bf', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k20.bf', '--seq-ids', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/seq_ids', '--bfs-ids-pipe', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-target_ids_input', '--bfs-ready-pipe', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-bfs_ready', '--batch-done-pipe', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/polishing_done', '--threads', '1'] failed!
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:226: w16_x10_golden_path.goldrush-edit-polished.fa] Terminated
make: *** Deleting file 'w16_x10_golden_path.goldrush-edit-polished.fa'

My full command was
goldrush run t=18 G=3e9 k_ntLink=64 w_ntLink=500 reads=ONT_pcd_treads

My system has 20 threads, 128 GB of RAM and no other processes were running in parallel.
In terms of disk space, my SSD only had the fastq file on it, with ~830 GB of free space.

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

I think it the issue might be the lack of CPU threads (?)

I re-ran GoldRush with default command
$goldrush run G=3e9 reads=ONT_pcd_treads

This time I got the same error at the end, but this part looks different (with 84.4% when all (20) threads are used compared to 70.7% when 18 threads are used)

...
K sweep complete
Creating new scaffold with gaps closed...
New scaffold complete
Gaps closed = 92
84.4%
...

Also there was a goldrush-edit error

...
[2022-12-11 21:13:48][INFO] SeqIndex::SeqIndex: Building index for ONT_pcd_treads.fastq... 
[2022-12-11 21:16:09][INFO] SeqIndex::SeqIndex: Done.
[2022-12-11 21:16:09][INFO] SeqIndex::save: Saving index to ONT_pcd_treads.fastq.index... 
[2022-12-11 21:16:09][INFO] SeqIndex::save: Done.

[2022-12-11 21:16:10][INFO] Indexes and mappings built.
[2022-12-11 21:16:10][INFO] Subsampling mapped reads to 40
[2022-12-11 21:16:10][INFO] SeqIndex::SeqIndex: Loading index from /media/duda5/w16_x10_golden_path.fa.index... 
[2022-12-11 21:16:10][INFO] SeqIndex::SeqIndex: Done!
[2022-12-11 21:16:10][INFO] SeqIndex::SeqIndex: Loading index from /media/duda5/ONT_pcd_treads.fastq.index... 
[2022-12-11 21:16:12][INFO] SeqIndex::SeqIndex: Done!
[2022-12-11 21:16:12][INFO] AllMappings::load_paf: Loading PAF mappings from /media/duda5/w16_x10_golden_path.fa.ONT_pcd_treads.fastq.paf... 
[2022-12-11 21:16:35][INFO] AllMappings::load_paf: Done!
[2022-12-11 21:16:36][INFO] serve: Accepting batch names at batch_name_input
[2022-12-11 21:16:38][INFO] goldrush-edit-targeted-bfs is ready!
[2022-12-11 21:16:38][INFO] Polishing batches...
Traceback (most recent call last):
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit", line 562, in <module>
    polish_seqs(
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit", line 516, in polish_seqs
    polish_batch(
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit", line 384, in polish_batch
    watch_process(process)
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush_edit_utils.py", line 40, in watch_process
    threading.Thread(target=_watch_process, args=(process,), daemon=True).start()
  File "/home/duda5/anaconda3/envs/goldrush/lib/python3.9/threading.py", line 899, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:226: w16_x10_golden_path.goldrush-edit-polished.fa] Error 1
make: *** Deleting file 'w16_x10_golden_path.goldrush-edit-polished.fa'
...

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Hi @Duda5,

Glad you got the demo working! For the benefit of future users, I took another look at your initial (pre-Conda) command, and I think another issue was the version of Tigmint (1.2.5). We added updates to Tigmint in 1.2.6 to be able to detect uncompressed fastq files, so that's why it couldn't properly find the input file with v1.2.5. Thanks for bringing this to our attention - we'll make that version requirement clear in the README.

For the goldrush-edit error - I haven't seen that before, but also do run GoldRush on machines with more threads. The default threads is 48, so would make sense that you'd see that failure when running on your machine with 20 threads. Have you tried with fewer threads? (ex. 14) I will also try to reproduce the issue on my side, but using fewer threads may help based on what the logs are showing.

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

Hi @lcoombe,
I tried running GoldRush with 14 threads and this time the error is different (but also in goldrush-edit)

...
K sweep complete
Creating new scaffold with gaps closed...
New scaffold complete
Gaps closed = 295
87.3%

[2022-12-13 04:13:01][ERROR] Process pipeline: Error on fork.
[2022-12-13 04:13:01][ERROR] Process pipeline: Spawner process failed.
[2022-12-13 04:13:01][ERROR] Process pipeline: Communication failure.
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-make:48: /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/batch.ntedited.prepd.sealer_scaffold.upper.fa] Error 1

Traceback (most recent call last):
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 143, in <module>
    polishing_stdout, polishing_stderr = run_polishing(
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 96, in run_polishing
    raise e
  File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 78, in run_polishing
    sealer_protocol_process = sp.run(
  File "/home/duda5/anaconda3/envs/goldrush/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '["         goldrush-edit-make         seqs_to_polish=/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/batch.fa         bfs='/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k32.bf /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k28.bf /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k24.bf /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k20.bf'         K='32 28 24 20'         t=1         /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/batch.ntedited.prepd.sealer_scaffold.upper.fa     "]' returned non-zero exit status 2.
['goldrush-edit-polish-batch', 'batch.fa', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs', '/dev/shm', 'xYbFEfAbTnaLUbeaX3zmPQ', '14', '-k32', '-k28', '-k24', '-k20', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k32.bf', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k28.bf', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k24.bf', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k20.bf', '--seq-ids', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/seq_ids', '--bfs-ids-pipe', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-target_ids_input', '--bfs-ready-pipe', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-bfs_ready', '--batch-done-pipe', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/polishing_done', '--threads', '1'] failed!
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:226: w16_x10_golden_path.goldrush-edit-polished.fa] Terminated
make: *** Deleting file 'w16_x10_golden_path.goldrush-edit-polished.fa'

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Thanks for the update, @Duda5.

Can I ask how large your /dev/shm/ is? Just trying to figure out if the issue could be that shared memory location or if it is the threads part of it.

Also, can you confirm the version of GoldRush that you're using?

Thanks for your patience with troubleshooting - I haven't been able to reproduce the issue yet on my end, so all this information should hopefully help us figure this out!

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

Hi @lcoombe,
No worries, GoldRush is my only hope to assemble anything from Nanopore data at the moment due to high RAM requirements of other packages, so I am happy to help with troubleshooting.

My GoldRush version is v1.0.2.

As for /dev/shm/, it was half of my system's RAM by default (63G).
I increased it to 126G and re-run GoldRush.
The error is the same as one from my early posts:

...
terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-make:48: /dev/shm/cmpFeTuLRQmEkOL1Q1ejDw-2751/batch.ntedited.prepd.sealer_scaffold.upper.fa] Aborted (core dumped)
...

This time I also created a shell script that outputs usage of /dev/shm/ every 20 seconds during the GoldRush run.
It stays at 0 most of the time until goldrush-edit is in progress and then increases to 1.1 G at most, before dropping back to 38M.
So it's unlikely to be a limiting factor (?)

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Thanks for the info, @Duda5!

Hmm, yes you're right that the /dev/shm is unlikely to be the culprit given your tracking. It's strange that you are getting different error messages when you change that, though. The fact that the test demo works fine, but you're seeing issues with the real data does seem to suggest something going on on the resource allocation side.

Could you share the result of ulimit -a on your machine? It does appear that there is something going on with allocation of resources (whether it be memory or threads), so that would be good to see as a sanity check.

@vlad0x00 - do you have any ideas?

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

Sure, here is the output of ulimit -a on my system

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 513891
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 8192
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

from goldrush.

vlad0x00 avatar vlad0x00 commented on July 17, 2024

I think we used to have this problem in our lab and the culprit was this:

max user processes              (-u) 8192

This is a hard limit on the number of processes (which I think also includes threads aka lightweight processes on Linux). An HPC machine would ideally have this limit set to a high number (at least 2x the current number). This does require root access, so you might need to ask this from the system admin.

Alternatively, you can further decrease the number of threads used by goldrush which should make the error go away. As far as I remember, goldrush-edit spawns more threads/processes than specified and this is likely causing the issue.

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Yes definitely! @jowong4 remembered that we do also have something similar (but for stack size) in the abyss-pe makefile

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

Okay, so this time goldrush-edit did manage to complete but at the cost of overall system performance - the screen response time (on logging in, navigating or opening any applications) was delayed by 30-40 sec.
At some point I was thinking to kill it, but I had htop running in one of the open terminal windows, which indicated that GoldRush was not stuck.
Glitches disappeared only after I re-rebooted, so I am not sure whether upping max user processes to 256000 is a good long-term solution (?)

As for the assembly, it had not been completed and GoldRush exited during ntLink gap-filling with the following error:

...
Done ntLink! Final post-ntLink scaffolds can be found in: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.fa
/home/duda5/anaconda3/envs/goldrush/bin/share/ntlink-1.3.6-0/bin/ntlink_patch_gaps.py --path w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.path \
 --mappings w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.verbose_mapping.tsv -s w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa --reads ONT_pcd_treads.fastq -o w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.gap_fill.fa --large_k 64 --min_gap 1 \
 --trims w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.tsv -k 20 -w 10
Running ntLink gap-filling...

Parameters:
	--path w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.path
	--mappings w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.verbose_mapping.tsv
	--trims w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.tsv
	-s w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa
	--reads ONT_pcd_treads.fastq

	-z 1000
	-k 20
	-w 10
	-t 4
	--large_k 64
	-x 0
	--min_gap 1
	-o w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.gap_fill.fa

2022-12-16 13:30:17.384275 Reading ntLink read mappings..
2022-12-16 13:30:45.889062 Reading scaffolds..
2022-12-16 13:30:47.841054 Reading trim coordinates..
2022-12-16 13:30:47.856365 Choosing best read..
2022-12-16 13:30:47.867963 Collecting reads...
[2022-12-16 13:32:14][ERROR] SeqReader: Quality string length (47399) does not match sequence length (34349).
[2022-12-16 13:32:14][ERROR] SeqReader: Unexpected character in a FASTQ file.
make[2]: *** [/home/duda5/anaconda3/envs/goldrush/bin/share/ntlink-1.3.6-0/ntLink:249: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.gap_fill.fa] Segmentation fault (core dumped)
make[2]: Leaving directory '/media/duda5/'
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/share/ntlink-1.3.6-0/ntLink_rounds:122: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.ntLink.gap_fill.fa] Error 2
make[1]: Leaving directory '/media/duda5/'
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:252: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.gap_fill.5rounds.fa] Error 2

I performed adapter trimming on my ONT reads with porechop-abi (I had to split the file in 2 and then merge back into 1 due to RAM limitation):

fastqsplitter -i ONT_mreads.fastq.gz -t 15 -o ONT_mreads_part1.fastq.gz -o ONT_mreads_part2.fastq.gz
porechop_abi -t 18 -v 2 -i ONT_mreads_part1.fastq.gz -o ONT_pcd_treads_part1.fastq.gz
porechop_abi -t 18 -v 2 -i ONT_mreads_part2.fastq.gz -o ONT_pcd_treads_part2.fastq.gz

So now I am wondering whether splitting or adapter trimming could have caused this error and why it hasn't appeared earlier

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Hi @Duda5,

I'm glad that the polishing worked for you in the end with that tweak! It also worked on my end, although we didn't have any glitches on our server (Although the server I was using does have >100 threads). The difficulty is that max user processes essentially just prevents the previous errors you were seeing (it doesn't itself control the number of processes). So, I don't think there's an alternative easy solution for that one other than reducing threads further (unless you have any other ideas @vlad0x00 ?)

For the new error, having entries where the quality score is longer than the sequence length would be an issue with the file itself, and perhaps porechop-abi is the culprit, as you say. I have not used fastqsplitter or porechop-abi myself, but have you found these cases in the file? Ie. as a sanity check, I would read through the file using bioawk or something similar, and print the lengths of the sequence and quality. That should give a quick idea of whether this is an edge case due to file splitting, or a more systematic issue. The steps that you have done so far should still be fine, by the way, but it's important to know what could be wrong with the file for the ntLink step.

from goldrush.

Duda5 avatar Duda5 commented on July 17, 2024

Hi @lcoombe,

I ran the following command to identify reads in which the length of the quality string does not match the length of the read sequence and strangely, it returned nothing
bioawk -c fastx 'length($seq) != length($qual) {print length($seq), length($qual), $name}' ONT_pcd_treads.fastq

To inspect the output manually, I ran:
bioawk -c fastx '{print length($seq), length($qual), $name}' ONT_pcd_treads.fastq > bioawk_out
And then searched for 47399 and 34349 values using grep
However, I could not find any cases that caused this GoldRush error.

[2022-12-16 13:32:14][ERROR] SeqReader: Quality string length (47399) does not match sequence length (34349).

Is it possible to re-run ntLink gap-filling using the command that I posted in my previous comment or do I need to re-run GoldRush from the beginning?

from goldrush.

lcoombe avatar lcoombe commented on July 17, 2024

Hi @Duda5,

That's strange...I wonder what is triggering that error then...
If you think it could be some transient system issue or something like that, yes you can re-launch the same command, and it should pick-up where it left off (ie. you don't need to start from the beginning). If you want to double check it's starting again where you expect, you can run the same command, but specify that you want a dry-run (-n)

from goldrush.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.