Comments (23)
Hi @lcoombe,
Thank you for your suggestions!
curl
command finished successfully, and test_reads.fq
was indeed in tests
directory, but it still could not be found for some reason.
However, I tried adding -c conda-forge
and was able to install GoldRush with:
conda install -c bioconda -c conda-forge goldrush
Trying it out on real ONT data right now and hope the run will complete successfully!
from goldrush.
Thanks @vlad0x00, that's really helpful!
On my end, I found one of our servers that had a much lower max user processes
, and was able to reproduce the error on that machine.
@Duda5 - You could try running ulimit -u 256000
right before running your GoldRush command, which should change that limit for you in your current terminal session. I'm going to test the same on my end!
from goldrush.
If that solves it, perhaps it can be added to the goldrush makefile.
from goldrush.
Hi @vlad0x00 and @lcoombe,
Thanks for your replies, I will increase max user processes
value to 256000 and will run GoldRush overnight
Will let you know whether it worked!
from goldrush.
I think you are right @lcoombe, it might have been a system error (possibly caused by high max user processes ?)
I re-ran the command; GoldRush re-started from checkpoint and completed!
Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.ntLink-5rounds.fa
I have another question about resultant assembly, but I think I will open a new issue for it at some point.
Thank you for your help!
from goldrush.
Very glad to hear that it finished successfully!
All the extra processes should be cleared up from GoldRush-Edit now, and ntLink does not use multi-processing, but you could check that with a ps -u <username>
to be sure that there aren't any processes hanging around for you.
Thanks for your patience working through this! It's very helpful for us to have this feedback from users :)
from goldrush.
Hi @Duda5,
I have not seen this error before. While we are trying to figure this out, you can try installing GoldRush via conda. That should alleviate the issues you are experiencing.
from goldrush.
Hi @jowong4,
I tried installing through conda
before but it did not work either.
Strangely, the error was not specified
$ conda create -n goldrush
$ conda activate goldrush
$ conda install -c bioconda goldrush
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError:
from goldrush.
Hi @Duda5,
A couple things that could be causing issues with your command in (2):
- Specifying
-t
just touches files for the GoldRush makefile, if you wanted to do a run where it prints commands, but doesn't execute them, use-n
- Don't specify the reads extension in your command,
So you could try this command, if you were intending to use 18 threads:
goldrush run t=18 G=3e9 reads=ONT_pcd_treads
For the Tigmint error in the demo, I see this:
sh -c 'gunzip -c | \
/home/linuxbrew/.linuxbrew/Cellar/tigmint/1.2.5/libexec/bin/tigmint_estimate_dist.py - -n 1000000 -o test_reads.tigmint-long.params.tsv'
gzip: compressed data not read from a terminal. Use -f to force decompression.
It looks like it cannot find the input reads file. Would you be able to share the full output from the test demo? Did the curl
command execute successfully, and do you see the reads file test_reads.fq
in the working directory?
For the conda issues - that's quite strange, I haven't seen that before in a fresh environment. Could you also try adding -c conda-forge
to your install command?
from goldrush.
So, after installing GoldRush with conda
, the test run (goldrush_test_demo.sh
) completed successfully
...
echo "Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa"
Done GoldRush-Path + GoldRush-Edit + Tigmint-long + 5 ntLink rounds! Your final assembly can be found in: goldrush_test_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa
Test successful!
When trying my own dataset (~ 170 Gb fastq file), GoldRush terminated after running for ~7 hours with the following error:
...
Loading bloom filter from `/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k20.bf'...
Bloom filter FPR: 0.00121%
Starting K run with k = 20
Flanks inserted into k run = 29
0 unique gaps closed for k20
No start/goal kmer: 2
No path: 24
Unique path: 0
Multiple paths: 0
Too many paths: 2
Too many branches: 0
Too many path/path mismatches: 0
Too many path/read mismatches: 0
Contains cycle: 0
Max cost exceeded: 1
Exceeded mem limit: 0
Skipped: 0
29 flanks left
k20 run complete
Total gaps closed so far = 70
K sweep complete
Creating new scaffold with gaps closed...
New scaffold complete
Gaps closed = 70
70.7%
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-make:48: /dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/batch.ntedited.prepd.sealer_scaffold.upper.fa] Aborted (core dumped)
Traceback (most recent call last):
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 143, in <module>
polishing_stdout, polishing_stderr = run_polishing(
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 96, in run_polishing
raise e
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 78, in run_polishing
sealer_protocol_process = sp.run(
File "/home/duda5/anaconda3/envs/goldrush/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '[" goldrush-edit-make seqs_to_polish=/dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/batch.fa bfs='/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k32.bf /dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k28.bf /dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k24.bf /dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k20.bf' K='32 28 24 20' t=1 /dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/batch.ntedited.prepd.sealer_scaffold.upper.fa "]' returned non-zero exit status 2.
['goldrush-edit-polish-batch', 'batch.fa', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs', '/dev/shm', '6yAujHN4TKm4REbs7yvOvQ', '18', '-k32', '-k28', '-k24', '-k20', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k32.bf', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k28.bf', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k24.bf', '-b/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-k20.bf', '--seq-ids', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/seq_ids', '--bfs-ids-pipe', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-target_ids_input', '--bfs-ready-pipe', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-targeted_bfs/3452-bfs_ready', '--batch-done-pipe', '/dev/shm/6yAujHN4TKm4REbs7yvOvQ-3452/polishing_done', '--threads', '1'] failed!
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:226: w16_x10_golden_path.goldrush-edit-polished.fa] Terminated
make: *** Deleting file 'w16_x10_golden_path.goldrush-edit-polished.fa'
My full command was
goldrush run t=18 G=3e9 k_ntLink=64 w_ntLink=500 reads=ONT_pcd_treads
My system has 20 threads, 128 GB of RAM and no other processes were running in parallel.
In terms of disk space, my SSD only had the fastq
file on it, with ~830 GB of free space.
from goldrush.
I think it the issue might be the lack of CPU threads (?)
I re-ran GoldRush with default command
$goldrush run G=3e9 reads=ONT_pcd_treads
This time I got the same error at the end, but this part looks different (with 84.4% when all (20) threads are used compared to 70.7% when 18 threads are used)
...
K sweep complete
Creating new scaffold with gaps closed...
New scaffold complete
Gaps closed = 92
84.4%
...
Also there was a goldrush-edit
error
...
[2022-12-11 21:13:48][INFO] SeqIndex::SeqIndex: Building index for ONT_pcd_treads.fastq...
[2022-12-11 21:16:09][INFO] SeqIndex::SeqIndex: Done.
[2022-12-11 21:16:09][INFO] SeqIndex::save: Saving index to ONT_pcd_treads.fastq.index...
[2022-12-11 21:16:09][INFO] SeqIndex::save: Done.
[2022-12-11 21:16:10][INFO] Indexes and mappings built.
[2022-12-11 21:16:10][INFO] Subsampling mapped reads to 40
[2022-12-11 21:16:10][INFO] SeqIndex::SeqIndex: Loading index from /media/duda5/w16_x10_golden_path.fa.index...
[2022-12-11 21:16:10][INFO] SeqIndex::SeqIndex: Done!
[2022-12-11 21:16:10][INFO] SeqIndex::SeqIndex: Loading index from /media/duda5/ONT_pcd_treads.fastq.index...
[2022-12-11 21:16:12][INFO] SeqIndex::SeqIndex: Done!
[2022-12-11 21:16:12][INFO] AllMappings::load_paf: Loading PAF mappings from /media/duda5/w16_x10_golden_path.fa.ONT_pcd_treads.fastq.paf...
[2022-12-11 21:16:35][INFO] AllMappings::load_paf: Done!
[2022-12-11 21:16:36][INFO] serve: Accepting batch names at batch_name_input
[2022-12-11 21:16:38][INFO] goldrush-edit-targeted-bfs is ready!
[2022-12-11 21:16:38][INFO] Polishing batches...
Traceback (most recent call last):
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit", line 562, in <module>
polish_seqs(
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit", line 516, in polish_seqs
polish_batch(
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit", line 384, in polish_batch
watch_process(process)
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush_edit_utils.py", line 40, in watch_process
threading.Thread(target=_watch_process, args=(process,), daemon=True).start()
File "/home/duda5/anaconda3/envs/goldrush/lib/python3.9/threading.py", line 899, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:226: w16_x10_golden_path.goldrush-edit-polished.fa] Error 1
make: *** Deleting file 'w16_x10_golden_path.goldrush-edit-polished.fa'
...
from goldrush.
Hi @Duda5,
Glad you got the demo working! For the benefit of future users, I took another look at your initial (pre-Conda) command, and I think another issue was the version of Tigmint (1.2.5). We added updates to Tigmint in 1.2.6 to be able to detect uncompressed fastq files, so that's why it couldn't properly find the input file with v1.2.5. Thanks for bringing this to our attention - we'll make that version requirement clear in the README.
For the goldrush-edit
error - I haven't seen that before, but also do run GoldRush on machines with more threads. The default threads is 48, so would make sense that you'd see that failure when running on your machine with 20 threads. Have you tried with fewer threads? (ex. 14) I will also try to reproduce the issue on my side, but using fewer threads may help based on what the logs are showing.
from goldrush.
Hi @lcoombe,
I tried running GoldRush with 14 threads and this time the error is different (but also in goldrush-edit
)
...
K sweep complete
Creating new scaffold with gaps closed...
New scaffold complete
Gaps closed = 295
87.3%
[2022-12-13 04:13:01][ERROR] Process pipeline: Error on fork.
[2022-12-13 04:13:01][ERROR] Process pipeline: Spawner process failed.
[2022-12-13 04:13:01][ERROR] Process pipeline: Communication failure.
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-make:48: /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/batch.ntedited.prepd.sealer_scaffold.upper.fa] Error 1
Traceback (most recent call last):
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 143, in <module>
polishing_stdout, polishing_stderr = run_polishing(
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 96, in run_polishing
raise e
File "/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-polish-batch", line 78, in run_polishing
sealer_protocol_process = sp.run(
File "/home/duda5/anaconda3/envs/goldrush/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '[" goldrush-edit-make seqs_to_polish=/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/batch.fa bfs='/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k32.bf /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k28.bf /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k24.bf /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k20.bf' K='32 28 24 20' t=1 /dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/batch.ntedited.prepd.sealer_scaffold.upper.fa "]' returned non-zero exit status 2.
['goldrush-edit-polish-batch', 'batch.fa', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs', '/dev/shm', 'xYbFEfAbTnaLUbeaX3zmPQ', '14', '-k32', '-k28', '-k24', '-k20', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k32.bf', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k28.bf', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k24.bf', '-b/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-k20.bf', '--seq-ids', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/seq_ids', '--bfs-ids-pipe', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-target_ids_input', '--bfs-ready-pipe', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-targeted_bfs/3799-bfs_ready', '--batch-done-pipe', '/dev/shm/xYbFEfAbTnaLUbeaX3zmPQ-3799/polishing_done', '--threads', '1'] failed!
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:226: w16_x10_golden_path.goldrush-edit-polished.fa] Terminated
make: *** Deleting file 'w16_x10_golden_path.goldrush-edit-polished.fa'
from goldrush.
Thanks for the update, @Duda5.
Can I ask how large your /dev/shm/
is? Just trying to figure out if the issue could be that shared memory location or if it is the threads part of it.
Also, can you confirm the version of GoldRush that you're using?
Thanks for your patience with troubleshooting - I haven't been able to reproduce the issue yet on my end, so all this information should hopefully help us figure this out!
from goldrush.
Hi @lcoombe,
No worries, GoldRush is my only hope to assemble anything from Nanopore data at the moment due to high RAM requirements of other packages, so I am happy to help with troubleshooting.
My GoldRush version is v1.0.2
.
As for /dev/shm/,
it was half of my system's RAM by default (63G).
I increased it to 126G and re-run GoldRush.
The error is the same as one from my early posts:
...
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush-edit-make:48: /dev/shm/cmpFeTuLRQmEkOL1Q1ejDw-2751/batch.ntedited.prepd.sealer_scaffold.upper.fa] Aborted (core dumped)
...
This time I also created a shell
script that outputs usage of /dev/shm/
every 20 seconds during the GoldRush run.
It stays at 0 most of the time until goldrush-edit
is in progress and then increases to 1.1 G at most, before dropping back to 38M.
So it's unlikely to be a limiting factor (?)
from goldrush.
Thanks for the info, @Duda5!
Hmm, yes you're right that the /dev/shm
is unlikely to be the culprit given your tracking. It's strange that you are getting different error messages when you change that, though. The fact that the test demo works fine, but you're seeing issues with the real data does seem to suggest something going on on the resource allocation side.
Could you share the result of ulimit -a
on your machine? It does appear that there is something going on with allocation of resources (whether it be memory or threads), so that would be good to see as a sanity check.
@vlad0x00 - do you have any ideas?
from goldrush.
Sure, here is the output of ulimit -a
on my system
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 513891
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 8192
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
from goldrush.
I think we used to have this problem in our lab and the culprit was this:
max user processes (-u) 8192
This is a hard limit on the number of processes (which I think also includes threads aka lightweight processes on Linux). An HPC machine would ideally have this limit set to a high number (at least 2x the current number). This does require root access, so you might need to ask this from the system admin.
Alternatively, you can further decrease the number of threads used by goldrush which should make the error go away. As far as I remember, goldrush-edit spawns more threads/processes than specified and this is likely causing the issue.
from goldrush.
Yes definitely! @jowong4 remembered that we do also have something similar (but for stack size) in the abyss-pe
makefile
from goldrush.
Okay, so this time goldrush-edit
did manage to complete but at the cost of overall system performance - the screen response time (on logging in, navigating or opening any applications) was delayed by 30-40 sec.
At some point I was thinking to kill it, but I had htop
running in one of the open terminal windows, which indicated that GoldRush was not stuck.
Glitches disappeared only after I re-rebooted, so I am not sure whether upping max user processes
to 256000 is a good long-term solution (?)
As for the assembly, it had not been completed and GoldRush exited during ntLink gap-filling
with the following error:
...
Done ntLink! Final post-ntLink scaffolds can be found in: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.fa
/home/duda5/anaconda3/envs/goldrush/bin/share/ntlink-1.3.6-0/bin/ntlink_patch_gaps.py --path w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.path \
--mappings w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.verbose_mapping.tsv -s w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa --reads ONT_pcd_treads.fastq -o w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.gap_fill.fa --large_k 64 --min_gap 1 \
--trims w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.tsv -k 20 -w 10
Running ntLink gap-filling...
Parameters:
--path w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.path
--mappings w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.verbose_mapping.tsv
--trims w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.trimmed_scafs.tsv
-s w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa
--reads ONT_pcd_treads.fastq
-z 1000
-k 20
-w 10
-t 4
--large_k 64
-x 0
--min_gap 1
-o w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.gap_fill.fa
2022-12-16 13:30:17.384275 Reading ntLink read mappings..
2022-12-16 13:30:45.889062 Reading scaffolds..
2022-12-16 13:30:47.841054 Reading trim coordinates..
2022-12-16 13:30:47.856365 Choosing best read..
2022-12-16 13:30:47.867963 Collecting reads...
[2022-12-16 13:32:14][ERROR] SeqReader: Quality string length (47399) does not match sequence length (34349).
[2022-12-16 13:32:14][ERROR] SeqReader: Unexpected character in a FASTQ file.
make[2]: *** [/home/duda5/anaconda3/envs/goldrush/bin/share/ntlink-1.3.6-0/ntLink:249: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.gap_fill.fa.k64.w500.z1000.ntLink.scaffolds.gap_fill.fa] Segmentation fault (core dumped)
make[2]: Leaving directory '/media/duda5/'
make[1]: *** [/home/duda5/anaconda3/envs/goldrush/bin/share/ntlink-1.3.6-0/ntLink_rounds:122: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.ntLink.ntLink.ntLink.gap_fill.fa] Error 2
make[1]: Leaving directory '/media/duda5/'
make: *** [/home/duda5/anaconda3/envs/goldrush/bin/goldrush.make:252: w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k64.w500.z1000.ntLink.gap_fill.5rounds.fa] Error 2
I performed adapter trimming on my ONT reads with porechop-abi (I had to split the file in 2 and then merge back into 1 due to RAM limitation):
fastqsplitter -i ONT_mreads.fastq.gz -t 15 -o ONT_mreads_part1.fastq.gz -o ONT_mreads_part2.fastq.gz
porechop_abi -t 18 -v 2 -i ONT_mreads_part1.fastq.gz -o ONT_pcd_treads_part1.fastq.gz
porechop_abi -t 18 -v 2 -i ONT_mreads_part2.fastq.gz -o ONT_pcd_treads_part2.fastq.gz
So now I am wondering whether splitting or adapter trimming could have caused this error and why it hasn't appeared earlier
from goldrush.
Hi @Duda5,
I'm glad that the polishing worked for you in the end with that tweak! It also worked on my end, although we didn't have any glitches on our server (Although the server I was using does have >100 threads). The difficulty is that max user processes
essentially just prevents the previous errors you were seeing (it doesn't itself control the number of processes). So, I don't think there's an alternative easy solution for that one other than reducing threads further (unless you have any other ideas @vlad0x00 ?)
For the new error, having entries where the quality score is longer than the sequence length would be an issue with the file itself, and perhaps porechop-abi is the culprit, as you say. I have not used fastqsplitter or porechop-abi myself, but have you found these cases in the file? Ie. as a sanity check, I would read through the file using bioawk
or something similar, and print the lengths of the sequence and quality. That should give a quick idea of whether this is an edge case due to file splitting, or a more systematic issue. The steps that you have done so far should still be fine, by the way, but it's important to know what could be wrong with the file for the ntLink step.
from goldrush.
Hi @lcoombe,
I ran the following command to identify reads in which the length of the quality string does not match the length of the read sequence and strangely, it returned nothing
bioawk -c fastx 'length($seq) != length($qual) {print length($seq), length($qual), $name}' ONT_pcd_treads.fastq
To inspect the output manually, I ran:
bioawk -c fastx '{print length($seq), length($qual), $name}' ONT_pcd_treads.fastq > bioawk_out
And then searched for 47399 and 34349 values using grep
However, I could not find any cases that caused this GoldRush error.
[2022-12-16 13:32:14][ERROR] SeqReader: Quality string length (47399) does not match sequence length (34349).
Is it possible to re-run ntLink gap-filling
using the command that I posted in my previous comment or do I need to re-run GoldRush from the beginning?
from goldrush.
Hi @Duda5,
That's strange...I wonder what is triggering that error then...
If you think it could be some transient system issue or something like that, yes you can re-launch the same command, and it should pick-up where it left off (ie. you don't need to start from the beginning). If you want to double check it's starting again where you expect, you can run the same command, but specify that you want a dry-run (-n
)
from goldrush.
Related Issues (20)
- core dump in goldrush-edit HOT 7
- zsh & pipefail HOT 7
- Combine silver path and golden path generation in one step
- optimize parameters for large genome HOT 6
- Using existing Draft Assembly in place of Gold Path Input HOT 3
- large number of 1bp or small contigs HOT 4
- Goldrush for PacBio or ONT + PacBio HOT 3
- ONT read lengths mostly under 10 kb HOT 2
- goldrush for generating UMI consensus? HOT 6
- Debian Installation issues - memory free issue on test dataset HOT 9
- Error when running goldrush_test_demo HOT 3
- Question about warning and output HOT 3
- bc unknown command HOT 4
- Can goldrush be used to assemble hifi-reads? HOT 2
- Dependencies not found with conda environment. HOT 3
- Filename should not contain . HOT 10
- miller required for goldrush_test_demo.sh HOT 2
- tmpdir/shared_dir pass through from parent goldrush? HOT 2
- Add `ulimit -u` command in driver makefile HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goldrush.