Comments (10)
How many adapters got removed, and how many not?
AfterQC trims adapters for pair-end sequencing data based on searching for best overlap. But if there are too many sequencing errors, it will make AfterQC failed to find the overlap and consequently failed to trim adapters.
Recording the adapter information in the report is a good idea, I will implement it in future release.
from afterqc.
multiqc report before any processing
multiqc report after AfterQC
Also quick question, what would happen if I run AfterQC again on good reads?
from afterqc.
If you run AfterQC with good reads, you should get every good read being kept.
But you can have a try.
from afterqc.
From the figures you post, seems that the adapters those cannot be trimmed are at positions < 40, which means these DNA templates are shorter than 40?
from afterqc.
Yes, which is weird. The original paper says:
KneadData. KneadData incorporates Trimmomatic and bowtie2 for filtering and human sequence removal, respectively. Reads were scanned with a four-base wide sliding window and trimmed when the average base Phred score drops below 20. Trimmed reads shorter than 70 nt were discarded.
So, I'm not sure if they trimmed adapters.
For me it is important to understand how AfterQC works, e.g. if what I got is expected, because of many sequencing errors, which makes AfterQC fail to find the overlap, and if it is advised to apply additional trimming after AfterQC in such case.
from afterqc.
It's not the major reason that sequencing error results in overlap detection failure, I think less than 1%.
From the figure you post, the major reason is that the DNA templates are too short. AfterQC requires at least 30bp overlap to detect adapters, that's why the adapters under or near position 30 are not trimmed but the adapters beyond position are all trimmed.
I am very curious why your sequencing library contains so many short fragments, I suspect that most of them may be self-ligased adapters. Did you perform fragment length selection after adapter ligation?
from afterqc.
I'm curious myself. It's not my article, I'm analyzing public dataset.
Here is the article. Please, read the section "Shotgun library sequencing and quality control". But regarding sequencing protocol they reference one more article, so I'm unable to answer your question currently, need to read more.
Additionally, here is sequence quality after AfterQC (raw is similar, but worse):
from afterqc.
Ok, seems the quality drop cycle by cycle is a bit serious, and there are two gaps...
from afterqc.
Yes, I thought that the gaps are bubble related. That's how I found AfterQC, because I googled the tool, which can correct that. But AfterQC didn't find anything. So I have no idea how those drops are generated.
from afterqc.
UPDATE. For NexteraXT transposon protocol when the insert is shorter than the length of a single read it results in the presence of adapter at the end of the read. Article, blog post, biostars post.
But for metagenomics datasets I don't think we can identify read pairs with small insert sizes prior. I will try to play with fastqc report by increasing k-mer size and see if I can identify the issue by looking on k-mers (i.e. without knowing nextera adapters sequence).
I think AfterQC can deal with this particular problem, but for that you either need to check against particular sequence or change the trimming algorithm somehow. But anyway just having adapters-related qc report would be great. I don't think that the tool should cover all the possible situations, but it should provide the user with information to see potential problems.
from afterqc.
Related Issues (20)
- Removal of PCR/RNA primers HOT 1
- Tool to keep reads where all bases are above a specific quality score. HOT 2
- output gzipped data HOT 3
- ValueError: max() arg is an empty sequence HOT 3
- AterQc HOT 2
- Report
- output files are truncated
- filter only for poly-X but nothing else HOT 9
- Afterqc with pypy HOT 3
- Issue with overlap analysis HOT 4
- Read length distributon after processing HOT 7
- question about AfterQC/preprocesser.py
- AfterQC total bases calculation HOT 1
- Float division by zero in circledetector.py
- AfterQC in FASTQ joined
- Error despite creating env with 2.7 in conda
- Remove overrepresented sequences
- String index out of range. HOT 1
- bubble
- My bioconda install Python version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from afterqc.