Giter VIP home page Giter VIP logo

Comments (18)

dianitasusilo avatar dianitasusilo commented on August 30, 2024

Hi I have same problem here.
How did you extract that sequence from fastq and determine the start trim point?
do we use R2 fastq for it?
Thank you

from cite-seq-count.

sopenaml avatar sopenaml commented on August 30, 2024

from cite-seq-count.

dianitasusilo avatar dianitasusilo commented on August 30, 2024

Thanks for your quick reply but in my case they are not in the same position...

Here I looked for my hashtag nucleotide in the raw R2 fastq files, and it turned out like this.
image

Or did I use wrong fastq file as input?

from cite-seq-count.

sopenaml avatar sopenaml commented on August 30, 2024

from cite-seq-count.

Hoohm avatar Hoohm commented on August 30, 2024

Happy to see fellow users help each other.

@dianitasusilo maybe a sliding window approach might help yes: --sliding-window is the option you are looking for.

@sopenaml Could you check out of your barcodes in R1 are overlapping? It might be a mapping between barcodes similar to totalSeqB.

from cite-seq-count.

sopenaml avatar sopenaml commented on August 30, 2024

from cite-seq-count.

Hoohm avatar Hoohm commented on August 30, 2024

from cite-seq-count.

sopenaml avatar sopenaml commented on August 30, 2024

Hi Patrick,

I've checked my cite-seq ab barcodes agains R1 and I don't see any matches. If I check my cell hashing barcodes, there's one that finds few (7 ) matches on R1, but the rest none. So it's not that my barcodes are overlapping with cell barcodes. Any other ideas of what the problem may be? Thanks

from cite-seq-count.

drlaurenwasson avatar drlaurenwasson commented on August 30, 2024

Hi,
I have the same problem, where I can grep my HTO out of read 2 but still get 100% reads unmapped. I am running 1.4.5 using Python 3.9. Do we know what the solution to this issue is?

from cite-seq-count.

sopenaml avatar sopenaml commented on August 30, 2024

from cite-seq-count.

Hoohm avatar Hoohm commented on August 30, 2024

Hey @sopenaml,
I need to rephrase what I mentioned earlier.
Depending on what chemistry kit you used, it's possible that your R1 barcodes(cell barcodes) linked to one library (GEX, VDJ, ADTs) are linked to one cell barcode and your HTOs are linked to another cell barcode in the same cell.
This means that when you do your overlap, it's going to be very low because the barcodes need to be translated.

Here is the translation matrix.
https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz

Is it a bit clearer?

from cite-seq-count.

stepanovacz avatar stepanovacz commented on August 30, 2024

Hi everyone,

I am running into an issue, where I do have about ~35% unmapped reads. Is there a way to bring that number up? Grepping the R2 file, shows that start trim needs to be --start-trim 0
Grep_R2
I used 10xv3
Attached are the tags
tags.csv

Here is the is what I run to get such output:

CITE-seq-Count -T ${numThreads} \ -R1 ${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L001_R1_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L002_R1_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L003_R1_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L004_R1_001.fastq.gz \ -R2 ${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L001_R2_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L002_R2_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L003_R2_001.fastq.gz,${fastq_path}${nova_id}outs/fastq_path/${fcid}/sham_hto/${outname[$index]}_KS220601_batch1_HTO_${snum[$index]}_L004_R2_001.fastq.gz \ -t tags.csv -cbf 1 -cbl 16 -umif 17 -umil 28 -cells 5000 --sliding-window --start-trim 0 \ -o /project/CiteSeq7_sham \

Here is the the output I get:
Date: 2022-07-13
Running time: 6.0 minutes, 37.25 seconds
CITE-seq-Count Version: 1.4.5
Reads processed: 3575668
Percentage mapped: 64
Percentage unmapped: 36
Uncorrected cells: 0
Correction:
Cell barcodes collapsing threshold: 1
Cell barcodes corrected: 16075
UMI collapsing threshold: 2
UMIs corrected: 12836
Run parameters:
Read1_paths: _S17_L004_R1_001.fastq.gz
Read2_paths: _S17_L004_R2_001.fastq.gz
Cell barcode:
First position: 1
Last position: 16
UMI barcode:
First position: 17
Last position: 28
Expected cells: 5000
Tags max errors: 2
Start trim: 0

Thank you in advance!

from cite-seq-count.

stepanovacz avatar stepanovacz commented on August 30, 2024

I am able to bring the number of mapped reads above 90, by setting --max-error 6 or higher. However, I do not think that is it a good solution as I get plenty of doublets and negatives Doublet 1868 Negative 35 Singlet 394 . Any idea what else I can do?
Thank you!

from cite-seq-count.

Hoohm avatar Hoohm commented on August 30, 2024

Would you be able to send me a sample of your data so that I can run it and have a look?

from cite-seq-count.

stepanovacz avatar stepanovacz commented on August 30, 2024

from cite-seq-count.

Hoohm avatar Hoohm commented on August 30, 2024

I asked for access

from cite-seq-count.

Hoohm avatar Hoohm commented on August 30, 2024
results/unmapped.csv 
tag,count
AAGCAGTGGTATCAA,38893
GGGGGGGGGGGGGGG,20759
CCGTACCTCAAAAAA,17644
GCAGTGGTATCAACG,10879
TTCCTGCCAAAAAAA,5855
GTGGTATCAACGCAG,5442
AGCAGTGGTATCAAC,4087
CCGTACCCCAAAAAA,3959
CAGTGGTATCAACGC,3894

It seems pretty reasonable from what I see in the first sample. The unmapped.csv gives you the top sequences that are not mapping. 22% of polyG, means no sequence there or could not be read

Why do you need to get higher?

I want to make sure about the translation issue. Do you have a high overlap between the cells from the RNA side and the HTO?

from cite-seq-count.

leeanapeters avatar leeanapeters commented on August 30, 2024

Hi, is the translation matrix used only with v3 chemistry? I seem to have a similar problem where grep doesnt return barcodes in my fastq R2 for which I know exist in my data after cellranger. I used the 5' v2 chemistry with gex, vdj and feature barcode libs.

Thanks

Leeana

from cite-seq-count.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.