Giter VIP home page Giter VIP logo

Comments (11)

Hoohm avatar Hoohm commented on July 26, 2024

Hello @bassanio
did you already go through the documentation? If so, could you maybe tell me specifically what you need help with?

from cite-seq-count.

mbassalbioinformatics avatar mbassalbioinformatics commented on July 26, 2024

Hi
I guess to the same spirit as the previous question.

So in the documentation you outline the structure of R1 with the UMI position and then R2 with the Ab barcode data. You also mention how you provide the tag.csv file which will take the input fq files, and generate counts based on the Ab barcodes provided in the csv. That part makes sense.

Now my question is, where does the barcode info for the HTO come into play? Where do you specify those and where does cite-seq-count deal with that? Do i need to run cite-seq-count twice, once for the Ab barcodes and then a 2nd time for the hto? Or do I make a single csv file with the hto and Ab sequences and let cite-seq-count loose on all of it in 1 go?

(I have 1 file of the format [say hto.csv]...

XXXXXX,hashtag1
YYYYYY,hashtag2

... and a 2nd file of format [say abs.csv]...

AAAAAA,Ab1
BBBBBB,Ab2

are you able to provide pseudo-code/commands as to how to run cite-seq-count for each of hto.csv and abs.csv to get the desired counts required for progressing...?)

The 2nd question, assuming now that we deal with the hto/Ab situation. The next step would require loading this information into Seurat for integration, is that correct?

from cite-seq-count.

Hoohm avatar Hoohm commented on July 26, 2024

So depending on how your libraries habve been sequenced, you ocan run everythint together.
You should have fastqs for ABs and fastqs for HTO.

Does cellranger give you the output you need for the ABs?

If so, you only need to run CSC on the HTO.

You can make a tsg.csv with all your HTO tags and all your AB tags, CSC will try and match all of those on the fastqs you provide.

Pseudo code is very simple.

  1. Take a read from R2, try and match any of the tags provided in the tags.csv from the start of the read (or from the first base given by the -start-trim), if not found, flag as unmapped.
  2. Do some cell aggregation
  3. UMI aggregation
  4. Produce read and umi count matrices

Yes, you need then to load up the results into Seurat to do the demultiplexing.

from cite-seq-count.

mbassalbioinformatics avatar mbassalbioinformatics commented on July 26, 2024

I have fq for the ab's and for the hto's seperate to the expression data (ie the fq have been split into the different samples, and each sample has its corresponding ab + hto fq files)

So if i understand you correctly i need to run cellranger on the ab+hto fq separately to get the counts matrix for those, right? and a 2nd run of cellranger on the expression fq files for those counts?

After which i just run CSC on the ab+hto-fq's with

CITE-seq-Count -R1 ab-HTO_R1.fastq.gz -R2 ab-HTO_R2.fastq.gz \
-t TAG_LIST_HTO-Ab.csv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 20000 -o ./out/

did i understand you correctly?

and from there into R for the rest 👍

from cite-seq-count.

Hoohm avatar Hoohm commented on July 26, 2024

from cite-seq-count.

bassanio avatar bassanio commented on July 26, 2024

Hi ,

I have tried to run the citeseq using the below command and I have got the following error.

I am also confused with R2 and R3 because for me I am finding the ABs in the R3 and not in R2.

CITE-seq-Count  \
-R1 hto_S3_L001_R1_001.fastq.gz\
 -R2 hto_S3_L001_R3_001.fastq.gz \
 -t TAGS.txt \
-cbf 1 -cbl 16 -umif 17 -umil 26 -cells 13641 \
-o RESULT

Tag File

ACCCACCAGTAAGAC,First_P1_Undivided
GGTCGAGAGCATTCA,Second_P2_late_dividers
CTTGCCGCATGTCAT,Third_P3_Early_dividers

** Executing the above command with Warning and issue error**

Read1 length is 51bp but you are using 26bp for Cell and UMI barcodes combined.
This might lead to wrong cell attribution and skewed umi counts.

Counting number of reads
Started mapping
Processing 10,651,191 read
CITE-seq-Count is running with XX cores.
Mapping done for process 2006672. Processed 166,424 reads
Mapping done for process 2006674. Processed 166,424 reads
Mapping done for .......
Mapping done for process 2006731. Processed 166,424 reads
Mapping done
Merging results
Correcting cell barcodes
Looking for a whitelist

Collapsing cell barcodes
Correcting umis
Traceback (most recent call last):
  File "/home/.local/bin/CITE-seq-Count", line 8, in <module>
    sys.exit(main())
  File "/home/.local/lib/python3.9/site-packages/cite_seq_count/__main__.py", line 435, in main
    ) = processing.correct_umis(
  File "/home/.local/lib/python3.9/site-packages/cite_seq_count/processing.py", line 229, in correct_umis
    for TAG in final_results[cell_barcode]:
RuntimeError: dictionary keys changed during iteration

HTO R1 :
Screen Shot 2023-05-16 at 11 23 17 AM

HTO R2 :
Screen Shot 2023-05-16 at 11 23 39 AM

HTO R3 :
Screen Shot 2023-05-16 at 11 24 09 AM

grep AB TAG in R3 :

Screen Shot 2023-05-16 at 11 26 28 AM

Some AB barcodes does not start correctly as shown in the example

from cite-seq-count.

cpflueger2016 avatar cpflueger2016 commented on July 26, 2024

@bassanio try to setup a conda environment with python version 3.7.16 and run it again. I have had no luck with any python version > 3.7. The error is actually an issue with changes in the pandas package. If you restrict python to 3.7.16, pip install CITE-seq-Count==1.4.5 will pull the correct pandas package version. good luck!

from cite-seq-count.

bassanio avatar bassanio commented on July 26, 2024

@cpflueger2016 : Thanks for the information I will do the same.

Can you also help me in understanding in R2 and R3 fastq files

from cite-seq-count.

cpflueger2016 avatar cpflueger2016 commented on July 26, 2024

Yea, so if you get the index read from the i7 index parsed out (there is an option in bcl2fastq), your read2 is actually the index of the library and read3 is truly the second read.

from cite-seq-count.

bassanio avatar bassanio commented on July 26, 2024

@cpflueger2016 : I have this warning message in the top

Read1 length is 51bp but you are using 26bp for Cell and UMI barcodes combined"

Should I change the umil to 51 ? do this has some affect on the analysis

from cite-seq-count.

Hoohm avatar Hoohm commented on July 26, 2024

This is not going to affect the analysis. Back in the day I wanted to make sure people knew what they were running and catch potential wrong lengths.
In hindsight this might have been a mistake as it confuses users more than anything.

Is your general issue resolved, can I close this one?

from cite-seq-count.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.