Giter VIP home page Giter VIP logo

Comments (16)

mizraelson avatar mizraelson commented on July 18, 2024 1

Hi,

You are rigth – a 9 bp UMI is quite short. As such, we're seeing about half the reads being dropped due to multiple CDR3s being assigned to the same UMI. Considering the UMIs are attached to multiple V gene primers, a good way around might be to include a few nucleotides right after the UMI, potentially increasing diversity. I'd recommend giving this a go: --tag-pattern "^(UMI:N{15})(R1:*)" or maybe even longer to capture the difference between primers. If that's not cutting it, let me know, and we can tinker around with the parameters to try and save more reads. Although a non-UMI approach is also a good choice since MiXCR has very poverful error-correction algorithms even for data without barcodes.

As for the CDR3 discrepancy. In the paper they do include an extra amino acid from the FR4 (sourced from the J gene) within the CDR3. The reasoning behind this addition isn't entirely clear. While some researchers opt to exclude the initial and final amino acids from the CDR3 definition (e.i. IMGT), adding an extra one is a bit weird However, since this particular amino acid stems from the J gene – which both methods identify correctly – you can safely consider the clones equivalent.

For a quick comparison:

  • CASSQDSDPQGLFAGNTIYFG (from the paper)
  • CASSQDSDPQGLFAGNTIYF (MiXCR)

Check out this link, and you'll see that the terminal 'G' belongs to the FR4.

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024 1

Actually, the 15bp UMI looks much better; the number of unassigned alignments has dropped from 53% to 7.8%! Additionally, 85% of the reads are used in clonotype assembly.

I would suggest going with the 15bp UMI. While it's not perfect, it still allows you to leverage the UMI to correct the data effectively.

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

HI, yes, of course, could you please share the size of the UMI?

from mixcr.

bshim181 avatar bshim181 commented on July 18, 2024

Size of the UMI is 7 bp long. it is pretty short.

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

It's pretty short. I must say I have read this paper before and contacted the authors on the matter of sharing the data (because, as far as I'm concerned, the raw data is not publicly available), so I can tune the preset, but I never heard back from them. If you have raw data generated by this protocol that you can share with us (the same goes for the single-part data from this publication) - that would be of great help.
Nevertheless, here is the command I suggest using, judging from the scheme:

mixcr analyze generic-amplicon-with-umi \
    --species hsa \
    --rna \
    --tag-pattern "^(R1:*)\(UMI:N{10})(R2:*)" \
    --floating-left-alignment-boundary \
    --floating-right-alignment-boundary C \
      input_R1.fastq.gz \
      input_R2.fastq.gz \
      result

Because the UMI is quite short, I suggest trying to include a few more letters from the TRBC/TRAC primer, which will at least increase the diversity two-fold.

from mixcr.

bshim181 avatar bshim181 commented on July 18, 2024

Hello, I do have the bulk and single cell data generated from this protocol. I would probably have to converse with the data generator because the data that we have is a clinical data of origin. I will get back to you once I talked with the developer and get back to you.

in terms of the bulk data, would single pair of the Fastq file suffice? ( R1 and R2 ) Also for the single cell data, Would you need all fastq files for the entire batch(it will be 384 pairs of fastq files in total)?

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

A single pair of files will be enough for our purposes. In the case of Single-cell analysis, it's better to see the full picture, as the filtering process includes all cells. If needed, we can provide a secure SFTP server for the data transfer.

Nevertheless, I recommend you try the commands suggested and we can see how well it worked, as these generic presets should cover most cases.

from mixcr.

bshim181 avatar bshim181 commented on July 18, 2024

the command above throws an error stating that, "Could not invoke public final void com.milaboratory.mixcr.cli.AlignMiXCRMixins.floatingRightAlignmentBoundary(java.lang.String) with /jsimonlab/users/bshim/BMS-Bulk-Reads/BMS-61_S1_L001_R1_001.fastq.gz (java.lang.IllegalArgumentException: Unknown point: /jsimonlab/users/bshim/BMS-Bulk-Reads/BMS-61_S1_L001_R1_001.fastq.gz)"

why might this be?

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

Please try the following:

mixcr analyze generic-amplicon-with-umi \
    --species hsa \
    --rna \
    --tag-pattern "^(R1:*)\(UMI:N{10})(R2:*)" \
    --floating-left-alignment-boundary \
    --floating-right-alignment-boundary C \
      input_R1.fastq.gz \
      input_R2.fastq.gz \
      result

from mixcr.

bshim181 avatar bshim181 commented on July 18, 2024

I am also in the process of getting access to sample data for both single and bulk library which we can share to you. I will let you guys know as soon as possible.

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

Upon analyzing the bulk dataset, I see that as expected, such a short UMI sequence leads to a high number of distinct clones within a single UMI group, which in some cases makes it hard to assemble consensus. I tweaked the parameters in the example below to recover as many clones as possible.

mixcr analyze generic-amplicon-with-umi \
  --species hsa \
  --rna \
  --tag-pattern "^(R1:*)gaagcaga\^(UMI:N{11}) || ^(R1:*)taccagct\^(UMI:N{11})" \
  --floating-left-alignment-boundary \
  --floating-right-alignment-boundary C \
  -Massemble.consensusAssemblerParameters.assembler.maxIterations=10 \
  -Massemble.consensusAssemblerParameters.assembler.minRecordSharePerConsensus=0.01 \
  -Massemble.consensusAssemblerParameters.assembler.minRecursiveRecordShare=0.01 \
  -Massemble.consensusAssemblerParameters.assembler.maxConsensuses=10 \
  input_R1_001.fastq.gz \
  input_R2_001.fastq.gz \
  output

Nevertheless, it is strongly recommended using a longer UMI, as in this case it doesn't really mark unique molecules, thus de facto is not a true UMI.
Alternatively, you can analyze the data ignoring the UMI sequence. In your case there is no need in R2 file at all then, as it doesn't cover anything but a portion of C gene.

mixcr analyze generic-amplicon \
 --species hsa \
 --rna \
 --tag-pattern "^(R1:*)gaagcaga || ^(R1:*)taccagct" \
 --floating-left-alignment-boundary \
 --floating-right-alignment-boundary C \
 input_R1_001.fastq.gz \
 output

Sincerely,
Mark

from mixcr.

bshim181 avatar bshim181 commented on July 18, 2024

Could you explain little bit about the tag pattern used here?
are the 8bp sequences after ^(R1:*), index 1 and index 2 for every sample? What do the 8bp sequences exactly represent here?
Also, single cell presets are still in a working progress I am assuming?

from mixcr.

mizraelson avatar mizraelson commented on July 18, 2024

In your R1 files the reads have UMI and Illumina indices at the end. These 8bp is the small part of C gene at the very end of the payload sequence (that is most likely comes from the primer) that I use to trim artificial barcode sequences.
The single-cell preset is still work in progress, I will get back to you with it later this week.

from mixcr.

jxshi avatar jxshi commented on July 18, 2024

Hi @mizraelson,

I recently read one paper entitled TCR sequencing and cloning methods for repertoire analysis and isolation of tumor-reactive TCRs. In this paper, they introduced one TCR sequencing method for RNA extracted from T cells under the name SEQTR. The library structure is [UMI 9 bases][VDJ][C constant region], and the sequencing strategy is SE150. I downloaded the raw sequencing files from GEO website and analyzed GSM7061297 (SRR23603384) with the following protocol:

# Step 1. Trim adaptor.
fastp -i SRR23603384_1.fastq.gz -o SRR23603384_trimmed_1.fastq.gz -w 8

# Step 2. Analyze the data with UMI assigned as the first 10 bases. 
# From the supplementary file of the paper, I learned that the 9-base UMI is HHHHHNNNN, 
# Then I calculated the presence of G in the first 9 bases of each trimmed fastq, it turned 
# out that the first base had a higher frequency of G. So I chose to use the first 10 bases
# as UMI. Maybe I should have chosen 1 to 10 bases as UMI?

mixcr analyze generic-amplicon-with-umi \
	--threads 16 \
	--species hsa \
	--rna \
	--rigid-left-alignment-boundary \
	--floating-right-alignment-boundary C \
	--tag-pattern '^(UMI:N{10})(R1:*)' \
	-Massemble.consensusAssemblerParameters.assembler.maxIterations=6 \
	-Massemble.consensusAssemblerParameters.assembler.minRecordSharePerConsensus=0.02 \
	-Massemble.consensusAssemblerParameters.assembler.minRecursiveRecordShare=0.1 \
	-Massemble.consensusAssemblerParameters.assembler.maxConsensuses=6 \
	../fastqs/SRR23603384_trimmed_1.fastq.gz \
	output

# Alternative Step 2. Ignore UMI and run mixcr by trimming the first 10 bases.
# After read this post and several post discussing UMI, I think 9-base UMI is too short.

mixcr analyze generic-amplicon \
	--threads 16 \
	--species hsa \
	--library imgt \
	--rna \
	--rigid-left-alignment-boundary \
	--floating-right-alignment-boundary C \
	--tag-pattern '^N{10}(R1:*)' \
	../fastqs/SRR23603384_trimmed_1.fastq.gz \
	noUMI

The qc output for Step 2 is:

  Successfully aligned reads:                           97.36% [OK]
  Off target (non TCR/IG) reads:                        0.27%  [OK]
  Reads with no V or J hits:                            2.36%  [OK]
  Reads with no barcode:                                0.0%   [OK]
  Alignments that do not cover CDR3:                    0.48%  [OK]
  Tag groups that do not cover CDR3:                    0.018% [OK]
  Barcode collisions in clonotype assembly:             86.56% [ALERT]
  Unassigned alignments in clonotype assembly:          53.29% [ALERT]
  Reads used in clonotypes:                             44.95% [ALERT]
  Alignments dropped due to low sequence quality:       1.75%  [OK]
  Clones dropped in post-filtering:                     0.0%   [OK]
  Alignments dropped in clones post-filtering:          0.0%   [OK]
  Reads dropped in tags error correction and filtering: 0.93%  [OK]
  UMIs artificial diversity eliminated:                 12.31% [OK]
  Reads dropped in UMI error correction and whitelist:  0.0%   [OK]
  Reads dropped in tags filtering:                      0.93%  [OK]

The qc output for Alternative step 2 is:

  Successfully aligned reads:                     97.36% [OK]
  Off target (non TCR/IG) reads:                  0.32%  [OK]
  Reads with no V or J hits:                      2.31%  [OK]
  Reads used in clonotypes:                       95.62% [OK]
  Alignments that do not cover CDR3:              0.48%  [OK]
  Alignments dropped due to low sequence quality: 2.10%  [OK]
  Clones dropped in post-filtering:               0.0%   [OK]
  Alignments dropped in clones post-filtering:    0.0%   [OK]

Then I compared the output files of TRB.tsv with the results published by the authors. I found there is one amino acid difference for the most abundant clones.
For example, the first five line from the results published by the authors reads:

#CDR3_sequence	Count	TRBV	TRBJ	Frame	CDR3_aaseq	CDR3_length
TGCGCCAGCAGCCAAGATTCCGATCCCCAGGGGCTGTTTGCGGGAAACACCATATATTTTGGA	174591	hTRBV04-3	hTRBJ01-3	IN	CASSQDSDPQGLFAGNTIYFG	21
TGTGCCAGCAGCCAAGGGACAGGACGGTCTTCACCCCTCCACTTTGGG	158629	hTRBV03-1	hTRBJ01-6	IN	CASSQGTGRSSPLHFG	16
TGTGCCAGCTCACCGACAGGGGAGGCCACTGAAGCTTTCTTTGGA	127792	hTRBV18	hTRBJ01-1	IN	CASSPTGEATEAFFG	15
TGCCAGCAGCTCTTAGCGCAATCCGTTCTTCGGG	87563	hTRBV21	hTRBJ02-1	OUT	_	_
TGTGCCAGCAGTTTCCCGGATACGCAGTATTTTGGC	80302	hTRBV28	hTRBJ02-3	IN	CASSFPDTQYFG	12

For the results from Step 2, the first five line reads:

cloneId readCount       readFraction    uniqueMoleculeCount     uniqueMoleculeFraction  targetSequences targetQualities allVHitsWithScore       allDHitsWithScore       allJHitsWithScore   allCHitsWithScore        allVAlignments  allDAlignments  allJAlignments  allCAlignments  nSeqCDR3        minQualCDR3     aaSeqCDR3       refPoints
0       162907.0        0.031750372110455366    17152   0.027187463840134162    TGCGCCAGCAGCCAAGATTCCGATCCCCAGGGGCTGTTTGCGGGAAACACCATATATTTT    [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ TRBV4-3*00(563.6)       TRBD1*00(30)    TRBJ1-3*00(458.2)       TRBC1*00(50.5)  347|365|384|0|18||180.0 16|22|36|27|33||30.0    24|42|70|42|60||180.0           TGCGCCAGCAGCCAAGATTCCGATCCCCAGGGGCTGTTTGCGGGAAACACCATATATTTT 58      CASSQDSDPQGLFAGNTIYF    :::::::::0:1:18:27:-4:-2:33:42:-4:60:::
3       82214.0 0.016023406561344676    8470    0.013425712379077446    TGTGCCAGCAGCCAAGGGACAGGACGGTCTTCACCCCTCCACTTT   [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[   TRBV3-1*00(521.2),TRBV3-2*00(520)    TRBD1*00(35)    TRBJ1-6*00(437.8)       TRBC1*00(141.5) 347|363|384|0|16||160.0;347|363|384|0|16||160.0 13|20|36|16|23||35.0    29|45|73|29|45||160.0           TGTGCCAGCAGCCAAGGGACAGGACGGTCTTCACCCCTCCACTTT        58      CASSQGTGRSSPLHF :::::::::0:-1:16:16:-1:-4:23:29:-9:45:::
1       81189.0 0.01582363533350783     10114   0.016031600354426127    TGTGCCAGCAGTTACGGGACAGTCTCTGGAAACACCATATATTTT   [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[   TRBV6-5*00(243.3)   TRBD1*00(35)     TRBJ1-3*00(498.5)       TRBC1*00(106.4) 347|362|384|0|15||150.0 12|19|36|15|22||35.0    20|42|70|23|45||220.0           TGTGCCAGCAGTTACGGGACAGTCTCTGGAAACACCATATATTTT   58  CASSYGTVSGNTIYF  :::::::::0:-2:15:15:0:-5:22:23:0:45:::
2       65077.0 0.01268342653067151     8582    0.013603242460123097    TGTGCCAGCAGTTACGTTGGGGGTGGCTACACCTTC    [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[    TRBV6-5*00(242.5)       TRBD1*00(25)TRBJ1-2*00(408.6)        TRBC1*00(144.3) 347|362|384|0|15||150.0 18|23|36|18|23||25.0    27|40|68|23|36||130.0           TGTGCCAGCAGTTACGTTGGGGGTGGCTACACCTTC    58      CASSYVGGGYTF    :::::::::0:-2:15:18:-6:-1:23:23:-7:36:::
6       49634.0 0.009673604997516015    4385    0.00695061969093915     TGTGCCAGCAGTGACTGGGGGGGGCAGGGAGCTTTCTTT [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ TRBV6-1*00(209.9)       TRBD1*00(31)TRBJ1-1*00(378.3)        TRBC1*00(117.1) 347|361|384|0|14||140.0 12|21|36|20|29|SA15G|31.0       30|40|68|29|39||100.0           TGTGCCAGCAGTGACTGGGGGGGGCAGGGAGCTTTCTTT 58      CASSDWGGQGAFF:::::::::0:-3:14:20:0:-3:29:29:-10:39:::

For the results from Alternative step 2, the first five line reads:

cloneId readCount       readFraction    targetSequences targetQualities allVHitsWithScore       allDHitsWithScore       allJHitsWithScore       allCHitsWithScore       allVAlignments  allDAlignments       allJAlignments  allCAlignments  nSeqCDR3        minQualCDR3     aaSeqCDR3       refPoints
0       180598.0        0.01654797835251231     TGCGCCAGCAGCCAAGATTCCGATCCCCAGGGGCTGTTTGCGGGAAACACCATATATTTT    [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[    TRBV4-3*01(572.3)    TRBD1*01(30)    TRBJ1-3*01(459.8)       TRBC1*01(50.5),TRBC1*02(50.5),TRBC1*03(50.5)    270|288|307|0|18||180.0 16|22|36|27|33||30.0    24|42|70|42|60||180.0   ;;      TGCGCCAGCAGCCAAGATTCCGATCCCCAGGGGCTGTTTGCGGGAAACACCATATATTTT 58      CASSQDSDPQGLFAGNTIYF    :::::::::0:1:18:27:-4:-2:33:42:-4:60:::
1       163591.0        0.014989647319825477    TGTGCCAGCAGCCAAGGGACAGGACGGTCTTCACCCCTCCACTTT   [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[   TRBV3-1*01(536.4),TRBV3-2*01(535.8)     TRBD1*01(35) TRBJ1-6*02(438.3)       TRBC1*01(142.1),TRBC1*02(142.1),TRBC1*03(142.1) 270|286|307|0|16||160.0;270|286|307|0|16||160.0 13|20|36|16|23||35.0    29|45|73|29|45||160.0   ;;      TGTGCCAGCAGCCAAGGGACAGGACGGTCTTCACCCCTCCACTTT        58      CASSQGTGRSSPLHF :::::::::0:-1:16:16:-1:-4:23:29:-9:45:::
2       127701.0        0.011701089622222697    TGTGCCAGCTCACCGACAGGGGAGGCCACTGAAGCTTTCTTT      [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[      TRBV18*01(532)  TRBD1*01(40)    TRBJ1-1*01(438.9)    TRBC1*01(153.4),TRBC1*02(153.4),TRBC1*03(153.4) 273|287|310|0|14||140.0 14|22|36|14|22||40.0    24|40|68|26|42||160.0   ;;      TGTGCCAGCTCACCGACAGGGGAGGCCACTGAAGCTTTCTTT      58  CASSPTGEATEAFF   :::::::::0:-3:14:14:-2:-2:22:26:-4:42:::
3       121440.0        0.011127401693978311    TGTGCCAGCAGTTACGGGACAGTCTCTGGAAACACCATATATTTT   [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[   TRBV6-5*01(185.2),TRBV6-2*01(183.6),TRBV6-3*01(183.6)        TRBD1*01(35)    TRBJ1-3*01(499) TRBC1*01(110.7),TRBC1*02(110.7),TRBC1*03(110.7) 270|285|307|0|15||150.0;270|285|307|0|15||150.0;270|285|307|0|15||150.0 12|19|36|15|22||35.020|42|70|23|45||220.0    ;;      TGTGCCAGCAGTTACGGGACAGTCTCTGGAAACACCATATATTTT   58      CASSYGTVSGNTIYF :::::::::0:-2:15:15:0:-5:22:23:0:45:::
4       109079.0        0.009994778074583828    TGTGCCAGCAGTTACGTTGGGGGTGGCTACACCTTC    [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[    TRBV6-5*01(197.9)       TRBD1*01(25),TRBD2*01(25)       TRBJ1-2*01(408.4)    TRBC1*01(147.9),TRBC1*02(147.9),TRBC1*03(147.9) 270|285|307|0|15||150.0 18|23|36|18|23||25.0;25|30|48|18|23||25.0       27|40|68|23|36||130.0   ;;      TGTGCCAGCAGTTACGTTGGGGGTGGCTACACCTTC 58      CASSYVGGGYTF    :::::::::0:-2:15:18:-6:-1:23:23:-7:36:::

I truly value your expertise and insight in this matter and I believe your perspective could be of great help.

Best,
Jianxiang

from mixcr.

jxshi avatar jxshi commented on July 18, 2024

Hi,

You are rigth – a 9 bp UMI is quite short. As such, we're seeing about half the reads being dropped due to multiple CDR3s being assigned to the same UMI. Considering the UMIs are attached to multiple V gene primers, a good way around might be to include a few nucleotides right after the UMI, potentially increasing diversity. I'd recommend giving this a go: --tag-pattern "^(UMI:N{15})(R1:*)" or maybe even longer to capture the difference between primers. If that's not cutting it, let me know, and we can tinker around with the parameters to try and save more reads. Although a non-UMI approach is also a good choice since MiXCR has very poverful error-correction algorithms even for data without barcodes.

As for the CDR3 discrepancy. In the paper they do include an extra amino acid from the FR4 (sourced from the J gene) within the CDR3. The reasoning behind this addition isn't entirely clear. While some researchers opt to exclude the initial and final amino acids from the CDR3 definition (e.i. IMGT), adding an extra one is a bit weird However, since this particular amino acid stems from the J gene – which both methods identify correctly – you can safely consider the clones equivalent.

For a quick comparison:

  • CASSQDSDPQGLFAGNTIYFG (from the paper)
  • CASSQDSDPQGLFAGNTIYF (MiXCR)

Check out this link, and you'll see that the terminal 'G' belongs to the FR4.

Thank you for your clarification of the "G" amino acid shown in the results of the manuscript.

I have both run the pipeline with set the first 15 bases as UMI and the first 25 bases as UMI. The results are slightly different. The results for the first 15 bases set as UMI is:

  Successfully aligned reads:                           97.51% [OK]
  Off target (non TCR/IG) reads:                        0.46%  [OK]
  Reads with no V or J hits:                            2.021% [OK]
  Reads with no barcode:                                0.0%   [OK]
  Alignments that do not cover CDR3:                    0.42%  [OK]
  Tag groups that do not cover CDR3:                    0.32%  [OK]
  Barcode collisions in clonotype assembly:             69.56% [ALERT]
  Unassigned alignments in clonotype assembly:          7.69%  [WARN]
  Reads used in clonotypes:                             85.67% [WARN]
  Alignments dropped due to low sequence quality:       6.13%  [OK]
  Clones dropped in post-filtering:                     0.0%   [OK]
  Alignments dropped in clones post-filtering:          0.0%   [OK]
  Reads dropped in tags error correction and filtering: 4.44%  [OK]
  UMIs artificial diversity eliminated:                 11.94% [OK]
  Reads dropped in UMI error correction and whitelist:  0.0%   [OK]
  Reads dropped in tags filtering:                      4.44%  [OK]

The results for the first 25 bp as UMI is:

  Successfully aligned reads:                           97.16% [OK]
  Off target (non TCR/IG) reads:                        1.66%  [OK]
  Reads with no V or J hits:                            1.17%  [OK]
  Reads with no barcode:                                0.0%   [OK]
  Alignments that do not cover CDR3:                    0.087% [OK]
  Tag groups that do not cover CDR3:                    0.041% [OK]
  Barcode collisions in clonotype assembly:             63.57% [ALERT]
  Unassigned alignments in clonotype assembly:          5.76%  [WARN]
  Reads used in clonotypes:                             85.98% [WARN]
  Alignments dropped due to low sequence quality:       7.88%  [OK]
  Clones dropped in post-filtering:                     0.0%   [OK]
  Alignments dropped in clones post-filtering:          0.0%   [OK]
  Reads dropped in tags error correction and filtering: 5.87%  [WARN]
  UMIs artificial diversity eliminated:                 12.21% [OK]
  Reads dropped in UMI error correction and whitelist:  0.0%   [OK]
  Reads dropped in tags filtering:                      5.87%  [WARN]

When the first 10 bases are ignored using the fore-mentioned Alternative step 2, the results shows:

  Successfully aligned reads:                     97.36% [OK]
  Off target (non TCR/IG) reads:                  0.32%  [OK]
  Reads with no V or J hits:                      2.31%  [OK]
  Reads used in clonotypes:                       95.62% [OK]
  Alignments that do not cover CDR3:              0.48%  [OK]
  Alignments dropped due to low sequence quality: 2.10%  [OK]
  Clones dropped in post-filtering:               0.0%   [OK]
  Alignments dropped in clones post-filtering:    0.0%   [OK]

Should I try to use longer bases to be used as UMI, or should I just ignore the first 10 bases?
Thank you very much!
Best,
Jianxiang

from mixcr.

jxshi avatar jxshi commented on July 18, 2024

Thank you for your clarification!
Best,
Jianxiang

from mixcr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.