Giter VIP home page Giter VIP logo

alphabetr's Introduction

alphabetr

alphabetr implements the ALPHABETR (algorithm for pairing alpha-beta T cell receptors) algorithms for obtaining TCR sequence pairs. This approach determines CDR3A/CDR3B pairs from high-throughput sequencing data from repeated samples of antigen-specific T cell populations.

With alphabetr, you can

  • Determine CDR3A/CDR3B pairs
  • Determine dual TCR-alpha clones and clones that share CDR3A or CDR3B sequences
  • Estimate clonal frequencies

You can install

  • the latest version released on CRAN with
install.packages("alphabetr")
  • the latest development version found on github with
if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
devtools::install_github("edwardslee/alphabetr")

If you encounter any bugs, please file an issue.

More information

Please see our paper in PLOS Computational Biology for more details about the algorithm and how it was tested (DOI: 10.1371/journal.pcbi.1005313).

alphabetr's People

Contributors

edwardslee avatar

Stargazers

Zaki Molvi avatar Jasim K.B. avatar B. Arman Aksoy avatar Jeff Hammerbacher avatar Michał Burdukiewicz avatar Mikhail Shugay avatar Sushanth Bhaskarabhatla avatar Thomas Sandmann avatar

Watchers

Sushanth Bhaskarabhatla avatar  avatar

alphabetr's Issues

Error in app[, 1] : incorrect number of dimensions

Hello,

I'm trying to run your bagpipe function on some data generated using the read_alphabetr function. However, with the following inputs I receive the error in the title of this issue.

dat <- read_alphabetr(data = "LH136_139_alphabetr_input.csv")
> dat
$alpha
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,]    1    1    1    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0
[2,]    0    0    0    1    1    1    1    1    1     1     1     1     1     1     1     1     0     0
[3,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     1     1
[4,]    0    0    1    0    0    0    1    0    0     0     1     0     0     1     0     0     0     0
     [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
[4,]     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0
     [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     0     0     0     0     0     0     0     0     0     0     0     0
[4,]     0     1     0     0     1     1     1     1     1     1     1     1     1     1     1     1
     [,51] [,52]
[1,]     0     0
[2,]     0     0
[3,]     0     0
[4,]     1     1

$beta
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,]    1    1    1    1    1    0    0    0    0     0     0     0     0     0     0     0     0     0
[2,]    0    0    0    0    0    1    1    1    1     1     1     1     1     1     1     1     1     1
[3,]    0    0    0    0    0    0    0    0    0     0     0     0     1     0     0     0     0     0
[4,]    0    0    0    0    1    0    0    0    0     0     0     0     0     0     0     0     0     0
     [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
[3,]     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0
     [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     1     1     1     1     1     1     1     1     1     1     1     1     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0     0     0     0     1     1     1     1
[4,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
     [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
[4,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
     [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     1     1     1     1     1     1     1     1     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0     0     0     1     1     1     1
     [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90] [,91]
[1,]     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0
[4,]     1     1     1     1     1     1     1     1     1

$alpha_lib
 [1] "CLVGADP_YGQNFVF"    "CIVSFRNNAGNMLTF"    "CVLTTDSWGKLQF"      "CAASLTGRRALTF"     
 [5] "CAAGAHRG_GGSYIPTF"  "CALRRNNNAGNMLTS"    "CATGNGH_GRRALTF"    "CTTGNGH_GRRALTF"   
 [9] "CVVNAMDSSYKLIF"     "CAADNAGNNRKLIW"     "CVVNSGMDTGRRALTF"   "CAYRGYGGSQGNLIF"   
[13] "CLGE*II_AGNMLTF"    "CVVNRNQAGTALIF"     "CIVRSTGTASKLTF"     "CAGTGGFGNVLHC"     
[17] "CATDHYNQGGKLIF"     "CAGAVETSGSRLTF"     "CAVSPYGGSQGNLIF"    "CVVHRNAGNNRKLIW"   
[21] "CAVRALSGGYNKLIF"    "CAVNDTGNNRKLIW"     "CAADGGATNKLIF"      "CATNTGTASKLTF"     
[25] "CALTGGGNKLTF"       "CAVGDYKLSF"         "CAVEDQGIM_GATNKLIF" "CAVGDRQR_STLGRLYF" 
[29] "CAVDAGDTGRRALTF"    "CAARVSGEYGNKLVF"    "CAVSPPG_QDYKLSF"    "CAVTTFTGGLKTIF"    
[33] "CAVSGGNQGGKLIF"     "CAVEDPNDYKLSF"      "CAVHSGTYKYIF"       "CAAGTGRRALTF"      
[37] "CATPQGEKLTF"        "CAVGDHKLSF"         "CAAKKGDYKLSF"       "CAASASDWGKLQF"     
[41] "CAVSLTGTASKLTF"     "CAVEDRSGNTPLVF"     "CAASNHDMRF"         "CAAPNQAGTALIF"     
[45] "CLGWNTNAGKSTF"      "CIVRRPFTGGGNKLTF"   "CVVNTETDSSYKLIF"    "CAVSEMNQAGTALIF"   
[49] "CAGHGGHNAGNMLTF"    "YAASASDWGKLQF"      "CAGSSNTGKLIF"       "CAAKGSGDMRF"       

$beta_lib
 [1] "CASRPEGQGNTEASF"     "CASRPTSGDSTDTQYF"    "CASSYRGPYNSPLHF"     "CASSPPYNEQFF"       
 [5] "CASSETVVEQFF"        "CASSLRGPRNTYNEQFF"   "CSAGEGQTTEAFF"       "CASSGDGTRASGELFF"   
 [9] "CASSLTPGLARNEQFF"    "CASSWDRGREQFF"       "CASSNRDREYV"         "CASMGMDPANEQFF"     
[13] "CSVDPTGENYGYTF"      "CASSGDGT_GRPGSCFF"   "CASSLRGQ_NTYNEQFF"   "CASSFQNGGRSDEQFF"   
[17] "CASSMWDRGIGYEQYF"    "CASSLTPGPARNEQFF"    "CASSRLAGGITNTDTQYF"  "CASSLRGPR_HTYNEQFF" 
[21] "CASSLAQGGSYNSPLHF"   "CASSPPGP_GRDNEQFF"   "CASSGDGG_RASGELFF"   "CASSGDGD_RASGELFF"  
[25] "CASSQDLAGSYNEQFF"    "CGSSFVAGGLPNEQFF"    "CSARDRAGGTTGELFF"    "CASSGDGL_ASGELFF"   
[29] "CASRRDRAVNTEAFF"     "CASSEDRAYSHEQFF"     "CASSLALVAYNEQFF"     "CSVGGGTAENTEAFF"    
[33] "CPGPPGC_STDTQYF"     "CAIRGPAGKNEQFF"      "CASTGGGSYNEQFF"      "CASSTTGTASERFF"     
[37] "CASSLEENTEAFF"       "CASSSSAGSPLHF"       "CASSYPNTGELFF"       "CASSRLVPYEQYV"      
[41] "CSAGEEQTTEAFF"       "CASSGT_GREQFF"       "CSAGEG_DTEAFF"       "CASRAPPWGYTF"       
[45] "CANSGGENEQYV"        "CASSSRDREYV"         "CASSLQTGGPYEQYF"     "CSVARDRGVNEQFF"     
[49] "CASSSGLAGRNEQFF"     "CASSLAQVNTGELFF"     "CASGEGGAANEKLFF"     "CASSPDNLRTDTQYF"    
[53] "CASSYSLAGGPYEQFF"    "CASSGEGQVNEKLFF"     "CASSLVGETYNEQFF"     "CASSPITTDTQYF"      
[57] "CASSATASGGRETQYF"    "CASSEGLGTSGFEQFF"    "CASSPGTGNSNQPQHF"    "CASSLSLGVGQPQHF"    
[61] "CDSTPDRGNTEAFF"      "CASSWASSYEQYF"       "CASSYSLA_GGPYEQFF"   "CASSWTGTTNTGELSF"   
[65] "CASRPSGSTYNEQFF"     "CASSQGTGRNEKLFF"     "CASSSGLVGRNEQFF"     "CASRFDYTGDNEQFF"    
[69] "CASSPAGGTFYEQYF"     "CASSQQGTPYYGYTF"     "CASTPDRGNTEAFF"      "CASSTGGSYNEQFF"     
[73] "CASSFNRGTYEQYF"      "CASSLVGDTGEQHF"      "*ASSFNRGTYEQYF"      "CASSSGSNYGYTF"      
[77] "CASSFPGANVLTF"       "CASSQVLNTEAFF"       "CSARKVASGGSYYNEQFF"  "CASRELASAETQYF"     
[81] "CASSLGTEGNQPQHF"     "CASSPRGGEKLFF"       "CASSQLRTSGGLFYNEQFF" "CASSYTMRT_RGMYEQYF" 
[85] "CASSLVYPGANTDTQYF"   "CASSQIDHSTNQPQHF"    "CAISGSGSYNEQFF"      "CASSITGGGTEAFF"     
[89] "CASRGGVSSYEQYF"      "CASSGGQVNTEAFF"      "CASSLRGLDTQYF"      

> data_alpha <-dat$alpha
> data_beta<-dat$beta
> pairs <- bagpipe(alpha = data_alpha, beta = data_beta, rep = 5)
Error in app[, 1] : incorrect number of dimensions

Here are the dimensions:

> dim(data_alpha)
[1]  4 52
> dim(data_beta)
[1]  4 91

I can successfully run the bagpipe function after following your steps (from here https://cran.r-project.org/web/packages/alphabetr/vignettes/alphabetr-vignette.html) to create data using the create data function.

> pairs <- bagpipe(alpha = data_alpha_fake, beta = data_beta_fake, rep = 5)
> head(pairs)
     beta1 beta2 alpha1 alpha2 prop_replicates
[1,]     1     1    272    272             1.0
[2,]     3     3    935    935             1.0
[3,]     4     4    351    351             0.2
[4,]     4     4    935    935             0.6
[5,]     4     4   1118   1118             0.2
[6,]     5     5    714    714             1.0

Any suggestions? Thank you.

“Zero” columns in “create_data”-output

I’m interested to use “alphabetr” for my ongoing project, however I’m stacked about structure of artificial data generated by the package. When I tried to reproduce the full pipeline, I got “zero” columns (columns with zero column sum) for both alpha and beta data frames after “create_clones” with the following “create_data” steps. Could you, please, explain, what these columns mean and whether they can influence on further pairing using the synthetic data? I tried to figure it out on my own: I converted the synthetic data to 3-column csv file (colnames were used as clone ID). During the conversion to csv each clone which wasn’t detected (zero sum in corresponding column) was deleted. Then the csv was imported back using “read_alphabetr” and “bagpipe” was applied with rep=5. Using both data sets (“normal synthetic data” and csv-imported ones created from “normal synthetic data”) I run “bagpipe” several times for both. Surprisingly, I got different number of pairs (row number in bagpipe output) on average for these two data sets. It seems, that presence of the “zero columns” can influence on the result of pairing. But what they are? If not, why did I get different pairing results from the same data in different formats?

`share_alpha <- c(.816, .085, .021, .007, .033, .005, .033)
share_beta <- c(.859, .076, .037, .019, .009)

set.seed(271)   
TCR_pairings_synt <- create_clones(numb_beta = 4052,
                                   dual_beta = 0.06,
                                   dual_alpha = 0.3,
                                   alpha_sharing = share_alpha,
                                   beta_sharing = share_beta)
TCR_clones_synt <- TCR_pairings_synt$TCR #кол-во клонов 4753

number_plates <- 1      
err_drop <- c(0.15, .01)  
err_seq  <- c(0.02, .005) 
err_mode <- c("constant", "constant") 
number_skewed <- 50      
pct_top <- 0.5          
dis_behavior <- "linear"  

#100 cells per well
numb_cells <- matrix(c(100,
                       96), ncol = 2)

# Creating the data sets
data_tcr_synt <- create_data(TCR = TCR_clones_synt,
                             plates = number_plates,
                             error_drop = err_drop,
                             error_seq = err_seq,
                             error_mode = err_mode,
                             skewed = number_skewed,
                             prop_top = pct_top,
                             dist = dis_behavior,
                             numb_cells = numb_cells)

# Saving the data for alpha chains and data for beta chains
data_alpha <- data_tcr_synt$alpha
data_beta <- data_tcr_synt$beta


pairs <- bagpipe(alpha = data_alpha, beta = data_beta, rep = 5)

###converting the synthetic data to 3-column csv file
data_alpha_real <- t(data_alpha) #transform columns to rows
data_alpha_real <- data_alpha_real[rowSums(data_alpha_real)!=0,] #deleted "zero" rows

data_beta_real <- t(data_beta) #transform columns to rows
data_beta_real <- data_beta_real[rowSums(data_beta_real)!=0,] #deleted "zero" rows

#in which rows chain appeared:
alpha_well <- apply(X = data_alpha_real, MARGIN = 1, FUN = function(x) which(x == 1))
beta_well <- apply(X = data_beta_real, MARGIN = 1, FUN = function(x) which(x == 1))

#lists of data frames: every df - one alpha/beta chain (index) and vector of wells
wells_a <- list()
for (i in 1:length(alpha_well))
{wells_a[[i]] <- data.frame(c(i), c(alpha_well[[i]]))}


wells_b <- list()
for (i in 1:length(beta_well))
{wells_b[[i]] <- data.frame(c(i), c(beta_well[[i]]))}

#from list to 3-column df:
a_all_wells <- do.call(what = rbind, args = wells_a)
a_all_wells <- rename(a_all_wells, c("c.i."="well", "c.alpha_well..i..."="cdr3"))
a_all_wells$chain <- c("TCRA")
a_all_wells <- a_all_wells[,c(3,1,2)]

b_all_wells <- do.call(what = rbind, args = wells_b)
b_all_wells <- rename(b_all_wells, c("c.i."="well", "c.beta_well..i..."="cdr3"))
b_all_wells$chain <- c("TCRB")
b_all_wells <- b_all_wells[,c(3,1,2)]

all_wells <- rbind(a_all_wells, b_all_wells)
write.csv(x = all_wells, file = "alphabetr_data.csv", row.names = F)
dat <- read_alphabetr(data = "alphabetr_data.csv")

pairs_real <- bagpipe(alpha = dat$alpha, beta = dat$beta, rep = 5)
#> pairs_real <- bagpipe(alpha = dat$alpha, beta = dat$beta, rep = 5)
#> nrow(pairs_real)
#[1] 1127 [1] 1142 [1] 1117 [1] 1121  mean=1126.75

#> pairs <- bagpipe(alpha = data_alpha, beta = data_beta, rep = 5)
#[1] 1445 [1] 1399[1] 1453 [1] 1403 mean=1425

`

bug in read_alphabetr function

When data_alpha and data_beta arguments are not null in the read_alphabetr function (i.e. two separate files are used as input) the respective data frames are supposed to have two columns, but multiple time in the succeeding code colum 3 is called.

Tldr; the indexing is wrong in read_alphabetr function for two file input

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.