edwardslee / alphabetr Goto Github PK

R package for obtaining TCR alpha-beta sequence pairs

R 84.30% C++ 15.70%

alphabetr's Introduction

alphabetr

alphabetr implements the ALPHABETR (algorithm for pairing alpha-beta T cell receptors) algorithms for obtaining TCR sequence pairs. This approach determines CDR3A/CDR3B pairs from high-throughput sequencing data from repeated samples of antigen-specific T cell populations.

With alphabetr, you can

Determine CDR3A/CDR3B pairs
Determine dual TCR-alpha clones and clones that share CDR3A or CDR3B sequences
Estimate clonal frequencies

You can install

the latest version released on CRAN with

install.packages("alphabetr")

the latest development version found on github with

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
devtools::install_github("edwardslee/alphabetr")

If you encounter any bugs, please file an issue.

More information

Please see our paper in PLOS Computational Biology for more details about the algorithm and how it was tested (DOI: 10.1371/journal.pcbi.1005313).

alphabetr's People

Contributors

Stargazers

Watchers

Forkers

mqondisi oldmoonlake wangdi2014

alphabetr's Issues

Error in app[, 1] : incorrect number of dimensions

Hello,

I'm trying to run your bagpipe function on some data generated using the read_alphabetr function. However, with the following inputs I receive the error in the title of this issue.

dat <- read_alphabetr(data = "LH136_139_alphabetr_input.csv")
> dat
$alpha
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,]    1    1    1    0    0    0    0    0    0     0     0     0     0     0     0     0     0     0
[2,]    0    0    0    1    1    1    1    1    1     1     1     1     1     1     1     1     0     0
[3,]    0    0    0    0    0    0    0    0    0     0     1     0     0     0     0     0     1     1
[4,]    0    0    1    0    0    0    1    0    0     0     1     0     0     1     0     0     0     0
     [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
[4,]     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0
     [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     0     0     0     0     0     0     0     0     0     0     0     0
[4,]     0     1     0     0     1     1     1     1     1     1     1     1     1     1     1     1
     [,51] [,52]
[1,]     0     0
[2,]     0     0
[3,]     0     0
[4,]     1     1

$beta
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,]    1    1    1    1    1    0    0    0    0     0     0     0     0     0     0     0     0     0
[2,]    0    0    0    0    0    1    1    1    1     1     1     1     1     1     1     1     1     1
[3,]    0    0    0    0    0    0    0    0    0     0     0     0     1     0     0     0     0     0
[4,]    0    0    0    0    1    0    0    0    0     0     0     0     0     0     0     0     0     0
     [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
[3,]     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0     0     1     0     0     0     0
     [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     1     1     1     1     1     1     1     1     1     1     1     1     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0     0     0     0     1     1     1     1
[4,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
     [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
[4,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
     [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82]
[1,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
[3,]     1     1     1     1     1     1     1     1     1     1     1     1     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0     0     0     0     1     1     1     1
     [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90] [,91]
[1,]     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0
[4,]     1     1     1     1     1     1     1     1     1

$alpha_lib
 [1] "CLVGADP_YGQNFVF"    "CIVSFRNNAGNMLTF"    "CVLTTDSWGKLQF"      "CAASLTGRRALTF"     
 [5] "CAAGAHRG_GGSYIPTF"  "CALRRNNNAGNMLTS"    "CATGNGH_GRRALTF"    "CTTGNGH_GRRALTF"   
 [9] "CVVNAMDSSYKLIF"     "CAADNAGNNRKLIW"     "CVVNSGMDTGRRALTF"   "CAYRGYGGSQGNLIF"   
[13] "CLGE*II_AGNMLTF"    "CVVNRNQAGTALIF"     "CIVRSTGTASKLTF"     "CAGTGGFGNVLHC"     
[17] "CATDHYNQGGKLIF"     "CAGAVETSGSRLTF"     "CAVSPYGGSQGNLIF"    "CVVHRNAGNNRKLIW"   
[21] "CAVRALSGGYNKLIF"    "CAVNDTGNNRKLIW"     "CAADGGATNKLIF"      "CATNTGTASKLTF"     
[25] "CALTGGGNKLTF"       "CAVGDYKLSF"         "CAVEDQGIM_GATNKLIF" "CAVGDRQR_STLGRLYF" 
[29] "CAVDAGDTGRRALTF"    "CAARVSGEYGNKLVF"    "CAVSPPG_QDYKLSF"    "CAVTTFTGGLKTIF"    
[33] "CAVSGGNQGGKLIF"     "CAVEDPNDYKLSF"      "CAVHSGTYKYIF"       "CAAGTGRRALTF"      
[37] "CATPQGEKLTF"        "CAVGDHKLSF"         "CAAKKGDYKLSF"       "CAASASDWGKLQF"     
[41] "CAVSLTGTASKLTF"     "CAVEDRSGNTPLVF"     "CAASNHDMRF"         "CAAPNQAGTALIF"     
[45] "CLGWNTNAGKSTF"      "CIVRRPFTGGGNKLTF"   "CVVNTETDSSYKLIF"    "CAVSEMNQAGTALIF"   
[49] "CAGHGGHNAGNMLTF"    "YAASASDWGKLQF"      "CAGSSNTGKLIF"       "CAAKGSGDMRF"       

$beta_lib
 [1] "CASRPEGQGNTEASF"     "CASRPTSGDSTDTQYF"    "CASSYRGPYNSPLHF"     "CASSPPYNEQFF"       
 [5] "CASSETVVEQFF"        "CASSLRGPRNTYNEQFF"   "CSAGEGQTTEAFF"       "CASSGDGTRASGELFF"   
 [9] "CASSLTPGLARNEQFF"    "CASSWDRGREQFF"       "CASSNRDREYV"         "CASMGMDPANEQFF"     
[13] "CSVDPTGENYGYTF"      "CASSGDGT_GRPGSCFF"   "CASSLRGQ_NTYNEQFF"   "CASSFQNGGRSDEQFF"   
[17] "CASSMWDRGIGYEQYF"    "CASSLTPGPARNEQFF"    "CASSRLAGGITNTDTQYF"  "CASSLRGPR_HTYNEQFF" 
[21] "CASSLAQGGSYNSPLHF"   "CASSPPGP_GRDNEQFF"   "CASSGDGG_RASGELFF"   "CASSGDGD_RASGELFF"  
[25] "CASSQDLAGSYNEQFF"    "CGSSFVAGGLPNEQFF"    "CSARDRAGGTTGELFF"    "CASSGDGL_ASGELFF"   
[29] "CASRRDRAVNTEAFF"     "CASSEDRAYSHEQFF"     "CASSLALVAYNEQFF"     "CSVGGGTAENTEAFF"    
[33] "CPGPPGC_STDTQYF"     "CAIRGPAGKNEQFF"      "CASTGGGSYNEQFF"      "CASSTTGTASERFF"     
[37] "CASSLEENTEAFF"       "CASSSSAGSPLHF"       "CASSYPNTGELFF"       "CASSRLVPYEQYV"      
[41] "CSAGEEQTTEAFF"       "CASSGT_GREQFF"       "CSAGEG_DTEAFF"       "CASRAPPWGYTF"       
[45] "CANSGGENEQYV"        "CASSSRDREYV"         "CASSLQTGGPYEQYF"     "CSVARDRGVNEQFF"     
[49] "CASSSGLAGRNEQFF"     "CASSLAQVNTGELFF"     "CASGEGGAANEKLFF"     "CASSPDNLRTDTQYF"    
[53] "CASSYSLAGGPYEQFF"    "CASSGEGQVNEKLFF"     "CASSLVGETYNEQFF"     "CASSPITTDTQYF"      
[57] "CASSATASGGRETQYF"    "CASSEGLGTSGFEQFF"    "CASSPGTGNSNQPQHF"    "CASSLSLGVGQPQHF"    
[61] "CDSTPDRGNTEAFF"      "CASSWASSYEQYF"       "CASSYSLA_GGPYEQFF"   "CASSWTGTTNTGELSF"   
[65] "CASRPSGSTYNEQFF"     "CASSQGTGRNEKLFF"     "CASSSGLVGRNEQFF"     "CASRFDYTGDNEQFF"    
[69] "CASSPAGGTFYEQYF"     "CASSQQGTPYYGYTF"     "CASTPDRGNTEAFF"      "CASSTGGSYNEQFF"     
[73] "CASSFNRGTYEQYF"      "CASSLVGDTGEQHF"      "*ASSFNRGTYEQYF"      "CASSSGSNYGYTF"      
[77] "CASSFPGANVLTF"       "CASSQVLNTEAFF"       "CSARKVASGGSYYNEQFF"  "CASRELASAETQYF"     
[81] "CASSLGTEGNQPQHF"     "CASSPRGGEKLFF"       "CASSQLRTSGGLFYNEQFF" "CASSYTMRT_RGMYEQYF" 
[85] "CASSLVYPGANTDTQYF"   "CASSQIDHSTNQPQHF"    "CAISGSGSYNEQFF"      "CASSITGGGTEAFF"     
[89] "CASRGGVSSYEQYF"      "CASSGGQVNTEAFF"      "CASSLRGLDTQYF"      

> data_alpha <-dat$alpha
> data_beta<-dat$beta
> pairs <- bagpipe(alpha = data_alpha, beta = data_beta, rep = 5)
Error in app[, 1] : incorrect number of dimensions

Here are the dimensions:

> dim(data_alpha)
[1]  4 52
> dim(data_beta)
[1]  4 91

I can successfully run the bagpipe function after following your steps (from here https://cran.r-project.org/web/packages/alphabetr/vignettes/alphabetr-vignette.html) to create data using the create data function.

> pairs <- bagpipe(alpha = data_alpha_fake, beta = data_beta_fake, rep = 5)
> head(pairs)
     beta1 beta2 alpha1 alpha2 prop_replicates
[1,]     1     1    272    272             1.0
[2,]     3     3    935    935             1.0
[3,]     4     4    351    351             0.2
[4,]     4     4    935    935             0.6
[5,]     4     4   1118   1118             0.2
[6,]     5     5    714    714             1.0

Any suggestions? Thank you.

converting indices to CDR3s

Should include a function that converts alpha and beta indices back to their CDR3s

“Zero” columns in “create_data”-output

I’m interested to use “alphabetr” for my ongoing project, however I’m stacked about structure of artificial data generated by the package. When I tried to reproduce the full pipeline, I got “zero” columns (columns with zero column sum) for both alpha and beta data frames after “create_clones” with the following “create_data” steps. Could you, please, explain, what these columns mean and whether they can influence on further pairing using the synthetic data? I tried to figure it out on my own: I converted the synthetic data to 3-column csv file (colnames were used as clone ID). During the conversion to csv each clone which wasn’t detected (zero sum in corresponding column) was deleted. Then the csv was imported back using “read_alphabetr” and “bagpipe” was applied with rep=5. Using both data sets (“normal synthetic data” and csv-imported ones created from “normal synthetic data”) I run “bagpipe” several times for both. Surprisingly, I got different number of pairs (row number in bagpipe output) on average for these two data sets. It seems, that presence of the “zero columns” can influence on the result of pairing. But what they are? If not, why did I get different pairing results from the same data in different formats?

`share_alpha <- c(.816, .085, .021, .007, .033, .005, .033)
share_beta <- c(.859, .076, .037, .019, .009)

set.seed(271)   
TCR_pairings_synt <- create_clones(numb_beta = 4052,
                                   dual_beta = 0.06,
                                   dual_alpha = 0.3,
                                   alpha_sharing = share_alpha,
                                   beta_sharing = share_beta)
TCR_clones_synt <- TCR_pairings_synt$TCR #кол-во клонов 4753

number_plates <- 1      
err_drop <- c(0.15, .01)  
err_seq  <- c(0.02, .005) 
err_mode <- c("constant", "constant") 
number_skewed <- 50      
pct_top <- 0.5          
dis_behavior <- "linear"  

#100 cells per well
numb_cells <- matrix(c(100,
                       96), ncol = 2)

# Creating the data sets
data_tcr_synt <- create_data(TCR = TCR_clones_synt,
                             plates = number_plates,
                             error_drop = err_drop,
                             error_seq = err_seq,
                             error_mode = err_mode,
                             skewed = number_skewed,
                             prop_top = pct_top,
                             dist = dis_behavior,
                             numb_cells = numb_cells)

# Saving the data for alpha chains and data for beta chains
data_alpha <- data_tcr_synt$alpha
data_beta <- data_tcr_synt$beta


pairs <- bagpipe(alpha = data_alpha, beta = data_beta, rep = 5)

###converting the synthetic data to 3-column csv file
data_alpha_real <- t(data_alpha) #transform columns to rows
data_alpha_real <- data_alpha_real[rowSums(data_alpha_real)!=0,] #deleted "zero" rows

data_beta_real <- t(data_beta) #transform columns to rows
data_beta_real <- data_beta_real[rowSums(data_beta_real)!=0,] #deleted "zero" rows

#in which rows chain appeared:
alpha_well <- apply(X = data_alpha_real, MARGIN = 1, FUN = function(x) which(x == 1))
beta_well <- apply(X = data_beta_real, MARGIN = 1, FUN = function(x) which(x == 1))

#lists of data frames: every df - one alpha/beta chain (index) and vector of wells
wells_a <- list()
for (i in 1:length(alpha_well))
{wells_a[[i]] <- data.frame(c(i), c(alpha_well[[i]]))}


wells_b <- list()
for (i in 1:length(beta_well))
{wells_b[[i]] <- data.frame(c(i), c(beta_well[[i]]))}

#from list to 3-column df:
a_all_wells <- do.call(what = rbind, args = wells_a)
a_all_wells <- rename(a_all_wells, c("c.i."="well", "c.alpha_well..i..."="cdr3"))
a_all_wells$chain <- c("TCRA")
a_all_wells <- a_all_wells[,c(3,1,2)]

b_all_wells <- do.call(what = rbind, args = wells_b)
b_all_wells <- rename(b_all_wells, c("c.i."="well", "c.beta_well..i..."="cdr3"))
b_all_wells$chain <- c("TCRB")
b_all_wells <- b_all_wells[,c(3,1,2)]

all_wells <- rbind(a_all_wells, b_all_wells)
write.csv(x = all_wells, file = "alphabetr_data.csv", row.names = F)
dat <- read_alphabetr(data = "alphabetr_data.csv")

pairs_real <- bagpipe(alpha = dat$alpha, beta = dat$beta, rep = 5)
#> pairs_real <- bagpipe(alpha = dat$alpha, beta = dat$beta, rep = 5)
#> nrow(pairs_real)
#[1] 1127 [1] 1142 [1] 1117 [1] 1121  mean=1126.75

#> pairs <- bagpipe(alpha = data_alpha, beta = data_beta, rep = 5)
#[1] 1445 [1] 1399[1] 1453 [1] 1403 mean=1425

bug in read_alphabetr function

When data_alpha and data_beta arguments are not null in the read_alphabetr function (i.e. two separate files are used as input) the respective data frames are supposed to have two columns, but multiple time in the succeeding code colum 3 is called.

Tldr; the indexing is wrong in read_alphabetr function for two file input

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.