This is the development page of the assigner package for the R software.
The name assigner |əˈsʌɪn| is rooted in the latin word assignare. It's first use in french dates back to XIIIe.
Genomic datasets produced by next-generation sequencing techniques that reduce the size of the genome (e.g. genotype-by-sequencing (GBS) and restriction-site-associated DNA sequencing (RADseq)) have a huge numbers of markers that hold great potential and promises for assignment analysis. After hitting the bioinformatic wall with the different workflows you'll likely end up with several folders containing whitelist and blacklist of markers and individuals, data sets with various de novo and/or filtering parameters and ... missing data. This reality of GBS/RADseq data is quite hard on GUI software traditionally used for population assignment analysis. The end results are usually poor data exploration, constrained by time, and poor reproducibility.
assigner was tailored to make it easy to conduct population assignment analysis using GBS/RADseq data within R. Additionally, combining the use of tools like R Notebook, RStudio and GitHub will make effortless documenting your workflows and pipelines.
The keywords here to remember: 3 differents algorithms implemented with frequentist, likelihood and the latest machine learning methods, marker selection (with very Fst WC84), cross-validation techniques (classic Leave-One-Out and Training, Holdout, Leave-one-out), resampling/bootstrap/subsampling, imputations, filters, ggplot2-based plotting, fast!
To try out the dev version of assigner:
if (!require("devtools")) install.packages("devtools") # to install
devtools::install_github("thierrygosselin/assigner", build_vignettes = TRUE) # to install WITH vignettes
assigner::install_gsi_sim(fromSource = TRUE) # for LINUX and macOS
assigner::install_gsi_sim() # for PC
library(assigner)
Notes
- Problems during installation: see this Installation problems.
- Windows users: Install Rtools.
- I recommend using RStudio to run assigner. The R GUI is unstable with functions using parallel.
- Optimize speed by enabling parallel computing with OpenMP inside R randomForestSRC (e.g. to do imputations in parallel). Follow the steps in this notebook vignette. You don't need to do this when updating assigner.