ericarcher / stratag Goto Github PK

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.

R 18.00% C++ 0.65% HTML 81.35%

population-structure locus-summaries population-genetics population-genomics genetics snps snp-data dna-sequences microsatellites

stratag's Introduction

strataG

Description

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure. One can select select specific individuals, loci, or strata using standard R '[' indexing methods. . The package contains functions for summarizing haploid and diploid loci (e.g., allelic richness, heterozygosity, haplotypic diversity, etc.), and haploid sequences by locus and by strata as well as functions for computing by-site base frequencies and identifying variable and fixed sites among strata. There are both overall and pairwise standard tests of population structure like PHIst, Fst, Gst, and Jost's D. If individuals are stratified according to multiple schemes, these stratifications can be changed with the stratify() function and summaries or tests can be re-run on the new object. The package also includes wrappers for several external programs like fastsimcoal2, STRUCTURE, and mafft. There are also multiple conversion functions for data objects for other population packages such as adegenet, pegas, and phangorn.

Installation

To install the stable version using install.packages requires an extra repo to be made available to the install.packages function prior to install as the strataG is not available via CRAN:

options(repos = c(
            zkamvar = 'https://zkamvar.r-universe.dev',
            CRAN = 'https://cloud.r-project.org'))

install.packages('strataG')

NB! Make sure that you have installed the development version of the dependency sprex prior to installing strataG

To install the latest version from GitHub including the development version of sprex:

# make sure you have Rtools installed
if (!require('devtools')) install.packages('devtools')
# install sprex development version
devtools::install_github("ericarcher/sprex")
# install strataG latest version
devtools::install_github('ericarcher/strataG', build_vignettes = TRUE)

Vignettes

Vignettes are available on several topics:

Creating and manipulating gtypes ("gtypes")
Genotype and sequence summaries ("summaries")
Working with sequences ("sequences")
Tests of population structure ("population.structure")
Installing external programs ("external.programs")

To see the list of all available vignettes:

browseVignettes("strataG")

To open a specific vignette:

vignette("gtypes", "strataG")

There is also a tutorial detailing running fastsimcoal2 through strataG available through the function fscTutorial().

Citation

The paper can be obtained here, and is cited as (preferred):

Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016), strataG: An R package for manipulating, summarizing and analysing population genetic data. Mol Ecol Resour. doi:10.1111/1755-0998.12559

If desired, the current release version of the package can be cited as:

Archer, F. 2016. strataG: An R package for manipulating, summarizing and analysing population genetic data. R package version 1.0.6. Zenodo. http://doi.org/10.5281/zenodo.60416

Contact

submit suggestions and bug-reports: https://github.com/ericarcher/strataG/issues
send a pull request: https://github.com/ericarcher/strataG/
e-mail: [email protected]

version 2.5.01 (devel)

removed melt from structurePlot
fixed ldNe error when one individual is present
fixed mafft error and now have mafft .fasta files written to temporary file rather than working directory
fixed error with readGenData() not recognizing NAs.
fixed error with fs2gtypes() not formatting multi-block DNA sequence data as gtypes properly

version 2.4.905

Deleted functions: alleleFreqFormat, as.array.gtypes
Changed structure of gtypes object, making it no longer compatible with previous versions
Fixed and enhanced arlequinRead() so that it will read and parse all .arp files. Added arp2gtypes() to create gtypes object from parsed .arp files.
Improved performance of several standard summary functions, most notably dupGenotypes().
Full rework of fastsimcoal2 wrapper.
Removed strataGUI().

version 2.1

fixed error in ldNe when missing data are present
added STANDARD marker type to fastsimcoal
added na.rm = TRUE to calculation of mean locus summaries by strata in summary.gtypes. This avoids NaNs when there is a locus with genotypes missing for all samples.
explicitly convert x to a data.frame in df2gtypes in case it is a data.table or tibble.

version 2.0.2

NOTE: In order to speed up indexing the data in large data sets, this version changes the underlying structure of the gtypes object by replacing the @loci data.frame slot with a @data data.table slot. The data.table has a id character column, a strata character column, and every column afterwards represents one locus. The @strata slot has been removed.
The loci accessor has been removed.
Added as.array which returns a 3-dimensional array with dimensions of [id, locus, allele].
The print (show) function for gtypes objects no longer shows a by-locus summary. The display was getting too slow for data sets with a large number of loci.
The summary function now includes by-sample results.
Fixed computational errors in population structure metrics due to incorrect sorting of stratification.
Added maf to return minimum allele frequency for each locus.
Added ldNe to calculate Ne.
Added expandHaplotypes to expand the haplotypes in a gtypes object to one sequence per individual.

version 1.0.6

Added read.arlequin back. Fixed missing function error with write.arlequin.
Added summarizeSamples
Changed evanno from base graphics to ggplot2
Updated logic in labelHaplotypes to assign haplotypes if possible alternative site combinations match a present haplotype
Added Zenodo DOI
Added shiny app (strataGUI) for creating gtypes objects, QA/QC, and population structure analyses
Added type argument to structurePlot to select between area and bar charts
Changed haplotypeLikelihoods to sequenceLikelihoods
neiDa now creates haplotypes before calculating metric
Fixed error in writePhase that was creating improper input files for PHASE

version 1.0.5

Fixed error in dupGenotypes, propSharedLoci, and propSharedIDs where missing genotypes were not being properly counted.
Added as.data.frame.gtypes.
Removed gtypes2df.
Added arguments to as.matrix.gtypes to include id and strata columns in output.
Removed the jmodeltest function as this functionality is available in the modeltest function in the phangorn package.
Added conversion functions gtypes2phyDat and phyDat2gtypes to facilitate interoperability with the phangorn package.
Removed read.arlequin.
Added alleleNames accessor for gtypes object, which returns list of allele names for each locus.

version 1.0

New version with different gtypes format from previous versions. See vignettes for instructions and examples.

stratag's People

Contributors

Stargazers

Watchers

Forkers

hjanime thierrygosselin zkamvar pamorin ksil91 oneesk19 noreastermt zuperbri stranda patjeanne thokall

stratag's Issues

fixedDifferences function returns 0 differences

Hi again,

I'm now having trouble with the fixedDifferences function. I have tried it both on my own gtypes object, and on the example dloop.g data. In both cases it returns no differences.

I'm using strataG compiled from the source code today. My gtypes object is attached.

Thanks!
gtypes.RData.zip

enhancement request: gtypes2genind add Lat/Lon data

Latitude and Longitude data can be added to a gtypes object in the 'other' object, but the gtypes2genind function doesn't transfer the Lat/Lon data. Currently, I have to convert back to a dataframe, combine alleles to a single column, add the locus names, use the Adegenet function df2genind to create the genind object, then add the lat/lon data from the dataframe to the genind object.
Can the gtypes2genind function be modified to include other/xy for lat/lon data?

Wrong Structure of write.arlequin()

Hi,
When I use write.arlequin() to write out of my data, I found that the structure of output file was wrong.
[[Structure]]
StructureName="A group of 6 populations analyzed for DNA"
NbGroups=1
Group= {
"1"
"2"
"3"
"4"
"5"
"6"
}
Much different from arlequin example data:
[[Structure]]
StructureName="An example of structure with 5 geographic regions"
NbGroups=5
#Asia
Group={
"Oriental"
"Tharu"
}
#Africa
Group={
"Wolof"
"Peul"
}
#America
Group={
"Pima"
"Maya"
}
#Europe
Group={
"Finnish"
"Sicilian"
}
#Middle East
Group={
"Israeli Jew"
"Israeli Arab"
}

Actually I have six group and each group has different sample sizes. The strata of my gtypes object like follows:
strata(gt)
aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az ba bb bc bd be bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw bx by bz ca cb cc cd ce cf cg ch ci cj ck cl cm
6 6 6 6 3 3 3 3 3 3 3 3 1 1 1 1 3 3 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 1 1 5 5 6 6 6 6 6 6 4 6 1 1
cn co cp cq cr cs ct cu cv cw cx cy cz da db dc dd de df dg dh di dj dk dl dm dn do dp dq dr ds dt du dv dw dx dy dz ea eb ec ed ee ef eg eh ei ej ek el em en eo ep eq er es et eu ev ew ex ey ez
6 3 6 6 6 6 4 1 6 6 6 6 4 5 5 6 6 5 6 6 6 6 6 6 5 6 3 5 5 4 4 4 5 5 5 5 4 4 4 4 6 6 4 4 4 6 5 5 4 5 1 6 6 4 5 4 1 1 5 1 5 1 1 1 1
fa fb fc fd fe ff fg fh fi fj fk fl fm fn fo fp fq fr fs ft fu fv fw fx fy fz ga gb gc gd ge gf gg gh gi gj gk gl gm gn go gp gq gr gs gt gu gv gw gx gy gz ha hb hc hd he hf hg hh hi hj hk hl hm
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 1 1 1 1 1 4 6 6 3 6 6 6 6 6 6 6 6 6 6 6 6
hn ho hp hq hr hs ht hu hv hw hx hy hz ia ib ic id ie if ig ih ii ij ik il im in io ip iq ir is it iu iv iw ix iy iz ja jb jc jd je jf jg jh ji jj jk jl jm jn jo jp jq jr js jt ju jv jw jx jy jz
6 1 1 2 5 6 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 3 3 6 6 6 6 4 3 6 6 6 5 6 6 6 1 1 1 6 6 6 6 6 6 6 6 6 6
ka kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg lh li lj lk ll lm ln
6 4 2 5 6 6 6 6 6 5 6 6 6 4 1 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 1 3 6 6 6 6 6 6 6 6
Levels: 1 2 3 4 5 6
So, is there some thing wrong with write.arlequin() function?

Best wishes!

Bug when running Structure with USEPOPINFO

There was a small bug in the structureRead() code when doing USEPOPINFO (line 407 of structure.r). I forked it and corrected it.

Registering Routines

I noticed your comment on an Rcpp thread a while ago about Registering Routines and saw this post recently. I thought it might help when R 3.4 is released: https://ironholds.org/registering-routines/

dist.dna model as object

new implementation of overallTest and pairwiseTest won't accept a saved object for the model (used for phiST).
This works:
model = "JC69"

This doesn't work:
mdl <- "JC69"
model = mdl

Problems to run FastSimCoal with strataG r

Dear Eric
I have some issues to run FastSimCoal with strataG r
I am following the scripts to simulate http://onlinelibrary.wiley.com/store/10.1111/1755-0998.12559/asset/supinfo/men12559-sup-0001-AppendixS1.R?v=1&s=de6dcdaf0c2c298d4ba925ea282432429d203d80

Although I have setting the Path to
setwd("C:/Users/JuanM/Desktop/SIMULATION IN ECOLOGY AN GENETIC/FASTSIMCOAL/fsc_win64")
But wen I tried to run an error appears in the R console I'm using Rstudio and R 3.4.1 and Windows 10

Error in fastsimcoal(pop.info, msat.params, mig.rates, hist.ev) :
fastsimcoal exited with error 1
In addition: Warning messages:
1: running command 'C:\WINDOWS\system32\cmd.exe /c fsc252 -i fsc.run.par -n 1 -q -S ' had status 1
2: In shell(cmd.line, intern = F) :
'fsc252 -i fsc.run.par -n 1 -q -S ' execution failed with error code 1

sim.sub <- lapply(c(50, 100, 500), function(x) {

ran.loc <- sample(locNames(sim.msats), x)
sim.msats[, ran.loc, ]

Any advice because I'm interested to use FastSimCoal with strataG r
Thanks in advance
Have a good day
Dr. Juan Manuel Penaloza Ramirez
Secretaría de Desarrollo Institucional
Torre de Rectoría 8° piso, Ciudad Universitaria, UNAM

Pairwise Statistic Inconsistency: new Gtypes object vs. genind2gtypes

Hello,

I am using STRATAG to perform pairwise analyses on populations of a globally distributed shark species. I found a discrepancy in my results depending on how I created the gtypes object. My first method created the gtypes object new (see code below).

I changed my methodology to use the genind2gtypes function later in my analysis to try to streamline my code (see code below).

When I use the second method (genind2gtypes) on the same VCF data with the same population designations, I find no significant differentiation in two comparisons that are significant when I use the first method (gtypes new). I noticed that my first method retains the original genotypes (AGCT) and the second method function converts SNP data to 0s and 1s. I am wondering if perhaps I am losing data in the file conversions?

Can you please offer any insight into this issue?

Thank you,
Cassandra

library("strataG")
library("pegas")
library("vcfR")

#Method 1: Create new Gtypes object
#1: import data from vcf (using pegas), split alleles
Data1 <- read.vcf("data.vcf")
Data1M <- as.matrix(Data1)
Data1M.split <- alleleSplit(Data1M, sep = '/')

#2: import strata for schemes, make sure row names are consistent in both data frames
Strata <- read_csv("Strata.csv", col_names = TRUE)
Data.schemes <- as.data.frame(Strata[, c("Region", "Location")])
rownames(Data.schemes) <- Strata$ID
rownames(Data1M.split) <- Strata$ID

#3: create gtypes objects with schemes attached
Data1.gtype <- new("gtypes", gen.data = Data1M.split, ploidy = 2, schemes = Data.schemes)

#4: stratify gtypes objects
Data1.gtype.loc <- stratify(Data.gtype, "Location")

#Method 2: Convert genind object to gtypes
#1: import data from vcf (using vcfR) and convert to genind
Data2.vcfr <- read.vcfR("data.vcf", verbose = TRUE)
Data2.genind <- vcfR2genind(Data2.vcfr, sep="[|/]")

#2: add populations to genind object
popID <- read.table("popID.txt", header=FALSE)
pop(Data2.genind) <- as.vector(as.matrix(popID))

#3: convert genind to gtypes object
Data2.gtype.loc <- genind2gtypes(Data2.genind)

#run pairwise statistics
Fstats1.loc <- popStructTest(Data1.gtype.loc, nrep = 10000, stats = c("fst", "gst.dbl.prime", "d"), quietly = FALSE, max.cores = NULL)
Fstats2.loc <- popStructTest(Data2.gtype.loc, nrep = 10000, stats = c("fst", "gst.dbl.prime", "d"), quietly = FALSE, max.cores = NULL)

Issues with compiling from GitHub

Hi Eric,

With the CRAN library not available, I am trying to compile from GitHub, but I get stuck in the final stages of the compilation of strataG on MacOs 11.2. It returns the following error:

clang++ -mmacosx-version-min=10.13 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o strataG.so RcppExports.o calc_FstStats.o calc_Phist.o misc_funcs.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0 -L/usr/local/gfortran/lib -lgfortran -lquadmath -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: directory not found for option '-L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0'
ld: warning: directory not found for option '-L/usr/local/gfortran/lib'
ld: library not found for -lgfortran
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [strataG.so] Error 1
ERROR: compilation failed for package ‘strataG’
─ removing ‘/private/var/folders/87/q5h9lnyj30161gjy8bq5pwk80000gn/T/RtmpEvbY2v/Rinst97783d66c54b/strataG’
-----------------------------------
ERROR: package installation failed
Error: Failed to install 'strataG' from GitHub:
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> clang++ -mmacosx-version-min=10.13 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o strataG.so RcppExports.o calc_FstStats.o calc_Phist.o misc_funcs.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0 -L/usr/local/gfortran/lib -lgfortran -lquadmath -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
E> ld: warning: directory not found for option '-L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0'
E> ld: warning: directory not found for option '-L/usr/local/gfortran/lib'
E> ld: library not found for -lgfortran
E> clang: error: linker command failed with

Hope you can point me in the right direction,

Thanks in advance, Bart

pairwiseTest update

It would be useful to add a value to the pairwiseTest function output, reporting how many loci were used to calculate differentiation between each pair of strata.

Missing data in ldNe

Hello Eric,

My name is Jeronymo and I'd like to estimate Ne using strataG similarly to NeEstimator. My dataset shows 6.3% of missing, represent one panmictic population, and when I used the function ldNe:

Ne = ldNe(snps_gtypes, maf.threshold = 0, by.strata = TRUE, ci = 0.95, drop.missing = TRUE, num.cores = 4)

Ne is estimated excluding missing values, same values than NeEstimator v2.1. When I used "drop.missing = TRUE", NULL is returned:

Warning message:
Can't compute ldNe in '1' because loci are missing genotypes and 'drop.missing = FALSE'. NULL returned.

I would like to know how I should proceed in your opinion:

Am I doing something wrong? My script is here: https://github.com/jdalapicolla/Ne_StrataG.R/blob/master/Ne_Estimation.R
Should I impute missing values by the mode or mean? If yes, what functions or packages do you recommend?
Should I use NeEstimator when there are missing values in my dataset?
Is there any way to implement the NeEstimator's solution for missing values in the function ldNe?

Thank you so much for your time,
Best regards,
Jeronymo Dalapicolla

arlequinWrite() with haploid microsatellite data

Hi Eric,

I was confronted to an issue relating to my haploid microsatellite data, which is not processed by the arlequinWrite() function, as the ploidy is set to >1 in the code. For haploid data, the ploidy is ==1 and then, the arlequinWrite() function is switching to the "FREQUENCY" type.

My request would to use the "MICROSAT" type while ploidy ==1.

Output format for Arlequin is the same as already written in your code for "MICROSAT". The easiest way would be to set up a switch in the command line; for instance: arlequinWrite(g, file = NULL, type= "microsat", ploidy = "haploid"). What do you think?

Warm regards,

Gilles

Duplicate names

Dear Eric,
I found a small problem in .fscCollectParams. I wanted to keep stable growth in a population that is a sink for another population. The thing is I wanted fastsimcoal to estimate this parameter while using the same parameter name between blocks is not allowed by .fscCollectParams. The error is as follows:
Error in .fscCollectParams(p) : Can't have duplicated parameter names.
I have tested it on the tpl file in fastsimcoal and the program works and substitutes the value with the same random number in par file. Of course one can define another growth parameter but as you know, overparametrisation is an enemy of decent modeling. Perhaps, you could change the check to allow for the same names used for the same parameters, I guess it applies only to growth that can be defined in demes and perhaps a time of an event which could be the same as a time of deme sampling, or migration if it is previously named in the migration matrix (gives the same error message; although, I cannot imagine any situation where this functionality could be useful ;).

In my case it is:
demes <- fscSettingsDemes(fscDeme("NFL",15,0,0,"GFL"),
fscDeme("ND34",15,0,0,"GD34"),#3
fscDeme("NPL",15,0,0, "GPL"), #4
and
events <- fscSettingsEvents(
fscEvent("TPL", 2, 1, 1, 1, "GD34", 0), ...
where "GD34" is a problem.
By the way, I think there is a mistake in the manual for the fastsimcoal use in the strataG. It says in the 'events' section: "growth.rate gives a new growth rate for the source deme" while fsc manual says it is a sink deme that is affected.

Troubles running the ldNe function

Dear Eric and strataG users,

I tried to run the ldNe function on a gtypes object consisting of 312 diploid individuals and 42826 SNP loci, converted from a genind object with the following characteristics:

/// GENIND OBJECT /////////

 // 312 individuals; 42,826 loci; 85,652 alleles; size: 135.5 Mb

 // Basic content
   @tab:  312 x 85652 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-2)
   @loc.fac: locus factor for the 85652 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: df2genind(X = xx[, ], sep = "/", ncode = 1, ind.names = [email protected], 
    pop = gl@pop, NA.char = "-", ploidy = 2)

 // Optional content
   @pop: population of each individual (group size range: 8-105)
   @other: a list containing: ind.metrics  loc.metrics  latlong  history

Here is what the gtypes object looks like after conversion using the genind2gtypes function:

<<< gtypes created on 2021-02-01 15:48:21 >>>

Contents: 312 samples, 42826 loci, 8 strata
Other info: genind

Strata summary:
          stratum num.ind num.missing num.alleles
1      Atlantic-N      42  0.57327325    1.891748
2     Atlantic-NE      21  0.47190959    1.757367
3     Atlantic-SE     105  1.43382525    1.966165
4 Indian_Ocean-EC       8  0.07605193    1.547308
5  Indian_Ocean-N      16  0.15123990    1.687059
6 Indian_Ocean-SW      22  0.29995797    1.750595
7   Mediterranean      45  0.60131696    1.876617
8      Pacific-SW      53  0.84348293    1.899827

Everything seems to be going well, unfortunately when I try to run the ldNe function I get the following error:

> LDNe_bystrata <- ldNe(dat1, maf.threshold = 0.05, by.strata = FALSE, ci = 0.95, drop.missing = FALSE, num.cores = 16)
Error in which() : argument "x" is missing, with no default

Traceback:

8. which()
7. eval(lhs, parent, parent)
6. eval(lhs, parent, parent)
5. which() %>% names()
4. FUN(X[[i]], ...)
3. lapply(X = ans[index], FUN = FUN, ...) 
2. tapply(1:nrow(mat), st, function(i) {
        mat.st <- mat[i, ]
        if (maf.threshold > 0) {
              above.thresh <- (colMeans(mat.st)/2) >= maf.threshold ...
1. ldNe(dat1, maf.threshold = 0.05, by.strata = FALSE, ci = 0.95, drop.missing = FALSE, num.cores = 16)

I am working with strataG v2.4.905, which is not the latest version indeed but apparently ldNe from strataG v2.4.910 was corrected for an error when 1 individual is present, which is not the case in my dataset (always >1 individuals). I tried getting into the code source but could not figure out what I was doing wrong?

Thank you very much in advance for your feedback,
All the best!

Chrystelle

Request to add population structure program, FLOCK, to StrataG

Hello,
I would like to request that a population structure component be added to StrataG- particularly the FLOCK software which is currently based in excel and aids in inferring population structure and individual assignment. FLOCK randomly divides all of the genotypes into K genetic groups; and reassigns the genotypes at each iteration to the group with the highest probability of belonging, using the multilocus method of maximum likelihood described by Paetkau et al. (1995).

I have been using FLOCK and I have been very pleased with its outcomes and superiority compared to STRUCTURE. Using FLOCK and a wrapper with the DAPC function through adegenet together would be beneficial- as you could take individual assignments from FLOCK and then directly input into a DAPC to visualize the results.

The authors of FLOCK are willing to help with integration, and I think this would be a valuable contribution to the StrataG program, as well as to R users that are interested in answering population structure and individual assignment questions.

Cheers, Julie

Error message trying to run ldNe() on microsatellites data.

Dear Eric,

My name is Giuliano. I am writing because I am having an issue while trying to run the ldNe() function from strataG.

I am using R 4.0.2 and RStudio 1.3.959 on a macOS 10.15.6

I downloaded and installed the dev version of strataG 2.4.910.

I am able to load and read my data (microsats from a diploid species) as gtypes object and I can run functions to calculate allele frequencies and such.

When I try to run ldNe() this is what I get:

> ldNe(jt.ldne.gtype, maf.threshold = 0, by.strata = FALSE,
+   ci = 0.95, drop.missing = FALSE, num.cores = 4)
Error in .local(x, ...) : 
  Can't code SNPs because some loci have more than 2 alleles.

I tried using the example data set that comes with the package, but I get the same error.
I am guessing the function works only with snps data? If not, any thought on how I could fix the problem?

Thanks in advance

arlequinWrite() output ARP formatting

Hello Eric,

I've been using the arlequinWrite function, but it appear now that the output file contains broken string characters (see at the end). Previous release was working great. I am under the v2.4.910n using R v4.0.2
This issue is random, since generating several ARP files will not lead to the same issue on the same sample names... But still, there are issues in the output ARP files.

Also, I sometimes use a modified version of arlequinWrite since I am working on microsatellite for haploid... and this function is only working on microsatellite for ploidy>1... If there is a way to trick this, I'll be more than happy!

Hope you could trick and fix this for the community!

Hope the best for you,

kind regards,

Gilles

Examples of broken ARP output files

#the two first letters in "SampleSize" is missing and carriage return before "SampleData="

SampleName="Guyana"
mpleSize=3SampleData={

or
#"Sa" is missing from "SampleSize" and "SampleData"

SampleName="Uganda247"

mpleSize=18
mpleData={5652 1 7 4 2 2 5 2 3 4 12 3 3 3 3 3
5653 1 7 4 2 2 5 2 3 4 12 3 3 3 3 3

or
#1st sample name is missing

SampleName="Ivory Coast221"
SampleSize=14
SampleData={
1 7 4 2 2 5 2 3 4 9 3 3 3 3 3
1803 1 7 4 2 2 5 2 3 4 9 3 3 3 3 3

or
#some random carriage returns are missing and names are incomplete

[[Structure]]
StructureName="A group of 40 populations analyzed for MICROSAT"
NbGroups=1
Group= {
"Australia"
"Benin"
"Brazil"
"Burkina Faso"
"Cameroon"
"China"
"Colombia"
"Comoros"
"Costa Rica"
"El Salvador"
"French Guiana"
"Guadeloupe"
"Guatemala"
"Guyana"
"Honduras"
"India"
"Indonesia"
"Ivory Coast"
"Japan"
"Kenya"
"Madagascar"
"Malaysia"
"Martinique"
"Mauritius"
"Mayotte"
"Mexico"

ew-Caledonia""Peru"
"Philippines"
"Reunion"
"Rodrigues"
"Seychelles"

outh Africa"
witzerland""Taiwan"
"Tanzania"
"Thailand"
"Trinidad"
"Uganda"

SA"}

error in SFS calculation

Hi I appreciate if you could help me
I get an error when I run this sfs on my SNP data which I I have converted to data-frame.
command: mysfs<- sfs(mydata, strata.col=2, locus.col = 3, sort.strata = FALSE, na.action ="filter")
Error: Error in FUN(X[[i]], ...) : 'x' must be numeric
my dataframe structure is like this below which is produced by conversion of gtypes to dataframe.
I imagine is my dataframe is it any specific function to convert genind, gl or gtype to data frame with 1, 0, 2 formats required for sfs analysis ?
id stratum SNP1 SNP2 SNP3 SNP4.......................... SNPn
ind1 1 01 02
ind1 1
ind2 .
ind2 .
. .
indn .

Lacking 'locus.info' in p variable (fastsimcoal)

Dear Eric,
Firstly, I love your package!
I have just installed it on Linux and try to run some simple simulations. My code is as follows:

genetics <- fscSettingsGenetics(
  fscBlock_dna(20, 1e-4, chromosome = 1),
  fscBlock_dna(20, 1e-5, chromosome = 2),
  fscBlock_dna(20, 1e-7, chromosome = 3),
  fscBlock_dna(20, 1e-10, chromosome = 4)
)
params <- fscWrite(demes,genetics,use.wd = TRUE)
fscRun(params)
proj <- fscReadArp(params,marker = "dna",sep.chrom = TRUE)

Unfortunately, the last line produces an error. I have track it down to .fscParseAllSites function where it stucks at:
gen.data <- vector("list", nrow(locus.info))
The thing is fscWrite does not fill in locus.info slot in parameters. I would fix it but I am not sure what does it contain. I only guess it might be equal to p$settings$genetics. Am I right?
Maciek Konopiński

labelHaplotypes with unambiguously assigned haplotypes

The code for renaming ambiguous haplotypes as NA appears to be breaking the code for generating gtypes. I'm getting the following error when running labelHaplotypes(my.gtype):

Error in .local(.Object, ...) :
the following haplotypes can't be found in sequences for locus 'haps': NA
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
display list redraw incomplete
2: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
3: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
4: In doTryCatch(return(expr), name, parentenv, handler) :
display list redraw incomplete
5: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state
6: In doTryCatch(return(expr), name, parentenv, handler) :
invalid graphics state

Issues with downloading StrataG on R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"

Dear Dr. Archer,

I am having issues with downloading strataG on the most recent version of R (R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"). What version of R should I use to use strataG? Thank you very much! Below is the error message.

I do look forward to using this package for my work!

aloha,
Jon

install.packages("strataG")
Warning in install.packages :
package ‘strataG’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Issues with read.arlequin

Hello,
There are a couple of issues with the function read.arlequin. First, it fails to recognize the appropriate DataType even though they are correctly annotated in the input file. The problem lies in the gsub portion of the code where your regular expression is not correctly extracting the DataType (and the title). It is extracting:

data.type
[1] " \tDataTypeDNA"
instead of
data.type
[1] "DNA"
Just modifying this part I was able to read in the file without any errors until I reach the haploid.mat step where I get the following error:
Error in x[i, 3] : subscript out of bounds

Any fixes or suggestions are really appreciated.
Thanks

readGenData doesn't exclude NA's

readGenData with na.strings = c(NA, "NA", "", " ", "?", ".") incorporates "NA" as an allele, not missing data.

Error in structureRun

I wasn't able to perform STRUCTURE with function structureRun. Keep getting the following error.

structureRun(msats.g, k = 2:5, num.k.rep = 1)
Error in structureRead(files["out"], sw.out$pops) :
the file 'dolphin.msats.structureRun/dolphin.msats.structureRun.k2.r1_out_f' can't be found.

Extraction of confidence intervals in STRUCTURE

Hello, thanks for the very nice package.
Could it be possible to extract the confidence intervals when running STRUCTURE ? It corresponds to the "credible regions" in the parameters.
Thanks a lot.

structureWrite refers to defunct loci function

Attempting to use structureRun in v2.0.1, encountering "Error: could not find function "loci" ". structureWrite refers to it.

Wandering through the commits, it looks like it was removed during the 2.0 update to the gtypes object.

NaN result in Fu's FS test

I have used the fusFs function to calculate FS on my datasets, however several results return a NaN. I am using read.dna to read in standard alignment files (COI) in FASTA format. Number of sequences per FASTA file range from 100-500.

Calculate Fu's Fs p-value.

For function fusFs. Requires incorporating coalescent simulation (use coala or scrm package?).

Fastsimcoal automation

Hi,
I guess this is a very hard time for you after releasing such a revolutionary version of the soft. I really appreciate your work. But I had to downgrade back to 2.0.2 because of the new method of introducing data to fastsimcoal. I managed to rewrite the script to produce input files automatically (24 microsatellite loci, 6 different allele nos. limits, 4 different mutation rates, and 4 populations with different histories) make some automated simulations, but did not manage to make fastsimcoal running. I know there's some problem possibly with the system, because I was not able to run it manually, but the par file from 2.0.2 works excellent. Below you will find a piece of script I used for simulations so that you can understand what automation I'm talking about. Maybe that will inspire you ;)
I managed to rewrite most of the blocks, except for the new format event block. The main obstacle is that you cannot provide multiple data as I did before. In the end I found I can build normal data.frames and process them by the new functions so they get the desired classes and attributes, but building them is much more tricky. And after all that work I was very disappointed to see that fsc hangs.
I will try to attach param files from the old and new versions, but I'm waiting for my simulations to finish, and I have to recreate all the loops I wrote for building new param file.
Cheers
Maciek Konopiński

popSize <- 10000
# mutation rates in loci
mutRates  <- c(0.0001, 0.0002, 0.0005, 0.001)
# max no. of alleles in loci
maxRange <- c(3, 6, 9, 12, 15, 20)
numLoci <- length(mutRates) * length(maxRange)
# bottleneck sizes
botSize <- c(500, 50, 20)
#historical events
botTime <- 20 # since t0
botLength <- 20 # since the beginning of the bottleneck
splitTime <- 50 # since t0
numPops <- length(botSize) + 1

splSize <- rep(popSize, numPops)

pop.info <- strataG::fscPopInfo(
  pop.size = rep(popSize, numPops),
  sample.size = splSize,
  sample.times = c(rep(0, numPops))
)

# Setting the parameters for fastsimcoal
# Parameters of microsatellite loci
locus.params <-
  strataG::fscLocusParams(
    locus.type = "msat",
    num.loci = 1,
    mut.rate = rep(mutRates, each = length(maxRange)),
    #proportion of non-stepwise mutations
    gsm.param = 0.2,
    #maximum number of alleles at locus
    range.constraint = rep(maxRange, length(mutRates)),
    #diploid individuals
    ploidy = 2,
    #24 chromosomes (free recombination)
    chromosome = c(1:(numLoci))
  )
# Parameters of coalescent analysis
# dates (in generations) of historical events
hist.ev <- strataG::fscHistEv(
  num.gen = c(
    rep(botTime, numPops - 1),
    rep(botTime + botLength, numPops - 1),
    rep(splitTime, numPops - 1)
  ),
  source.deme = as.numeric(c(seq(1:(numPops - 1)),
                             seq(1:(numPops - 1)),
                             seq(1:(numPops - 1)))),
  sink.deme = c(seq(1:(numPops - 1)),
                seq(1:(numPops - 1)),
                rep(0, numPops - 1)),
  prop.migrants = c(rep(1, 3 * (numPops - 1))),
  #simulating bottlenecks and recovery to previous popsize
  new.sink.size = c(as.numeric(rep(botSize / popSize)),
                    as.numeric(rep(popSize / botSize)),
                    rep(1, numPops - 1))
)
simulated <- strataG::fastsimcoal(
    pop.info,
    locus.params,
    mig.rates = NULL,
    hist.ev,
    num.cores = 12,
    exec = paste0("fsc26"),
    delete.files = TRUE,
    quiet = TRUE
  )

incorrect result for gtypes2genind

I noticed a bug in gtypes2genind where the ids and strata columns are being incorporated as loci in the genind object:

library('strataG')
library('magrittr')
data(msats.g)
msats.g %>% gtypes2genind() %>% locNames()
## [1] "ids"    "strata" "D11t"   "EV37"   "EV94"   "Ttr11"  "Ttr34"

Here's my session info:

Session info -------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, darwin13.4.0        
 ui       RStudio (0.99.1172)         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2016-05-06                  

Packages -----------------------------------------------------------------------------------
 package      * version     date       source                                  
 abind          1.4-3       2015-03-13 CRAN (R 3.2.0)                          
 acepack        1.3-3.3     2014-11-24 CRAN (R 3.2.0)                          
 ade4         * 1.7-4       2016-03-01 CRAN (R 3.2.3)                          
 adegenet     * 2.0.2       2016-04-29 Github (thibautjombart/adegenet@e61bc06)
 ADGofTest      0.3         2011-12-28 CRAN (R 3.2.0)                          
 ape          * 3.4-0.3     2016-02-23 local                                   
 apex         * 1.0.2       2016-03-31 CRAN (R 3.2.4)                          
 assertthat     0.1         2013-12-06 CRAN (R 3.2.0)                          
 boot           1.3-18      2016-02-23 CRAN (R 3.2.3)                          
 chron          2.3-47      2015-06-24 CRAN (R 3.2.0)                          
 cluster        2.0.4       2016-04-18 CRAN (R 3.3.0)                          
 coda           0.18-1      2015-10-16 CRAN (R 3.2.0)                          
 colorspace     1.2-6       2015-03-11 CRAN (R 3.2.0)                          
 copula         0.999-14    2015-10-26 CRAN (R 3.2.0)                          
 data.table     1.9.6       2015-09-19 CRAN (R 3.2.0)                          
 DBI            0.4         2016-05-02 CRAN (R 3.2.5)                          
 deldir         0.1-12      2016-03-06 CRAN (R 3.2.4)                          
 devtools       1.11.1      2016-04-21 CRAN (R 3.3.0)                          
 digest         0.6.9       2016-01-08 CRAN (R 3.2.3)                          
 dplyr          0.4.3       2015-09-01 CRAN (R 3.2.0)                          
 fastmatch      1.0-4       2012-01-21 CRAN (R 3.2.0)                          
 foreign        0.8-66      2015-08-19 CRAN (R 3.2.0)                          
 Formula        1.2-1       2015-04-07 CRAN (R 3.2.0)                          
 gdata          2.17.0      2015-07-04 CRAN (R 3.2.0)                          
 ggplot2        2.1.0       2016-03-01 CRAN (R 3.2.3)                          
 gmodels        2.16.2      2015-07-22 CRAN (R 3.2.0)                          
 goftest        1.0-3       2015-07-03 CRAN (R 3.2.0)                          
 gridExtra      2.2.1       2016-02-29 CRAN (R 3.2.4)                          
 gsl            1.9-10.1    2015-12-02 CRAN (R 3.2.3)                          
 gtable         0.2.0       2016-02-26 CRAN (R 3.2.3)                          
 gtools         3.5.0       2015-05-29 CRAN (R 3.2.0)                          
 Hmisc          3.17-4      2016-05-02 CRAN (R 3.3.0)                          
 htmltools      0.3.5       2016-03-21 CRAN (R 3.2.4)                          
 httpuv         1.3.3       2015-08-04 CRAN (R 3.2.0)                          
 igraph         1.0.1       2015-06-26 CRAN (R 3.2.0)                          
 lattice        0.20-33     2015-07-14 CRAN (R 3.2.0)                          
 latticeExtra   0.6-28      2016-02-09 CRAN (R 3.2.3)                          
 LearnBayes     2.15        2014-05-29 CRAN (R 3.2.0)                          
 magrittr       1.5         2014-11-22 CRAN (R 3.2.0)                          
 mapdata        2.2-6       2016-01-14 CRAN (R 3.2.3)                          
 maps           3.1.0       2016-02-13 CRAN (R 3.2.3)                          
 MASS           7.3-45      2015-11-10 CRAN (R 3.2.2)                          
 Matrix         1.2-6       2016-05-02 CRAN (R 3.3.0)                          
 memoise        1.0.0       2016-01-29 CRAN (R 3.2.3)                          
 mgcv           1.8-12      2016-03-03 CRAN (R 3.2.4)                          
 mime           0.4         2015-09-03 CRAN (R 3.2.0)                          
 munsell        0.4.3       2016-02-13 CRAN (R 3.2.3)                          
 mvtnorm        1.0-5       2016-02-02 CRAN (R 3.2.3)                          
 nlme           3.1-127     2016-04-16 CRAN (R 3.3.0)                          
 nnet           7.3-12      2016-02-02 CRAN (R 3.2.3)                          
 nnls           1.4         2012-03-19 CRAN (R 3.2.0)                          
 pegas          0.9         2016-04-16 CRAN (R 3.2.5)                          
 permute        0.9-0       2016-01-24 CRAN (R 3.2.3)                          
 phangorn     * 2.0.3       2016-05-01 CRAN (R 3.2.5)                          
 plyr           1.8.3       2015-06-12 CRAN (R 3.2.0)                          
 polyclip       1.5-6       2016-04-03 CRAN (R 3.2.4)                          
 poppr        * 2.1.1.99-47 2016-05-05 local                                   
 pspline        1.0-17      2015-06-29 CRAN (R 3.2.0)                          
 quadprog       1.5-5       2013-04-17 CRAN (R 3.2.0)                          
 R6             2.1.2       2016-01-26 CRAN (R 3.2.3)                          
 RColorBrewer   1.1-2       2014-12-07 CRAN (R 3.2.0)                          
 Rcpp           0.12.4      2016-03-26 CRAN (R 3.2.4)                          
 reshape2       1.4.1       2014-12-06 CRAN (R 3.2.0)                          
 rpart          4.1-10      2015-06-29 CRAN (R 3.2.0)                          
 rsconnect      0.4.3       2016-05-02 CRAN (R 3.3.0)                          
 scales         0.4.0       2016-02-26 CRAN (R 3.2.3)                          
 seqinr         3.1-3       2014-12-17 CRAN (R 3.2.0)                          
 shiny          0.13.2      2016-03-28 CRAN (R 3.2.4)                          
 sp             1.2-3       2016-04-14 CRAN (R 3.3.0)                          
 spatstat       1.45-0      2016-03-10 CRAN (R 3.2.4)                          
 spdep          0.6-4       2016-04-12 CRAN (R 3.2.4)                          
 stabledist     0.7-0       2015-05-04 CRAN (R 3.2.0)                          
 strataG      * 1.0.5       2016-05-06 Github (ericarcher/strataG@c29d1fc)     
 stringi        1.0-1       2015-10-22 CRAN (R 3.2.0)                          
 stringr        1.0.0       2015-04-30 CRAN (R 3.2.0)                          
 survival       2.39-2      2016-04-16 CRAN (R 3.3.0)                          
 swfscMisc      1.1         2016-03-03 CRAN (R 3.2.4)                          
 tensor         1.5         2012-05-05 CRAN (R 3.2.0)                          
 vegan          2.3-5       2016-04-09 CRAN (R 3.2.4)                          
 withr          1.0.1       2016-02-04 CRAN (R 3.2.3)                          
 xtable         1.8-2       2016-02-05 CRAN (R 3.2.3)

Error with genind2gtypes 'no ids in 'schemes' are in 'gen.data' or 'ind.names''

I was converting a genind to a gtype and got this error:

Error: no ids in 'schemes' are in 'gen.data' or 'ind.names'
Traceback:

1. genind2gtypes(cp)
2. df2gtypes(x = gen.mat, ploidy = x@ploidy[1], id.col = NULL, strata.col = if (has.pop) 1 else NULL, 
 .     loc.col = if (has.pop) 2 else 1, schemes = x@strata, other = list(genind = adegenet::other(x)))
3. methods::new("gtypes", gen.data = gen.data, ploidy = ploidy, 
 .     ind.names = ind.names, strata = strata, schemes = schemes, 
 .     sequences = sequences, description = description, other = if (is.null(other)) list() else other)
4. initialize(value, ...)
5. initialize(value, ...)
6. .local(.Object, ...)
7. stop("no ids in 'schemes' are in 'gen.data' or 'ind.names'", 
 .     call. = FALSE)

These are the parameters of my genind object:

/// GENIND OBJECT /////////

 // 605 individuals; 2,809 loci; 5,618 alleles; size: 14.6 Mb

 // Basic content
   @tab:  605 x 5618 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-2)
   @loc.fac: locus factor for the 5618 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: repool(c(D, N, F, DFf1, DFf2, DFf1xD, DFf1xF, DFf1DxD, DFf1FxF, 
    DFf1xN, DNf1, DNf2, DNf1xD, DNf1DxD, DNf1xN, DNf1NxN, DNf1xF, 
    DNf1FxF, FNf1, FNf2, FNf1xF, FNf1FxF, FNf1xN, FNf1NxN, FNf1NNxN, 
    FNf1xD, FNf1DxD))

 // Optional content
   @pop: population of each individual (group size range: 20-40)
   @strata: a data frame with 2 columns ( INDIVIDUALS, POP_ID )

It seems like genind2gtype is setting the schemes parameter as as strata(genind), so it is getting an error because my strata(genind) columns have a different name than 'id' as required by schemes. I was able to avoid the error by changing the column name:
colnames(strata(sim.gind)) <- c("id","POP_ID")

Error in df2gtypes: "the number of genes in 'sequences' is not equal to the number of loci"

I'm using strataG (v2.5.01; R version 4.2.0) to run fastSimcoal2 simulations (with DNA markers) and convert the Arlequin outputs to gtype objects. However, when I run fsc2gtypes on my fsc parameters object, I get the error in the title. I was able to trace the error back to this line of the df2gtypes function, but I haven't been able to fix it yet.

I've included a reprex below:

library(strataG)
# Simulation settings
demeA <- fscDeme(deme.size = 10, sample.size = 10)
dna <- fscBlock_dna(sequence.length = 15, mut.rate = 1e-3)
DNAgenetics <- fscSettingsGenetics(dna, dna, dna, num.chrom = 1)
# Build parameters and run
DNA_Demo.params <- fscWrite(demes = fscSettingsDemes(demeA), genetics = DNAgenetics, 
                                          label = "DNAmarker_Demo", use.wd=TRUE)
DNA_Demo.params <- fscRun(DNA_Demo.params, num.sims = 5, all.sites = TRUE)
# Convert to gtype
DNA_Demo_gtype <- fsc2gtypes(DNA_Demo.params, marker = "dna")
# Error: the number of genes in 'sequences' is not equal to the number of loci

Ultimately, my goal is to convert the Arlequin outputs to genind, and I was planning on doing this by calling gtypes2genind next. If there's something I'm messing up, or there's another means of achieving this conversion (Arlequin to genind) using strataG, please let me know.

Thank you for your time, and for creating an incredibly useful package!

Enhancement request: extra-parameters ALPHA and UNIFPRIORALPHA as arguments

Hi Eric,

Thanks for creating STRATAG. I was wondering if it would be possible to make the extraparameters ALPHA and UNIFPRIORALPHA to be arguments that can be defined by the user. In that way one could follow the suggestions in Wang 2017 ("The computer program STRUCTURE for assigning individuals to populations: easy to use but easier to misuse") and try the alternative ancestry prior and different values of ALPHA when analysing unbalanced samples.

Cheers,

Fernanda.

nucleotideDivergence() not recognising alleles

Hi Eric,

Thanks so much for making such a brilliant package!

Something strange is going on with the nucleotideDivergence() function for me and I just can't figure out why it's not working.

I'm using a gtypes file with 122 unique haplotypes, but I get this error message when I try to run the function:

div <- strataG::nucleotideDivergence(mt_haps)

Error in utils::combn(.data$allele, 2) : n < m

Which is from the line "haps <- utils::combn(.data$allele, 2)" in nucleotideDivergence().

When I check my number of alleles:

length(unique(mt_haps@data$allele))
122

I'm using StrataG v2.4.905.

My full code and datasets are here: https://github.com/mariaemilyd/test_repo

Thanks very much,

Maria

Enhancement request: convert genlight to gtypes

It would be great if there was a function to convert genlight to gtypes to faciliate integration of agedenet and strataG with larger SNP datasets.

Calculation of critical M

It would be really useful to calculate the critical M through simulation as decribed in Garza and Williamson (2001), for comparison with the empirical M-ratio.

ldNe issues

Hello,

I'm trying to calculate the ldNe for my data. For this, I first converted my genind file to gtype (genind2gtypes). The ldNe function, however, does not work after this is done. I get the warning message "No loci are biallelic", but the summary seems to show that my file is fine.

Thanks for your help!

> ldNe(gen50,maf.threshold = 0.02,ci=0.95)
NULL
Warning message:
No loci are biallelic. NULL returned. 
> summary(gen50)

<<< gtypes created on 2018-08-07 16:05:22 >>>

Contents: 680 samples, 14 loci, 2 strata

Strata summary:
            num.samples num.missing num.alleles prop.unique.alleles heterozygosity
SubPop0-130         130  0.07142857    10.64286          0.06907160      0.7647841
SubPop1-550         550  0.07142857    12.28571          0.01938025      0.7829731

Locus summary:
         num.genotyped num.alleles prop.unique.alleles obsvd.heterozygosity
A24                680          11          0.00000000            0.7323529
B22                680           8          0.00000000            0.7117647
CloneA2            680          12          0.08333333            0.7779412
Kpa24_2            680          17          0.00000000            0.9014706
Mluc8              680          10          0.00000000            0.8132353
Kpa16_2            680          14          0.00000000            0.8794118
CA38               678          16          0.00000000            0.8952802
CA11               680          12          0.00000000            0.6897059
Mnatt6             680          10          0.10000000            0.7808824
G6                 680           5          0.00000000            0.6705882
C112_2             680          11          0.00000000            0.6397059
MS3D02             680          17          0.05882353            0.6941176
H23_Mluc           680          15          0.06666667            0.8500000
B23                680          15          0.00000000            0.8764706

arlequinRead() doesn't read my samples

Hi Eric!

I didn't saw this issue, so I want to ask you about a problem that I have executing the function arlequinRead.

I have an arlequin file (.arp) of SNPs data. When I execute the function arlequinRead I have no error in the console. Nevertheless, when I check the file that was created (list), the "data.info" shows a "list of length 0". This is how it looks:

I've tried with different arlequin files that I have, but it always show me the same. Then I can't go on with the arp2gtype to continue with the analyses. I don't know if there is a problem with mi data, I don't understand why the function doesn't read my samples. This is a picture of the structure of my data:

I'm using v2.4.905 of strataG

Cheers,
Valentina

Parameter estimation using fastsimcoal - using non-SNP data

Hi Eric,

Me again! Not really an issue but more of a question.

Does the strataG interface for fastsimcoal allow for parameter estimation using genetic data other than SNPS (e..g microsats/STRs)?

I noticed if I try and specify parameters using fscSettingsEst() it won't work if you don't specify an obs.sfs?

Error in fscSettingsEst(fscEstParam("NCUR", is.int = TRUE, distr = "unif", :
object 'obs.sfs' not found

Dependency {rmetasim} no longer available for R-4.0.4

Hi Dr. Archer,

I was trying to install {strataG}, and I've also tried installing it from GitHub but it was still unsuccessful. It seems like one of the dependencies, {rmetasim}, was removed from CRAN. I was wondering if there are any fixes? Thank you for helping!

Kind regards,
Erica

.fscWriteEst problem with complex params missing

Dear Eric,
I do not want to spoil you but you might have already noticed I adore your package, although I focus mostly on running fastsimcoal with help of strataG. I think I found a minor problem with .fscWriteEst function. When there are no complex parameters to estimate the the function fscWrite returns an error: Error in `$<-.data.frame`(`*tmp*`, "name", value = " = ") : replacement has 1 row, data has 0
The thing is that in .fscWriteEst in line:
cmplx.df$name <- paste0(cmplx.df$name, " = ", cmplx.df$value)
there are empty characters for both cmplx.df$name and cmplx.df$value, so paste returns only =.
The solution that worked for me was to put a condition if(nrow(cmplx.df) != 0) on the two subsequent lines.
I would make a push request with that fix but I rarely use git and forgot how to do that...
My simulations do not converge anyway (perhaps due to at least a few complex params) so I do not even know if the fix does make any sense.
Cheers
Maciek

Can't use ldNe on windows

Hi Eric,

I am trying to use the ldNe function in R v 3.6.0, but when I run the code:

>ne <- ldNe(gty, maf.threshold = 0, by.strata = FALSE, ci = 0.95)

I get this error:

Error in mclapply(1:ncol(loc.pairs), compLoc, loc.pairs = loc.pairs, mat = mat, :
'mc.cores' > 1 is not supported on Windows

Is there a way to solve this? Is it a problem of my R version that maybe I can solve?

Thank you very much!
Anna

Add schemes/strata with gtypes2genind/genind2gtypes

The conversion functions between gtypes and genind would benefit if they took advantage of the fact that the schemes (gtypes) and strata (genind) slots were passed with these functions. Is it possible to implement such a feature?

You do not have STRUCTURE installed.

Hi there,

I have installed strataG via GitHub and tried to run the example code:

library(strataG)

data(msats.g)

sr <- structureRun(msats.g, k.range = 1:4, num.k.rep = 10)

However, I get the following error message: sh: 1: structure: not found error in FUN(X[[i]], ...) : You do not have STRUCTURE installed.

Any help is highly appriciated,

Bastian

Unable to install strataG or skeleSim in R v. 3.6.1

Dear Eric, Dear all,

Everything is in the title, I tried to install strataG (and also the package skeleSim) from GitHub on R 3.6.1, but this does not seem to work. I had an error message telling that R could not remove the prior installation of other packages (vctrs and tibbles regularly popped out the error messages).
If I understood well, there currently are some issues with the packages maintenance due to external factors, maybe related to R itself. May this explain why I could not download the packages?

Sorry for this quite technical question! Many thanks for any answer!

Best regards,
Chrys

Misspelt word in the code for read.arlequin function

The word FREQUENCY is misspelt as FREQUNENCY and for that reason, the function is unable to recognise Arlequin files whose DataType is FREQUENCY

read.arlequin
function (file)
{
arp <- scan(file, what = "character", sep = "\n", quiet = TRUE)
getValues <- function(title, arp) {
x <- grep(title, arp, ignore.case = TRUE, value = TRUE)
gsub("^[[:alnum:]]+=", "", x)
}
title <- gsub("[[:punct:]]+", "", getValues("Title", arp))
data.type <- gsub("[[:punct:]]+", "", getValues("DataType",
arp))
if (!toupper(data.type) %in% c("FREQUNENCY", "DNA", "MICROSAT")) {
stop("DataType must be of FREQUENCY, DNA, or MICROSAT")
}
missing.data <- gsub("[']|"", "", getValues("MissingData",
arp))
locus.separator <- gsub("[']|"", "", getValues("LocusSeparator",
arp))
sample.name <- gsub("[']|"", "", getValues("SampleName",
arp))
data.start <- grep("SampleData", arp, ignore.case = TRUE) +
1
data.end <- grep("[}]", arp)[1:length(data.start)] - 1
split.char <- switch(locus.separator, WHITESPACE = "[[:space:]]",
TAB = "[[:space:]]", NONE = "", locus.separator)
gen.data <- do.call(rbind, lapply(1:length(data.start), function(i) {
lines <- strsplit(arp[data.start[i]:data.end[i]], split = split.char)
lines <- lapply(lines, function(x) {
x <- x[x != ""]
x[x == missing.data] <- NA
x
})
max.len <- max(sapply(lines, length))
mat <- do.call(rbind, lapply(lines, function(x) {
if (length(x) < max.len)
c(rep(NA, max.len - length(x)), x)
else x
}))
for (r in 2:nrow(mat)) {
if (is.na(mat[r, 1]))
mat[r, 1] <- mat[r - 1, 1]
}
cbind(rep(sample.name[i], nrow(mat)), mat)
}))
haploid.mat <- function(x) {
new.mat <- do.call(rbind, lapply(1:nrow(x), function(i) {
mat <- x[rep(i, as.numeric(x[i, 3])), , drop = FALSE]
if (nrow(mat) > 1)
mat[, 2] <- paste(mat[, 2], 1:nrow(mat), sep = "")
mat[, 2] <- paste(mat[, 2], x[i, 1], sep = "")
cbind(mat[, c(2, 1), drop = FALSE], rep(x[i, 2],
nrow(mat)))
}))
colnames(new.mat) <- c("id", "strata", "gene")
new.mat
}
switch(toupper(data.type), DNA = {
new.mat <- haploid.mat(gen.data)
seq.mat <- gen.data[, c(2, 4), drop = FALSE]
seq.mat <- seq.mat[!duplicated(seq.mat[, 1]), , drop = FALSE]
seq.mat <- seq.mat[order(seq.mat[, 1]), ]
dna <- lapply(strsplit(seq.mat[, 2], ""), function(x) {
x[x == missing.data] <- "n"
tolower(x)
})
names(dna) <- seq.mat[, 1]
dna <- as.DNAbin(dna)
df2gtypes(new.mat, ploidy = 1, sequences = dna, description = title)
}, FREQUENCY = {
new.mat <- haploid.mat(gen.data)
df2gtypes(new.mat, ploidy = 1, description = title)
}, MICROSAT = {
ploidy <- unname(table(gen.data[, 2])[1])
new.mat <- do.call(rbind, split(gen.data[, -(1:3)], gen.data[,
2]))
freqs <- na.omit(gen.data[, 1:3])
new.mat <- cbind(freqs, new.mat[freqs[, 2], ])
new.mat <- do.call(rbind, lapply(1:nrow(new.mat), function(i) {
x <- new.mat[rep(i, as.numeric(new.mat[i, 3])), ,
drop = FALSE]
if (nrow(x) > 1) x[, 2] <- paste(x[, 2], 1:nrow(x),
sep = "_")
rownames(x) <- x[, 2]
x
}))
new.mat <- new.mat[, c(2, 1, 4:ncol(new.mat))]
loc.names <- paste("Locus", 1:(ncol(gen.data) - 3), sep = "")
loc.names <- paste(rep(loc.names, each = ploidy), 1:ploidy,
sep = ".")
colnames(new.mat) <- c("id", "strata", loc.names)
df2gtypes(new.mat, ploidy = ploidy, description = title)
})
}
<bytecode: 0x7fd2f54b90e8>
<environment: namespace:strataG>

Issue: Cannot execute arp2gtypes on Arlequin input file

Hello,

I hadn't seen this anywhere else, so please excuse me if this has been asked before. I have an Arlequin input file that I'd like to import into strataG (DataType=FREQUENCY), but when I try to execute arp2gtypes, I get the following error:

Error in '$<-.data.frame'('*tmp*', "id", value = c("_1", "_0")) : 
  replacement has 2 rows, data has 0

Here is the first section of my input file:

[Profile]
	Title="HFCR2"
	NbSamples=5
	DataType=FREQUENCY
	GenotypicData=0
	LocusSeparator=NONE

[Data]
[[Samples]]
	SampleName="Catalina"
	SampleSize=29
	SampleData={
		C-F4	17
		C-F8	5
		C-F13	5
		C-F17	1
		C-F53	1
		M-F65	0
		M-F68	0
		PV-F154	0
	}
	SampleName="Malibu"
	SampleSize=28
	SampleData={
		C-F4	7
		C-F8	3
		C-F13	8
		C-F17	6
		C-F53	0
		M-F65	2
		M-F68	2
		PV-F154	0
	}
	SampleName="PV"
	SampleSize=30
	SampleData={
		C-F4	6
		C-F8	1
		C-F13	10
		C-F17	6
		C-F53	0
		M-F65	2
		M-F68	3
		PV-F154	2

Each identifier (e.g., C-F4) is a unique haplotype. I'm sure I'm missing something obvious. The only time I've gotten the function to work is with DataType=DNA, where ID and the sequence for each individual is listed once. The problem is, when I do that, strataG thinks that each haplotype is unique, and Fst ends up being calculated as zero (which I know isn't the case).

Any tips or advice would be much appreciated.

Cheers,
Sean

EDIT: Yikes, sorry about the formatting. I'll work on fixing that...aaaaand fixed!

Enhancement request: ldNe microsat implementation

Hello Eric,

As per our emails I'm writing to request an ldNe implementation for microsats as an enhancement.

Cheers,
Colin

Error code 1 when running STRUCTURE

Dear Eric,

Thank you for creating strataG, it is a wonderful tool.

I am trying to run STRUCTURE via your package, but I am encountering the following error:

Error in FUN(X[[i]], ...) :
Error running STRUCTURE. Error code 1 returned.

Briefly, I first converted my genlight object into a gtypes object, and ran the following code for STRUCTURE mirroring the vignette's:

library(strataG)

gtypes <- genlight2gtypes(gl.rubi)

gtypes

Sys.setenv(PATH = paste("C:\Program Files (x86)\Structure2.3.4", Sys.getenv("PATH"), sep = ";"))

set.seed(1804)

sr <- structureRun(gtypes, k.range = 1:6, num.k.rep = 6)

I could not find what this error refers to, so any further guidance would be much appreciated. Thank you in advance!

labelHaplotypes

After applying the function labelHaplotipes the functions to estimate diverse parameters such as nucleotide diversity o haplotype diversity return an error message:
Example

secprueba <- sequence2gtypes(prueba, strata)
secpruebaL <- labelHaplotypes(secprueba)
ndgL <- nucleotideDivergence(secpruebaL)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘ploidy’ for signature ‘"list"’

The problem is solved using:

ndgL <- nucleotideDivergence(secpruebaL$gtypes)

Cheers

ericarcher / stratag Goto Github PK

stratag's Introduction

strataG

Description

Installation

Vignettes

Citation

Contact

version 2.5.01 (devel)

version 2.4.905

version 2.1

version 2.0.2

version 1.0.6

version 1.0.5

version 1.0

stratag's People

Contributors

Stargazers

Watchers

Forkers

stratag's Issues

#the two first letters in "SampleSize" is missing and carriage return before "SampleData="

or #"Sa" is missing from "SampleSize" and "SampleData"

or #1st sample name is missing

or #some random carriage returns are missing and names are incomplete

Recommend Projects

Recommend Topics

Recommend Org

or
#"Sa" is missing from "SampleSize" and "SampleData"

or
#1st sample name is missing

or
#some random carriage returns are missing and names are incomplete