Giter VIP home page Giter VIP logo

spacemix's Introduction

############################################################
############################################################
#	SpaceMix ReadMe
############################################################
############################################################

PREAMBLE
Caveat user: this code is still in beta.  Please contact
Gideon Bradburd at bradburd(at)umich(dot)edu with bug
reports or questions.  Please thoroughly read the 
documentation and go through the vignette first to make sure 
your question has not already been addressed.

Also, remember to check the git for updates!

README
This repository contains the R package SpaceMix and accompanying 
vignette.  To install the package you can either:

(1) download the .tar.gz file and install using 

> install.packages("~/SpaceMix_X.tar.gz",repos=NULL)

where "X" in the package name denotes the number of 
the most current version.

OR

(2) install directly from github using 

> require(devtools)
> install_github("gbradburd/SpaceMix",build_vignettes=TRUE)

You can then look at the vignette with 

> vignette("spacemix_vignette")

and you can see all documented functions using 

> ??SpaceMix

The citation for this paper is:
Bradburd GS, Ralph PL, Coop GM (2016) A Spatial Framework for Understanding Population Structure and Admixture. PLoS Genet 12(1): e1005703

and the DOI link is below:
http://dx.doi.org/10.1371/journal.pgen.1005703

spacemix's People

Contributors

gbradburd avatar petrelharp avatar

Stargazers

James Kitchens avatar Pavel V. Dimens avatar Yuqing Chen avatar Clemens Schmid avatar  avatar Jessie Pelosi avatar  avatar Xinghu Qin, Associate Professor of Beijing Forestry University avatar Tao Wan avatar Yudong Cai avatar Shea avatar Tyler Chafin avatar Asif Zubair avatar Wendy Wong avatar Fu Jun avatar Evan McCartney-Melstad avatar  avatar Gregory Owens avatar Sheng Wang avatar Mike Koontz avatar Stephen Doyle avatar Kieran Samuk avatar peterdfields avatar Francisco Pina-Martins avatar stollec avatar Ben Evans avatar  avatar Suvi-Tuuli Allan avatar  avatar

Watchers

peterdfields avatar James Cloos avatar  avatar Sheng Wang avatar Thierry Gosselin avatar Kim Gilbert avatar Benjamin Peter avatar Derek Nedveck avatar Coop Lab avatar Kate Crosby avatar  avatar Torsten Günther avatar  avatar

spacemix's Issues

Should we move from SNPs to averages over regions?

Currently, we think about thinning SNPs to make them independent. But, why not average them in windows, since then the central limit theorem says the result should fit our Gaussian model better anyhow?

Specifically, we could:

  • subtract off SNP means
  • take the average genotype in windows
  • normalize each (across pops) to have variance 1

First step: see how the resulting covariance matrix compares.

Negative infinity

I get the following error when running spacemix:

fast run 1 failed
Error in MCMC(model.option = fast.model.option, data.type = data.type, : Initial probability of model is NEGATIVE INFINITY! Please attempt to initiate chain again.

skipping fast_run_1 because it failed
Error in setwd(list.files()[grep("BestRun", list.files())]) :
character argument expected
In addition: Warning messages:
1: In sqrt(mean.freq.mat * (1 - mean.freq.mat)) : NaNs produced
2: In sqrt(mean.freq.mat * (1 - mean.freq.mat)) : NaNs produced
3: In file.rename(paste("../", fast.run.dirs[i], sep = ""), paste("../", :
cannot rename file '../fast_run_1' to '../failed_fast_run_1', reason 'Directory not empty'
4: In file.rename(fast.run.dirs[which.max(last.probs)], paste(fast.run.dirs[which.max(last.probs)], :
cannot rename file 'fast_run_1' to 'fast_run_1_BestRun', reason 'No such file or directory'

I tried initiating the MCMC a number of times but always get this error. The present dataset is using only a single individual per population, about 50k SNPs. Let me know if any additional info that might be useful troubleshooting here.

Parallel runs

When running spacemix on the same data (with different replicates / different models) at the same time, I found that the runs somehow interact with each other, with all runs result in nearly the same output figures, some of which are clearly inconsistent with the parameters of the particular run.

Is there a list of temporary files/folders that are written somewhere? Or some guidelines how spacemix should be configured to avoid these interactions?

Plotting admixture when there is no admixture

Hi Gideon.

Thank you for the great package, and method!

I am running the 'source_and_target' mode, and there seems to be little evidence for admixture. Yet, when I make the geogenetic plot, I get large dotted circles that encompass all populations. Is this by design?

Also, as a minor suggestion, you should add a citation and link to the paper to the repo's README.

Thanks again.

Best.

Anders.

make.spacemix.map requires X11

Not exactly a bug, but undesireable if it can be avoided...
Even if a different output dev is specified (I tend to use svg), make.spacemix.map requires a working X11 connection.

Some people (me at least) tend to not forward X on their server logins unless it is really needed... It causes annoyances with persistent logins using screen or tmux as well as X being (sometimes very) slow when logging in from home or other less-than-ideal connection.

Burnin valid values?

Hi,

What is the valid value for burnin when displaying the SpaceMix results if I want to exclude, for example, data for 0-10,000 MCMC interations out of total 100,000 iterations?

bypass SNP calling

Allow as input a sample covariance matrix estimated directly from e.g. low-coverage data.

Chains not mixing

When running my Spacemix Model, I don't think my chains are mixing correctly (no fuzzy caterpillar). In other packages, I have tried running the model longer, but that hasn't seem to improve anything. How can I go about optimizing the run.spacemix.analysis parameters to fix this? Below I have included my run code and various troubleshooting graphs

run.spacemix.analysis(n.fast.reps = 10,
fast.MCMC.ngen = 1e5,
fast.model.option = "source_and_target",
long.model.option = "source_and_target",
data.type = "sample.frequencies",
sample.frequencies = allele.frequencies,
mean.sample.sizes = mean.sample.sizes,
counts = NULL,
sample.sizes = NULL,
sample.covariance= NULL,
target.spatial.prior.scale=NULL,
source.spatial.prior.scale=NULL,
spatial.prior.X.coordinates = LonLat_sorted[,1],
spatial.prior.Y.coordinates = LonLat_sorted[,2],
round.earth = TRUE,
long.run.initial.parameters=NULL,
k = nrow(allele.frequencies),
loci = ncol(allele.frequencies),
ngen = 1e6,
printfreq = 1e2,
samplefreq = 1e3,
mixing.diagn.freq = 50,
savefreq = 1e5,
directory=NULL,
prefix = "G4t90_110")

image
image
image
image
image
image

Possible numeric instability?

Hi Gideon.

In a dataset I am currently analysing some of my short runs, and my long run seem to have some numeric stability issues:

748200  ----  1.907098e+13 
748300  ----  1.907098e+13 
748400  ----  1.907098e+13 
748500  ----  1.907098e+13 
Error in solve.default(par.cov) : 
  system is computationally singular: reciprocal condition number = 1.41589e-16

I had this problem with the both the target and source-target models, I have not tried the other models. I am working from a smallish number of SNPs (420) and eight populations. But, they were all found to be outliers --- I was hoping to compare the neutral and outlier geogenetic maps. So, I wonder if that is what is causing the issue.

Congratulations and good luck with your new position (saw it on Twitter).

Cheers.

Anders.

compute goodness of fit along the genome

Perhaps we should compute "residuals" for each SNP, globally and for each population, and use this to look for regions that "look different".

"Residual" could mean the same thing as in bayenv.

But, would this be any different than bayenv, really?

Error in solve.default(par.cov)?

Here is the truncated version out output... It seems like the fast_run 1 ran ok but the all subsequent runs failed. Any idea?

--------- OUTPUT BEGIN -------------
100000 ---- -38143.28
9369 loci out of the original 100000 left in curated dataset.
LnL: -300572.6
Pr(a0): -5.390294
Pr(a1): -0.06266915
Pr(a2): -0.6418539
Pr(nugget): -100.6171
Pr(admix_target_locations): -1101.546
Pr(admix_source_locations): -1206.868
Pr(admix_proportions): 335.3737
Prob: -302652.3
1000 ---- -269967.7
...
49000 ---- 1.507207e+16
fast run 2 failed
Error in solve.default(par.cov): system is computationally singular: reciprocal condition number = 1.72956e-16

....

Error in make.spacemix.map.list

Hi,

I am running spacemix on a large dataset of ~3,600,000 SNPs. The spacemix analysis runs fine and generates the long run output file. However, when I try to run the make.spacemix.map.list I receive the following error.

Error in rgb(x[1L, ], x[2L, ], x[3L, ], x[4L, ]) :
alpha level NA, not in [0,1]
Calls: make.spacemix.map.list -> fade.admixture.source.points -> adjustcolor -> rgb

I have run the script on a subset of 500 SNPs, and both the spacemix analysis and reading in the Robj file for plotting run without errors.

Do you know what could be causing the above error in the larger dataset?

Please see attached for the commands used.

Many thanks,

Megan
spacemixPlot.txt
spacemixAnalysis.txt

Please move the ".tar.gz" distribution file to a release.

Instead of leaving it in the git repository, since it increases the git clone size by a very large amount, you can just place the "distribution" in the releases section.
If you find that version controlling this file is also important, please consider creating a new repository for it. It really helps to keep thing tight and neat.

Thanks for considering!

"system computationally singular" crash

I've got no clue what caused this error. It came up rather late into the LongRun after successfully doing 10 fast runs.
The data set is 100K SNPs in something like 45 samples.

...
652800  ----  8.926476e+14 
652900  ----  8.926476e+14 
653000  ----  8.926476e+14 
653100  ----  8.926476e+14 
Error in solve.default(par.cov) : 
  system is computationally singular: reciprocal condition number = 9.67127e-17

The command was:

require(SpaceMix)

# load data
# defines ac (allele_counts), ss (sample_size), and loc (lng, lat) matricies
load('data.gzip')
# Need to transpose
ac <- t(ac)
ss <- t(ss)

run.spacemix.analysis(n.fast.reps = 10,
                        fast.MCMC.ngen = 1e5,
                        fast.model.option = "target",
                        long.model.option = "target",
                        data.type = "counts",
                        sample.frequencies=NULL,
                        mean.sample.sizes=NULL,
                        counts = ac, # our data
                        sample.sizes = ss, # our data
                        sample.covariance=NULL,
                        target.spatial.prior.scale=NULL,
                        source.spatial.prior.scale=NULL,
                        spatial.prior.X.coordinates = loc[,1], # our data
                        spatial.prior.Y.coordinates = loc[,2], # our data
                        round.earth = TRUE, # changed
                        long.run.initial.parameters=NULL,
                        k = nrow(ac), # our data
                        loci = ncol(ss), # our data
                        ngen = 1e6,
                        printfreq = 1e2,
                        samplefreq = 1e3,
                        mixing.diagn.freq = 50,
                        savefreq = 1e5,
                        directory=NULL,
                        prefix = outprefix)

data.gzip.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.