tskit-dev / pyslim Goto Github PK

Tools for dealing with tree sequences coming to and from SLiM.

License: MIT License

Python 69.97% Shell 1.36% Slim 27.59% Makefile 1.09%

pyslim's Introduction

pyslim

pyslim is a python module that provides a few extra tools for dealing with tree sequences produced by SLiM and for preparing other tree seuqences for use with SLiM. Most tree sequence functionality is provided by tskit. Please see our documentation for more information. (That link is to documentation for the last release release; instead, the latest documentation may have additional examples, but may also describe features you need to install from github to get.)

Installation

To install pyslim, do

pip install pyslim

or read the documentation for how to install from source.

pyslim's People

Contributors

Stargazers

Watchers

pyslim's Issues

simplify() does some funny things on SLiM-generated treeSeqs

In particular:

Any nodes that are unrelated to all other sample nodes over the course of the simulation have all their edges removed.
It doesn't seem like you can specify arbitrary nodes as "samples" to be retained. I haven't been able to make it work with anything other than nodes from the latest generation.

A notebook showing this behaviour is here.

EDIT: I've now updated the notebook so it works, but the original commit is here.

Uses struct classes to avoid recompiling

Decoding metadata will be a bit more efficient if we create a decoder class for each of the types at module load time, e.g.

individual_struct = struct.Struct("<iid") # or whatever

def decode_individual(buff):
     if len(buff) != individual_struct.size:
          raise ValueError(...)
     return individual_struct.unpack(buff)

https://docs.python.org/3.4/library/struct.html#classes

Document metadata

There's not currently a good place to see the slim-added metadata fields in the documentation. Add this, maybe here?

write VCF howto

explaining how to get genomes and info about a subset of individuals

samples() outputs extra nodes when applied to SLiM-generated treeSeqs

A notebook showing this behaviour is here.

Add "<" to struct definitions

I had a quick scan over the unpacking code, and it's looking good.

One minor issue though: the struct.pack/unpack format strings should explicitly state that the values are little-endian. The main reason here is to ensure that standard sizes for ints etc, otherwise they are defined by the compiler which can vary.

Changing to

struct.unpack("<ifii", buff)

states that they are little-endian and standard sizes. It's very unlikely that any of this will ever be used on a big-endian system so we might as well be explicit about requiring things be stored in little endian format.

If this seems overly prescriptive, then we can use the '=' prefix which says to use native byte ordering but standard sizes. We should definitely use one of '<' or '=' as the prefix though.

readthedocs not autoclassing

In the python api section, the additional members of the SlimTreeSequence are supposed to be listed, and are when I build locally. But they are not on readthedocs. Looking at the logs there were import errors that I made dissappear by this magic, which I thought was odd since tskit does not do this. But now there are no errors and yet the members are not there. Anyone have any idea what's going on? @jeromekelleher or @molpopgen maybe?

General framework

This package needs to deal with the extra information that SLiM stores in a tree sequence, reading it easily, and creating it in tree sequences lacking it. These are defined here and are:

Individuals:

SLiM pedigree ID
age
subpopulation ID

Nodes:

SLiM genome ID
is a null genome?
type (autosome, X, Y, ...)

Mutations:

mutation type ID
selection coefficient
subpopulation ID it occurred in
origin generation

Population:

a whole bunch of stuff

There's one (at least?) bit of additional tree-sequence-global information:

generation : the current generation of the simulation

Proposed method #1:

Define classes like SlimNodeMetadata, etcetera, to hold the information above. This would not be a tuple containing the information for a given node, it would have numpy vectors of the information for all nodes, so that SlimNodeMetadata.type would give you a vector of the same length as the node table it came from, with the type for every node. (These are basically new tables, but we won't call them that.) Define a SlimMetadata class to hold this information, as well as generation, the current generation of the simulation (obtained from provenance).
Extend the TreeSequence class to SlimTreeSequence to include a SlimMetadata object. The metadata object will be immutable, as are the metadata columns they are obtained from.
Provide a method that allows creating a new SlimTreeSequence with a different SlimMetadata object.

The above suffices, but to make some things easier, we might want to also:

Extend Individual, Node, Mutation, and Population to include the above attributes (e.g., creating SlimIndividual);
Extend methods SlimTreeSequence.individuals() and .individual() methods to return SlimIndividuals, and similarly for .nodes(), .mutations(), and .populations().

How's this look, roughly, @jeromekelleher, @bhaller?

write function to check for SLiM-validity

In SLiM internals we have the CrosscheckTreeSeqIntegrity() function; it'd be nice to check all that in pyslim so we know we're outputting something that can be loaded into SLiM.

add pyslim metadata to VCF output from msprime

Hi folks. For background, see this question on slim-discuss: https://groups.google.com/forum/#!topic/slim-discuss/etn7plcQRY8.

SLiM's VCF output contains a bunch of additional metadata: mutation IDs, mutation types, selection coefficients, dominance coefficients, etc. This is all documented in chapter 25 of the SLiM manual, particularly sections 25.2.3 and 25.2.4 (in the current version of the manual as I write this issue, anyway). This metadata is also available to pyslim, of course, and it would be great if it made it into VCF output from msprime following the conventions already defined by SLiM. This would let people who output VCF from Python have access to the metadata they need, for example to find mutations of a particular mutation type within the VCF output (as per the link above).

pyslim test run fails

I installed pyslim as mentioned in the readme document. However, when I ran the following command:

python -m nose tests

it gives me the following error:

I installed SLiM and msprime as suggested. The test runs for SLiM are working and for msprime too. However when I do a test run for pyslim, it fails every time. I activate my msprime environment and then run the pyslim command. Could you tell me what possibly could be causing this?

provide method for setting individuals and remembered nodes in annotation

We could add two arguments to annotate():

remembered_nodes, which is a list of node Ids, defaulting to an empty list, and
individuals, which is a list of pairs of nodes, defaulting to adjacent pairs of nodes among the samples

The first step would be to simplify() down to these nodes, to get the remembered_nodes first, and then annotation would proceed as before (but with some bookkeeping for the case that the two sets of nodes overlap).

mutating to complement SLiM mutations

Currently, the msprime.mutate() function will only do uniform mutations. But, suppose that
(a) we want mutations to fall at rate 1e-8, but (b) we want 1/10 of them to be nonneutral only on the first 1/4 of the chromosome. Then it would be nice to lay down only the nonneutral mutations in SLiM, at rate 0.1e-8, on the first 1/4 of the chromosome, and then add the remaining mutations with msprime, at rates 0.9e-8 on the first 1/4, and rate 1e-8 on the remainder. Currently we can't do this, but we'd like to: tskit-dev/msprime#710

Even after this is implemented, maybe we can make this easier for people somehow? At least, this should be worked through in an example in the documentation.

give example of matching individuals to something written to metadata

Say you want to output something about individuals within SLiM (say, a tag that is not stored in metadata, or a computed, random phenotype), and match it up to the individuals in the tree sequence. This could be stored in top-level metadata, as a vector (if for presently-alive individuals) or a dictionary (more generally, with pedigreeID as keys). We should have an example of this, as discussed in #198 .

write example of how to combine multiple outputs to one population

run two different simulations, output tree sequences
combine these into one tree sequence
load this into slim and do something else

document recapitation with nonuniform rates

We should provide an example of making the recombination map in recapitation match that used in SLiM.

a tree-sequence-validation feature inside pyslim would be useful

The issue MesserLab/SLiM#71 made me think of this. It would be cool if pyslim could perform a validation of a tree sequence, including both the tskit information and the SLiM metadata, to catch a wide variety of problems. SLiM's crosscheck can catch some problems, as we have seen with that issue, but I'm sure there are all kinds of problems that SLiM is not equipped to catch, and in any case it would be good to have a separate validation codebase that doesn't depend on SLiM. I'm thinking of things like:

inconsistencies in the references across tables
table entries like sites or mutations that are not referenced at all but have not been stripped out
as in the linked issue, SLiM metadata inconsistencies like derived states at different positions referring to the same mutation ID

I'm sure one could think of quite a few things to test, and who knows what bugs it might catch for us later; I'm a big believer in self-consistency checks, like SLiM's crosscheck. If it wasn't too slow, it could run automatically on the load of a .trees file; that's the best way to catch problems, is to make the check part of the standard code path when possible, of course.

more intuitive overlap between handling of genetic data and individual metadata

Hi Peter!

This is a non-urgent request for either new tools, more documentation, or a tutorial for getting genetic data to match up more intuitively to individual data in pyslim. Some areas of (my) confusion that cropped up in trying to run some nonWF simulations for diploids with tree sequence recording:

The individuals included in .genotype_matrix or output by .write_vcf are only the ones flagged as samples, while .individual_locations or .individual_times output data for everyone. So standard stuff like running a spatial sim and outputting a location and a genotype for each sampled individual requires some hoop-jumpery to output only the metadata for the sample-flagged individual set.
The order of the genetic data contained in .genotype_matrix() or output by .write_vcf is determined by the numbering of the nodes associated with the individuals flagged as samples, but the ordering of .individual_locations or .individual_times is determined by individual ID. This is maybe extra confusing, as the sample names associated with each sample in .write_vcf() aren't informative (ie don't match back to individual metadata).

Hope those are clear - happy to write more about what I was trying to do and where I got mired. And again - not urgent! Just wanted to add it to the wishlist...

-g

Provide method to output to A/C/G/T

We should be able to output code to e.g. VCF format, for which we'll need to turn alleles into something friendlier like ACGT.

This can be easy, though: this should be a one-way operation, not something that we want to be able to reverse.

include nucleotide support

We need to update the file version because of MesserLab/SLiM#36

Recapitating with non-uniform recombination rates

You cannot use msprime to recapitate a SLiM simulation using a recombination map that is not uniform. This is because msprime and SLiM both use discrete recombination maps, but SLiM is discrete in physical coordinates (base pairs), while msprime is discrete in genetic map units. You can run a simulation in SLiM using any recombination map you want, and then recapitate using a uniform map in msprime (this is the default). See the msprime documentation for more discussion of recapitation.

FYI, this feature would be useful to me!

loading takes a long time due to python

This stuff should be done more efficiently; it can take like a minute for 10,000 individuals.

move time-shifting over to SLiM instead of pyslim

If shifting of tskit-time was done at writing out from slim instead of inside pyslim, we could say that

each slim tree sequence records at which generation it was written out, and times in msprime are measured in generations before this point

instead of

because msprime works with time moving backwards, times in the tables that slim writes out are measured in generations before the start of the simulation, and are therefore all negative; since this is confusing, we shift these times to be relative to the current generation, so that in the tree sequence that is loaded using pylim, times are in units of generations before the end of the simulation, rather than the beginning

... which would obviously be much better.

Inconsistent mutations: state already equal to derived state

Hello!

as per part of this discussion:

https://groups.google.com/forum/#!topic/slim-discuss/etn7plcQRY8
relating to outputting a vcf file with the error Inconsistent mutations: state already equal to derived state

Simplified scripts attached! Let me know if you need anything else

Josie
debug_burn_in.slim.gz
debug_mutate_to_vcf.py.gz
debug_run_balancing.slim.gz

`individuals_alive_at( )` does not work with WF models

As discovered in #38, individuals_alive_at( ) doesn't work with WF models.

Here is a simple SLiM recipe:

initialize()
{
    setSeed(23);
    initializeTreeSeq();
    initializeMutationRate(1e-2);
    initializeMutationType("m1", 0.5, "f", -0.1);
    initializeGenomicElementType("g1", m1, 1.0);
    initializeGenomicElement(g1, 0, 99);
    initializeRecombinationRate(1e-2);
}

1 { 
    sim.addSubpop("p1", 10);
}

10 {
    sim.treeSeqOutput("remember.trees");
    catn("Done.");
    sim.simulationFinished();
}

but individuals_alive_at() does not show anyone alive at any time:

import pyslim
import numpy as np

ts = pyslim.load("remember.trees")

print(ts.num_individuals)
print([len(ts.individuals_alive_at(x)) for x in range(20)])

A look at the code reveals that the code uses individual_ages, which are all -1 in WF models. Whoops. I guess this wasn't being tested in a WF model. (damn bifurcating code paths)

One solution would be an if statement in the code, checking if it's a WF model. But I think it would make much more sense to actually set individual_ages all to 0 for a WF model, since that's actually what their ages are.

add multiple population recapitation example

... noting that if you don't have a migration rate, coal time will be infinite.

Issues with `mutate` on same geneology sampled at two time points

We are conducting some climate change simulations, and are having some issues with overlaying mutations on the same genealogy sampled at two different timepoints.

We run the simulation for a while in SliM, output the .trees file at the first timepoint (T1), and then read this back into SliM and run the simulation again under the climate change scenario to output the second timepoint (T2).

When we recaptitate both trees, we get the same tree topology, but when we try overlaying neutral mutations to both trees (with the same seed) we get completely different genomes with almost no overlap among the location of variants. (In traditional forward time simulations we see many shared variants in the two timepoints, so this result was not an artifact of the simulation.) For full details including links to the files needed to reproduce the problem see here:

https://github.com/TestTheTests/TTT_Offset_Vulnerability_GF_Sims/blob/master/Notebook/2019_05_03_Mutate_Recap_notes.md

For now, we just want to understand better how "mutate" works and why this problem is happening.

shift mutation metadata time

so that there is only one notion of time

provide method for getting the nodes that correspond to a set of individuals

Proposal: this should be like:

def individual_nodes(inds):
  out = np.repeat(-1, len(inds) * 2).reshape((len(inds), 2))
  for k, ind in enumerate(inds):
    out[k, :] = self.individual(ind).nodes

so returns an array with one row per individual that can be flattened to get just a flat list of the nodes corresponding to the genomes of the individuals whose IDs are inds.

provide method for assigning individuals to populations

This could be an argument of the form population, giving a vector of population IDs, or it could be a dictionary of individual : population IDs, or it could be a dictionary of population ID : list of individual IDs, or... ?

write reload from tree sequence HOWTO

also check if this is well-covered in the manual

Simplifying a recapitated .trees file produces weird (wrong?) output

Hi again,

I've noticed that when I run simplify() on recapitated TreeSequences in my simulation pipeline, the output doesn't look quite right.
I've made a short notebook that illustrates my problem.

Use tables.tree_sequence

In a few places we have

ts = msprime.TableCollection.tree_sequence(tables)

This should be

ts = tables.tree_sequence()

Use SlimTreeSequence to automatically decode metadata

We can use the SlimTreeSequence to automatically decode the metadata by overriding the (e.g.) invidual method:

class SlimTreeSequence(msprime.TreeSequence):

    def individual(self, id_):
         ind = super(SlimTreeSequence, self).individual(id_)
         ind.metadata = IndividualMetadata.decode(ind.metadata)
         return ind

This would be a major advantage of the subclass approach, and should be quite robust to changes.

provide subset argument to recapitate

Suppose that we are only interested in a subset of the samples. For efficiency, we might want to recapitate only the subset of the initial population from which that subset inherits. To do this, we only need to first mark those individuals we want to retain as samples, and then recaptiate. For instance:

ts = pyslim.load("recipe_nonWF.trees")
tt = ts.tables
tt.nodes.set_columns(flags=[0 if k not in [20,21] else 1 for k in range(tt.nodes.num_rows)], 
          population=tt.nodes.population, individual=tt.nodes.individual, time=tt.nodes.time,
          metadata=tt.nodes.metadata, metadata_offset=tt.nodes.metadata_offset)
new_ts = pyslim.TreeSequence(tt.tree_sequence())
nts = new_ts.recapitate(recombination_rate=1.0)

We could do this as an argument, say samples= to recapitate(). I believe this doesn't do the extra work.

pyslim could provide further help in decoding mutations

Pyslim can decode tree-seq mutations from SLiM to provide the metadata for each of the stacked mutations at a site, which are all lumped together into a single "derived state" and considered to be a single mutation by the tree sequence. This decoding is useful, but could go further to provide quite a bit of extra utility.

(1) A given mutation ID might exist in several different "derived states", stacked in different ways with other mutations. When you flip through mutations with ts.mutations(), you may therefore see the same mutation ID referenced multiple times. This makes it difficult to tally up information about mutations – to assess the mean selection coefficient, say – because you first have to do a merge and unique on mutation ID. It would be great if pyslim provided a method like ts.slim_mutations() that would return a list of uniqued mutation records, one per SLiM mutation, just as one would get from sim.mutations in Eidos. One could then loop through that, each element being the metadata for that mutation. (The metadata is given again each time a mutation is listed in a derived state, by the way, and it does not have to be the same in each case, since it is just a snapshot of the state of the mutation at the moment that derived state was constructed; for the purposes proposed here, the most recent metadata for a given mutation ID ought to be provided.)

(2) If this facility were implemented, a really nice extension of it would by for pyslim to provide a frequency count for each mutation, across all extant genomes. Analysis will often want to know about mutation frequency, and figuring that out from the tree sequence in Python would be rather an adventure – requiring use of a vargen_t and then decoding derived states to extract mutation IDs and tallying up the counts for each uniqued mutation.

(3) If (2) is implemented, then one could then provide ts.slim_fixed_mutations() and ts.slim_segregating_mutations() calls, if one wished. :->

Add heterozygosity method

Observed heterozygosity is a property of the individual, so we should have a method to compute it.

setup RTD

We've got good docs in the code, which people don't always read.

could overlaying neutral mutations via treeseq overwrite simulated mutations in SliM?

I'm running replicate simulations, and in a subset of my simulations a few of the loci that I simulated in SliM do not have the expected allele frequencies and genotypes after overlaying neutral mutations with pyslim. In other words, the allele frequency at a position outputted from SliM does not match the allele frequency at that position after recapitation and overlaying neutral mutations. While this happens rarely, it appears to me that the process of recapitation and overlaying neutral mutations can in some cases overwrite the selected mutations output from SliM. Could this be happening or could there be something I'm missing?

write method to emit SLiM reload script

One can reload a tree sequence into SLiM, but only if one knows some information like length of the genome, etcetera (need to check what, exactly?). It would be handy if pyslim could produce a very simple SLiM script that would, basically, just reload the simulation, so that it could be inspected in the console, and things could be printed, from SLiM.

Document workflow for adding/annotating a single mutation to msprime simulation

This isn't an issue per se but a suggestion for a new feature (@petrelharp suggested I post it here); TLDR I think that for simulating selection from a standing variant it could be useful to have the option of throwing individual mutations onto the treeseq. Does such a method exist? I couldn't find it in the docs.

There’s a pretty straightforward way to simulate SSVs in Slim (the recipe is actually in the manual under a heading something like “Soft sweeps from a randomly selected locus”); you simulate neutrally with mutation until the start of selection, and then you randomly pick a segsite to be the selected allele from that time onward.

But I’m not totally sure how to implement something analogous with un-mutated, burn-in treeseqs. One naive way is to sample the allele frequency p at the start of selection from the SFS and randomly assign haplotypes the derived/ancestral allele w.p. p, 1-p, but this doesn’t model LD of the standing variant. I do have one idea for a recipe, but it seems like somewhat a pain to implement:

—In pyslim/msprime: simulate treeseqs corresponding to start of burnin up to just before the start of selection
—In pyslim/msprime: condition on the event that a particular locus (e.g. the basepair at the halfway point of the locus) is segregating at the start of selection; get the local tree at that locus. Randomly pick a branch w.p. proportional to the branch length. The allele frequency p is the number of leaves/haplotypes that subtend this branch.
—In slim: You assign the derived type (along with the desired selection coefficient) to the subtending haplotypes

but I think this could be made easier is if there were simply a ts method to throw a mutation “by hand” onto a particular local tree; the pyslim docs hint at this but I’m not really sure how I would do this in practice. Any advice? How would you go about setting this up?

Importing mutations from msprime

At the moment, annotate_defaults() does not convert the continuous-valued positions of mutations in .trees file into integer values, as is required by SLiM. (When the resulting SlimTreeSequence is loaded into SLiM, an error results.) This means that only .trees files without mutations can be fed into SLiM.

There's already some discussion of this here and here.

[docs] recapitation description ungrammatical

As reported here by @winni2k:

To allow this process, the first generation of the SLiM simulation has been recorded in the tree sequence, but are not currently marked as samples, so this process (or, simplify()) will remove any of these that are not needed. If you want to keep them, then set keep_first_generation to True; although this will make more work here.

There's a grammar mistake in the first sentence, but I don't understand the sentence enough to fix it.

VCF metadata question

Hi Peter, a question regarding pyslim's VCF output:
When you output VCF files from SLiM it has that great feature to include the mutation type (MT), as well as the mutation ID value, in the VCF output. However when we recap/mutate and then output a VCF with pyslim all that information is lost. If that metadata is still there after recap/mutate, is there a possibility that information could be included in the VCF of the pyslim write_vcf output?

Make pyslim conda-installable.

I gather that's a better way to do things. I've set up a PyPI account, but not conda-forge.

_msprime LibraryError: Bad population id provided

This is with a version of simple.trees that I produced so that could contribute but getting the error below with a conda install msprime version 0.6.0:

 $ python test_reading.py                            
Traceback (most recent call last):                   
  File "test_reading.py", line 4, in <module>        
    ts = pyslim.load("simple.trees", slim_format=True)                                                    
  File "/home/jaime/lib/pyslim/pyslim/slim_tree_sequence.py", line 29, in load                            
    ts = SlimTreeSequence.load(path)                 
  File "/home/jaime/lib/pyslim/pyslim/slim_tree_sequence.py", line 129, in load                           
    ts = msprime.load(path)                          
  File "/home/jaime/miniconda3/envs/pyslim/lib/python3.6/site-packages/msprime/trees.py", line 1161, in load                                                                                                        
    return TreeSequence.load(path)                   
  File "/home/jaime/miniconda3/envs/pyslim/lib/python3.6/site-packages/msprime/trees.py", line 1588, in load                                                                                                        
    ts.load(path)                                    
_msprime.LibraryError: Bad population id provided.   
(pyslim) jaime@shuksan:~/lib/pyslim/tests$ (master)  
 $ which python                                      
/home/jaime/miniconda3/envs/pyslim/bin/python        
(pyslim) jaime@shuksan:~/lib/pyslim/tests$ (master)  
 $ conda list | grep msprime
msprime                   0.6.0            py36hcb787e7_0    conda-forge

SLiM ancestral state written as empty string to VCF file

I used the workflow below to recapitate and produce a vcf. In this example there is a single sweep mutation at position 500,000, all other mutations are neutral and generated by msprime. Following these steps the vcf produced has a bug at position 500000 where there is a blank in the Ref column and a 0 in the alt column, where it should be 0 in Ref and 1 in Alt. Here is a sample, please let me know if I can provide more detail to help diagnose the bug:

1 499480 . 0 1 . PASS . GT 0|0 0|0 0|0 0|0 0|0
1 499786 . 0 1 . PASS . GT 0|0 0|0 0|0 0|0 0|0
1 500000 . 0 . PASS . GT 1|1 1|1 1|1 1|1 1|1
1 500492 . 0 1 . PASS . GT 0|0 0|0 0|0 0|0 0|0
1 500606 . 0 1 . PASS . GT 0|0 0|0 0|0 0|0 0|0

import pyslim, msprime
import numpy as np
slim_ts = pyslim.load("decap.trees")
recap_ts = slim_ts.recapitate(recombination_rate=1e-8, Ne=7310)
ts = pyslim.SlimTreeSequence(msprime.mutate(recap_ts, rate=1e-8,keep=True))
alive = ts.individuals_alive_at(0)
p1 = [i for i in alive if ts.individual(i).population == 1]
p1_sample = np.random.choice(p1, size=200, replace=False)
outvcf = open("test_out.vcf", "w")
ts.write_vcf(outvcf, individuals=p1_sample)

Provide spatial tools

It could be nicer to do spatial things, especially with Individuals. I've sandboxed out some things that'd make this easier over here:

Here's some ideas:

add a .dim attribute, defaulting to 3
an individual_locations() that returns the array of spatial coordinates (with dim columns)
distance_to_point(xy) : distance of all individuals to a point.
individuals_in_circle(center, radius)

and maybe more? Should these return boolean arrays? lists of individual IDs? iterators over individual objects?

Provide additional tools for individuals and/or "full pedigrees"

One particular use case is when SLiM calls RememberIndividuals on everyone in every generation, so that everyone alive for some chunk of time is present in the tree sequence. These are therefore "full pedigrees", and might be woth a separate python class. I've written some methods for this situation, over here.

Here's a quick run-down:

individuals_alive(time): tells you which individuals are alive at the time. This information can be obtained because the time recorded in the individual's nodes is her birth time, while their age at time of last-Remembering is recorded in metadata.
individuals_age(time): returns the ages of all individuals at the given time.
various ancestry-related methods like individual_parents_dict(), that should just be individual-focused versions of whatever we decide to do for node-based ancestry tools.

Note that these only make sense for full pedigrees because of the assumption that their age was recorded during the last time step they were alive.

Document recapitation with generation_time for a nonWF model

For recapitation, we often probably want to match what happens before the transition to SLiM to what happens after. Here "what happens" means the rates per unit time of coalescence, mutation, and recombination. In SLiM, the rate of mutation is (mutation rate / generation time); same for recombination; and the rate of coalescence is (1 / (2 * Ne * generation time)); all in rates per unit of clock time. Therefore, when we recapitate, if our initial SLiM generation had N individuals in it, it would be natural to recaptiate with Ne = N * gentime, recombination_rate = r / gentime, and mutate with mu / gentime, where r and mu are the per-meiosis recombination and mutation rates from SLiM.

To make this more transparent, it'd be nice to add a generation_time argument to recapitate that scales all these things appropriately.

tskit-dev / pyslim Goto Github PK

pyslim's Introduction

pyslim

Installation

pyslim's People

Contributors

Stargazers

Watchers

Forkers

pyslim's Issues

Recommend Projects

Recommend Topics

Recommend Org