Why and how to use the Ease package?

Ehouarn Le Faou

2022-10-19

Ease aims to implement in a simple and efficient way in R the possibility to perform population genetics simulations considering multiple loci whose epistasis is fully customizable. Specifically suited to the modelling of multilocus nucleocytoplasmic systems, it is nevertheless possible to simulate purely nucleic, i.e. diploid (or purely cytoplasmic, i.e. haploid) genetic models. The simulations are not individual-centred in that the transition from one generation to the next is done matrix-wise on the basis of deterministic equations. Instead of each individual being described separately, the simulations only handle the genotype frequencies within the population. All possible genotype frequencies considering the loci and alleles defined by the user are explicitly tracked. The simulations are therefore fast only if the number of loci and alleles is not too large.

The consideration of genetic drift and a specific population size is nevertheless introduced as a multinomial draw each generation, which adds to the realism of the simulations by adding randomisation. One can also simulate a simple demography, i.e. an initial population size, a growth rate and a carrying capacity. The size of the population is thus drawn each generation in a Poisson distribution. Varying the carrying capacity is not (yet) included in Ease, but it is possible to run different models in succession, giving the next one as the initial genotype frequencies the frequencies of the previous one.

The population considered can be structured and connected by migration. Migration is only possible at the individual stage and not at the gamete stage, which future updates may add. It is also not possible to define different migration rates between different genotypes.

The consideration of genetic drift and thus a specific population size is nevertheless introduced as a multinomial draw each generation, which adds to the realism of the simulations by adding randomisation. In the Ease package, the life cycle of the simulated population is standard ([migration] - [selection on gamete production] - [gametogenesis (recombination + meiosis + mutation)] - [selection on gametes] - [syngamy] - [selection on individuals] - [demography] - [drift]) and may consider the population dioecious or hermaphroditic.

Genome

Definition

A genome is defined by the set of loci to which lists of alleles are attached. Each loci and each allele is defined by a unique name, which allows it to be unequivocally identified.

There are two types of loci: diploid and haploid. A genotype is defined as an allelic combination of all the alleles of an individual’s loci and a haplotype as only those alleles that have been inherited together from a single parent. A genotype is therefore made up of two haplotypes. A distinction is also made between diploid (resp. haploid) haplotypes which correspond to allelic combinations taking into account only diploid (resp. haploid) loci. The loci are defined by a list of vectors that enumerates their respective alleles. The order in which the loci are placed is not important in the case of haploid loci. It does matter in the case of diploid loci because recombination is likely to affect the haplotypes. In the Ease package, diploid loci are

In the case of diploid loci, however, if several are defined, the order of the diploid loci in the list is not trivial. The rates of two-to-one combinations between them must indeed be defined by a vector of recombination rates. For example, if three diploid loci are defined, this vector must be of length 2, the first of its values defining the recombination rate between the first and second loci, the second of its values the recombination rate between the second and third loci. For example, if we want to define two groups of two loci that are linked to each other but are on two different chromosomes, we can define the recombination rate vector as c(0.1, 0.5, 0.1). The first two loci are thus relatively linked (recombination rate of 0.1), as are the last two loci. On the other hand, the recombination rate of 0.5 between the second and third loci ensures that the two groups are independent.

To create a haplotype ID, we concatenate all diploid alleles and all haploid alleles separately, then concatenate these two strings by separating them with "||". For example "Ab||CD" corresponds to a haplotype with four loci, two diploid with alleles A and b, and two haploid with alleles C and D. The principle is the same for the genotypes, but the second diploid haplotype is added by separating it from the first by a "/", for example "Ab/ab||CD".

Construction

Each loci is represented by a name and a factor vector that lists its alleles. If one wish to consider a system with two loci, a diploid and a haploid, each of which has two alleles, A and a, and B and b respectively, the construction of the genome is done as follows:

DL <- list(dl = c("A", "a"))
HL <- list(hl = c("B", "b"))
genomeObj <- setGenome(listHapLoci = HL, listDipLoci = DL)

The haplotypes and genotypes have been generated automatically, their numbers can be retrieved by simply displaying the Genome object created:

genomeObj
#> -=-=-=-=-=-= GENOME OBJECT =-=-=-=-=-=- 
#>  #  1 haploid locus, with 2 allele(s)
#>  #  1 diploid locus, with 2 allele(s)
#>  #  4 haplotypes 
#>  #  6 genotypes 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print for a list of haplotypes and genotypes)

and an exhaustive list can be displayed using the print method:

print(genomeObj)
#> -=-=-=-=-=-= GENOME OBJECT =-=-=-=-=-=- 
#>               in details 
#> 
#>  #   1  haploid loci: 
#>       - 'hl' with B and b alleles 
#> 
#>  #   1  diploid loci: 
#>       - 'dl' with A and a alleles 
#> 
#>  #   4  alleles: 
#> [1]    1) B    2) b    3) A    4) a
#> 
#>  #   4  haplotypes: 
#> [1]    1) A||B    2) a||B    3) A||b    4) a||b
#> 
#>  #   6  genotypes: 
#> [1]    1) A/A||B    2) A/a||B    3) a/a||B    4) A/A||b    5) A/a||b
#> [6]    6) a/a||b
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The haplotypes and genotypes are numbered, and these numberings will be important in defining the different types of fitness, as we shall now see.

Mutation matrix

Definition

A genome necessarily has a mutation matrix attached to it. This mutation matrix is haplotypic: it is a square probability matrix (the sum of the rows of which is equal to 1), of size equal to the number of haplotypes defined in the genome. This mutation matrix is not provided as is by the user, in which case it would be too tedious to define. Instead the user is asked to either :

NOTE: In practice, the mutation matrix is not used as such in the simulations. It is associated with the recombination matrix and the meiosis matrix which associates to each genotype the probability that they produce each haplotype by chromosomal segregation. It is with a matrix product Recombination matrix x Meiosis matrix x Mutation matrix that a single gametogenesis matrix is produced and used for the simulations.

Construction

Definition of the haplotypic mutation matrix by filling in the allelic mutation matrices :

mutMatrixObj <- setMutationMatrix(
  genomeObj = genomeObj,
  mutHapLoci = list(matrix(c(
    0.95, 0.05,
    0.03, 0.97
  ), 2, byrow = TRUE)),
  mutDipLoci = list(matrix(c(
    0.9, 0.1,
    0.09, 0.91
  ), 2, byrow = TRUE))
)
mutMatrixObj
#> -=-=-=- MUTATION MATRIX OBJECT -=-=-=- 
#>  #  Haplotypic mutation matrix: 
#>        A||B   a||B   A||b   a||b
#> A||B 0.8550 0.0950 0.0450 0.0050
#> a||B 0.0855 0.8645 0.0045 0.0455
#> A||b 0.0270 0.0030 0.8730 0.0970
#> a||b 0.0027 0.0273 0.0873 0.8827
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print to access the allelic mutation matrices)

Definition of the haplotypic mutation matrix by filling in the forward and backward mutation rates:

mutMatrixObj <- setMutationMatrix(genomeObj = genomeObj, forwardMut = 0.01)
mutMatrixObj
#> -=-=-=- MUTATION MATRIX OBJECT -=-=-=- 
#>  #  Haplotypic mutation matrix: 
#>        A||B   a||B   A||b  a||b
#> A||B 0.9801 0.0099 0.0099 1e-04
#> a||B 0.0000 0.9900 0.0000 1e-02
#> A||b 0.0000 0.0000 0.9900 1e-02
#> a||b 0.0000 0.0000 0.0000 1e+00
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print to access the allelic mutation matrices)

Definition of the haplotypic mutation matrix by making a mutation list:

mutMatrixObj <- setMutationMatrix(
  genomeObj = genomeObj,
  mutations = list(
    mutation(from = "A", to = "a", rate = 0.01),
    mutation(from = "B", to = "b", rate = 0.01)
  )
)
mutMatrixObj
#> -=-=-=- MUTATION MATRIX OBJECT -=-=-=- 
#>  #  Haplotypic mutation matrix: 
#>        A||B   a||B   A||b  a||b
#> A||B 0.9801 0.0099 0.0099 1e-04
#> a||B 0.0000 0.9900 0.0000 1e-02
#> A||b 0.0000 0.0000 0.9900 1e-02
#> a||b 0.0000 0.0000 0.0000 1e+00
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print to access the allelic mutation matrices)

Selection

Definition

Selection can be defined at three stages of this cycle: on mature individuals directly, on their gamete production or on the gametes. For individuals and gamete production, a fitness value is associated with each genotype. For gametes, it is with each haplotype. A fitness value is any positive or zero real. Fitness values are relative, so if all genotypes have a fitness value of 3, there will be no effect on the dynamics of the model. By default, as is customary, the fitness of individuals is set to 1, and the additional fitness effects are added multiplicatively.

Construction

In all cases, the construction of a Selection object is done using a genome class object (which is used to check the compatibility between the constructed genome and the desired selection parameters). There are two ways of defining selection : * give a vector of length equal to the number of genotypes. By defining the fitness vector as such, it is therefore necessary to know the order of haplotypes and genotypes (use the method print on an object of class Genome). This way is often impractical and the second way is often preferred. * a simpler and more readable way to define the selection is to create a list of selection formulas, as they will be called here. The principle is explained below.

Selection formulas

A selection formula is a formula (in the sense of R) that associates allelic combinations with a fitness effect. The fitness effect is the left-hand member of the formula, while the right-hand member corresponds to the allelic combinations.

The fitness effect is any literal or numerical expression. If it is literal, it will be evaluated at the time of the definition of the object selection

The definition of allelic combinations is based on several rules. First, each allelic combination is described using the keywords : and h(). The first one, :, is used to separate the alleles from each other in the enumeration of allele combinations and h() is used to specify whether the allele must be in the homozygous state (if this keyword is absent, the allele is in the heterozygous state). For example, in a model with two diploid loci and two alleles each, respectively A and a, B and b, we note h(a):b to mean that the fitness effect must apply to any genotype that has the a allele in the homozygous state and the b allele in the heterozygous state. Other examples are: a:b (a heterozygous and b heterozygous), h(A):h(B) (A and B homozygous). We can therefore construct the formula: 1 + s ~ a:b, which means that the presence of a and b in the heterozygous state in a genotype has an effect (1+s) on the fitness of that genotype. To add other allelic combinations to which the same fitness effect will apply, the operator + must be used. Using the previous example, 1 + s ~ a:b + h(a) means that the fitness effect should also apply to genotypes that are homozygous for a.

To define a selection regime, one has to give as input a list of selection formula that describe how, depending on their allelic combinations, genotypes suffer or benefit from fitness effects, for example of the form list(1 + s ~ h(a):h(b), 1 + h*s ~ a:b + a:h(b) + h(a):b).

An important thing to note is that

list(1 + s ~ a + b)

is not equivalent to

list(1 + s ~ a, 1 + s ~ b)

In the first situation, if a genotype has a OR b in the heterozygous state, the fitness effect will apply. In the second situation the effect will apply twice (multiplicatively) to the same genotype because it fits two different formulas.

NOTE: This way of defining fitness does not allow to distinguish between cis and trans effects on fitness.

Neutral selection

Then it is for example possible to define no selection (neutral model) with the function setSelectNeutral to construct a selection object where the fitnesses are all identical (equal to 1):

selectionObj <- setSelectNeutral(genomeObj = genomeObj)

We can then check that no selection has been defined:

selectionObj
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>  #  On individuals:  NO 
#>  #  On gametes:  NO 
#>  #  On gamete production:  NO 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print to access the fitness values)

or with :

print(selectionObj)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>               in details 
#> 
#> No selection defined. 
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Non-neutral selection

Using the example given in the Genome section, one might want to simulate a system of genetic incompatibility where when the derived alleles a and b are put together within the same genotype, they induce a fitness cost through negative epistasis. This cost, which we will call s, is associated with h dominance which reduces this cost when the a nuclear allele is in the heterozygous state. Thus individuals A/A||B, A/a||B, a/a||B and A/A||b do not suffer any fitness cost (because they have only one of the two incompatible alleles), their fitness is equal to 1. The genotype A/a||b undergoing the reduced cost of incompatibility has a fitness of 1-h*s and the genotype a/a||b undergoing the full cost of incompatibility has a fitness of 1 - s. To define selection it is then sufficient to specify that if only one allele a is present with allele b (a:b), the cost is only 1 - h*s, and if allele a is homozygous with allele b, the cost is 1 - s.

indFit <- list(
  1 - h * s ~ a:b,
  1 - s ~ h(a):b
)
s <- 0.8
h <- 0.5
selectionObj <- setSelectOnInds(
  genomeObj = genomeObj,
  indFit = indFit
)

We can then check that selection has been defined:

selectionObj
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>  #  On individuals:  YES 
#>  #  On gametes:  NO 
#>  #  On gamete production:  NO 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print to access the fitness values)

Regarding selection on individuals, it is necessary to understand that it will potentially not be identical if the modelled population is hermaphroditic or dioecious. In the case of hermaphroditism there is no distinction between female and male fitness, and so the indFit parameter will govern their fitness. If the sexes are separated, however, one can either define a fitness in individuals indFit that will apply to both males and females, or specify separately for males and females with the parameters femaleFit and maleFit.

In any case it is good to check with the print method that the fitnesses are those wanted:

print(selectionObj)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>               in details 
#> 
#>        Individuals Female Male
#> A/A||B         1.0    1.0  1.0
#> A/a||B         1.0    1.0  1.0
#> a/a||B         1.0    1.0  1.0
#> A/A||b         1.0    1.0  1.0
#> A/a||b         0.6    0.6  0.6
#> a/a||b         0.2    0.2  0.2
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Selection can also be defined on gamete production:

indProdFit <- list(
  1 - h * s ~ a:b,
  1 - s ~ h(a):b
)
s <- 0.8
h <- 0.5
selectionObj <- setSelectOnGametesProd(
  genomeObj = genomeObj,
  indProdFit = indProdFit
)

or on the gametes directly:

gamFit <- list(
  1 - s ~ a:b
)
s <- 0.8
h <- 0.5
selectionObj <- setSelectOnGametes(
  genomeObj = genomeObj,
  gamFit = gamFit
)

For the latter two selection modes, it is possible to specify the fitness effect according to sex or to define the same effect regardless of sex, as desired.

Last but not least, it is obviously possible to combine these different layers of selections. This is done using the selectionObj parameter that each of the setSelect... functions has (except setSelectNeutral), it is then unnecessary to recall the genome to which the selection refers. For example, if we want to combine the three types of selections presented here :

indFit <- list(
  1 - h * s ~ a:b,
  1 - s ~ h(a):b
)
indProdFit <- list(
  1 - h * s ~ a:b,
  1 - s ~ h(a):b
)
gamFit <- list(
  1 - s ~ a:b
)
s <- 0.8
h <- 0.5
selectionObj <- setSelectOnInds(genomeObj = genomeObj, indFit = indFit)
selectionObj <- setSelectOnGametesProd(indProdFit = indProdFit, selectionObj = selectionObj)
selectionObj <- setSelectOnGametes(femaleFit = gamFit, selectionObj = selectionObj)
print(selectionObj)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>               in details 
#> 
#>        Individuals Female Male
#> A/A||B         1.0    1.0  1.0
#> A/a||B         1.0    1.0  1.0
#> a/a||B         1.0    1.0  1.0
#> A/A||b         1.0    1.0  1.0
#> A/a||b         0.6    0.6  0.6
#> a/a||b         0.2    0.2  0.2
#> 
#>  #  On gametes:  
#>      Female gamete Male gamete
#> A||B           1.0           1
#> a||B           1.0           1
#> A||b           1.0           1
#> a||b           0.2           1
#> 
#>  #  On gamete production:  
#>        Female gamete Male gamete
#> A/A||B           1.0         1.0
#> A/a||B           1.0         1.0
#> a/a||B           1.0         1.0
#> A/A||b           1.0         1.0
#> A/a||b           0.6         0.6
#> a/a||b           0.2         0.2
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Population

Definition

Once the previous steps have been completed, i.e. definition of the genome, mutation and selection, the population can be constructed. A population is defined strictly by a name, a size, a sexual system (dioecy or hermaphodite), and the three objects defined previously: genome, mutation matrix and selection. In addition to that, it is possible to define

Two demographic regimes are possible: no demography, i.e. a fixed population size, or demography, i.e. a population where the size fluctuates stochastically. The boolean argument demography is used to define whether there should be stochasticity. For a fixed population size, it is therefore sufficient to define that demography = FALSE (default) and to set the desired population size with the popSize parameter.

For a fluctuating demography, demography must be TRUE and three other parameters are then needed: the initial population size (initPopSize), the population growth rate (growthRate) and the carrying capacity of the population (the population size, popSize).

It is also possible to avoid defining a population size altogether, by setting off the genetic drift (drift parameter). This will allow the model to be simulated deterministically.

Construction

Definition of a population in its simplest form:

DL <- list(dl = c("A", "a"))
HL <- list(hl = c("B", "b"))
mutations <- list(
  mutation(from = "A", to = "a", rate = 1e-3),
  mutation(from = "B", to = "b", rate = 1e-3)
)
genomeObj <- setGenome(listHapLoci = HL, listDipLoci = DL)
pop <- setPopulation(
  name = "A",
  size = 1000,
  dioecy = TRUE,
  genomeObj = genomeObj,
  selectionObj = setSelectNeutral(genomeObj),
  mutMatrixObj = setMutationMatrix(genomeObj, mutations = mutations)
)
pop
#> -=-=-=-=-= Population OBJECT =-=-=-=-=- 
#> Population 'A' of 1000 dioecious individuals
#> There is no demography. 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print for details on the genome)

More information on the population can be accessed using the print method;

print(pop)
#> -=-=-=-=-= Population OBJECT =-=-=-=-=- 
#>               in details 
#> Population 'A' of 1000 dioecious individuals
#> There is no demography. 
#> The initial genotypes frequency are:  
#> A/A||B A/a||B a/a||B A/A||b A/a||b a/a||b 
#>      1      0      0      0      0      0 
#> Selection:  
#> No selection defined. 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

To simulate it is then necessary to go through the metapopulation class, even if only one population is simulated. More information below.

Metapopulation

Definition

A metapopulation is a set of population(s) (from 1) that are simulated with potential migration between them. Only genotypes can migrate, i.e. adult individuals.

Construction

The construction of a Metapopulation object requires only two arguments (one optional). The first is a population(s) list, defined from the population class. The second is a migration matrix, which connects the populations together. This matrix is a probability matrix (square with the sum of the rows equal to 1, whose size is equal to the number of populations) where each value corresponds to the proportion of individuals (genotypes) that disperse from their source population (row) to their target population (column).

Simulate

The nameOutFunct (optionnal) argument is the name of a function that allow to produce a custom output for a simulation. It is called each generation in each population of a simulation and systematically returns a list with the first element being a logical that indicates whether something should be saved. If so, the second element of this list will be saved (no need to add a second element the first is FALSE). The custom output function must take only one argument, pop. This argument is a list of :

An example of such a function could be :

customOutFunct <- function(pop) {
  if (pop$freqAlleles[4] < 0.1 | pop$freqAlleles[4] > 0.9) {
    return(list(TRUE, a = list(gen = pop$gen, freq = pop$freqAlleles[4])))
  }
  return(list(FALSE))
}

Here the function returns orders to save the frequency of the fourth allele of the model if its frequency is between 0 and 0.1 or between 0.9 and 1. After defining it in this way, one will have to give "customOutFunct" as the value of the nameOutFunct parameter in the simulate method of the class Metapopulation for it to be taken into account.

Example of building a metapopulation and generating results

For example, consider a model with a single locus with two alleles, A and a, where the a allele has a small positive fitness effect.

The genome:

DL <- list(dl = c("A", "a"))
genomeObj <- setGenome(listDipLoci = DL)
#> Warning in .local(.Object, ...): No haploid locus has been set. By construction
#> it is necessary that there is at least 1 haploid locus, so it has been defined
#> with only one allele (this will not affect the simulations).
print(genomeObj)
#> -=-=-=-=-=-= GENOME OBJECT =-=-=-=-=-=- 
#>               in details 
#> 
#>  #   1  haploid loci: 
#>       - 'NoneHL' with NoneH alleles 
#> 
#>  #   1  diploid loci: 
#>       - 'dl' with A and a alleles 
#> 
#>  #   3  alleles: 
#> [1]    1) NoneH    2) A        3) a    
#> 
#>  #   2  haplotypes: 
#> [1]    1) A||NoneH    2) a||NoneH
#> 
#>  #   3  genotypes: 
#> [1]    1) A/A||NoneH    2) A/a||NoneH    3) a/a||NoneH
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

This warning is normal when working only with diploid (or only haploid) loci.

The mutation matrix:

mutMatrixObj <- setMutationMatrix(
  genomeObj = genomeObj,
  forwardMut = 1e-3,
  backwardMut = 1e-3
)
print(mutMatrixObj)
#> -=-=-=- MUTATION MATRIX OBJECT -=-=-=- 
#>               in details 
#> 
#>  #   1  haploid locus allelic matrix: 
#> 
#> NoneHL : 
#>       NoneH
#> NoneH     1
#> 
#>  #   1  diploid locus allelic matrix: 
#> 
#> dl : 
#>       A     a
#> A 0.999 0.001
#> a 0.001 0.999
#> 
#>  #  Haplotypic mutation matrix: 
#>          A||NoneH a||NoneH
#> A||NoneH    0.999    0.001
#> a||NoneH    0.001    0.999
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The object of selection is then created which describes that allele A is beneficial in population 1 while allele a is beneficial in population 2. This implements a selection differential between the two populations, a form of local selection.

# Selection in population 1
indFit <- list(
  1 + h * s ~ a,
  1 + s ~ h(a)
)
h <- 0.5
s <- 0.05
selectionObj1 <- setSelectOnInds(
  genomeObj = genomeObj,
  indFit = indFit
)
print(selectionObj1)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>               in details 
#> 
#>            Individuals Female  Male
#> A/A||NoneH       1.000  1.000 1.000
#> A/a||NoneH       1.025  1.025 1.025
#> a/a||NoneH       1.050  1.050 1.050
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

# Selection in population 2
indFit <- list(
  1 + h * s ~ A,
  1 + s ~ h(A)
)
h <- 0.5
s <- 0.05
selectionObj2 <- setSelectOnInds(
  genomeObj = genomeObj,
  indFit = indFit
)
print(selectionObj2)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=- 
#>               in details 
#> 
#>            Individuals Female  Male
#> A/A||NoneH       1.050  1.050 1.050
#> A/a||NoneH       1.025  1.025 1.025
#> a/a||NoneH       1.000  1.000 1.000
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

We can now define a migration rate of 0.005 from each of the two populations to the other.

migMat <- matrix(c(
  0.995, 0.005,
  0.005, 0.995
), 2, 2)

And finally create the Metapopulation object

metapop <- setMetapopulation(
  populations = list(
    setPopulation(
      name = "pop1", size = 500, dioecy = F, genomeObj = genomeObj,
      selectionObj = selectionObj1, mutMatrixObj = mutMatrixObj, selfRate = 0.5
    ),
    setPopulation(
      name = "pop2", size = 500, dioecy = F, genomeObj = genomeObj,
      selectionObj = selectionObj2, mutMatrixObj = mutMatrixObj, selfRate = 0.5
    )
  ),
  migMat = migMat
)
metapop
#> -=-=-=-= Metapopulation OBJECT =-=-=-=- 
#> 2 populations of size 500 and 500, 
#> forming a metapopulation of 1000 individuals. 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 
#> (use print to get population details)

The print method allows access to all the characteristics of an object of class Metapopulation

print(metapop)
#> -=-=-=-= Metapopulation OBJECT =-=-=-=- 
#>              in details 
#> 
#>  #  Populations:  
#> 
#> Population 'pop1' of 500 hermaphroditic individuals
#>    with a 50% selfing rate. 
#> There is no demography. 
#> The initial genotypes frequency are:  
#> A/A||NoneH A/a||NoneH a/a||NoneH 
#>          1          0          0 
#> Selection:  
#>    - On individuals:  
#>            Fitness
#> A/A||NoneH   1.000
#> A/a||NoneH   1.025
#> a/a||NoneH   1.050
#> ~-~-~ 
#> Population 'pop2' of 500 hermaphroditic individuals
#>    with a 50% selfing rate. 
#> There is no demography. 
#> The initial genotypes frequency are:  
#> A/A||NoneH A/a||NoneH a/a||NoneH 
#>          1          0          0 
#> Selection:  
#>    - On individuals:  
#>            Fitness
#> A/A||NoneH   1.050
#> A/a||NoneH   1.025
#> a/a||NoneH   1.000
#> 
#>  #  Migration matrix:  
#> 
#>       pop1  pop2
#> pop1 0.995 0.005
#> pop2 0.005 0.995
#> 
#>  #  Haploptypes:  
#> 
#>          1          2 
#> "A||NoneH" "a||NoneH" 
#> 
#>  #  Genotypes:  
#> 
#>            1            2            3 
#> "A/A||NoneH" "A/a||NoneH" "a/a||NoneH" 
#> 
#>  #  Matrices involved in gametogenesis:  
#> 
#>    - Mutation matrix:  
#>          A||NoneH a||NoneH
#> A||NoneH    0.999    0.001
#> a||NoneH    0.001    0.999
#> 
#>    - Recombination matrix:  
#>            A/A||NoneH A/a||NoneH a/a||NoneH
#> A/A||NoneH          1          0          0
#> A/a||NoneH          0          1          0
#> a/a||NoneH          0          0          1
#> 
#>    - Meiosis matrix:  
#>            A||NoneH a||NoneH
#> A/A||NoneH      1.0      0.0
#> A/a||NoneH      0.5      0.5
#> a/a||NoneH      0.0      1.0
#> 
#>    - Final gametogenesis matrix:  
#>            A||NoneH a||NoneH
#> A/A||NoneH    0.999    0.001
#> A/a||NoneH    0.500    0.500
#> a/a||NoneH    0.001    0.999
#> 
#>  #  Haplotypes crossing matrix:  
#> 
#>               A||NoneH(Mal) a||NoneH(Mal)
#> A||NoneH(Fem)    A/A||NoneH    A/a||NoneH
#> a||NoneH(Fem)    A/a||NoneH    a/a||NoneH
#> 
#>  #  Allele frequencies from genotype frequencies matrix:  
#> 
#>            NoneH   A   a
#> A/A||NoneH     1 1.0 0.0
#> A/a||NoneH     1 0.5 0.5
#> a/a||NoneH     1 0.0 1.0
#> 
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The model is then simulated 100 times recording the population state every 50 generations up to 800 generations. The seed is specified to ensure reproducibility of the results obtained.

nsim <- 100
metapop <- simulate(metapop,
  nsim = nsim, seed = 123,
  recording = TRUE, recordGenGap = 50,
  threshold = 800
)
metapopDeterminist <- simulate(metapop,
  drift = FALSE, seed = 123,
  recording = TRUE, threshold = 800
)

Then we sum up the recording data frames of each simulation and divide them by the number of simulations (which results in an overall average of the data frames):

rec <- getRecords(metapop)

recMean <- Reduce(
  function(x, y) {
    x$pop1 <- x$pop1 + y$pop1
    x$pop2 <- x$pop2 + y$pop2
    x
  },
  rec
)
recMean$pop1 <- recMean$pop1 / nsim
recMean$pop2 <- recMean$pop2 / nsim

recDeterminist <- getRecords(metapopDeterminist)$s1

This creates a mutation - selection - drift - migration equilibrium visible on these graphs. Dashed lines indicate deterministic predictions (without drift).

par(mfrow = c(2, 1))

### Allelic frequency
plot(c(),
  xlim = c(0, 800), ylim = c(0, 1), col = "blue",
  xlab = "Generation\n(blue for A, red for B)",
  ylab = "Frequency of a"
)

# Stochastic
for (i in seq_len(nsim)) {
  lines(rec[[i]]$pop1$gen, rec[[i]]$pop1$a, lty = "longdash", col = "lightblue")
  lines(rec[[i]]$pop2$gen, rec[[i]]$pop2$a, lty = "longdash", col = "lightpink")
}
lines(recMean$pop1$gen, recMean$pop1$a, lty = "longdash", col = "blue")
lines(recMean$pop2$gen, recMean$pop2$a, lty = "longdash", col = "red")

# Deterministic
lines(recDeterminist$pop1$gen, recDeterminist$pop1$a, lty = "longdash", col = "blue", lwd = 2)
lines(recDeterminist$pop2$gen, recDeterminist$pop2$a, lty = "longdash", col = "red", lwd = 2)

# Mean stochastic
lines(recMean$pop1$gen, recMean$pop1$a, col = "blue")
lines(recMean$pop2$gen, recMean$pop2$a, col = "red")


### Mean fitness
plot(c(),
  xlim = c(0, 800), ylim = c(1, 1 + s), col = "blue",
  xlab = "Generation\n(blue for A, red for B)",
  ylab = "Frequency of a"
)

# Stochastic
for (i in seq_len(nsim)) {
  lines(rec[[i]]$pop1$gen, rec[[i]]$pop1$indMeanFit, lty = "longdash", col = "lightblue")
  lines(rec[[i]]$pop2$gen, rec[[i]]$pop2$indMeanFit, lty = "longdash", col = "lightpink")
}
lines(recMean$pop1$gen, recMean$pop1$indMeanFit, lty = "longdash", col = "blue")
lines(recMean$pop2$gen, recMean$pop2$indMeanFit, lty = "longdash", col = "red")

# Deterministic
lines(recDeterminist$pop1$gen, recDeterminist$pop1$indMeanFit, lty = "longdash", col = "blue", lwd = 2)
lines(recDeterminist$pop2$gen, recDeterminist$pop2$indMeanFit, lty = "longdash", col = "red", lwd = 2)

# Mean stochastic
lines(recMean$pop1$gen, recMean$pop1$indMeanFit, col = "blue")
lines(recMean$pop2$gen, recMean$pop2$indMeanFit, col = "red")

The average fitnesses in the populations equalise while the allelic frequencies are maintained at intermediate values but still as close as possible to the optimum in both populations.