Estimating Migration Rates Based on Different Models

ESTIMATING MIGRATION RATES BASED ON DIFFERENT EVOLUTIONARY MODELS

1. We will start by estimating gene flow based on the “Isolation with Migration” model implemented in IMa.

1.1. Take a look at the input file for IMa in notepad. This is saved in the files folder and is called IMinfile.txt .The following scheme explains what in input consists of:

1.2. Save this file in the IMa folder.

1.3. Open the command line (StartRun>cmd). You now have to go to the IMa directory. To do this, use the cd command (meaning “change directory”). Eg. to get into a folder called “documents” located on the desktop you should write “cd desktop\documents” and press enter.

1.4. IMa works in a command line. Usually, we need to make several optimization runs to pick parameter values and prior distributions appropriate for our run. Since we are time-constrained, we will used directly the following command (which has been tested before and is known to work):

ima -i IMinfile.txt -o IMoutput.txt -p 45 -q1 5 -q2 5 -m1 5 -m2 5 -t8 -b7500 -L7500 -n 5 -fl –g1 0.01

Even with a small data set such as this, IMa takes a long time to run. While IMa runs, move to point 2 and conduct the analyses with DNAsp. We will get back to this analysis later.

2. We will use DNAsp to estimate the degree of gene flow between species A and species B.

2.1. Open PGD_Ph1.fas in DNAsp.

2.2. Define two groups of sequences. To do this, go to Data>Define sequence sets. Select all sequences from species A. Press “” to include them in the first group. Click “Add new sequence set” and write the name you wish to call this species (e.g. “Species A”). Do the same for species B. To finish, click “Update all entries”.

2.3. Menu Analysis>Gene flow and genetic differentiation. Click OK. Save the output (text file) with an appropriate name (suggestion “Ph1_DNAsp_PGD_gf.out”).

Record Nm value based on Hudson et al.’s Fst.

2.4. Repeat for file BF_Ph1.fas

DISCUSSION:

- What do these values indicate?

- What are the limitations of these methods to estimate gene flow?

3. Back to the IM model! When the IMa run is over, open the IMa outfile; if you have chosen the suggested name, the file will be called IMoutput.txt. Before looking at the results, we need to make sure that all went well. For example: are ESS values all >50? Do parameter trends stabilize with time? Are the curves defined enough? Do different users get similar results?

2.5. If the answers are “YES” we can interpret the output. We are especially interested in the “HiPT” value of the parameters. This value corresponds to the estimates that were more often accepted in the MCMC. If the run achieved stationarity then these should corresponde to the peak of the posterior probability distribution.

Record the “HiPT” corresponding to the parameters we want to estimate: θ1, θ2, θa, m1, m2 and t. Record also HPD90Lo and HPD90Hi values, corresponding to the lower and upper limits containing 90% of the posterior probability (and work as a confidence interval).

2.6.1 The estimated values (θ1, θ2, θa, m1, m2 and t )are all scaled by mutation rate µ.

To convert this to biological meaningful units, we must use one value for mutation rate (in the case of multiple loci, we use the geometric mean of the each locus’ mutation rate). We will assume that this value is 3*10-7 mutations per year per gene (not per site!).

It is important to distinguish between mutation rate per generation (µg) and mutation rate per year (µy). The relationship between them is easy to understand:

µg=µy*generation time.

In Podarcis, generation time is about 2.5 years.

Usually, when estimates are based on coalescent theory, we have to use µg for the conversions.

What will these conversions be?

To estimate the effective population size:

θ1=4*N1*µg, in which N1 is the effective size of population 1. That is, N1=θ1/(4*µg).

Do the same to estimate N2 and Na.

To estimate divergence time (in years):

t=t* µy, in which t is divergence time. Thus t=t/µy.

To estimate migration rates:

m1=m1/µg, in which m1 is the migration rate per generation, from population 2 to population 1. Thus m1=m1*µg. We can think in an analogous form to estimate m2.

To estimate population migration rates (the number of gene copies that migrate each generation):

2.N1.m1=2* θ1/(4*µg)*m1*µg = θ1*m1/2

What migration rates were estimated with IMa? Do these coincide to those estimated based on the island model? What appears to be the most plausible scenario?