example uses

We are re-using the same network and simulated tree as before:

julia> writeTopology(net)"((C:0.9,(B:0.2)#H1:0.7::0.6)I1:0.6,(#H1:0.6::0.4,A:1.0)I2:0.5)I3;"
julia> writeTopology(tree, round=true)"((((C:0.9)I1:0.6)I3:0.64,((A:1.0)I2:0.5)I3:0.64):1.424,(((B:0.2)H1:0.6)I2:0.5)I3:2.064);"

example 1, same as in mapping section

counting deep coalescences

The number of deep coalescences can be quantified as the number of "extra" lineages due to incomplete lineage sorting, that can be calculated from embedding the gene tree into the species phylogeny (see Maddison 1997 for species trees). For an edge in the network, say edge 7 going from I2 to I3 going in back in time, lineage sorting is complete if all the gene lineages entering the edge (at I2) coalesce into a single gene lineage by the time they exit the edge (at I3). If they don't, the number of extra lineages is k-1 where k is the number of lineages "exiting" the edge, for that particular edge in the species network and that particular gene tree. The total number of extra lineages, for a given gene tree, is the sum across all edges in the species phylogeny.

In our gene tree above, we can count the number of lineages that exit each species edge using the degree-2 mapping nodes, then count how many lineages are "extra". We do so below using utilities mappingnodes to iterate over degree-2 mapping nodes and population_mappedto to extract the mapping information.

julia> # dictionary to store the count of extra lineages exiting each network edge. initialized at 0
       edge_count = Dict(e.number => 0 for e in net.edge)Dict{Int64, Int64} with 7 entries:
  5 => 0
  4 => 0
  6 => 0
  7 => 0
  2 => 0
  3 => 0
  1 => 0
julia> const PCS = PhyloCoalSimulations; # for lazy typing!
julia> for n in PCS.mappingnodes(tree) # iterate over degree-2 mapping nodes in the gene tree child = getchildedge(n) popid = PCS.population_mappedto(child) # number of species edge that 'n' came from # sanity check below isnothing(popid) && error("""population ID not found for the child edge of node number $(n.number) mapping to species node $(n.name).""") edge_count[popid] += 1 # increment by 1 the number of lineages exiting population edge 'popid' end
julia> edge_countDict{Int64, Int64} with 7 entries: 5 => 1 4 => 1 6 => 1 7 => 2 2 => 1 3 => 0 1 => 1

From this, we see two interesting things.

  • 0 lineages exited edge number 3 in the species network: it's the hybrid edge from H1 to I3 (going back in time). That's because the only lineage at H1 was interited from I2, so there weren't any lineage evolving through edge 3.
  • 2 lineages exited edge number 7 (going from I2 to I3 back in time), so that's 1 extra lineage. All other edges look as expected, with a single gene lineage exiting from them.

We can now calculate the total number of extra lineages:

julia> filter!(p -> p.second > 0, edge_count) # filter out edges without any gene lineageDict{Int64, Int64} with 6 entries:
  5 => 1
  4 => 1
  6 => 1
  7 => 2
  2 => 1
  1 => 1
julia> map!(k -> k-1, values(edge_count)) # calculate number of "extras": k-1ValueIterator for a Dict{Int64, Int64} with 6 entries. Values: 0 0 0 1 0 0
julia> deepcoalescence = sum(values(edge_count)) # sum 'extras' over all edges in the network1

On the particular gene tree we simulated, we counted 1 deep coalescence.

number of lineages inherited via gene flow

Our network has inheritance γ=0.4 on the minor edge, which we'll call the "gene flow" edge, and γ=0.6 on the major hybrid edge, parent to H1 on the major tree. But we may be interested in the realized proportion of lineages inherited from each parent at H1, realized in the gene trees we actually simulated. To do so, we can count the number of gene lineages that are mapped to each hybrid edge in the network.

This mapping is stored in the edge attribute .inCycle internally, but it's best to access it via the function population_mappedto (as the internal representation may change). From the plot above, the minor "gene flow" edge is edge number 5 and the major hybrid edge has number 3. So we can count the gene lineages inherited via gene flow as the number of gene tree edges with inCycle equal to 5.

If the gene trees have been saved to a file and later read from this file, then the .inCycle attributes are no longer stored in memory. In this case, we can retrieve the mapping information by the internal node names. The edges going through gene flow are those whose child node is named "H1" and parent node is named "I2".

We use the first option with the .inCycle attribute below. We get that our one simulated gene tree was indeed inherited via gene flow:

julia> sum(e.inCycle == 5 for e in tree.edge) # or:1
julia> sum(PCS.population_mappedto(e) == 5 for e in tree.edge)1
julia> # or define a function to do this for any edge, so we can re-use later: nlineages_through(edgeID, gt) = sum(PCS.population_mappedto(e) == edgeID for e in gt.edge);
julia> nlineages_through(5, tree) # same as before!1
julia> nlineages_through(3, tree) # lineages that went through edge 3, the major edge.0

To make this more interesting, we can simulate many gene trees then count how many of their lineages were inherited via gene flow. If we ask for 2 individuals in species B, then each gene may have 2 lineages that enter the hybrid node H1, if the two B individuals fail to coalesce. In that case, it's possible that one individual lineage was inherited via gene flow, and the other not. We'll calculate the gene flow proportion among all these lineages. This proportion should be close (but not exactly equal) to the theoretical γ=0.4 from the network model.

julia> ngenes = 100;
julia> genetrees = simulatecoalescent(net, ngenes, Dict("B"=>2, "A"=>1, "C"=>1); nodemapping=true);
julia> length(genetrees)100
julia> nlineages_geneflow = sum(nlineages_through(5,gt) for gt in genetrees)70
julia> nlineages_major = sum(nlineages_through(3,gt) for gt in genetrees)129
julia> # realized γ, close to 0.4: proportion_geneflow = nlineages_geneflow / (nlineages_geneflow + nlineages_major)0.35175879396984927

rate variation across species

The gene trees resulting from simulatecoalescent have their edge lengths in coalescent units. One may want to convert them to substitutions per site, so as to simulate molecular sequences along these gene trees. The mapping information is important to allow for different rates of molecular evolution across different species. Here is an example to do this.

We will use the Distributions package to simulate rates from a log-normal distribution across species, that is, across edges in the species network.

julia> using Distributions
julia> lognormal_rate_dist = LogNormal(-0.125, 0.5) # μ = -σ²/2 to get a mean of 1.Distributions.LogNormal{Float64}(μ=-0.125, σ=0.5)
julia> networkedge_rate = Dict(e.number => rand(lognormal_rate_dist) for e in net.edge)Dict{Int64, Float64} with 7 entries: 5 => 0.68631 4 => 1.23458 6 => 1.84251 7 => 0.438 2 => 1.43724 3 => 1.09321 1 => 0.580741
julia> # add entry for the edge above the network's root. Find its number first. rootedgenumber = PhyloCoalSimulations.get_rootedgenumber(net)8
julia> push!(networkedge_rate, rootedgenumber => rand(lognormal_rate_dist))Dict{Int64, Float64} with 8 entries: 5 => 0.68631 4 => 1.23458 6 => 1.84251 7 => 0.438 2 => 1.43724 8 => 1.39978 3 => 1.09321 1 => 0.580741
julia> writeTopology(tree, round=true, digits=4) # before rate variation"((((C:0.9)I1:0.6)I3:0.64,((A:1.0)I2:0.5)I3:0.64):1.4235,(((B:0.2)H1:0.6)I2:0.5)I3:2.0636);"
julia> # multiply the length of each gene lineage by the rate of the species edge it maps into for e in tree.edge e.length *= networkedge_rate[e.inCycle] end
julia> writeTopology(tree, round=true, digits=4) # after rate variation"((((C:0.5227)I1:0.7407)I3:0.8959,((A:1.8425)I2:0.219)I3:0.8959):1.9926,(((B:0.2874)H1:0.4118)I2:0.219)I3:2.8885);"