mapping gene trees into the species phylogeny

Nodes in gene trees can be mapped to a node or an edge in the species phylogeny. Edges in gene trees can be mapped to an edge in the species phylogeny. Attributes of nodes and edges are used to carry this mapping information, as detailed in the documentation of function simulatecoalescent.

We give examples below of how we may use this mapping information.

naming internal nodes

First, it's useful to name internal nodes in the network, to which we can later map nodes in the gene tree.

julia> net = readTopology("((C:0.9,(B:0.2)#H1:0.7::0.6):0.6,(#H1:0.6,A:1):0.5);");
julia> PhyloNetworks.nameinternalnodes!(net, "I"); # "I" is a prefix to name internal nodes
julia> writeTopology(net)"((C:0.9,(B:0.2)#H1:0.7::0.6)I1:0.6,(#H1:0.6::0.4,A:1.0)I2:0.5)I3;"

Notice the extra node names in the species phylogeny: I1, I2, and I3 at the root.

Next, we use the option nodemapping=true when simulating gene trees, to ask for extra degree-2 nodes in gene trees. These nodes are created each time that a gene tree lineage crosses a node in the species phylogeny, that is, each time that a gene tree lineage crosses a speciation node or a hybridization node. We'll simulate a single tree here each time.

julia> using Random; Random.seed!(261); # to replicate randomness
julia> tree_regular = simulatecoalescent(net,1,1)[1];
julia> writeTopology(tree_regular, round=true) # regular nodes only"((C:2.14,A:2.14):1.424,B:3.364);"
julia> Random.seed!(261); # to replicate the same coalescent simulation
julia> tree = simulatecoalescent(net,1,1; nodemapping=true)[1];
julia> writeTopology(tree, round=true) # extra degree-2 nodes for mapping"((((C:0.9)I1:0.6)I3:0.64,((A:1.0)I2:0.5)I3:0.64):1.424,(((B:0.2)H1:0.6)I2:0.5)I3:2.064);"

Notice that regular nodes in the gene tree (nodes that we get without the nodemapping option) don't have names. With the nodemapping option, there are many new nodes, all of degree-2, named after internal nodes in the network (I1, I2, I3). These extra degree-2 nodes and their names are sufficient to map the gene tree into the species network.

The network is shown on the left below, with edges annotated by their numbers.

plot(net, showedgenumber=true, shownodelabel=true, tipoffset=0.1);
plot(tree, edgelabel=DataFrame(number=[e.number for e in tree.edge],
                               label=[e.inCycle for e in tree.edge]),
           edgelabelcolor="red4", shownodelabel=true, tipoffset=0.1);

example 1: degree-2 node names in gene tree

In the gene tree (right), each lineage is annotated by the network edge it maps into. Degree-2 nodes appear via their names, such that each horizontal line represents a series of gene lineages, separated from each other by degree-2 nodes. For example, the horizontal line tracing B's ancestry back in time maps into the network like this:

from B, go back along edge 2
meet hybrid node H1, was inherited from the minor hybrid edge 5,
from speciation node I2, trace back along edge 7,
meet the network's root node I3, trace back along the network's root edge 8 before coalescing with the ancestor of the other lineages (which have already coalesced by then).

cleaning gene trees

Almost all examples below use this mapping information via the extra degree-2 nodes and the extra edges between these nodes.

But we may want to "clean" gene trees of their degree-2 nodes at some point. This can be done with the PhyloNetworks utility removedegree2nodes!, like this:

julia> PhyloNetworks.removedegree2nodes!(tree, true)PhyloNetworks.HybridNetwork, Rooted Network
4 edges
5 nodes: 3 tips, 0 hybrid nodes, 2 internal tree nodes.
tip labels: C, A, B
((C:2.14,A:2.14):1.424,B:3.364);

The option true is to keep the root, even if it's of degree 2.