quartet concordance factor simulation

using PhyloCoalSimulations

The correction for dependence across 4-taxon uses a simulation approach. We provide more examples here to simulate gene trees and extract the quartet concordance factors observed in these simulated gene trees.

We use PhyloCoalSimulations for the simulation of trees under the network multispecies coalescent, assuming that the network edges lengths are in coalescent units.

julia> using PhyloNetworks, QuartetNetworkGoodnessFit, PhyloCoalSimulations
julia> net1 = readTopology("((((D:0.2,C:0.2):2.4,((A:0.4,B:0.4):1.1)#H1:1.1::0.7):2.0,(#H1:0.0::0.3,E:1.5):3.1):1.0,O:5.6);");
julia> ngenes = 3030
julia> genetrees = simulatecoalescent(net1, ngenes, 1); # 1 individual / species
julia> obsCF, t = countquartetsintrees(genetrees; showprogressbar=true);Reading in trees, looking at 15 quartets in each... 0+------------------------------+100% ******************************
julia> df = writeTableCF(obsCF, t)15×9 DataFrame Row │ qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 ngen ⋯ │ Int64 String String String String Float64 Float64 Float64 Floa ⋯ ─────┼─────────────────────────────────────────────────────────────────────────────── 1 │ 1 A B C D 0.966667 0.0333333 0.0 3 ⋯ 2 │ 2 A B C E 0.766667 0.0333333 0.2 3 3 │ 3 A B D E 0.766667 0.0333333 0.2 3 4 │ 4 A C D E 0.0 0.0 1.0 3 5 │ 5 B C D E 0.0 0.0333333 0.966667 3 ⋯ 6 │ 6 A B C O 0.766667 0.0666667 0.166667 3 7 │ 7 A B D O 0.766667 0.0333333 0.2 3 8 │ 8 A C D O 0.0333333 0.0 0.966667 3 9 │ 9 B C D O 0.0 0.0333333 0.966667 3 ⋯ 10 │ 10 A B E O 0.833333 0.133333 0.0333333 3 11 │ 11 A C E O 0.7 0.266667 0.0333333 3 12 │ 12 B C E O 0.766667 0.166667 0.0666667 3 13 │ 13 A D E O 0.7 0.266667 0.0333333 3 ⋯ 14 │ 14 B D E O 0.8 0.166667 0.0333333 3 15 │ 15 C D E O 0.966667 0.0333333 0.0 3 1 column omitted

using Hybrid-Lambda

Warning

Hybrid-Lambda is not recommended (see Allman, Baños & Rhodes 2022) but this section may still be of interest to some and is left for completeness.

The example qCF data used earlier were originally simulated as shown below using hybrid-Lambda, which comes with version v0.3 of QuartetNetworkGoodnessFit but not with later versions.

In the code below, we show how to call hybrid-lambda within julia, asking for the simulation of 200 genes (-num 200). The major issue that this code solves is that hybrid-lambda does not use the standard extended Newick format, but we show code to generate the format expected by hybrid-lambda, from the standard format.

Note that hybrid-lambda assumes that the tree or network is ultrametric, but runs even if this assumption is not met.

If you required QuartetNetworkGoodnessFit v0.3, then do this:

hl = QuartetNetworkGoodnessFit.hybridlambda # path to hybrid-lambda simulator, on local machine

otherwise, install Hybrid-Lambda separately, then define hl as a string that gives the path to the Hybrid-Lambda executable on your local machine. Below, we use Julia to convert the standard Newick network into the format used by Hybrid-Lambda, run it, read the gene trees from the file created by Hybrid-Lambda, and finally calculate quartet concordance factors from these gene trees.

net1 = readTopology("((((D:0.2,C:0.2):2.4,((A:0.4,B:0.4):1.1)#H1:1.1::0.7):2.0,(#H1:0.0::0.3,E:1.5):3.1):1.0,O:5.6);");
net1HL = hybridlambdaformat(net1) # format for the network, to use below by hybrid-lambda
run(`$hl -spcu "((((D:0.2,C:0.2)I1:2.4,((A:0.4,B:0.4)I2:1.1)H1#0.7:1.1)I3:2.0,(H1#0.7:0.0,E:1.5)I4:3.1)I5:1.0,O:5.6)I6;" -num 200 -seed 123 -o "genetrees"`)
treelist = readMultiTopology("genetrees_coal_unit")
obsCF = writeTableCF(countquartetsintrees(treelist)...)