expected concordance factors

The quartet concordance factors expected from a network can be calculated exactly on a network of any level. This can be done here using a recursive algorithm – although it may run slowly as its complexity was not optimized. Below is an example, with a network on 6 taxa with 2 reticulations.

julia> net = readTopology("(D:1,((C:1,#H25:0):0.1,((((B1:10,B2:1):1.5,#H1:0):10.8,((A1:1,A2:1):0.001)#H1:0::0.5):0.5)#H25:0::0.501):1);");
julia> eCFs,t = network_expectedCF(net);Calculation quartet CFs for 15 quartets... 0+---------------+100% ***************
julia> t # taxon list6-element Vector{String}: "A1" "A2" "B1" "B2" "C" "D"
julia> first(eCFs, 2) # first 2 4-taxon sets, each with 3 quartet CFs. taxon numbers are indices in the taxon list above2-element Vector{PhyloNetworks.QuartetT{StaticArraysCore.MVector{3, Float64}}}: 4-taxon set number 1; taxon numbers: 1,2,3,4 data: [0.8885456713760765, 0.05572716431196175, 0.05572716431196175] 4-taxon set number 2; taxon numbers: 1,2,3,5 data: [0.16750297999120484, 0.41624851000439755, 0.41624851000439755]

To look at these quartet CFs, we define below a function to convert them to a string. This string can then be printed to the screen (shown below with print) or written to a file (using write, not shown).

julia> qCFstring(qlist,taxa, sigdigits=4) = join(
         [join(taxa[q.taxonnumber],",") * ": " * string(round.(q.data, sigdigits=sigdigits))
          for q in qlist],   "\n");
julia> print(qCFstring(eCFs,t))A1,A2,B1,B2: [0.8885, 0.05573, 0.05573] A1,A2,B1,C: [0.1675, 0.4162, 0.4162] A1,A2,B2,C: [0.1675, 0.4162, 0.4162] A1,B1,B2,C: [0.03719, 0.03719, 0.9256] A2,B1,B2,C: [0.03719, 0.03719, 0.9256] A1,A2,B1,D: [0.1675, 0.4162, 0.4162] A1,A2,B2,D: [0.1675, 0.4162, 0.4162] A1,B1,B2,D: [0.03719, 0.03719, 0.9256] A2,B1,B2,D: [0.03719, 0.03719, 0.9256] A1,A2,C,D: [0.6898, 0.1551, 0.1551] A1,B1,C,D: [0.793, 0.1035, 0.1035] A2,B1,C,D: [0.793, 0.1035, 0.1035] A1,B2,C,D: [0.793, 0.1035, 0.1035] A2,B2,C,D: [0.793, 0.1035, 0.1035] B1,B2,C,D: [1.0, 9.422e-7, 9.422e-7]

For each 4-taxon set above, the 3 concordance factors are for the quartets listed in this order: 12|34, 13|24, 14|23 where 1,2,3,4 refer to the order of the taxa listed to the left.

We can also convert our list to a data frame:

julia> using DataFrames
julia> df = writeTableCF(eCFs, t) # requires PhyloNetworks v0.16.115×8 DataFrame Row │ qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 │ Int64 String String String String Float64 Float64 Float64 ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.888546 0.0557272 0.0557272 2 │ 2 A1 A2 B1 C 0.167503 0.416249 0.416249 3 │ 3 A1 A2 B2 C 0.167503 0.416249 0.416249 4 │ 4 A1 B1 B2 C 0.0371891 0.0371891 0.925622 5 │ 5 A2 B1 B2 C 0.0371891 0.0371891 0.925622 6 │ 6 A1 A2 B1 D 0.167503 0.416249 0.416249 7 │ 7 A1 A2 B2 D 0.167503 0.416249 0.416249 8 │ 8 A1 B1 B2 D 0.0371891 0.0371891 0.925622 9 │ 9 A2 B1 B2 D 0.0371891 0.0371891 0.925622 10 │ 10 A1 A2 C D 0.689828 0.155086 0.155086 11 │ 11 A1 B1 C D 0.793009 0.103496 0.103496 12 │ 12 A2 B1 C D 0.793009 0.103496 0.103496 13 │ 13 A1 B2 C D 0.793009 0.103496 0.103496 14 │ 14 A2 B2 C D 0.793009 0.103496 0.103496 15 │ 15 B1 B2 C D 0.999998 9.42151e-7 9.42151e-7

If we wanted to compare with the observed frequency among 100 gene trees, we could use data frame manipulations to join the two data frames:

julia> using PhyloCoalSimulations
julia> genetrees = simulatecoalescent(net, 100, 1); # 100 genes, 1 individual / pop
julia> df_sim = writeTableCF(countquartetsintrees(genetrees; showprogressbar=false)...);
julia> first(df_sim, 2)2×9 DataFrame Row │ qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 ngenes │ Int64 String String String String Float64 Float64 Float64 Float64 ─────┼─────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.94 0.05 0.01 100.0 2 │ 2 A1 A2 B1 C 0.19 0.49 0.32 100.0
julia> select!(df_sim, Not([:qind,:ngenes])); # delete columns with q index and number of genes
julia> df_both = outerjoin(select(df, Not(:qind)), df_sim; on=[:t1,:t2,:t3,:t4], renamecols = "exact" => "sim")15×10 DataFrame Row │ t1 t2 t3 t4 CF12_34exact CF13_24exact CF14_23exact CF12_34sim CF13_24sim CF14_23sim │ String String String String Float64? Float64? Float64? Float64? Float64? Float64? ─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ A1 A2 B1 B2 0.888546 0.0557272 0.0557272 0.94 0.05 0.01 2 │ A1 A2 B1 C 0.167503 0.416249 0.416249 0.19 0.49 0.32 3 │ A1 A2 B2 C 0.167503 0.416249 0.416249 0.21 0.48 0.31 4 │ A1 B1 B2 C 0.0371891 0.0371891 0.925622 0.05 0.0 0.95 5 │ A2 B1 B2 C 0.0371891 0.0371891 0.925622 0.02 0.01 0.97 6 │ A1 A2 B1 D 0.167503 0.416249 0.416249 0.19 0.47 0.34 7 │ A1 A2 B2 D 0.167503 0.416249 0.416249 0.21 0.46 0.33 8 │ A1 B1 B2 D 0.0371891 0.0371891 0.925622 0.05 0.0 0.95 9 │ A2 B1 B2 D 0.0371891 0.0371891 0.925622 0.02 0.01 0.97 10 │ A1 A2 C D 0.689828 0.155086 0.155086 0.68 0.12 0.2 11 │ A1 B1 C D 0.793009 0.103496 0.103496 0.8 0.09 0.11 12 │ A2 B1 C D 0.793009 0.103496 0.103496 0.74 0.17 0.09 13 │ A1 B2 C D 0.793009 0.103496 0.103496 0.8 0.09 0.11 14 │ A2 B2 C D 0.793009 0.103496 0.103496 0.74 0.17 0.09 15 │ B1 B2 C D 0.999998 9.42151e-7 9.42151e-7 1.0 0.0 0.0