internal documentation

Documentation for QuartetNetworkGoodnessFit's internal functions. Those functions are not exported, but can still be used (like: QuartetNetworkGoodnessFit.foo() for a function named foo()).

index

    functions

    QuartetNetworkGoodnessFit.dirichlet_maxMethod
    dirichlet_max(dcf::DataCF)

    Calculate outlier p-values, one for each four-taxon set, using the maximum concordance factor under a Dirichlet distribution. Used by ticr!.

    output

    • vector of outlier p-values, one for each 4-taxon set
    • value of the concentration parameter α
    • value of the pseudo likelihood (optimized at α)
    source
    QuartetNetworkGoodnessFit.dirichlet_minMethod
    dirichlet_min(dcf::DataCF)

    First calculate outlier p-values using each of the three concordance factors for each quartet under a Dirichlet distribution, then take the smallest p-value among the three, as the outlier p-value for each four-taxon set.

    output

    • a vector of outlier p-values, one for each quartet
    • value of the concentration parameter α
    • value of the pseudo likelihood (optimized at α)
    source
    QuartetNetworkGoodnessFit.expectedCF_orderedFunction
    expectedCF_ordered(dcf::DataCF, net::HybridNetwork, suffix=""::AbstractString)

    Expected quartet concordance factors in dcf, but ordered as they would be if output by PhyloNetworks.countquartetsintrees. Output:

    • 2-dimentional SharedArray (number of 4-taxon sets x 3). dcf.quartet[i].qnet.expCF[j] for 4-taxon set i and resolution j is stored in row qi and column k if qi is the rank of 4-taxon set i (see PhyloNetworks.quartetrank). This rank depends on how taxa are ordered.
    • vector of taxon names, whose order matters. These are tip labels in net with suffix suffix added, then ordered alphabetically, or numerically if taxon names can be parsed as integers.
    source
    QuartetNetworkGoodnessFit.multinom_lrt!Method
    multinom_lrt!(pval::AbstractVector{Float64}, quartet::Vector{Quartet})
    multinom_lrt!(pval::AbstractVector{Float64}, obsCF, expCF::AbstractMatrix{Float64})

    Calculate outlier p-values (one per four-taxon set) using the likelihood ratio test under a multinomial distribution for the observed concordance factors.

    source
    QuartetNetworkGoodnessFit.multinom_pearson!Method
    multinom_pearson!(pval::AbstractVector{Float64}, quartet::Vector{Quartet})
    multinom_pearson!(pval::AbstractVector{Float64}, obsCF, expCF::AbstractMatrix{Float64})

    Calculate outlier p-values (one per four-taxon set) using Pearson's chi-squared statistic under a multinomial distribution for the observed concordance factors.

    source
    QuartetNetworkGoodnessFit.multinom_qlog!Method
    multinom_qlog!(pval::AbstractVector{Float64}, quartet::Vector{Quartet})
    multinom_qlog!(pval::AbstractVector{Float64}, obsCF, expCF::AbstractMatrix{Float64})

    Calculate outlier p-values (one per four-taxon set) using the Qlog statistic (Lorenzen, 1995), under a multinomial distribution for the observed concordance factors.

    source
    QuartetNetworkGoodnessFit.network_expectedCF!Method
    network_expectedCF!(quartet::QuartetT, net::HybridNetwork, taxa, taxonnumber,
            inheritancecorrelation)

    Update quartet.data to contain the quartet concordance factors expected from the multispecies coalescent along network net for the 4-taxon set taxa[quartet.taxonnumber]. taxa should contain the tip labels in net. quartet.taxonnumber gives the indices in taxa of the 4 taxa of interest. taxonnumber should be a dictionary mapping taxon labels in to their indices in taxa, for easier lookup.

    net is not modified.

    For inheritancecorrelation see network_expectedCF. Its value should be between 0 and 1 (not checked by this internal function).

    source
    QuartetNetworkGoodnessFit.network_expectedCF_4taxa!Method
    network_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorrelation)

    Return the quartet concordance factors expected from the multispecies coalescent along network net, where the 3 quartet topologies are ordered following the ordering of taxon names in fourtaxa, that is: if fourtaxa is a,b,c,d, then the concordance factors are listed in this order:

    (qCF(ab|cd), qCF(ac|bd), qCF(ad,bc))

    Assumptions about net:

    • has 4 taxa, and those are the same as fourtaxa
    • no degree-2 nodes, except perhaps for the root
    • edge lengths are non-missing
    • hybrid edge γ's are non-missing

    The network is modified as follows: what's above the LSA is removed, the 2 edges incident to the root are fused (if the root is of degree 2), and external degree-2 blobs are removed. net is then simplified recursively by removing hybrid edges for the recursive calculation of qCFs.

    For inheritancecorrelation see network_expectedCF. Its value should be between 0 and 1 (not checked by this internal function).

    source
    QuartetNetworkGoodnessFit.quarnetGoFtestMethod
    quarnetGoFtest(quartet::Vector{Quartet}, outlierp_fun!::Function)
    quarnetGoFtest(outlier_pvalues::AbstractVector)

    Calculate an outlier p-value for each quartet according to function outlierp_fun! (or take outlier-values as input: second version) and calculate the z-value to test the null hypothesis that 5% of the p-values are < 0.05, versus the one-sided alternative of more outliers than expected.

    See quarnetGoFtest! for more details.

    Output:

    • z-value
    • outlier p-values (first version only)
    source
    QuartetNetworkGoodnessFit.quarnetGoFtest_simulationMethod
    quarnetGoFtest_simulation(net::HybridNetwork, dcf::DataCF, outlierp_fun!::Function,
                              seed::Int, nsim::Int, verbose::Bool, keepfiles::Bool)

    Simulate gene trees under the multispecies coalescent model along network net using PhyloCoalSimulations. The quartet concordance factors (CFs) from these simulated gene trees are used as input to outlierp_fun! to categorize each 4-taxon set as an outlier (p-value < 0.05) or not. For each simulated data set, a goodness-of-fit z-value is calculated by comparing the proportion of outlier 4-taxon sets to 0.05. The standard deviation of these z-values (assuming a mean of 0), and the z-values themselves are returned.

    Used by quarnetGoFtest!.

    Warning: The quartet CFs expected from net are assumed to be stored in dcf.quartet[i].qnet.expCF. This is not checked.

    source
    QuartetNetworkGoodnessFit.reroot!Method
    reroot!(net, refnet)

    Reroot net to minimize the hardwired cluster distance between the net (with the new root position) and the reference network refnet. Candidate root positions are limited to internal nodes (excluding leaves) that are compatible with the direction of hybrid edges.

    source
    QuartetNetworkGoodnessFit.ticr_optimalphaMethod
    ticr_optimalpha(dcf::DataCF)

    Find the concentration parameter α by maximizing the pseudo-log-likelihood of observed quartet concordance factors. The model assumes a Dirichlet distribution with mean equal to the expected concordance factors calculated from a phylogenetic network (under ILS). These expected CFs are assumed to be already calculated, and stored in dcf.

    When calculating the pseudo-log-likelihood, this function checks the observed concordance factors for any values equal to zero: they cause a problem because the Dirichlet density is 0 at 0 (for concentration α > 1). Those 0.0 observed CF values are re-set to the minimum of:

    • the minimum of all expected concordance factors, and
    • the minimum of all nonzero observed concordance factors.

    output

    • maximized pseudo-loglikelihood
    • value of α where the pseudo-loglikelihood is maximized
    • return code of the optimization

    The optimization uses NLOpt, with the :LN_BOBYQA method. Optional arguments can tune the optimization differently: nloptmethod, xtol_rel (1e-6 by default), starting α value x_start (1.0 by default).

    source
    QuartetNetworkGoodnessFit.ultrametrize!Method
    ultrametrize!(net::HybridNetwork, verbose::Bool)

    Assign values to missing branch lengths in net to make the network time-consistent (all paths from the root to a given hybrid node have the same length) and ultrametric (all paths from the root to the tips have the same length), if possible. Warnings are given if it's not possible and if verbose is true.

    Output: true if the modified network is ultrametric, false otherwise.

    The major tree is used to calculate the distance from nodes to the root. If a tree edge has a missing length, this length is changed to the following:

    • 0 if the edge is internal,
    • the smallest value possible to make the network ultrametric if the edge is external.

    It is assumed that hybrid nodes are not leaves, such that external edges are necessarily tree edges. If a hybrid edge has a missing length, this length is changed as follows:

    • If both partner hybrid edges lack a length: the shortest lengths are assigned to make the network time-consistent at the hybrid node. In particular, either the major edge or the minor edge is assigned length 0.0.
    • Otherwise: the value needed to make the network time-consistent considering based on the partner edge's length if this value is non-negative, and 0 if the ideal value is negative.
    source