# code from prior sectionsusingCSV, DataFrames, PhyloNetworks, PhyloPlots, RCall, StatsBase, StatsModelsnet =readTopology("data/polemonium_network_calibrated.phy");for tip in net.node tip.name =replace(tip.name, r"_2$" => s""); endtraits = CSV.read("data/polemonium_traits_morph.csv", DataFrame)subset!(traits, :morph =>ByRow(in(tipLabels(net))));
What happened to the trait during gene flow? We can ask this question in various ways.
network model versus tree model
First, we can investigate if the network (with gene flow) is a better explanation for the trait evolution than a tree. Several trees are displayed in the network, but the tree that represents most of the genome has the “major” hybrid edges, those with inheritance γ>0.5. We can delete all the “minor” hybrid edges (with γ<0.5) as shown below.
tree =majorTree(net)R"par"(mar=[0,0,0,0]);plot(tree, tipoffset=0.05, useedgelength=true, xlim=[1,2.5]);
Which model seems to fit best: network or major tree?
Note that this network-vs-tree question applies to the phylogenetic correlation of residuals here. It could be gene flow had an impact on the geographic distribution at reticulations, which impacted evelation and latitude. In that case, this impact would be explained by the predictors that we used, and would not be seen here. But it could appear from a model without any predictors (something you could run if interested).
We can check to see how much of a difference it makes on coefficients:
coeftable(fit_net)
Coef.
Std. Error
t
Pr(>
t
)
(Intercept)
4.22609
1.81331
2.33
0.0352
0.336929
8.11526
elevation
-0.585523
0.16023
-3.65
0.0026
-0.929181
-0.241865
lat
-0.0804964
0.0391184
-2.06
0.0587
-0.164397
0.00340418
coeftable(fit_tree)
Coef.
Std. Error
t
Pr(>
t
)
(Intercept)
4.11819
1.81426
2.27
0.0395
0.226996
8.00938
elevation
-0.602781
0.155119
-3.89
0.0016
-0.935479
-0.270083
lat
-0.077632
0.0392076
-1.98
0.0677
-0.161724
0.00645985
or how much difference it makes on the estimated variance parameters:
Another question we may ask is whether the trait at reticulations deviates from the weighted average of its parent populations (immediately before gene flow). Such would be the case if a hybrid species’s trait is outside the range of its parents. This “transgressive” evolution can be modelled with a shift in the trait along the edge just below the reticulation. This shift quantifies the difference between the trait at the hybrid node and the weighted average of the immediate parents.
The code below builds a data frame with 1 predictor column per reticulation: this column quantifies the effect of a transgressive shift on each taxon in the phylogeny.
df_shift =regressorHybrid(net)
17×5 DataFrame
Row
shift_8
shift_18
shift_25
tipNames
sum
Float64
Float64
Float64
String
Float64
1
0.0
0.0
0.0
micranthum
0.0
2
0.0
0.0
0.0
occidentale_occidentale
0.0
3
0.0
0.0
0.0
reptans
0.0
4
0.0
0.0
0.0
pectinatum
0.0
5
1.0
0.0
0.0
delicatum
1.0
6
1.0
0.0
0.0
pulcherrimum_shastense
1.0
7
0.0
0.0
0.0
elegans
0.0
8
0.0
0.0
0.0
carneum
0.0
9
0.0
1.0
0.0
elusum
1.0
10
0.0
0.0
0.0
aff._viscosum_sp._nov.
0.0
11
0.0
0.0
0.0
pauciflorum
0.0
12
0.0
0.0
1.0
apachianum
1.0
13
0.0
0.0
0.0
foliosissimum
0.0
14
0.0
0.0
0.0
confertum
0.0
15
0.0
0.0
0.0
brandegeei
0.0
16
0.0
0.0
0.0
eddyense
0.0
17
0.0
0.0
0.0
chartaceum
0.0
To know which shift corresponds to which reticulation, we could plot the network and ask to see the edge numbers, as shown below.
But the data frame shows us which tips are impacted by each reticulation: - transgressive shift on edge 8: reticulation leading to delicatum and pulcherrimum_shastense, - shift on edge 18: below reticulation leading to elusum - shift on edge 25: below reticulation leading to apachianum. The values in this data frame are the proportion of the genome that each species inherited from the reticulation, based on the network’s γ values. In our case, they are only 0 or 1, but the values could be in between on more complex networks, especially for older reticulations if they were followed by later reticulations.
Now we can combine our trait data and the new columns that can serve as predictors to model transgressive evolution:
df =innerjoin(select(traits, :morph, r"llength", :elevation, :lat),select(df_shift, Not(:sum)), # excludes the 'sum' column on =:morph =>:tipNames) # join our data with shift predictors
17×9 DataFrame
Row
morph
llength
llength_sd
llength_n
elevation
lat
shift_8
shift_18
shift_25
String31
Float64
Float64
Int64
Float64
Float64
Float64
Float64
Float64
1
micranthum
-0.864849
0.345533
75
1.128
43.7447
0.0
0.0
0.0
2
occidentale_occidentale
0.542457
0.39549
173
2.14
42.5916
0.0
0.0
0.0
3
reptans
0.746643
0.285631
49
0.242
39.9954
0.0
0.0
0.0
4
pectinatum
0.834157
0.204992
9
0.55
47.1283
0.0
0.0
0.0
5
delicatum
0.0316277
0.384536
50
3.307
38.84
1.0
0.0
0.0
6
pulcherrimum_shastense
-1.04488
0.230817
8
3.17
41.3842
1.0
0.0
0.0
7
elegans
-1.06016
0.258922
21
2.287
46.904
0.0
0.0
0.0
8
carneum
0.75653
0.236791
47
1.008
43.2211
0.0
0.0
0.0
9
elusum
0.21314
0.260283
14
1.692
44.7146
0.0
1.0
0.0
10
aff._viscosum_sp._nov.
-1.371
0.269326
38
3.0785
44.0958
0.0
0.0
0.0
11
pauciflorum
0.222039
0.313219
47
2.6375
30.8088
0.0
0.0
0.0
12
apachianum
0.289364
0.304548
62
2.68836
33.927
0.0
0.0
1.0
13
foliosissimum
0.622095
0.315621
274
2.6825
38.786
0.0
0.0
0.0
14
confertum
-0.430446
0.290027
10
3.7045
38.9484
0.0
0.0
0.0
15
brandegeei
-0.67488
0.353546
58
2.7435
40.0324
0.0
0.0
0.0
16
eddyense
-1.21937
0.0652927
3
2.744
41.3189
0.0
0.0
0.0
17
chartaceum
-1.61233
0.156378
16
4.116
37.6344
0.0
0.0
0.0
If we had a particular interest in the reticulation ancestral to the delicatum + pulcherrimum clade, we could look for evidence of transgressive evolution at that reticulation like this (in leaflet length itself, including any effect correlated with latitude and elevation):
fit_noshift =phylolm(@formula(llength ~1), df, net, tipnames=:morph, withinspecies_var=true, y_mean_std=true)# using `df` for both, so the no-shift model is recognized as nested in the shifted modelfit_sh8 =phylolm(@formula(llength ~ shift_8), df, net, tipnames=:morph, withinspecies_var=true, y_mean_std=true)coeftable(fit_sh8) # 95% confidence interval for shift includes 0aic(fit_noshift), aic(fit_sh8)
(674.853189498432, 675.4984901526377)
Note that we fitted the no-shift model with the same data df as the shifted model. The taxa are listed in a different order in df and in the original traits data frame. It helps to use the same data frame for later comparisons. For example, we could run a likelihood ratio test (lrtest) to compare the two models, altough that we requires re-fitting them with ML method instead of the default REML: