系统发育网络推断软件PhyloNetworks和Phylonet的使用及遇到的error

一、系统发育网络推断软件PhyloNetworks的使用流程以及解决报错

该软件的详细说明请参考他的官方网站Home · PhyloNetworks.jl (crsl4.github.io)，该软件的中文流程（包括简介、安装使用流程等）请参考系统发育网络推断 —— PhyloNetworks | 生信技工 (yanzhongsino.github.io)，本笔记参考以上两个站点并测试无报错。

1、安装

wget https://julialang-s3.julialang.org/bin/linux/x64/1.7/julia-1.7.2-linux-x86_64.tar.gz #下载
tar -xzf julia-1.7.2-linux-x86_64.tar.gz #解压
julia-1.7.2/bin/julia -h #若无报错则安装julia成功

julia-1.7.2/bin/julia  #进入julia运行界面，类似于python和r的交互模式
julia> using Pkg #类似于r的library和python的import来加载函数
julia> Pkg.add("PhyloNetworks") #安装PhyloNetworks
julia> Pkg.add("PhyloPlots")

在安装过程中遇到以下报错

1	ERROR: Unable to automatically install 'Bzip2' from '/home/manu/.julia/packages/Bzip2_jll/2H8pU/Artifacts.toml

参考github的论坛ERROR: Unable to automatically install Artifacts.toml issue? · Issue #1705 · JuliaLang/Pkg.jl (github.com)，以下这个解决方法对我的报错有效，在julia交互模式输入以下命令

using Pkg

Pkg.PlatformEngines.probe_platform_engines!()

Pkg.PlatformEngines.download("https://github.com/JuliaBinaryWrappers/MKL_jll.jl/releases/download/MKL-v2020.0.166%2B0/MKL.v2020.0.166.x86_64-apple-darwin14.tar.gz", "MKL_jll.tar.gz"; verbose=true)

再次运行Pkg.add(“PhyloNetworks”)安装成功。

2、软件使用

①准备两个文件，一是“多基因树文件” alltree.rooted.txt，用以计算得到CF表，CF表则用于系统发育网络推断的输入；二是“通过astral软件将多基因树整合得到的树” astral.tre，用于构建系统发育网络起点。在Julia的交互模式下

通过多基因树文件制备CF表tableCF.csv

using PhyloNetworks
using CSV
iqtrees=joinpath("alltree.rooted.txt") #读取多基因树文件
genetrees = readMultiTopology(iqtrees) #解析基因树
q,t = countquartetsintrees(genetrees) #读取基因树，计算四分类群的CFs
df = writeTableCF(q,t) #读取计算得到的CF值到df：基因频率
CSV.write("tableCF.csv", df) #保存df内容为tableCF.csv文件

②构建起始网络

using PhyloNetworks 
astralfile= joinpath("astral.tre") ##读取联合后的基因树文件
astraltree = readMultiTopology(astralfile)[1] #读取文件中的第一棵树
CF = readTableCF("tableCF.csv") #读取CF表的数据
net0 = snaq!(astraltree,CF, hmax=0, filename="net0", seed=1234) #运行评估程序，We first impose the constraint of at most 0 hybrid node, that is, we ask for a tree

最初在运行net0 = snaq!(astraltree,CF, hmax=0, filename=”net0”, seed=1234)过程中遇到以下报错

1	：MethodError: no method matching snaq!(::Vector{HybridNetwork}, ::DataCF; hmax=0, filename="net0", seed=1234)

原因是读取astralfile树时命令错误，astraltree = readMultiTopology(astralfile)，在最末尾没有添加[1]，可能导致数据类型出现问题。当运行astraltree = readMultiTopology(astralfile)[1]时，可正常运行。

③把得到的net0作为起点来构建hmax=1的网络

1	net1 = snaq!(net0, raxmlCF, hmax=1, filename="net1", seed=1235)

④迭代运行得到net2、net3……

net2 = snaq!(net1, raxmlCF, hmax=2, filename="net2", seed=1236)
net3 = snaq!(net2, raxmlCF, hmax=2, filename="net3", seed=1237)
...... 
#若最佳hmax在5以下，可用此方案

另外，如果要运行的h值较多，也可以利用脚本运行，以下脚本未考虑迭代，即每次h值得运行都用同一个起始树astral.tre

当h为1时，设置文件名为runSNaQ_h1.jl，内容如下

#!/usr/bin/env julia

# file "runSNaQ.jl". run in the shell like this in general:
# julia runSNaQ.jl hvalue nruns
# example for h=2 and default 10 runs:
# julia runSNaQ.jl 2
# or example for h=3 and 50 runs:
# julia runSNaQ.jl 3 50

length(ARGS) > 0 ||
    error("need 1 or 2 arguments: # reticulations (h) and # runs (optional, 10 by default)")
h = parse(Int, ARGS[1])
nruns = 10
if length(ARGS) > 1
    nruns = parse(Int, ARGS[2])
end
outputfile = string("net", h, "_", nruns, "runs") # example: "net2_10runs"
seed = 1234 + h # change as desired! Best to have it different for different h
@info "will run SNaQ with h=$h, # of runs=$nruns, seed=$seed, output will go to: $outputfile"

using Distributed
addprocs(nruns)
@everywhere using PhyloNetworks
net0_h1 = readTopology("astral.tre");  #读取起始树，为了避免并行时linux系统环境变量得区分，在h为1时设置为net0_h1
using DataFrames, CSV
df_sp = DataFrame(CSV.File("tableCF.csv", pool=false); copycols=false); #读取CF表
d_sp = readTableCF!(df_sp);
net_h1 = snaq!(net0_h1, d_sp, hmax=h, filename=outputfile, seed=seed, runs=nruns)

当h为2时，设置文件名为runSNaQ_h2.jl，内容如下

#!/usr/bin/env julia

# file "runSNaQ.jl". run in the shell like this in general:
# julia runSNaQ.jl hvalue nruns
# example for h=2 and default 10 runs:
# julia runSNaQ.jl 2
# or example for h=3 and 50 runs:
# julia runSNaQ.jl 3 50

length(ARGS) > 0 ||
    error("need 1 or 2 arguments: # reticulations (h) and # runs (optional, 10 by default)")
h = parse(Int, ARGS[1])
nruns = 10
if length(ARGS) > 1
    nruns = parse(Int, ARGS[2])
end
outputfile = string("net", h, "_", nruns, "runs") # example: "net2_10runs"
seed = 1234 + h # change as desired! Best to have it different for different h
@info "will run SNaQ with h=$h, # of runs=$nruns, seed=$seed, output will go to: $outputfile"

using Distributed
addprocs(nruns)
@everywhere using PhyloNetworks
net0_h2 = readTopology("astral.tre");  #此处net0_h2需要修改
using DataFrames, CSV
df_sp = DataFrame(CSV.File("tableCF.csv", pool=false); copycols=false);
d_sp = readTableCF!(df_sp);
net_h2 = snaq!(net0_h2, d_sp, hmax=h, filename=outputfile, seed=seed, runs=nruns) #此处net0_h2需要修改

以此类推，设置h为3、4、5、6…….时得jl文件。

将运行命令写入到同一目录下run_julia.sh文件中

julia runSNaQ_h1.jl 1  #数字为设置得h值
julia runSNaQ_h2.jl 2
julia runSNaQ_h3.jl 3
julia runSNaQ_h4.jl 4
julia runSNaQ_h5.jl 5
julia runSNaQ_h6.jl 6
julia runSNaQ_h7.jl 7
julia runSNaQ_h8.jl 8
julia runSNaQ_h9.jl 9
julia runSNaQ_h10.jl 10
julia runSNaQ_h11.jl 11
julia runSNaQ_h12.jl 12

利用nohup或screen+ParaFly组合后台运行

1 2	screen -S julia ParaFly -c run_julia.sh -CPU 12

3、结果解读

①选择最佳系统发育网络

选择运行结果中-loglik值最小的hman值时的运行结果，将不同h值得-loglik值统计好，可利用excel做折线图。

②可视化，假设h为3时所得的结果为最佳网络树

using PhyloNetworks
using PhyloPlots
using RCall

net = readTopology("net3_10runs.out")  #将最佳网络树存为变量net
writeTopology(net, "bestnet_h3.tre") #将最佳网络树写到bestnet_h3.tre文件
rootatnode!(net,"mcap")  #对树进行定根
imagefilename = "snaqplot_net_root.svg" #命名
R"svg"(imagefilename, width=4, height=3) #将图片存为svg格式
R"par"(mar=[0,0,0,0])
plot(net, showgamma=true, showedgenumber=true);
R"dev.off()";  #将图片存为svg文件

二、系统发育网络推断软件PhyloNet的使用（2024.05.29更新）

该软件官网链接为PhyloNet Tutorial (rice.edu)

具体用法参考Phylonet Tutorial (SSB 2020) - Phylonet - Rice University Campus Wiki

1、下载安装：下载jar文件后，若系统java版本大于等于1.8.0就可以使用

2、PhyloNet软件使用

准备一个nex格式文件即可，script.nex包括基因树信息以及程序运行命令，举个栗子：

#NEXUS

BEGIN TREES;

Tree gt0=(a:2.299,((e:1.329,f:1.329):0.77,(c:1.684,(b:1.232,d:1.232):0.451):0.416):0.2);
Tree gt1=(a:2.085,((e:1.376,f:1.376):0.696,(d:1.487,(b:1.241,c:1.241):0.246):0.585):0.013);
Tree gt2=(((c:0.52,d:0.52):1.243,(e:1.403,f:1.403):0.36):1.025,(a:1.863,b:1.863):0.925);
Tree gt3=(a:2.82,((b:1.051,(c:0.86,d:0.86):0.19):1.357,(e:1.365,f:1.365):1.043):0.412);
Tree gt4=((e:1.3,f:1.3):0.994,(a:1.869,(b:1.255,(c:0.849,d:0.849):0.405):0.615):0.425);
Tree gt5=(a:2.46,((b:1.077,c:1.077):0.857,(f:1.141,(d:0.505,e:0.505):0.636):0.793):0.526);
Tree gt6=(a:2.025,((b:1.111,c:1.111):0.416,(f:1.304,(d:0.727,e:0.727):0.577):0.223):0.498);
Tree gt7=((d:1.526,(e:1.415,f:1.415):0.111):0.982,(a:2.188,(b:1.532,c:1.532):0.656):0.32);
Tree gt8=(a:2.234,((b:1.057,c:1.057):0.766,(f:1.301,(d:0.849,e:0.849):0.452):0.522):0.411);
Tree gt9=((e:1.644,(b:1.361,(c:0.503,d:0.503):0.858):0.283):0.787,(a:2.226,f:2.226):0.205);
Tree gt10=(a:2.917,((e:1.683,(c:0.961,d:0.961):0.722):0.886,(b:1.779,f:1.779):0.79):0.348);
Tree gt11=(a:2.391,((b:1.041,(c:0.602,d:0.602):0.439):0.516,(e:1.164,f:1.164):0.393):0.834);
Tree gt12=((b:1.21,c:1.21):1.622,(a:2.443,(f:1.804,(d:0.583,e:0.583):1.221):0.639):0.389);
Tree gt13=(a:2.047,((b:1.025,c:1.025):0.519,(f:1.295,(d:0.738,e:0.738):0.556):0.249):0.503);
Tree gt14=(a:2.58,((d:0.919,e:0.919):0.834,(f:1.503,(b:1.228,c:1.228):0.275):0.251):0.827);
Tree gt15=((f:1.267,(d:0.871,e:0.871):0.396):1.67,(a:2.181,(b:1.362,c:1.362):0.819):0.756);
Tree gt16=(a:3.016,(b:1.892,(c:1.816,(f:1.479,(d:0.812,e:0.812):0.667):0.337):0.076):1.124);
Tree gt17=((f:1.186,(d:0.721,e:0.721):0.465):1.822,(a:2.031,(b:1.13,c:1.13):0.902):0.977);
Tree gt18=((c:1.51,(f:1.166,(d:0.521,e:0.521):0.645):0.345):1.218,(a:2.073,b:2.073):0.655);
Tree gt19=(a:2.329,((b:1.354,c:1.354):0.467,(f:1.392,(d:0.955,e:0.955):0.437):0.429):0.508);
Tree gt20=(a:3.31,((e:1.083,f:1.083):2.08,(b:1.923,(c:0.538,d:0.538):1.385):1.241):0.146);

END;


BEGIN PHYLONET;

InferNetwork_MPL (all) 2 -pl 8;

END;

①要准备script.nex文件，需要以下几步：a.对已有的多个序列分别进行比对、剪切、构树，可以参考test_pipeline-of-phylogeny | TianzhenWu’ Blog (wu-tz.github.io)。b. 获取多个treefile后，分别对他们进行定根处理，可以参考利用ete3批量对基因树定根并检测单系性 | TianzhenWu’ Blog (wu-tz.github.io)。c. 将所有treefile放到一个文件中，利用shell命令补全每一行的前缀“Tree gt1=”。d. 手动补全nexus格式文件的开始和结尾。e. 在文件末尾补充PHYLONET程序的参数，方法的选择参考https://phylogenomics.rice.edu/html/phylonetTutorial.html。

②运行命令如下

1	java -jar PhyloNetv3_8_2.jar script.nex

3、PhyloNet结果解读

每次运行输出5个可能的进化网络树，分别对应着统计量Total log probability，该值越大越可靠。

4、 Visualizing a Phylogenetic Network

Phylogenetic network in Rich Newick string can be visualized in Dendroscope or icytree. The former needs downloading, and the latter is online. However, Dendroscope cannot recognize inheritance probabilities (branch lengths are fine), and icytree sometimes can and sometimes cannot. You need to remove those probabilities manually from the Rich Newick string, or use option “-di” so that PhyloNet returns the network that Dendroscope takes directly. 此处参考https://wiki.rice.edu/confluence/pages/viewpage.action?pageId=39500205#PhylonetTutorial(SSB2020)-3.BasicUsage