IgPhyML lineage tree analysis¶
IgPhyML is a program designed to build phylogenetic trees and test evolutionary hypotheses regarding B cell affinity maturation.
The biology of B cell somatic hypermutation (SHM) violates important assumptions in most standard phylogenetic substitution models; further, while most phylogenetics programs are designed to analyze single lineages, B cell repertoires typically contain thousands of lineages. IgPhyML addresses both of these issues by implementing substitution models that correct for the context-sensitive nature of SHM, and combines information from multiple lineages to give more precisely estimated repertoire-wide model parameter estimates.
An in-depth description of IgPhyML installation and usage can be found at the IgPhyML website.
Once installed, IgPhyML can be run through
by specifying the
--igphyml option. IgPhyML is easiest to run through the
Immcantation Docker image.
If this is not possible, these instructions require Change-O 0.4.6 or higher, Alakazam 0.3.0 or higher,
and IgPhyML to be installed, with the executable in your
The following commands should work as a first pass on many reasonably sized datasets, but if you really want to understand what’s going on or make sure what you’re doing makes sense, please check out the IgPhyML website.
Build trees and estimate model parameters¶
Download the IgPhyML repository, move to the
examples folder, and run
# Clone IgPhyML repository to get example files git clone https://bitbucket.org/kleinstein/igphyml # Move to examples directory cd igphyml/examples # Run BuildTrees BuildTrees.py -d example.tsv --outname ex --log ex.log --collapse \ --sample 3000 --igphyml --clean all --nproc 1
This command processes an AIRR-formatted dataset of BCR sequences that have been
with germlines reconstructed.
It then quickly builds trees using the GY94 model and, using these
fixed topologies, estimates HLP19 model parameters. This can be sped up by
--nproc option. Subsampling using the
--sample option in isn’t
strictly necessary, but IgPhyML will run slowly when applied to large datasets.
--collapse flag is used to collapse identical sequences. This is
highly recommended because identical sequences slow down calculations without
affecting likelihood values in IgPhyML.
The output file of the above command can be read using the
After opening an
R session in the
examples subfolder, enter the following commands. Note that
when using the Docker container, you’ll need to run
plotting the tree to create a pdf plot in the
library(alakazam) library(igraph) db = readIgphyml("ex_igphyml-pass.tab") # Plot largest lineage tree plot(db$trees[],layout=layout_as_tree) # Show HLP10 parameters print(t(db$param[1,])) CLONE "REPERTOIRE" NSEQ "4" NSITE "107" TREE_LENGTH "0.286" LHOOD "-290.7928" KAPPA_MLE "2.266" OMEGA_FWR_MLE "0.5284" OMEGA_CDR_MLE "2.3324" WRC_2_MLE "4.8019" GYW_0_MLE "3.4464" WA_1_MLE "5.972" TW_0_MLE "0.8131" SYC_2_MLE "-0.99" GRS_0_MLE "0.2583"
To visualize a larger dataset with bigger trees, and bifurcating tree topologies,
again open an
R session in the
library(alakazam) library(ape) db = readIgphyml("sample1_igphyml-pass.tab",format="phylo") # Plot largest lineage tree plot(ladderize(db$trees[]),cex=0.7,no.margin=TRUE)