BuildTrees.py¶
Converts TSV files into IgPhyML input files
usage: BuildTrees.py [--version] [-h] -d DB_FILES [DB_FILES ...]
[--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE]
[--failed] [--format {airr,changeo}] [--collapse]
[--ncdr3] [--nmask] [--md META_DATA [META_DATA ...]]
[--clones TARGET_CLONES [TARGET_CLONES ...]]
[--minseq MIN_SEQ] [--sample SAMPLE_DEPTH]
[--append APPEND [APPEND ...]] [--igphyml]
[--nproc NPROC] [--clean {none,all}]
[--optimize {n,r,l,lr,tl,tlr}] [--omega OMEGA] [-t KAPPA]
[--motifs MOTIFS] [--hotness HOTNESS]
[--oformat {tab,txt}] [--nohlp] [--asr ASR]
-
--version
¶
show program’s version number and exit
-
-h
,
--help
¶
show this help message and exit
-
-d
<db_files>
¶ A list of tab delimited database files.
-
--outdir
<out_dir>
¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname
<out_name>
¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
--log
<log_file>
¶ Specify to write verbose logging to a file. May not be specified with multiple input files.
-
--failed
¶
If specified create files containing records that fail processing.
-
--format
{airr,changeo}
¶ Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.
-
--collapse
¶
If specified, collapse identical sequences before exporting to fasta.
-
--ncdr3
¶
If specified, remove CDR3 from all sequences.
-
--nmask
¶
If specified, do not attempt to mask split codons.
-
--md
<meta_data>
¶ List of fields to containing metadata to include in output fasta file sequence headers.
-
--clones
<target_clones>
¶ List of clone IDs to output, if specified.
-
--minseq
<min_seq>
¶ Minimum number of data sequences. Any clones with fewer than the specified number of sequences will be excluded.
-
--sample
<sample_depth>
¶ Depth of reads to be subsampled (before deduplication).
-
--append
<append>
¶ List of columns to append to sequence ID to ensure uniqueness.
-
--igphyml
¶
Run IgPhyML on output?
-
--nproc
<nproc>
¶ Number of threads to parallelize IgPhyML across.
-
--clean
{none,all}
¶ Delete intermediate files? none: leave all intermediate files; all: delete all intermediate files.
-
--optimize
{n,r,l,lr,tl,tlr}
¶ Optimize combination of topology (t) branch lengths (l) and parameters (r), or nothing (n), for IgPhyML.
-
--omega
<omega>
¶ Omega parameters to estimate for FWR,CDR respectively: e = estimate, ce = estimate + confidence interval, or numeric value
-
-t
<kappa>
¶ Kappa parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value
-
--motifs
<motifs>
¶ Which motifs to estimate mutability.
-
--hotness
<hotness>
¶ Mutability parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value
-
--oformat
{tab,txt}
¶ IgPhyML output format.
-
--nohlp
¶
Don’t run HLP model?
-
--asr
<asr>
¶ Ancestral sequence reconstruction interval (0-1).
- output files:
- <folder>
folder containing fasta and partition files for each clone.
- lineages
successfully processed records.
- lineages-fail
database records failed processing.
- igphyml-pass
parameter estimates and lineage trees from running IgPhyML, if specified
- required fields:
sequence_id, sequence, sequence_alignment, germline_alignment_d_mask or germline_alignment, v_call, j_call, clone_id, v_sequence_start