BuildTrees.py
Converts TSV files into IgPhyML input files
usage: BuildTrees.py [--version] [-h] -d DB_FILES [DB_FILES ...]
[--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE]
[--failed] [--format {airr,changeo}] [--collapse]
[--ncdr3] [--nmask] [--md META_DATA [META_DATA ...]]
[--clones TARGET_CLONES [TARGET_CLONES ...]]
[--minseq MIN_SEQ] [--sample SAMPLE_DEPTH]
[--append APPEND [APPEND ...]] [--igphyml]
[--nproc NPROC] [--clean {none,all}]
[--optimize {n,r,l,lr,tl,tlr}] [--omega OMEGA] [-t KAPPA]
[--motifs MOTIFS] [--hotness HOTNESS]
[--oformat {tab,txt}] [--nohlp] [--asr ASR]
- --version
show program’s version number and exit
- -h, --help
show this help message and exit
- -d <db_files>
A list of tab delimited database files.
- --outdir <out_dir>
Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
- --outname <out_name>
Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
- --log <log_file>
Specify to write verbose logging to a file. May not be specified with multiple input files.
- --failed
If specified create files containing records that fail processing.
- --format {airr,changeo}
Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.
- --collapse
If specified, collapse identical sequences before exporting to fasta.
- --ncdr3
If specified, remove CDR3 from all sequences.
- --nmask
If specified, do not attempt to mask split codons.
- --md <meta_data>
List of fields to containing metadata to include in output fasta file sequence headers.
- --clones <target_clones>
List of clone IDs to output, if specified.
- --minseq <min_seq>
Minimum number of data sequences. Any clones with fewer than the specified number of sequences will be excluded.
- --sample <sample_depth>
Depth of reads to be subsampled (before deduplication).
- --append <append>
List of columns to append to sequence ID to ensure uniqueness.
- --igphyml
Run IgPhyML on output?
- --nproc <nproc>
Number of threads to parallelize IgPhyML across.
- --clean {none,all}
Delete intermediate files? none: leave all intermediate files; all: delete all intermediate files.
- --optimize {n,r,l,lr,tl,tlr}
Optimize combination of topology (t) branch lengths (l) and parameters (r), or nothing (n), for IgPhyML.
- --omega <omega>
Omega parameters to estimate for FWR,CDR respectively: e = estimate, ce = estimate + confidence interval, or numeric value
- -t <kappa>
Kappa parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value
- --motifs <motifs>
Which motifs to estimate mutability.
- --hotness <hotness>
Mutability parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value
- --oformat {tab,txt}
IgPhyML output format.
- --nohlp
Don’t run HLP model?
- --asr <asr>
Ancestral sequence reconstruction interval (0-1).
- output files:
- <folder>
folder containing fasta and partition files for each clone.
- lineages
successfully processed records.
- lineages-fail
database records failed processing.
- igphyml-pass
parameter estimates and lineage trees from running IgPhyML, if specified
- required fields:
sequence_id, sequence, sequence_alignment, germline_alignment_d_mask or germline_alignment, v_call, j_call, clone_id, v_sequence_start