BuildTrees.py

Converts TSV files into IgPhyML input files

usage: BuildTrees.py [--version] [-h] -d DB_FILES [DB_FILES ...]
                     [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE]
                     [--failed] [--format {airr,changeo}] [--collapse]
                     [--ncdr3] [--nmask] [--md META_DATA [META_DATA ...]]
                     [--clones TARGET_CLONES [TARGET_CLONES ...]]
                     [--minseq MIN_SEQ] [--sample SAMPLE_DEPTH]
                     [--append APPEND [APPEND ...]] [--igphyml]
                     [--nproc NPROC] [--clean {none,all}]
                     [--optimize {n,r,l,lr,tl,tlr}] [--omega OMEGA] [-t KAPPA]
                     [--motifs MOTIFS] [--hotness HOTNESS]
                     [--oformat {tab,txt}] [--nohlp] [--asr ASR]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--format {airr,changeo}

Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.

--collapse

If specified, collapse identical sequences before exporting to fasta.

--ncdr3

If specified, remove CDR3 from all sequences.

--nmask

If specified, do not attempt to mask split codons.

--md <meta_data>

List of fields to containing metadata to include in output fasta file sequence headers.

--clones <target_clones>

List of clone IDs to output, if specified.

--minseq <min_seq>

Minimum number of data sequences. Any clones with fewer than the specified number of sequences will be excluded.

--sample <sample_depth>

Depth of reads to be subsampled (before deduplication).

--append <append>

List of columns to append to sequence ID to ensure uniqueness.

--igphyml

Run IgPhyML on output?

--nproc <nproc>

Number of threads to parallelize IgPhyML across.

--clean {none,all}

Delete intermediate files? none: leave all intermediate files; all: delete all intermediate files.

--optimize {n,r,l,lr,tl,tlr}

Optimize combination of topology (t) branch lengths (l) and parameters (r), or nothing (n), for IgPhyML.

--omega <omega>

Omega parameters to estimate for FWR,CDR respectively: e = estimate, ce = estimate + confidence interval, or numeric value

-t <kappa>

Kappa parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value

--motifs <motifs>

Which motifs to estimate mutability.

--hotness <hotness>

Mutability parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value

--oformat {tab,txt}

IgPhyML output format.

--nohlp

Don’t run HLP model?

--asr <asr>

Ancestral sequence reconstruction interval (0-1).

output files:
<folder>

folder containing fasta and partition files for each clone.

lineages

successfully processed records.

lineages-fail

database records failed processing.

igphyml-pass

parameter estimates and lineage trees from running IgPhyML, if specified

required fields:

sequence_id, sequence, sequence_alignment, germline_alignment_d_mask or germline_alignment, v_call, j_call, clone_id, v_sequence_start