BuildTrees

Converts TSV files into IgPhyML input files

usage: BuildTrees [--version] [-h] -d DB_FILES [DB_FILES ...]
                  [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE]
                  [--failed] [--format {changeo,airr}] [--collapse] [--ncdr3]
                  [--md META_DATA [META_DATA ...]]
                  [--clones TARGET_CLONES [TARGET_CLONES ...]]
                  [--minseq MIN_SEQ] [--sample SAMPLE_DEPTH]
                  [--append APPEND [APPEND ...]] [--igphyml] [--nproc NPROC]
                  [--clean {none,all}] [--optimize {n,r,l,lr,tl,tlr}]
                  [--omega {e,ce,e,e,ce,e,e,ce,ce,ce}] [-t {e,ce}]
                  [--motifs MOTIFS] [--hotness HOTNESS] [--oformat {tab,txt}]
                  [--nohlp]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--format {changeo,airr}

Specify input and output format.

--collapse

If specified, collapse identical sequences before exporting to fasta.

--ncdr3

If specified, remove CDR3 from all sequences.

--md <meta_data>

List of fields to containing metadata to include in output fasta file sequence headers.

--clones <target_clones>

List of clone IDs to output, if specified.

--minseq <min_seq>

Minimum number of data sequences. Any clones with fewer than the specified number of sequences will be excluded.

--sample <sample_depth>

Depth of reads to be subsampled (before deduplication).

--append <append>

List of columns to append to sequence ID to ensure uniqueness.

--igphyml

Run IgPhyML on output?

--nproc <nproc>

Number of threads to parallelize IgPhyML across.

--clean {none,all}

Delete intermediate files? none: leave all intermediate files; all: delete all intermediate files.

--optimize {n,r,l,lr,tl,tlr}

Optimize combination of topology (t) branch lengths (l) and parameters (r), or nothing (n), for IgPhyML.

--omega {e,ce,e,e,ce,e,e,ce,ce,ce}

Omega parameters to estimate for FWR,CDR respectively: e = estimate, ce = estimate + confidence interval

-t {e,ce}

Kappa parameters to estimate: e = estimate, ce = estimate + confidence interval

--motifs <motifs>

Which motifs to estimate mutability.

--hotness <hotness>

Mutability parameters to estimate: e = estimate, ce = estimate + confidence interval

--oformat {tab,txt}

IgPhyML output format.

--nohlp

Don’t run HLP model?

output files:
<folder>
folder containing fasta and partition files for each clone.
lineages
successfully processed records.
lineages-fail
database records failed processing.
igphyml-pass
parameter estimates and lineage trees from running IgPhyML, if specified
required fields:
SEQUENCE_ID, SEQUENCE_INPUT, SEQUENCE_IMGT, GERMLINE_IMGT_D_MASK, V_CALL, J_CALL, CLONE, V_SEQ_START