DefineClones.py¶
Assign Ig sequences into clones
usage: DefineClones.py [--version] [-h] -d DB_FILES [DB_FILES ...]
[-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
[--outname OUT_NAME] [--log LOG_FILE] [--failed]
[--format {airr,changeo}] [--nproc NPROC]
[--sf SEQ_FIELD] [--vf V_FIELD] [--jf J_FIELD]
[--gf GROUP_FIELDS [GROUP_FIELDS ...]]
[--mode {allele,gene}] [--act {first,set}]
[--model {ham,aa,hh_s1f,hh_s5f,mk_rs1nf,mk_rs5nf,hs1f_compat,m1n_compat}]
[--dist DISTANCE] [--norm {len,mut,none}]
[--sym {avg,min}] [--link {single,average,complete}]
[--maxmiss MAX_MISSING]
-
--version
¶
show program’s version number and exit
-
-h
,
--help
¶
show this help message and exit
-
-d
<db_files>
¶ A list of tab delimited database files.
-
-o
<out_files>
¶ Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).
-
--outdir
<out_dir>
¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname
<out_name>
¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
--log
<log_file>
¶ Specify to write verbose logging to a file. May not be specified with multiple input files.
-
--failed
¶
If specified create files containing records that fail processing.
-
--format
{airr,changeo}
¶ Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.
-
--nproc
<nproc>
¶ The number of simultaneous computational processes to execute (CPU cores to utilized).
-
--sf
<seq_field>
¶ Field to be used to calculate distance between records. Defaults to junction (airr) or JUNCTION (changeo).
-
--vf
<v_field>
¶ Field containing the germline V segment call. Defaults to v_call (airr) or V_CALL (changeo).
-
--jf
<j_field>
¶ Field containing the germline J segment call. Defaults to j_call (airr) or J_CALL (changeo).
-
--gf
<group_fields>
¶ Additional fields to use for grouping clones aside from V, J and junction length.
-
--mode
{allele,gene}
¶ Specifies whether to use the V(D)J allele or gene for initial grouping.
-
--act
{first,set}
¶ Specifies how to handle multiple V(D)J assignments for initial grouping. The “first” action will use only the first gene listed. The “set” action will use all gene assignments and construct a larger gene grouping composed of any sequences sharing an assignment or linked to another sequence by a common assignment (similar to single-linkage).
-
--model
{ham,aa,hh_s1f,hh_s5f,mk_rs1nf,mk_rs5nf,hs1f_compat,m1n_compat}
¶ Specifies which substitution model to use for calculating distance between sequences. The “ham” model is nucleotide Hamming distance and “aa” is amino acid Hamming distance. The “hh_s1f” and “hh_s5f” models are human specific single nucleotide and 5-mer content models, respectively, from Yaari et al, 2013. The “mk_rs1nf” and “mk_rs5nf” models are mouse specific single nucleotide and 5-mer content models, respectively, from Cui et al, 2016. The “m1n_compat” and “hs1f_compat” models are deprecated models provided backwards compatibility with the “m1n” and “hs1f” models in Change-O v0.3.3 and SHazaM v0.1.4. Both 5-mer models should be considered experimental.
-
--dist
<distance>
¶ The distance threshold for clonal grouping
-
--norm
{len,mut,none}
¶ Specifies how to normalize distances. One of none (do not normalize), len (normalize by length), or mut (normalize by number of mutations between sequences).
-
--sym
{avg,min}
¶ Specifies how to combine asymmetric distances. One of avg (average of A->B and B->A) or min (minimum of A->B and B->A).
-
--link
{single,average,complete}
¶ Type of linkage to use for hierarchical clustering.
-
--maxmiss
<max_missing>
¶ The maximum number of non-ACGT characters (gaps or Ns) to permit in the junction sequence before excluding the record from clonal assignment. Note, under single linkage non-informative positions can create artifactual links between unrelated sequences. Use with caution.
- output files:
- clone-pass
database with assigned clonal group numbers.
- clone-fail
database with records failing clonal grouping.
- required fields:
sequence_id, v_call, j_call, junction
- output fields:
clone_id