CreateGermlines.py
Reconstructs germline sequences from alignment data
usage: CreateGermlines.py [--version] [-h] -d DB_FILES [DB_FILES ...]
[-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
[--outname OUT_NAME] [--log LOG_FILE] [--failed]
[--format {airr,changeo}] -r REFERENCES
[REFERENCES ...]
[-g {full,dmask,vonly,regions} [{full,dmask,vonly,regions} ...]]
[--cloned] [--sf SEQ_FIELD] [--vf V_FIELD]
[--df D_FIELD] [--jf J_FIELD] [--cf CLONE_FIELD]
- --version
show program’s version number and exit
- -h, --help
show this help message and exit
- -d <db_files>
A list of tab delimited database files.
- -o <out_files>
Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).
- --outdir <out_dir>
Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
- --outname <out_name>
Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
- --log <log_file>
Specify to write verbose logging to a file. May not be specified with multiple input files.
- --failed
If specified create files containing records that fail processing.
- --format {airr,changeo}
Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.
- -r <references>
List of folders and/or fasta files (with .fasta, .fna or .fa extension) with germline sequences. When using the default Change-O sequence and coordinate fields, these reference sequences must contain IMGT-numbering spacers (gaps) in the V segment. Alternative numbering schemes, or no numbering, may work for alternative sequence and coordinate definitions that define a valid alignment, but a warning will be issued.
- -g {full,dmask,vonly,regions}
Specify type(s) of germlines to include full germline, germline with D segment masked, or germline for V segment only.
- --cloned
Specify to create only one germline per clone. Note, if allele calls are ambiguous within a clonal group, this will place the germline call used for the entire clone within the germline_v_call, germline_d_call and germline_j_call fields.
- --sf <seq_field>
Field containing the aligned sequence. Defaults to sequence_alignment (airr) or SEQUENCE_IMGT (changeo).
- --vf <v_field>
Field containing the germline V segment call. Defaults to v_call (airr) or V_CALL (changeo).
- --df <d_field>
Field containing the germline D segment call. Defaults to d_call (airr) or D_CALL (changeo).
- --jf <j_field>
Field containing the germline J segment call. Defaults to j_call (airr) or J_CALL (changeo).
- --cf <clone_field>
Field containing clone identifiers. Ignored if –cloned is not also specified. Defaults to clone_id (airr) or CLONE (changeo).
- output files:
- germ-pass
database with assigned germline sequences.
- germ-fail
database with records failing germline assignment.
- required fields:
sequence_id, sequence_alignment, v_call, d_call, j_call, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, np1_length, np2_length
- optional fields:
n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length, clone_id
- output fields:
germline_v_call, germline_d_call, germline_j_call, germline_alignment, germline_alignment_d_mask, germline_alignment_v_region, germline_regions,