MakeDb.py

Create tab-delimited database file to store sequence alignment information

usage: MakeDb.py [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
db-pass

database of alignment records with functionality information, V and J calls, and a junction region.

db-fail

database with records that fail due to no productivity information, no gene V assignment, no J assignment, or no junction region.

universal output fields:

sequence_id, sequence, sequence_alignment, germline_alignment, rev_comp, productive, stop_codon, vj_in_frame, locus, v_call, d_call, j_call, c_call, junction, junction_length, junction_aa, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, np1_length, np2_length, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3

imgt specific output fields:

n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length, d_frame, v_score, v_identity, d_score, d_identity, j_score, j_identity

igblast specific output fields:

v_score, v_identity, v_support, v_cigar, d_score, d_identity, d_support, d_cigar, j_score, j_identity, j_support, j_cigar

ihmm specific output fields:

vdj_score

10x specific output fields:

cell_id, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa

MakeDb.py igblast

Process igblastn output.

usage: MakeDb.py igblast [--version] [-h] [-o OUT_FILES [OUT_FILES ...]]
                         [--outdir OUT_DIR] [--outname OUT_NAME]
                         [--log LOG_FILE] [--failed] [--format {airr,changeo}]
                         -i ALIGNER_FILES [ALIGNER_FILES ...] -r REPO
                         [REPO ...] -s SEQ_FILES [SEQ_FILES ...]
                         [--10x CELLRANGER_FILES [CELLRANGER_FILES ...]]
                         [--asis-id] [--asis-calls] [--extended]
                         [--regions {default,rhesus-igl}] [--infer-junction]
                         [--partial]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--format {airr,changeo}

Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.

-i <aligner_files>

IgBLAST output files in format 7 with query sequence (igblastn argument ‘-outfmt “7 std qseq sseq btop”’).

-r <repo>

List of folders and/or fasta files containing the same germline set used in the IgBLAST alignment. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment.

-s <seq_files>

List of input FASTA files (with .fasta, .fna or .fa extension), containing sequences.

--10x <cellranger_files>

Table file containing 10X annotations (with .csv or .tsv extension).

--asis-id

Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.

--asis-calls

Specify to prevent gene calls from being parsed into standard allele names in both the IgBLAST output and reference database. Note, this requires the sequence identifiers in the reference sequence set and the IgBLAST database to be exact string matches.

--extended

Specify to include additional aligner specific fields in the output. Adds <vdj>_score, <vdj>_identity, <vdj>_support, <vdj>_cigar, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.

--regions {default,rhesus-igl}

IMGT CDR and FWR boundary definition to use.

--infer-junction

Infer the junction sequence. For use with IgBLAST v1.6.0 or older, prior to the addition of IMGT-CDR3 inference.

--partial

If specified, include incomplete V(D)J alignments in the pass file instead of the fail file. An incomplete alignment is defined as a record that is missing a V gene assignment, J gene assignment, junction region, or productivity call.

MakeDb.py igblast-aa

Process igblastp output.

usage: MakeDb.py igblast-aa [--version] [-h] [-o OUT_FILES [OUT_FILES ...]]
                            [--outdir OUT_DIR] [--outname OUT_NAME]
                            [--log LOG_FILE] [--failed]
                            [--format {airr,changeo}] -i ALIGNER_FILES
                            [ALIGNER_FILES ...] -r REPO [REPO ...] -s
                            SEQ_FILES [SEQ_FILES ...]
                            [--10x CELLRANGER_FILES [CELLRANGER_FILES ...]]
                            [--asis-id] [--asis-calls] [--extended]
                            [--regions {default,rhesus-igl}]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--format {airr,changeo}

Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.

-i <aligner_files>

IgBLAST output files in format 7 with query sequence (igblastp argument ‘-outfmt “7 std qseq sseq btop”’).

-r <repo>

List of folders and/or fasta files containing the same germline set used in the IgBLAST alignment. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment.

-s <seq_files>

List of input FASTA files (with .fasta, .fna or .fa extension), containing sequences.

--10x <cellranger_files>

Table file containing 10X annotations (with .csv or .tsv extension).

--asis-id

Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.

--asis-calls

Specify to prevent gene calls from being parsed into standard allele names in both the IgBLAST output and reference database. Note, this requires the sequence identifiers in the reference sequence set and the IgBLAST database to be exact string matches.

--extended

Specify to include additional aligner specific fields in the output. Adds v_score, v_identity, v_support, v_cigar, fwr1, fwr2, fwr3, cdr1 and cdr2.

--regions {default,rhesus-igl}

IMGT CDR and FWR boundary definition to use.

MakeDb.py ihmm

Process iHMMune-Align output.

usage: MakeDb.py ihmm [--version] [-h] [-o OUT_FILES [OUT_FILES ...]]
                      [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE]
                      [--failed] [--format {airr,changeo}] -i ALIGNER_FILES
                      [ALIGNER_FILES ...] -r REPO [REPO ...] -s SEQ_FILES
                      [SEQ_FILES ...]
                      [--10x CELLRANGER_FILES [CELLRANGER_FILES ...]]
                      [--asis-id] [--extended] [--partial]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--format {airr,changeo}

Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.

-i <aligner_files>

iHMMune-Align output file.

-r <repo>

List of folders and/or FASTA files containing the set of germline sequences used by iHMMune-Align. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment.

-s <seq_files>

List of input FASTA files (with .fasta, .fna or .fa extension) containing sequences.

--10x <cellranger_files>

Table file containing 10X annotations (with .csv or .tsv extension).

--asis-id

Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.

--extended

Specify to include additional aligner specific fields in the output. Adds the path score of the iHMMune-Align hidden Markov model as vdj_score; adds fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.

--partial

If specified, include incomplete V(D)J alignments in the pass file instead of the fail file. An incomplete alignment is defined as a record that is missing a V gene assignment, J gene assignment, junction region, or productivity call.

MakeDb.py imgt

Process IMGT/HighV-Quest output (does not work with V-QUEST).

usage: MakeDb.py imgt [--version] [-h] [-o OUT_FILES [OUT_FILES ...]]
                      [--outdir OUT_DIR] [--outname OUT_NAME] [--log LOG_FILE]
                      [--failed] [--format {airr,changeo}] -i ALIGNER_FILES
                      [ALIGNER_FILES ...] [-s [SEQ_FILES ...]]
                      [-r REPO [REPO ...]]
                      [--10x CELLRANGER_FILES [CELLRANGER_FILES ...]]
                      [--extended] [--asis-id] [--imgt-id-len IMGT_ID_LEN]
                      [--partial]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--failed

If specified create files containing records that fail processing.

--format {airr,changeo}

Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.

-i <aligner_files>

Either zipped IMGT output files (.zip or .txz) or a folder containing unzipped IMGT output files (which must include 1_Summary, 2_IMGT-gapped, 3_Nt-sequences, and 6_Junction).

-s <seq_files>

List of FASTA files (with .fasta, .fna or .fa extension) that were submitted to IMGT/HighV-QUEST. If unspecified, sequence identifiers truncated by IMGT/HighV-QUEST will not be corrected.

-r <repo>

List of folders and/or fasta files containing the germline sequence set used by IMGT/HighV-QUEST. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment. If unspecified, the germline sequence reconstruction will not be included in the output.

--10x <cellranger_files>

Table file containing 10X annotations (with .csv or .tsv extension).

--extended

Specify to include additional aligner specific fields in the output. Adds <vdj>_score, <vdj>_identity>, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length and d_frame.

--asis-id

Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.

--imgt-id-len <imgt_id_len>

The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST. Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older than 1.8.3 (May 7, 2021).

--partial

If specified, include incomplete V(D)J alignments in the pass file instead of the fail file. An incomplete alignment is defined as a record that is missing a V gene assignment, J gene assignment, junction region, or productivity call.