ConvertDb.py

Parses tab delimited database files

usage: ConvertDb.py [--version] [-h]  ...
--version

show program’s version number and exit

-h, --help

show this help message and exit

output files:
airr

AIRR formatted database files.

changeo

Change-O formatted database files.

sequences

FASTA formatted sequences output from the subcommands fasta and clip.

genbank

feature tables and fasta files containing MiAIRR compliant input for tbl2asn.

required fields:

sequence_id, sequence, sequence_alignment, junction, v_call, d_call, j_call, v_germline_start, v_germline_end, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end

optional fields:

germline_alignment, c_call, clone_id

ConvertDb.py airr

Converts input to an AIRR TSV file.

usage: ConvertDb.py airr [--version] [-h] -d DB_FILES [DB_FILES ...]
                         [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                         [--outname OUT_NAME]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

ConvertDb.py baseline

Creates a BASELINe fasta file from database records.

usage: ConvertDb.py baseline [--version] [-h] -d DB_FILES [DB_FILES ...]
                             [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                             [--outname OUT_NAME] [--if ID_FIELD]
                             [--sf SEQ_FIELD] [--gf GERM_FIELD]
                             [--cf CLUSTER_FIELD]
                             [--mf META_FIELDS [META_FIELDS ...]]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--if <id_field>

The name of the field containing identifiers

--sf <seq_field>

The name of the field containing reads

--gf <germ_field>

The name of the field containing germline sequences

--cf <cluster_field>

The name of the field containing containing sorted clone IDs

--mf <meta_fields>

List of annotation fields to add to the sequence description

ConvertDb.py changeo

Converts input into a Change-O TSV file.

usage: ConvertDb.py changeo [--version] [-h] -d DB_FILES [DB_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

ConvertDb.py fasta

Creates a fasta file from database records.

usage: ConvertDb.py fasta [--version] [-h] -d DB_FILES [DB_FILES ...]
                          [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                          [--outname OUT_NAME] [--if ID_FIELD]
                          [--sf SEQ_FIELD]
                          [--mf META_FIELDS [META_FIELDS ...]]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--if <id_field>

The name of the field containing identifiers

--sf <seq_field>

The name of the field containing sequences

--mf <meta_fields>

List of annotation fields to add to the sequence description

ConvertDb.py genbank

Creates files for GenBank/TLS submissions.

usage: ConvertDb.py genbank [--version] [-h] -d DB_FILES [DB_FILES ...]
                            [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
                            [--outname OUT_NAME] [--format {airr,changeo}]
                            [--mol MOLECULE] [--product PRODUCT]
                            [--db DB_XREF] [--inf INFERENCE]
                            [--organism ORGANISM] [--sex SEX]
                            [--isolate ISOLATE] [--tissue TISSUE]
                            [--cell-type CELL_TYPE] [-y YAML_CONFIG]
                            [--label LABEL] [--cf C_FIELD] [--nf COUNT_FIELD]
                            [--if INDEX_FIELD] [--allow-stop] [--asis-id]
                            [--asis-calls] [--allele-delim ALLELE_DELIM]
                            [--asn] [--sbt ASN_TEMPLATE] [--exec TBL2ASN_EXEC]
--version

show program’s version number and exit

-h, --help

show this help message and exit

-d <db_files>

A list of tab delimited database files.

-o <out_files>

Explicit output file name. Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--format {airr,changeo}

Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.

--mol <molecule>

The source molecule type. Usually one of “mRNA” or “genomic DNA”.

--product <product>

The product name, such as “immunoglobulin heavy chain”.

--db <db_xref>

Name of the reference database used for alignment. Usually “IMGT/GENE-DB”.

--inf <inference>

Name and version of the inference tool used for reference alignment in the form tool:version.

--organism <organism>

The scientific name of the organism.

--sex <sex>

If specified, adds the given sex annotation to the fasta headers.

--isolate <isolate>

If specified, adds the given isolate annotation (sample label) to the fasta headers.

--tissue <tissue>

If specified, adds the given tissue-type annotation to the fasta headers.

--cell-type <cell_type>

If specified, adds the given cell-type annotation to the fasta headers.

-y <yaml_config>

A yaml file specifying sample features (BioSample attributes) in the form ‘variable: value’. If specified, any features provided in the yaml file will override those provided at the commandline. Note, this config file applies to sample features only and cannot be used for required source features such as the –product or –mol argument.

--label <label>

If specified, add a field name to the sequence identifier. Sequence identifiers will be output in the form <label>=<id>.

--cf <c_field>

Field containing the C region call. If unspecified, the C region gene call will be excluded from the feature table.

--nf <count_field>

If specified, use the provided column to add the AIRR_READ_COUNT note to the feature table.

--if <index_field>

If specified, use the provided column to add the AIRR_CELL_INDEX note to the feature table.

--allow-stop

If specified, retain records in the output with stop codons in the junction region. In such records the CDS will be removed and replaced with a similar misc_feature in the feature table.

--asis-id

If specified, use the existing sequence identifier for the output identifier. By default, only the row number will be used as the identifier to avoid the 50 character limit.

--asis-calls

Specify to prevent alleles from being parsed using the IMGT nomenclature. Note, this requires the gene assignments to be exact matches to valid records in the references database specified by the –db argument.

--allele-delim <allele_delim>

The delimiter to use for splitting the gene name from the allele number. Note, this only applies when specifying –asis-calls. By default, this argument will be ignored and allele numbers extracted under the expectation of IMGT nomenclature consistency.

--asn

If specified, run tbl2asn to generate the .sqn submission file after making the .fsa and .tbl files.

--sbt <asn_template>

If provided along with –asn, use the specified file for the template file argument to tbl2asn.

--exec <tbl2asn_exec>

The name or location of the tbl2asn executable.