Filtering records

The ParseDb tool provides a basic set of operations for manipulating Change-O database files from the commandline, including removing or updating rows and columns.

Removing non-functional sequences

After building a Change-O database from either IMGT/HighV-QUEST or IgBLAST output, you may wish to subset your data to only functional sequences. This can be done in one of two roughly equivalent ways using the ParseDb tool:

1
2
ParseDb.py select -d S43_atleast-2_db-pass.tab -f FUNCTIONAL -u T
ParseDb.py split -d S43_atleast-2_db-pass.tab -f FUNCTIONAL

The first line above uses the select subcommand to output a single file labeled parse-select containing only records with the value of T (-u T) in the FUNCTIONAL column (-f FUNCTIONAL).

Alternatively, the second line above uses the split subcommand to output multiple files with each file containing records with one of the values found in the FUNCTIONAL column (-f FUNCTIONAL). This will generate two files labeled FUNCTIONAL-T and FUNCTIONAL-F.

Removing disagreements between the C-region primers and the reference alignment

If you have data that includes both heavy and light chains in the same library, the V-segment and J-segment alignments from IMGT/HighV-QUEST or IgBLAST may not always agree with the isotype assignments from the C-region primers. In these cases, you can filter out such reads with the select subcommand of ParseDb. An example function call using an imaginary file db.tab is provided below:

1
2
3
4
ParseDb.py select -d db.tab -f V_CALL J_CALL CPRIMER -u "IGH" \
    --logic all --regex --outname heavy
ParseDb.py select -d db.tab -f V_CALL J_CALL CPRIMER -u "IG[LK]" \
    --logic all --regex --outname light

These commands will require that all of the V_CALL, J_CALL and CPRIMER fields (-f V_CALL J_CALL CPRIMER and --logic all) contain the string IGH (lines 1-2) or one of IGK or IGL (lines 3-4). The --regex argument allows for partial matching and interpretation of regular expressions. The output from these two commands are two files, one containing only heavy chains (heavy_parse-select.tab) and one containg only light chains (light_parse-select.tab).

Exporting records to FASTA files

You may want to use external tools, or tools from pRESTO, on your Change-O result files. The ParseDb tool provides two options for exporting data from tab-delimited files to FASTA format.

Standard FASTA

The fasta subcommand allows you to export sequences and annotations to FASTA formatted files in the pRESTO annototation scheme:

ParseDb.py fasta -d S43_atleast-2_db-pass.tab --if SEQUENCE_ID --sf SEQUENCE_IMGT --mf V_CALL DUPCOUNT

Where the column containing the sequence identifier is specified by --if SEQUENCE_ID, the nucleotide sequence column is specified by --sf SEQUENCE_ID, and additional annotations to be added to the sequence header are specified by --mf V_CALL DUPCOUNT.

BASELINe FASTA

The baseline subcommand generates a FASTA derivative format required by the BASELINe web tool. Generating these files is similar to building standard FASTA files, but requires a few more options. An example function call using an imaginary file db.tab is provided below:

ParseDb.py baseline -d db.tab --if SEQUENCE_ID --sf SEQUENCE_IMGT --mf V_CALL DUPCOUNT \
    --cf CLONE --gf GERMLINE_IMGT_D_MASK

The additional arguments required by the baseline subcommand include the clonal grouping (--cf CLONE) and germline sequence (--gf GERMLINE_IMGT_D_MASK) columns added by the DefineClones and CreateGermlines tasks, respectively.

Note

The baseline subcommand requires the CLONE column to be sorted. DefineClones generates a sorted CLONE column by default. However, you needed to alter the order of the CLONE column at some point, then you can re-sort the clonal assignments using the sort subcommand of ParseDb. An example function call using an imaginary file db.tab is provided below:

ParseDb.py sort -d db.tab -f CLONE

Which will sort records by the value in the CLONE column (-f CLONE).