Filtering records

The ParseDb tool provides a basic set of operations for manipulating Change-O database files from the commandline, including removing or updating rows and columns.

Removing non-functional sequences

After building a Change-O database from either IMGT/HighV-QUEST or IgBLAST output, you may wish to subset your data to only functional sequences. This can be done in one of two roughly equivalent ways using the ParseDb tool:

2 select -d -f FUNCTIONAL -u T split -d -f FUNCTIONAL

The first line above uses the select subcommand to output a single file labeled parse-select containing only records with the value of T (-u T) in the FUNCTIONAL column (-f FUNCTIONAL).

Alternatively, the second line above uses the split subcommand to output multiple files with each file containing records with one of the values found in the FUNCTIONAL column (-f FUNCTIONAL). This will generate two files labeled FUNCTIONAL-T and FUNCTIONAL-F.

Removing disagreements between the C-region primers and the reference alignment

If you have data that includes both heavy and light chains in the same library, the V-segment and J-segment alignments from IMGT/HighV-QUEST or IgBLAST may not always agree with the isotype assignments from the C-region primers. In these cases, you can filter out such reads with the select subcommand of ParseDb. An example function call using an imaginary file is provided below:

4 select -d -f V_CALL J_CALL CPRIMER -u "IGH" \
    --logic all --regex --outname heavy select -d -f V_CALL J_CALL CPRIMER -u "IG[LK]" \
    --logic all --regex --outname light

These commands will require that all of the V_CALL, J_CALL and CPRIMER fields (-f V_CALL J_CALL CPRIMER and --logic all) contain the string IGH (lines 1-2) or one of IGK or IGL (lines 3-4). The --regex argument allows for partial matching and interpretation of regular expressions. The output from these two commands are two files, one containing only heavy chains ( and one containg only light chains (

Exporting records to FASTA files

You may want to use external tools, or tools from pRESTO, on your Change-O result files. The ParseDb tool provides two options for exporting data from tab-delimited files to FASTA format.

Standard FASTA

The fasta subcommand allows you to export sequences and annotations to FASTA formatted files in the pRESTO annototation scheme: fasta -d --if SEQUENCE_ID --sf SEQUENCE_IMGT --mf V_CALL DUPCOUNT

Where the column containing the sequence identifier is specified by --if SEQUENCE_ID, the nucleotide sequence column is specified by --sf SEQUENCE_ID, and additional annotations to be added to the sequence header are specified by --mf V_CALL DUPCOUNT.


The baseline subcommand generates a FASTA derivative format required by the BASELINe web tool. Generating these files is similar to building standard FASTA files, but requires a few more options. An example function call using an imaginary file is provided below: baseline -d --if SEQUENCE_ID --sf SEQUENCE_IMGT --mf V_CALL DUPCOUNT \

The additional arguments required by the baseline subcommand include the clonal grouping (--cf CLONE) and germline sequence (--gf GERMLINE_IMGT_D_MASK) columns added by the DefineClones and CreateGermlines tasks, respectively.


The baseline subcommand requires the CLONE column to be sorted. DefineClones generates a sorted CLONE column by default. However, you needed to alter the order of the CLONE column at some point, then you can re-sort the clonal assignments using the sort subcommand of ParseDb. An example function call using an imaginary file is provided below: sort -d -f CLONE

Which will sort records by the value in the CLONE column (-f CLONE).