Filtering records¶
The ParseDb tool provides a basic set of operations for manipulating Change-O database files from the commandline, including removing or updating rows and columns.
Removing non-functional sequences¶
After building a Change-O database from either IMGT/HighV-QUEST or IgBLAST output, you may wish to subset your data to only functional sequences. This can be done in one of two roughly equivalent ways using the ParseDb tool:
1 2 | ParseDb.py select -d S43_atleast-2_db-pass.tab -f FUNCTIONAL -u T
ParseDb.py split -d S43_atleast-2_db-pass.tab -f FUNCTIONAL
|
The first line above uses the select subcommand to output a single file
labeled parse-select
containing only records with the value of T
(-u T
) in the FUNCTIONAL
column
(-f FUNCTIONAL
).
Alternatively, the second line above uses the split subcommand to output
multiple files with each file containing records with one of the values found in the
FUNCTIONAL
column (-f FUNCTIONAL
). This will
generate two files labeled FUNCTIONAL-T
and FUNCTIONAL-F
.
Removing disagreements between the C-region primers and the reference alignment¶
If you have data that includes both heavy and light chains in the same library,
the V-segment and J-segment alignments from IMGT/HighV-QUEST or IgBLAST may not
always agree with the isotype assignments from the C-region primers. In these cases,
you can filter out such reads with the select subcommand of ParseDb.
An example function call using an imaginary file db.tab
is provided below:
1 2 3 4 | ParseDb.py select -d db.tab -f V_CALL J_CALL CPRIMER -u "IGH" \
--logic all --regex --outname heavy
ParseDb.py select -d db.tab -f V_CALL J_CALL CPRIMER -u "IG[LK]" \
--logic all --regex --outname light
|
These commands will require that all of the V_CALL
, J_CALL
and CPRIMER
fields (-f V_CALL J_CALL CPRIMER
and
--logic all
) contain the string IGH
(lines 1-2)
or one of IGK
or IGL
(lines 3-4). The --regex
argument allows for partial matching and interpretation of regular expressions. The
output from these two commands are two files, one containing only heavy chains
(heavy_parse-select.tab
) and one containg only light chains (light_parse-select.tab
).
Exporting records to FASTA files¶
You may want to use external tools, or tools from pRESTO, on your Change-O result files. The ConvertDb tool provides two options for exporting data from tab-delimited files to FASTA format.
Standard FASTA¶
The fasta subcommand allows you to export sequences and annotations to FASTA formatted files in the pRESTO annototation scheme:
ConvertDb.py fasta -d S43_atleast-2_db-pass.tab --if SEQUENCE_ID --sf SEQUENCE_IMGT --mf V_CALL DUPCOUNT
Where the column containing the sequence identifier is specified by
--if SEQUENCE_ID
, the nucleotide sequence column is
specified by --sf SEQUENCE_ID
, and additional annotations
to be added to the sequence header are specified by
--mf V_CALL DUPCOUNT
.
BASELINe FASTA¶
The baseline subcommand generates a FASTA derivative format required by the
BASELINe web tool. Generating these
files is similar to building standard FASTA files, but requires a few more options.
An example function call using an imaginary file db.tab
is provided below:
ConvertDb.py baseline -d db.tab --if SEQUENCE_ID --sf SEQUENCE_IMGT --mf V_CALL DUPCOUNT \
--cf CLONE --gf GERMLINE_IMGT_D_MASK
The additional arguments required by the baseline subcommand include the
clonal grouping (--cf CLONE
) and germline sequence
(--gf GERMLINE_IMGT_D_MASK
) columns added by
the DefineClones and CreateGermlines tasks,
respectively.
Note
The baseline subcommand requires the CLONE
column to be sorted.
DefineClones generates a sorted CLONE
column by default. However,
you needed to alter the order of the CLONE
column at some point,
then you can re-sort the clonal assignments using the sort
subcommand of ParseDb. An example function call using an imaginary
file db.tab
is provided below:
ParseDb.py sort -d db.tab -f CLONE
Which will sort records by the value in the CLONE
column
(-f CLONE
).