Example data

We have hosted a small example data set resulting from the UMI barcoded MiSeq workflow described in the pRESTO documentation. In addition to the example FASTA files, we have included the standalone IgBLAST results. The files can be downloded from here:

Change-O Example Files

Configuring IgBLAST

A collection of scripts for setting up the standalone IgBLAST database from the IMGT reference sequences are available on the Immcantation repository. To use these scripts, copy all the tools in the /scripts folder to a location in your PATH. At a minimum, you’ll need the following scripts:





Download and configure the IgBLAST and IMGT reference databases as follows, adjusting the version number to taste:

 1# Download and extract IgBLAST
 4tar -zxf ncbi-igblast-${VERSION}-x64-linux.tar.gz
 5cp ncbi-igblast-${VERSION}/bin/* ~/bin
 6# Download reference databases and setup IGDATA directory -o ~/share/igblast
 8cp -r ncbi-igblast-${VERSION}/internal_data ~/share/igblast
 9cp -r ncbi-igblast-${VERSION}/optional_file ~/share/igblast
10# Build IgBLAST database from IMGT reference sequences -o ~/share/germlines/imgt -i ~/share/germlines/imgt -o ~/share/igblast


Several Immcantation tools require the observed V(D)J sequence (sequence_alignment) and associated germline fields (germline_alignment or germline_alignment_d_mask) to have gaps inserted to conform to the IMGT numbering scheme. Thus, when a tool such as or requires a reference sequence set as input, it will required the IMGT-gapped reference set. Meaning, the reference sequences that were downloaded using the script, or downloaded manually from the IMGT reference directory, rather than the final upgapped reference set required by IgBLAST.

See also

The provided scripts download only the mouse and human IMGT reference databases. See the IgBLAST documentation for instructions on how to build the database in a more general case. Shown below is an example of how to performed the same steps as the Immcantation scripts using a separately downloaded IMGT reference set and the scripts provided by IgBLAST. You must have all of the associated commands in your PATH and the appropriate directories created:

 1# V segment database IMGT_Human_IGHV.fasta > ~/share/igblast/fasta/imgt_human_ig_v.fasta
 3makeblastdb -parse_seqids -dbtype nucl -in ~/share/igblast/fasta/imgt_human_ig_v.fasta \
 4    -out ~/share/igblast/database/imgt_human_ig_v
 5# D segment database IMGT_Human_IGHD.fasta > ~/share/igblast/fasta/imgt_human_ig_d.fasta
 7makeblastdb -parse_seqids -dbtype nucl -in ~/share/igblast/fasta/imgt_human_ig_d.fasta \
 8    -out ~/share/igblast/database/imgt_human_ig_d
 9# J segment database IMGT_Human_IGHJ.fasta > ~/share/igblast/fasta/imgt_human_ig_j.fasta
11makeblastdb -parse_seqids -dbtype nucl -in ~/share/igblast/fasta/imgt_human_ig_j.fasta \
12    -out ~/share/igblast/database/imgt_human_ig_j

Once these databases are built for each segment they can be referenced when running IgBLAST.

Running IgBLAST

Change-O provides a simple wrapper script to run IgBLAST with the required options as the igblast subcommand of This wrapper can be run as follows using the database built using the Immcantation scripts: igblast -s HD13M.fasta -b ~/share/igblast \
    --organism human --loci ig --format blast

The optional --format blast argument defines the output format of IgBLAST. The default, blast, is the blocked tabular output provided by specifying the -outfmt '7 std qseq sseq btop' argument to IgBLAST. Specifying --format airr will output a tab-delimited file compliant with the AIRR Rearrangement schema defined by the AIRR Community. AIRR format support requires IgBLAST v1.9.0 or higher.

The -b ~/share/igblast argument specifies the path containing the database, internal_data, and optional_file directories required by IgBLAST. This option sets the IGDATA environment variable that controls where IgBLAST looks for internal database files. See the IgBLAST documentation for more details regarding the IGDATA environment variable.

See also

The IgBLAST wrapper provides limited functionality. For more control, IgBLAST should be run directly. The only strict requirement for compatibility with Changeo-O is that the output must either be an AIRR tab-delimited file (--outfmt 19) or a blast-style tabular output with the optional query sequence, subject sequence and BTOP fields (-outfmt '7 std qseq sseq btop'). An example of how to run IgBLAST directly is shown below:

 1export IGDATA=~/share/igblast
 2igblastn \
 3    -germline_db_V ~/share/igblast/database/imgt_human_ig_v\
 4    -germline_db_D ~/share/igblast/database/imgt_human_ig_d \
 5    -germline_db_J ~/share/igblast/database/imgt_human_ig_j \
 6    -auxiliary_data ~/share/igblast/optional_file/human_gl.aux \
 7    -domain_system imgt -ig_seqtype Ig -organism human \
 8    -outfmt '7 std qseq sseq btop' \
 9    -query HD13M.fasta \
10    -out HD13M.fmt7

Processing the output of IgBLAST

Standalone IgBLAST blast-style tabular output is parsed by the igblast subcommand of to generate the standardized tab-delimited database file on which all subsequent Change-O modules operate. In addition to the IgBLAST output (-i HD13M.fmt7), both the FASTA files input to IgBLAST (-s HD13M.fasta) and the IMGT-gapped reference sequences (-r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta) must be provided to igblast -i HD13M.fmt7 -s HD13M.fasta \
    -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta \

The optional --extended argument adds extra columns to the output database containing IMGT-gapped CDR/FWR regions and alignment metrics.


The references sequences you provide to must contain IMGT-gapped V segment references, and these reference must be the same sequences used to build the IgBLAST reference database. If your IgBLAST germlines are not IMGT-gapped and/or they are not identical to those provided to, then sequences which were assigned missing germlines will fail the parsing operation and the junction (CDR3) sequences will not be correct.