Important
Are you an Immcantation user and/or interested in adaptive immune receptor repertoire analysis?
Register now for the upcoming Immcantation Users Group Meeting! It will be held virtually on January 30th, 2025, from 10 to 1:30pm (ET). All talks will be from user-submitted abstracts.
Full information here: https://immcantation.github.io/users-meeting/
Using IgBLAST
Example data
We have hosted a small example data set resulting from the UMI barcoded MiSeq workflow described in the pRESTO documentation. In addition to the example FASTA files, we have included the standalone IgBLAST results. The files can be downloded from here:
Configuring IgBLAST
A collection of scripts for setting up the standalone IgBLAST database from the
IMGT reference sequences are available on the
Immcantation repository.
To use these scripts, copy all the tools in the /scripts
folder to a location
in your PATH
. At a minimum, you’ll need the following scripts:
fetch_igblastdb.sh
fetch_imgtdb.sh
clean_imgtdb.py
imgt2igblast.sh
Download and configure the IgBLAST and IMGT reference databases as follows, adjusting the version number to taste:
1# Download and extract IgBLAST
2VERSION="1.21.0"
3wget ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/${VERSION}/ncbi-igblast-${VERSION}-x64-linux.tar.gz
4tar -zxf ncbi-igblast-${VERSION}-x64-linux.tar.gz
5cp ncbi-igblast-${VERSION}/bin/* ~/bin
6# Download reference databases and setup IGDATA directory
7fetch_igblastdb.sh -o ~/share/igblast
8cp -r ncbi-igblast-${VERSION}/internal_data ~/share/igblast
9cp -r ncbi-igblast-${VERSION}/optional_file ~/share/igblast
10# Build IgBLAST database from IMGT reference sequences
11fetch_imgtdb.sh -o ~/share/germlines/imgt
12imgt2igblast.sh -i ~/share/germlines/imgt -o ~/share/igblast
Note
Several Immcantation tools require the observed V(D)J sequence
(sequence_alignment
) and associated germline fields (germline_alignment
or germline_alignment_d_mask
) to have gaps inserted to conform to the
IMGT numbering scheme. Thus, when a tool such as MakeDb.py or
CreateGermlines.py requires a reference sequence set as input,
it will required the IMGT-gapped reference set. Meaning,
the reference sequences that were downloaded using the fetch_imgtdb.sh
script, or downloaded manually from the
IMGT reference directory,
rather than the final upgapped reference set required by IgBLAST.
See also
The provided scripts download only the mouse and human IMGT reference databases.
See the IgBLAST documentation for instructions
on how to build the database in a more general case. Shown below is an example of how
to performed the same steps as the Immcantation scripts using a separately
downloaded IMGT reference set and the scripts provided by IgBLAST. You must have all of
the associated commands in your PATH
and the appropriate directories created:
1# V segment database
2edit_imgt_file.pl IMGT_Human_IGHV.fasta > ~/share/igblast/fasta/imgt_human_ig_v.fasta
3makeblastdb -parse_seqids -dbtype nucl -in ~/share/igblast/fasta/imgt_human_ig_v.fasta \
4 -out ~/share/igblast/database/imgt_human_ig_v
5# D segment database
6edit_imgt_file.pl IMGT_Human_IGHD.fasta > ~/share/igblast/fasta/imgt_human_ig_d.fasta
7makeblastdb -parse_seqids -dbtype nucl -in ~/share/igblast/fasta/imgt_human_ig_d.fasta \
8 -out ~/share/igblast/database/imgt_human_ig_d
9# J segment database
10edit_imgt_file.pl IMGT_Human_IGHJ.fasta > ~/share/igblast/fasta/imgt_human_ig_j.fasta
11makeblastdb -parse_seqids -dbtype nucl -in ~/share/igblast/fasta/imgt_human_ig_j.fasta \
12 -out ~/share/igblast/database/imgt_human_ig_j
Once these databases are built for each segment they can be referenced when running IgBLAST.
Running IgBLAST
Change-O provides a simple wrapper script to run IgBLAST with the required options as the igblast subcommand of AssignGenes.py. This wrapper can be run as follows using the database built using the Immcantation scripts:
AssignGenes.py igblast -s HD13M.fasta -b ~/share/igblast \
--organism human --loci ig --format blast
The optional --format blast
argument
defines the output format of IgBLAST. The default, blast
, is the
blocked tabular output provided by specifying the -outfmt '7 std qseq sseq btop'
argument to IgBLAST. Specifying --format airr
will output a tab-delimited file compliant with the
AIRR Rearrangement schema
defined by the AIRR Community.
AIRR format support requires IgBLAST v1.9.0 or higher.
The -b ~/share/igblast
argument specifies the
path containing the database
, internal_data
, and optional_file
directories required by IgBLAST. This option sets the IGDATA
environment variable
that controls where IgBLAST looks for internal database files. See the
IgBLAST documentation for more details
regarding the IGDATA
environment variable.
See also
The AssignGenes.py IgBLAST wrapper provides limited functionality.
For more control, IgBLAST should be run directly. The only strict
requirement for compatibility with Changeo-O is that the output must
either be an AIRR tab-delimited file (--outfmt 19
) or a blast-style
tabular output with the optional query sequence, subject sequence and BTOP fields
(-outfmt '7 std qseq sseq btop'
). An example of how to run IgBLAST
directly is shown below:
1export IGDATA=~/share/igblast
2igblastn \
3 -germline_db_V ~/share/igblast/database/imgt_human_ig_v\
4 -germline_db_D ~/share/igblast/database/imgt_human_ig_d \
5 -germline_db_J ~/share/igblast/database/imgt_human_ig_j \
6 -auxiliary_data ~/share/igblast/optional_file/human_gl.aux \
7 -domain_system imgt -ig_seqtype Ig -organism human \
8 -outfmt '7 std qseq sseq btop' \
9 -query HD13M.fasta \
10 -out HD13M.fmt7
Processing the output of IgBLAST
Standalone IgBLAST blast-style tabular output is parsed by the igblast
subcommand of MakeDb.py to generate the standardized tab-delimited database file
on which all subsequent Change-O modules operate. In addition to the IgBLAST output
(-i HD13M.fmt7
), both the FASTA files input to IgBLAST
(-s HD13M.fasta
) and the IMGT-gapped reference sequences
(-r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta
)
must be provided to MakeDb.py:
MakeDb.py igblast -i HD13M.fmt7 -s HD13M.fasta \
-r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta \
--extended
The optional --extended
argument adds extra
columns to the output database containing IMGT-gapped CDR/FWR regions and
alignment metrics.
Warning
The references sequences you provide to MakeDb.py must contain IMGT-gapped V segment references, and these reference must be the same sequences used to build the IgBLAST reference database. If your IgBLAST germlines are not IMGT-gapped and/or they are not identical to those provided to MakeDb.py, then sequences which were assigned missing germlines will fail the parsing operation and the junction (CDR3) sequences will not be correct.