Parsing IMGT output

Example data

We have hosted a small example data set resulting from the UMI barcoded MiSeq workflow described in the pRESTO documentation. In addition to the example FASTA files, we have included the IMGT/HighV-QUEST results. The files can be downloded from here:

Change-O Example Files

Reducing file size for submission to IMGT/HighV-QUEST

IMGT/HighV-QUEST currently limits the size of uploaded files to 500,000 sequences. To accomodate this limit, you can use the count subcommand of the pRESTO tool SplitSeq to divide your files into small pieces:

SplitSeq.py count -s file.fastq -n 500000 --fasta

The -n 500000 argument sets the maximum number of sequences in each file and the --fasta argument tells the tool to output a FASTA, rather than FASTQ, formatted file suitable for upload to IMGT/HighV-QUEST.

Processing the output of IMGT/HighV-QUEST

The output from IMGT/HighV-QUEST may be parsed via the imgt subcommand of MakeDb.py to generate the standardized tab-delimited database file on which all subsequent Change-O modules operate. Processing the IMGT output requires either the compressed output file (.zip or .txz) or an uncompressed folder containing the 1_Summary, 2_IMGT-gapped, 3_Nt-sequences and 6_Junction files (-i HD13M.txz). Additionally, it is recommended that you provide the FASTA file that was submitted to HighV-QUEST (-s HD13M.fasta), as this will allow MakeDb.py to correct the changes HighV-QUEST makes to the sequence identifier and add additional columns corresponding any annotations generated by pRESTO:

MakeDb.py imgt -i HD13M.txz -s HD13M.fasta --extended

The optional --extended argument add extra columns to the output database containing IMGT-gapped CDR/FWR regions and alignment metrics.

Merging processed IMGT/HighV-QUEST output

If you previously split files for submission to IMGT/HighV-QUEST, you can run each partition through MakeDb.py individually and merge the resulting output files using the merge subcommand of ParseDb.py:

MakeDb.py imgt -i part1.txz -s part1.fasta -o part1.tsv
MakeDb.py imgt -i part2.txz -s part2.fasta -o part2.tsv
ParseDb.py merge -d part1.tsv part2.tsv -o merged.tsv