Parsing IMGT output

Example Data

We have hosted a small example data set resulting from the Roche 454 example workflow described in the pRESTO documentation. In addition to the example FASTA files, we have included the IMGT/HighV-QUEST results. The files can be downloded from here:

Change-O Example Files

Reducing file size for submission to IMGT/HighV-QUEST

IMGT/HighV-QUEST currently limits the size of uploaded files to 500,000 sequences. To accomodate this limit, you can use the count subcommand of the pRESTO tool SplitSeq to divide your files into small pieces:

SplitSeq.py count -s file.fastq -n 500000 --fasta

The -n 500000 argument sets the maximum number of sequences in each file and the --fasta argument tells the tool to output a FASTA, rather than FASTQ, formatted file suitable for upload to IMGT/HighV-QUEST.

See also

For additional details see the corresponding example in the pRESTO documentation

Processing the output of IMGT/HighV-QUEST

The output from IMGT/HighV-QUEST may be parsed via the imgt subcommand of MakeDb to generate the standardized tab-delimited database file on which all subsequent Change-O modules operate. Processing the IMGT output requires either the compressed output file (.zip or .txz) or an uncompressed folder containing the 1_Summary, 2_IMGT-gapped, 3_Nt-sequences and 6_Junction files (-i S43_atleast-2.txz). Additionally, it is recommended that you provide the FASTA file that was submitted to HighV-QUEST (-s S43_atleast-2.fasta), as this will allow MakeDb to correct the changes HighV-QUEST makes to the sequence identifier and add additional columns corresponding any annotations generated by pRESTO:

MakeDb.py imgt -i S43_atleast-2.txz -s S43_atleast-2.fasta --regions --scores

The optional (--regions) and (--scores) arguments add extra columns to the output database containing IMGT-gapped CDR/FWR regions and alignment metrics, respectively.