Parsing IMGT output¶
We have hosted a small example data set resulting from the Roche 454 example workflow described in the pRESTO documentation. In addition to the example FASTA files, we have included the IMGT/HighV-QUEST results. The files can be downloded from here:
Reducing file size for submission to IMGT/HighV-QUEST¶
IMGT/HighV-QUEST currently limits the size of uploaded files to 500,000 sequences. To accomodate this limit, you can use the count subcommand of the pRESTO tool SplitSeq to divide your files into small pieces:
SplitSeq.py count -s file.fastq -n 500000 --fasta
-n 500000 argument sets the maximum number of sequences in each file and the
--fasta argument tells the tool to output a FASTA, rather than FASTQ, formatted file
suitable for upload to IMGT/HighV-QUEST.
For additional details see the corresponding example in the pRESTO documentation
Processing the output of IMGT/HighV-QUEST¶
The output from IMGT/HighV-QUEST may be
parsed via the imgt subcommand of MakeDb to generate the standardized
tab-delimited database file on which all subsequent Change-O modules operate.
Processing the IMGT output requires either the compressed output file (.zip or .txz)
or an uncompressed folder containing the
6_Junction files (
Additionally, it is recommended that you provide the FASTA file that was submitted to HighV-QUEST
-s S43_atleast-2.fasta), as this will allow MakeDb to correct the
changes HighV-QUEST makes to the sequence identifier and add additional columns corresponding any
annotations generated by pRESTO:
MakeDb.py imgt -i S43_atleast-2.txz -s S43_atleast-2.fasta --regions --scores
The optional (
--scores) arguments add extra columns to the output
database containing IMGT-gapped CDR/FWR regions and alignment metrics, respectively.