Parsing IMGT output¶
Example data¶
We have hosted a small example data set resulting from the UMI barcoded MiSeq workflow described in the pRESTO documentation. In addition to the example FASTA files, we have included the IMGT/HighV-QUEST results. The files can be downloded from here:
Reducing file size for submission to IMGT/HighV-QUEST¶
IMGT/HighV-QUEST currently limits the size of uploaded files to 500,000 sequences. To accomodate this limit, you can use the count subcommand of the pRESTO tool SplitSeq to divide your files into small pieces:
SplitSeq.py count -s file.fastq -n 500000 --fasta
The -n 500000
argument sets the maximum number of sequences in each file and the
--fasta
argument tells the tool to output a FASTA, rather than FASTQ, formatted file
suitable for upload to IMGT/HighV-QUEST.
See also
For additional details see the corresponding example in the pRESTO documentation
Processing the output of IMGT/HighV-QUEST¶
The output from IMGT/HighV-QUEST may be
parsed via the imgt subcommand of MakeDb.py to generate the standardized
tab-delimited database file on which all subsequent Change-O modules operate.
Processing the IMGT output requires either the compressed output file (.zip or .txz)
or an uncompressed folder containing the 1_Summary
, 2_IMGT-gapped
, 3_Nt-sequences
and
6_Junction
files (-i HD13M.txz
).
Additionally, it is recommended that you provide the FASTA file that was submitted to HighV-QUEST
(-s HD13M.fasta
), as this will allow MakeDb.py to correct the
changes HighV-QUEST makes to the sequence identifier and add additional columns corresponding any
annotations generated by pRESTO:
MakeDb.py imgt -i HD13M.txz -s HD13M.fasta --extended
The optional --extended
argument add extra
columns to the output database containing IMGT-gapped CDR/FWR regions and
alignment metrics.
Merging processed IMGT/HighV-QUEST output¶
If you previously split files for submission to IMGT/HighV-QUEST, you can run each partition through MakeDb.py individually and merge the resulting output files using the merge subcommand of ParseDb.py:
MakeDb.py imgt -i part1.txz -s part1.fasta -o part1.tsv
MakeDb.py imgt -i part2.txz -s part2.fasta -o part2.tsv
ParseDb.py merge -d part1.tsv part2.tsv -o merged.tsv