.. _IMGT: Parsing IMGT output ================================================================================ Example data -------------------------------------------------------------------------------- We have hosted a small example data set resulting from the `UMI barcoded MiSeq workflow `__ described in the `pRESTO `__ documentation. In addition to the example FASTA files, we have included the `IMGT/HighV-QUEST `__ results. The files can be downloded from here: `Change-O Example Files `__ Reducing file size for submission to IMGT/HighV-QUEST -------------------------------------------------------------------------------- `IMGT/HighV-QUEST `__ currently limits the size of uploaded files to 500,000 sequences. To accomodate this limit, you can use the :program:`count` subcommand of the `pRESTO `__ tool `SplitSeq `__ to divide your files into small pieces:: SplitSeq.py count -s file.fastq -n 500000 --fasta The ``-n 500000`` argument sets the maximum number of sequences in each file and the ``--fasta`` argument tells the tool to output a FASTA, rather than FASTQ, formatted file suitable for upload to `IMGT/HighV-QUEST `__. .. seealso:: For additional details see the corresponding example in the `pRESTO documentation `__ Processing the output of IMGT/HighV-QUEST -------------------------------------------------------------------------------- The output from `IMGT/HighV-QUEST `__ may be parsed via the :program:`imgt` subcommand of :ref:`MakeDb` to generate the standardized tab-delimited database file on which all subsequent Change-O modules operate. Processing the IMGT output requires either the compressed output file (.zip or .txz) or an uncompressed folder containing the ``1_Summary``, ``2_IMGT-gapped``, ``3_Nt-sequences`` and ``6_Junction`` files (:option:`-i HD13M.txz `). Additionally, it is recommended that you provide the FASTA file that was submitted to HighV-QUEST (:option:`-s HD13M.fasta `), as this will allow :ref:`MakeDb` to correct the changes HighV-QUEST makes to the sequence identifier and add additional columns corresponding any annotations generated by `pRESTO `__:: MakeDb.py imgt -i HD13M.txz -s HD13M.fasta --extended The optional :option:`--extended ` argument add extra columns to the output database containing IMGT-gapped CDR/FWR regions and alignment metrics. Merging processed IMGT/HighV-QUEST output -------------------------------------------------------------------------------- If you previously split files for submission to IMGT/HighV-QUEST, you can run each partition through :ref:`MakeDb` individually and merge the resulting output files using the :program:`merge` subcommand of :ref:`ParseDb`:: MakeDb.py imgt -i part1.txz -s part1.fasta -o part1.tsv MakeDb.py imgt -i part2.txz -s part2.fasta -o part2.tsv ParseDb.py merge -d part1.tsv part2.tsv -o merged.tsv