.. _Germlines: Reconstructing germline sequences ================================================================================ Example data -------------------------------------------------------------------------------- We have hosted a small example data set resulting from the `UMI barcoded MiSeq workflow `__ described in the `pRESTO `__ documentation. The files can be downloaded from here: `Change-O Example Files `__ The following examples use the ``HD13M_db-pass.tsv`` AIRR Rearrangement file provided in the example bundle, which has already undergone the :ref:`IMGT `/:ref:`IgBLAST ` parsing and :ref:`filtering ` operations. Adding germline sequences to the database -------------------------------------------------------------------------------- The :ref:`CreateGermlines` tool is used to reconstruct the germline V(D)J sequence, from which the Ig lineage and mutations can be inferred. In addition to the alignment information parsed by :ref:`MakeDb` to generate the initial database, :ref:`CreateGermlines` also requires the set of germline sequences that were used for the alignment passed to the :option:`-r ` argument. In the case of V-segment germlines, the reference sequences must be IMGT-gapped. Because the D-segment call for B cell receptor alignments is often low confidence, the default germline format (:option:`-g dmask `) places Ns in the N/P and D-segments of the junction region rather than using the D-segment assigned during reference alignment; this can be modified to generate a complete germline (:option:`-g full `) or a V-segment only germline (:option:`-g vonly `) if you wish. The command below adds the germline sequence to the ``germline_alignment_d_mask`` column of the output database:: CreateGermlines.py -d HD13M_db-pass.tsv -g dmask \ -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta Alternatively, if you have run the :ref:`clonal assignment ` task prior to invoking :ref:`CreateGermlines`, then adding the :option:`--cloned ` argument is recommended, as this will generate a single germline of consensus length for each clone:: CreateGermlines.py -d HD13M_db-pass_clone-pass.tsv -g dmask --cloned \ -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta .. important:: The germline set passed to :option:`-r ` **must** contain the complete set of germlines used by the reference alignment software (IMGT/HighV-QUEST or IgBLAST). If alleles called by the aligner are missing from the reference set, they will not be successfully processed. Additionally, the V-segment reference set **must** contain IMGT-gapped sequences to properly reconstruct germlines, even if the reference alignment was performed on ungapped sequences. .. note:: While :ref:`MakeDb` provides the :program:`ihmm` subcommand to parse alignment output generated by `iHMMuneAlign `__, there is insufficient information to successfully reconstruct germline sequences for all cases using :ref:`CreateGermlines`. .. seealso:: The `TIgGER `__ R package provided tools for identifying novel polymorphisms and building a personalized germline database. To use the germline corrections provided by `TIgGER `__ you would replace the V-segment germline file with the one generated by `genotypeFasta `__ (:option:`-r IGHV_genotype.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta `) and specify the genotyped V-segment column (:option:`--vf v_call_genotyped `):: CreateGermlines.py -d genotyped.tsv -g dmask --vf v_call_genotyped \ -r IGHV_genotype.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta