.. _10X: Parsing 10X Genomics V(D)J data ================================================================================ .. admonition:: New We have an updated `tutorial `__ covering the processing of 10x Genomics VDJ data with Change-O and `SCOPer `__. You can also follow the steps below to process 10x VDJ data using methods available in Change-O. Example data -------------------------------------------------------------------------------- 10X Genomics provides an example data set of Ig V(D)J processed by the Cell Ranger pipeline, which is available for download from their `Single Cell Immune Profiling `__ support site. Converting 10X V(D)J data into the AIRR Community standardized format -------------------------------------------------------------------------------- To process 10X V(D)J data, a combination of :ref:`AssignGenes` and :ref:`MakeDb` can be used to generate a TSV file compliant with the `AIRR Community Rearrangement `__ schema that incorporates annotation information provided by the Cell Ranger pipeline. The :option:`--10x filtered_contig_annotations.csv ` specifies the path of the contig annotations file generated by ``cellranger vdj``, which can be found in the ``outs`` directory. Generate AIRR Rearrangement data from the 10X V(D)J FASTA files using the steps below:: AssignGenes.py igblast -s filtered_contig.fasta -b ~/share/igblast \ --organism human --loci ig --format blast MakeDb.py igblast -i filtered_contig_igblast.fmt7 -s filtered_contig.fasta \ -r IMGT_Human_*.fasta --10x filtered_contig_annotations.csv --extended ``all_contig.fasta`` can be exchanged for ``filtered_contig.fasta``, and ``all_contig_annotations.csv`` can be exchanged for ``filtered_contig_annotations.csv``. .. warning:: The resulting table overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST or IMGT/HighV-QUEST instead. .. seealso:: To process mouse data and/or TCR data alter the :option:`--organism ` and :option:`--loci ` arguments to :ref:`AssignGenes` accordingly (e.g., :option:`--organism mouse `, :option:`--loci tcr `) and use the appropriate V, D and J IMGT reference databases (e.g., ``IMGT_Mouse_TR*.fasta``) See the :ref:`IgBLAST usage guide ` for further details regarding the setup and use of IgBLAST with Change-O. Identifying clones from B cells in AIRR formatted 10X V(D)J data -------------------------------------------------------------------------------- Splitting into separate light and heavy chain files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To group B cells into clones from AIRR Rearrangement data, the output from :ref:`MakeDb` must be parsed into a light chain file and a heavy chain file:: ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IGH" \ --logic all --regex --outname heavy ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IG[LK]" \ --logic all --regex --outname light Assign clonal groups to the heavy chain data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The heavy chain file must then be clonally clustered separately. See :ref:`Cloning` for how to use :ref:`DefineClones` to assign clonal cluster annotations to the IGH file. Correct clonal groups based on light chain data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :ref:`DefineClones` currently does not support light chain cloning. However, cloning can be performed after heavy chain cloning using `light_cluster.py `__ provided on the `Immcantation GitHub repository `__ in the ``scripts`` directory:: light_cluster.py -d heavy_select-pass_clone-pass.tsv -e light_select-pass.tsv \ -o 10X_clone-pass.tsv Here, ``heavy_select-pass_clone-pass.tsv`` refers to the cloned heavy chain AIRR Rearrangement file, ``light_select-pass.tsv`` refers to the light chain file, and ``10X_clone-pass.tsv`` is the resulting output file. The algorithm will (1) remove cells associated with more than one heavy chain and (2) correct heavy chain clone definitions based on an analysis of the light chain partners associated with the heavy chain clone. .. note:: By default, ``light_chain.py`` expects the `AIRR Rearrangement `__ columns: * ``v_call`` * ``j_call`` * ``junction_length`` * ``umi_count`` * ``cell_id`` * ``clone_id`` To process legacy Change-O formatted data add the ``--format changeo`` argument:: light_cluster.py -d heavy_select-pass_clone-pass.tab -e light_select-pass.tab \ -o 10X_clone-pass.tab --format changeo Which expects the following Change-O columns: * ``V_CALL`` * ``J_CALL`` * ``JUNCTION_LENGTH`` * ``UMICOUNT`` * ``CELL`` * ``CLONE``