Parsing 10X Genomics V(D)J data


We have an updated tutorial covering the processing of 10x Genomics VDJ data with Change-O and SCOPer. You can also follow the steps below to process 10x VDJ data using methods available in Change-O.

Example data

10X Genomics provides an example data set of Ig V(D)J processed by the Cell Ranger pipeline, which is available for download from their Single Cell Immune Profiling support site.

Converting 10X V(D)J data into the AIRR Community standardized format

To process 10X V(D)J data, a combination of and can be used to generate a TSV file compliant with the AIRR Community Rearrangement schema that incorporates annotation information provided by the Cell Ranger pipeline. The --10x filtered_contig_annotations.csv specifies the path of the contig annotations file generated by cellranger vdj, which can be found in the outs directory.

Generate AIRR Rearrangement data from the 10X V(D)J FASTA files using the steps below: igblast -s filtered_contig.fasta -b ~/share/igblast \
   --organism human --loci ig --format blast igblast -i filtered_contig_igblast.fmt7 -s filtered_contig.fasta \
   -r IMGT_Human_*.fasta --10x filtered_contig_annotations.csv --extended

all_contig.fasta can be exchanged for filtered_contig.fasta, and all_contig_annotations.csv can be exchanged for filtered_contig_annotations.csv.


The resulting table overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST or IMGT/HighV-QUEST instead.

See also

To process mouse data and/or TCR data alter the --organism and --loci arguments to accordingly (e.g., --organism mouse, --loci tcr) and use the appropriate V, D and J IMGT reference databases (e.g., IMGT_Mouse_TR*.fasta)

See the IgBLAST usage guide for further details regarding the setup and use of IgBLAST with Change-O.

Identifying clones from B cells in AIRR formatted 10X V(D)J data

Splitting into separate light and heavy chain files

To group B cells into clones from AIRR Rearrangement data, the output from must be parsed into a light chain file and a heavy chain file: select -d 10x_igblast_db-pass.tsv -f locus -u "IGH" \
        --logic all --regex --outname heavy select -d 10x_igblast_db-pass.tsv -f locus -u "IG[LK]" \
        --logic all --regex --outname light

Assign clonal groups to the heavy chain data

The heavy chain file must then be clonally clustered separately. See Clustering sequences into clonal groups for how to use to assign clonal cluster annotations to the IGH file.

Correct clonal groups based on light chain data currently does not support light chain cloning. However, cloning can be performed after heavy chain cloning using provided on the Immcantation Bitbucket repository in the scripts directory: -d heavy_select-pass_clone-pass.tsv -e light_select-pass.tsv \
        -o 10X_clone-pass.tsv

Here, heavy_select-pass_clone-pass.tsv refers to the cloned heavy chain AIRR Rearrangement file, light_select-pass.tsv refers to the light chain file, and 10X_clone-pass.tsv is the resulting output file.

The algorithm will (1) remove cells associated with more than one heavy chain and (2) correct heavy chain clone definitions based on an analysis of the light chain partners associated with the heavy chain clone.


By default, expects the AIRR Rearrangement columns:

  • v_call

  • j_call

  • junction_length

  • umi_count

  • cell_id

  • clone_id

To process legacy Change-O formatted data add the --format changeo argument: -d -e \
    -o --format changeo

Which expects the following Change-O columns:

  • V_CALL

  • J_CALL



  • CELL