Parsing 10X Genomics V(D)J data

Example data

10X Genomics provides an example data set of Ig V(D)J processed by the Cell Ranger pipeline, which is available for download from their Single Cell Immune Profiling support site.

Converting 10X V(D)J data into the AIRR Community standardized format

To process 10X V(D)J data, a combination of AssignGenes.py and MakeDb.py can be used to generate a TSV file compliant with the AIRR Community Rearrangement schema that incorporates annotation information provided by the Cell Ranger pipeline. The --10x filtered_contig_annotations.csv specifies the path of the contig annotations file generated by cellranger vdj, which can be found in the outs directory.

Generate AIRR Rearrangement data from the 10X V(D)J FASTA files using the steps below:

AssignGenes.py igblast -s filtered_contig.fasta -b ~/share/igblast \
   --organism human --loci ig --format blast
MakeDb.py igblast -i filtered_contig_igblast.fmt7 -s filtered_contig.fasta \
   -r IMGT_Human_*.fasta --10x filtered_contig_annotations.csv --extended

all_contig.fasta can be exchanged for filtered_contig.fasta, and all_contig_annotations.csv can be exchanged for filtered_contig_annotations.csv.

Warning

The resulting table overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST or IMGT/HighV-QUEST instead.

See also

To process mouse data and/or TCR data alter the --organism and --loci arguments to AssignGenes.py accordingly (e.g., --organism mouse, --loci tcr) and use the appropriate V, D and J IMGT reference databases (e.g., IMGT_Mouse_TR*.fasta)

See the IgBLAST usage guide for further details regarding the setup and use of IgBLAST with Change-O.

Identifying clones from B cells in AIRR formatted 10X V(D)J data

Splitting into separate light and heavy chain files

To group B cells into clones from AIRR Rearrangement data, the output from MakeDb.py must be parsed into a light chain file and a heavy chain file:

ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IGH" \
        --logic all --regex --outname heavy
ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IG[LK]" \
        --logic all --regex --outname light

Assign clonal groups to the heavy chain data

The heavy chain file must then be clonally clustered separately. See Clustering sequences into clonal groups for how to use DefineClones.py to assign clonal cluster annotations to the IGH file.

Correct clonal groups based on light chain data

DefineClones.py currently does not support light chain cloning. However, cloning can be performed after heavy chain cloning using light_cluster.py provided on the Immcantation Bitbucket repository in the scripts directory:

light_cluster.py -d heavy_select-pass_clone-pass.tsv -e light_select-pass.tsv \
        -o 10X_clone-pass.tsv

Here, heavy_select-pass_clone-pass.tsv refers to the cloned heavy chain AIRR Rearrangement file, light_select-pass.tsv refers to the light chain file, and 10X_clone-pass.tsv is the resulting output file.

The algorithm will (1) remove cells associated with more than one heavy chain and (2) correct heavy chain clone definitions based on an analysis of the light chain partners associated with the heavy chain clone.

Note

By default, light_chain.py expects the AIRR Rearrangement columns:

  • v_call

  • j_call

  • junction_length

  • umi_count

  • cell_id

  • clone_id

To process legacy Change-O formatted data add the --format changeo argument:

light_cluster.py -d heavy_select-pass_clone-pass.tab -e light_select-pass.tab \
    -o 10X_clone-pass.tab --format changeo

Which expects the following Change-O columns:

  • V_CALL

  • J_CALL

  • JUNCTION_LENGTH

  • UMICOUNT

  • CELL

  • CLONE