Data Standards

All Change-O tools supports both the legacy Change-O standard and the new Adaptive Immune Receptor Repertoire (AIRR) standard developed by the AIRR Community (AIRR-C).

AIRR-C Format

As of v1.0.0, the default file format is the AIRR-C format as described by the Rearrangement Schema (v1.2). The AIRR-C Rearrangement format is a tab-delimited file format (.tsv) that defines the required and optional annotations for rearranged adaptive immune receptor sequences.

To learn more about this format, the valid field names and their expected values, visit the AIRR-C Rearrangement Schema documentation site.

An API for the input and output of the AIRR-C format is provided in the airr Python package. Wrappers for this package are provided in the API as changeo.IO.AIRRReader and changeo.IO.AIRRWriter.

Change-O Format

The legacy Change-O standard is a tab-delimited file format (.tab) with a set of predefined column names. The standardized column names used by the Change-O format are shown in the table below. Most tools do not require every column. The columns required by and added by each individual tool are described in the commandline usage documentation. If a column contains multiple entries, such as ambiguous V gene assignments, these nested entries are delimited by commas. The ordering of the columns does not matter.

An API for the input and output of the Change-O format is provided in changeo.IO.ChangeoReader and changeo.IO.ChangeoWriter respectively.

Change-O Field

AIRR Field

Type

Description

Standard Annotations

SEQUENCE_ID

sequence_id

string

Unique sequence identifier

SEQUENCE_INPUT

sequence

string

Input nucleotide sequence

SEQUENCE_VDJ

string

V(D)J nucleotide sequence

SEQUENCE_IMGT

sequence_alignment

string

IMGT-numbered V(D)J nucleotide sequence

FUNCTIONAL

productive

logical

T: V(D)J sequence is predicted to be productive

IN_FRAME

vj_in_frame

logical

T: junction region nucleotide sequence is in-frame

STOP

stop_codon

logical

T: stop codon is present in V(D)J nucleotide sequence

MUTATED_INVARIANT

logical

T: invariant amino acids properly encoded by V(D)J sequence

INDELS

logical

T: V(D)J nucleotide sequence contains insertions and/or deletions

LOCUS

locus

string

Locus of the receptor

V_CALL

v_call

string

V allele assignment(s)

D_CALL

d_call

string

D allele assignment(s)

J_CALL

j_call

string

J allele assignment(s)

C_CALL

c_call

string

C-region assignment

V_SEQ_START

v_sequence_start

integer

Position of first V nucleotide in SEQUENCE_INPUT

V_SEQ_LENGTH

integer

Number of V nucleotides in SEQUENCE_INPUT

V_GERM_START_IMGT

v_germline_start

integer

Position of V_SEQ_START in IMGT-numbered germline V(D)J sequence

V_GERM_LENGTH_IMGT

integer

Length of the IMGT numbered germline V alignment

NP1_LENGTH

np1_length

integer

Number of nucleotides between V and D segments

D_SEQ_START

d_sequence_start

integer

Position of first D nucleotide in SEQUENCE_INPUT

D_SEQ_LENGTH

integer

Number of D nucleotides in SEQUENCE_INPUT

D_GERM_START

d_germline_start

integer

Position of D_SEQ_START in germline V(D)J nucleotide sequence

D_GERM_LENGTH

integer

Length of the germline D alignment

NP2_LENGTH

np2_length

integer

Number of nucleotides between D and J segments

J_SEQ_START

j_sequence_start

integer

Position of first J nucleotide in SEQUENCE_INPUT

J_SEQ_LENGTH

j_sequence_end

integer

Number of J nucleotides in SEQUENCE_INPUT

J_GERM_START

j_germline_start

integer

Position of J_SEQ_START in germline V(D)J nucleotide sequence

J_GERM_LENGTH

integer

Length of the germline J alignment

JUNCTION_LENGTH

junction_length

integer

Number of junction nucleotides in SEQUENCE_VDJ

JUNCTION

junction

string

Junction region nucletide sequence

CELL

cell_id

string

Cell identifier

CLONE

clone_id

string

Clonal grouping identifier

Region Annotations

FWR1_IMGT

fwr1

string

IMGT-numbered FWR1 nucleotide sequence

FWR2_IMGT

fwr2

string

IMGT-numbered FWR2 nucleotide sequence

FWR3_IMGT

fwr3

string

IMGT-numbered FWR3 nucleotide sequence

FWR4_IMGT

fwr4

string

IMGT-numbered FWR4 nucleotide sequence

CDR1_IMGT

cdr1

string

IMGT-numbered CDR1 nucleotide sequence

CDR2_IMGT

cdr2

string

IMGT-numbered CDR2 nucleotide sequence

CDR3_IMGT

cdr3

string

IMGT-numbered CDR3 nucleotide sequence

N1_LENGTH

n1_length

integer

Untemplated nucleotides 5’ of the D segment

N2_LENGTH

n2_length

integer

Untemplated Nucleotides 3’ of the D segment

P3V_LENGTH

p3v_length

integer

Palindromic nucleotides 3’ of the V segment

P5D_LENGTH

p5d_length

integer

Palindromic nucleotides 5’ of the D segment

P3D_LENGTH

p3d_length

integer

Palindromic nucleotides 3’ of the D segment

P5J_LENGTH

p5j_length

integer

Palindromic nucleotides 5’ of the J segment

D_FRAME

integer

D segment reading frame

Germline Annotations

GERMLINE_VDJ

string

Full unaligned germline V(D)J nucleotide sequence

GERMLINE_VDJ_V_REGION

string

Unaligned germline V segment nucleotide sequence

GERMLINE_VDJ_D_MASK

string

Unaligned germline V(D)J nucleotides sequence with Ns masking the NP1-D-NP2 regions

GERMLINE_IMGT

germline_alignment

string

Full IMGT-numbered germline V(D)J nucleotide sequence

GERMLINE_IMGT_V_REGION

string

IMGT-numbered germline V segment nucleotide sequence

GERMLINE_IMGT_D_MASK

string

IMGT-numbered germline V(D)J nucleotide sequence with Ns masking the NP1-D-NP2 regions

GERMLINE_V_CALL

string

Clonal consensus germline V assignment

GERMLINE_D_CALL

string

Clonal consensus germline D assignment

GERMLINE_J_CALL

string

Clonal consensus germline J assignment

GERMLINE_REGIONS

string

String showing germline segments positions encoded as V, D, J, N, and P characters

Alignment Annotations

V_SCORE

v_score

float

Alignment score for the V

V_IDENTITY

v_identity

float

Alignment identity for the V

V_EVALUE

v_support

float

E-value for the alignment of the V

V_CIGAR

v_cigar

string

CIGAR string for the alignment of the V

D_SCORE

d_score

float

Alignment score for the D

D_IDENTITY

d_identity

float

Alignment identity for the D

D_EVALUE

d_support

float

E-value for the alignment of the D

D_CIGAR

d_cigar

string

CIGAR string for the alignment of the D

J_SCORE

j_score

float

Alignment score for the J

J_IDENTITY

j_identity

float

Alignment identity for the J

J_EVALUE

j_support

float

E-value for the alignment of the J

J_CIGAR

j_cigar

string

CIGAR string for the alignment of the J

VDJ_SCORE

float

Alignment score for the V(D)J

TIgGER Annotations

V_CALL_GENOTYPED

string

Adjusted V allele assignment(s) following TIgGER genotype inference

Preprocessing Annotations

PRCONS

string

pRESTO UMI consensus primer

PRIMER

string

pRESTO primers list

CONSCOUNT

consensus_count

integer

Number of reads contributing to the UMI consensus sequence

DUPCOUNT

duplicate_count

integer

Copy number of the sequence

UMICOUNT

integer

UMI count for the sequence