Data Standard

All Change-O tools supports both the classical Change-O standard and the new Adaptive Immune Receptor Repertoire (AIRR) standard.

The Change-O standard is a tab-delimited file format (.tab) with a set of predefine column names. The standardized column names used by the Change-O format are shown in the table below. Most tools do not require every column. The columns required by, and added by, each individual tool are described in the commandline usage documentation. If a columns contains multiple entries, such as ambiguous V segment assignments, these nested extries are delimited by commas. The ordering of the columns does not matter.

An API for input and output of the Change-O format is provide in changeo.IO.ChangeoReader and changeo.IO.ChangeoWriter, respectively. See changeo.Receptor.Receptor for a description of the data structure used by the reader and writer classes.

Column Name Type Description
MakeDb - Standard Annotations    
SEQUENCE_ID string Unique sequence identifier
SEQUENCE_INPUT string Input nucleotide sequence
SEQUENCE_VDJ string V(D)J nucleotide sequence
SEQUENCE_IMGT string IMGT-numbered V(D)J nucleotide sequence
FUNCTIONAL logical T: V(D)J sequence is predicted to be productive
IN_FRAME logical T: junction region nucleotide sequence is in-frame
STOP logical T: stop codon is present in V(D)J nucleotide sequence
MUTATED_INVARIANT logical T: invariant amino acids properly encoded by V(D)J sequence
INDELS logical T: V(D)J nucleotide sequence contains insertions and/or deletions
V_CALL string V allele assignment(s)
D_CALL string D allele assignment(s)
J_CALL string J allele assignment(s)
V_SEQ_START integer Position of first V nucleotide in SEQUENCE_INPUT
V_SEQ_LENGTH integer Number of V nucleotides in SEQUENCE_INPUT
V_GERM_START_IMGT integer Position of V_SEQ_START in IMGT-numbered germline V(D)J sequence
V_GERM_LENGTH_IMGT integer Length of the IMGT numbered germline V alignment
NP1_LENGTH integer Number of nucleotides between V and D segments
D_SEQ_START integer Position of first D nucleotide in SEQUENCE_INPUT
D_SEQ_LENGTH integer Number of D nucleotides in SEQUENCE_INPUT
D_GERM_START integer Position of D_SEQ_START in germline V(D)J nucleotide sequence
D_GERM_LENGTH integer Length of the germline D alignment
NP2_LENGTH integer Number of nucleotides between D and J segments
J_SEQ_START integer Position of first J nucleotide in SEQUENCE_INPUT
J_SEQ_LENGTH integer Number of J nucleotides in SEQUENCE_INPUT
J_GERM_START integer Position of J_SEQ_START in germline V(D)J nucleotide sequence
J_GERM_LENGTH integer Length of the germline J alignment
JUNCTION_LENGTH integer Number of junction nucleotides in SEQUENCE_VDJ
JUNCTION string Junction region nucletide sequence
MakeDb - Optional Region Annotations    
FWR1_IMGT string IMGT-numbered FWR1 nucleotide sequence
FWR2_IMGT string IMGT-numbered FWR2 nucleotide sequence
FWR3_IMGT string IMGT-numbered FWR3 nucleotide sequence
FWR4_IMGT string IMGT-numbered FWR4 nucleotide sequence
CDR1_IMGT string IMGT-numbered CDR1 nucleotide sequence
CDR2_IMGT string IMGT-numbered CDR2 nucleotide sequence
CDR3_IMGT string IMGT-numbered CDR3 nucleotide sequence
MakeDb - IMGT Optional Annotations    
N1_LENGTH integer Untemplated nucleotides 5’ of the D segment
N2_LENGTH integer Untemplated Nucleotides 3’ of the D segment
P3V_LENGTH integer Palindromic nucleotides 3’ of the V segment
P5D_LENGTH integer Palindromic nucleotides 5’ of the D segment
P3D_LENGTH integer Palindromic nucleotides 3’ of the D segment
P5J_LENGTH integer Palindromic nucleotides 5’ of the J segment
D_FRAME integer D segment reading frame
V_SCORE float Alignment score for the V
V_IDENTITY float Alignment identity for the V
D_SCORE float Alignment score for the D
D_IDENTITY float Alignment identity for the D
J_SCORE float Alignment score for the J
J_IDENTITY float Alignment identity for the J
MakeDb - IgBLAST Optional Annotations    
CDR3_IGBLAST string CDR3 nucleotide sequence assigned by IgBLAST
CDR3_IGBLAST_AA string CDR3 amino acid sequence assigned by IgBLAST
V_SCORE float Alignment score for the V
V_IDENTITY float Alignment identity for the V
V_EVALUE float E-value for the alignment of the V
V_CIGAR string CIGAR string for the alignment of the V
D_SCORE float Alignment score for the D
D_IDENTITY float Alignment identity for the D
D_EVALUE float E-value for the alignment of the D
D_CIGAR string CIGAR string for the alignment of the D
J_SCORE float Alignment score for the J
J_IDENTITY float Alignment identity for the J
J_EVALUE float E-value for the alignment of the J
J_CIGAR string CIGAR string for the alignment of the J
MakeDb - iHHMune-Align Optional Annotations    
VDJ_SCORE float Alignment score for the V(D)J
CreateGermlines Annotations    
GERMLINE_VDJ string Full unaligned germline V(D)J nucleotide sequence
GERMLINE_VDJ_V_REGION string Unaligned germline V segment nucleotide sequence
GERMLINE_VDJ_D_MASK string Unaligned germline V(D)J nucleotides sequence with Ns masking the NP1-D-NP2 regions
GERMLINE_IMGT string Full IMGT-numbered germline V(D)J nucleotide sequence
GERMLINE_IMGT_V_REGION string IMGT-numbered germline V segment nucleotide sequence
GERMLINE_IMGT_D_MASK string IMGT-numbered germline V(D)J nucleotide sequence with Ns masking the NP1-D-NP2 regions
GERMLINE_V_CALL string Clonal consensus germline V assignment
GERMLINE_D_CALL string Clonal consensus germline D assignment
GERMLINE_J_CALL string Clonal consensus germline J assignment
GERMLINE_REGIONS string String showing germline segments positions encoded as V, D, J, N, and P characters
DefineClones Annotations    
CLONE integer Clonal grouping identifier
TIgGER Annotations    
V_CALL_GENOTYPED string Adjusted V allele assignment(s) following TIgGER genotype inference
pRESTO Annotations    
PRCONS string UMI consensus primer
PRIMER string list of primers
CONSCOUNT integer number of reads contributing to the UMI consensus sequence
DUPCOUNT integer copy number of the sequence