changeo.IO

File I/O and parsers

class changeo.IO.AIRRReader(handle)

Bases: changeo.IO.TSVReader

An iterator to read and parse AIRR formatted data.

class changeo.IO.AIRRWriter(handle, fields=['sequence_id', 'sequence', 'sequence_alignment', 'germline_alignment', 'rev_comp', 'productive', 'stop_codon', 'vj_in_frame', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_length', 'junction_aa', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end'])

Bases: changeo.IO.TSVWriter

Writes AIRR formatted data.

writeReceptor(records)

Writes a row from a Receptor object

Parameters

records – a changeo.Receptor object to write or iterable of such objects.

Returns

None

class changeo.IO.ChangeoReader(handle)

Bases: changeo.IO.TSVReader

An iterator to read and parse Change-O formatted data.

class changeo.IO.ChangeoWriter(handle, fields=['SEQUENCE_ID', 'SEQUENCE_INPUT', 'FUNCTIONAL', 'IN_FRAME', 'STOP', 'MUTATED_INVARIANT', 'INDELS', 'LOCUS', 'V_CALL', 'D_CALL', 'J_CALL', 'SEQUENCE_VDJ', 'SEQUENCE_IMGT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'V_GERM_START_VDJ', 'V_GERM_LENGTH_VDJ', 'V_GERM_START_IMGT', 'V_GERM_LENGTH_IMGT', 'NP1_LENGTH', 'D_SEQ_START', 'D_SEQ_LENGTH', 'D_GERM_START', 'D_GERM_LENGTH', 'NP2_LENGTH', 'J_SEQ_START', 'J_SEQ_LENGTH', 'J_GERM_START', 'J_GERM_LENGTH', 'JUNCTION', 'JUNCTION_LENGTH', 'GERMLINE_IMGT'], header=True)

Bases: changeo.IO.TSVWriter

Writes Change-O formatted data.

writeReceptor(records)

Writes a row from a Receptor object

Parameters

records – a changeo.Receptor.Receptor object to write or an iterable of such objects.

Returns

None

class changeo.IO.IHMMuneReader(ihmmune, sequences, references, receptor=True)

Bases: object

An iterator to read and parse iHMMune-Align output files.

__iter__()

Iterator initializer.

Returns

changeo.IO.IHMMuneReader

__next__()

Next method.

Returns

parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).

Return type

changeo.Receptor.Receptor

static customFields(scores=False, regions=False, cell=False, schema=None)

Returns non-standard Receptor attributes defined by the parser

Parameters
  • scores – if True include alignment scoring fields.

  • regions – if True include IMGT-gapped CDR and FWR region fields.

  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns

list of field names.

Return type

list

ihmmune_fields = ['SEQUENCE_ID', 'V_CALL', 'D_CALL', 'J_CALL', 'V_SEQ', 'NP1_SEQ', 'D_SEQ', 'NP2_SEQ', 'J_SEQ', 'V_MUT', 'D_MUT', 'J_MUT', 'NX_COUNT', 'J_INFRAME', 'V_SEQ_START', 'STOP_COUNT', 'D_PROB', 'HMM_SCORE', 'RC', 'COMMON_MUT', 'COMMON_NX_COUNT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'A_SCORE']
parseRecord(record)

Parses a single row from each IMTG file.

Parameters

record – dictionary containing one row of iHMMune-Align file.

Returns

database entry for the row.

Return type

dict

class changeo.IO.IMGTReader(summary, gapped, ntseq, junction, receptor=True)

Bases: object

An iterator to read and parse IMGT output files.

__iter__()

Iterator initializer.

Returns

changeo.IO.IMGTReader

__next__()

Next method.

Returns

parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).

Return type

changeo.Receptor.Receptor

static customFields(scores=False, regions=False, junction=False, schema=None)

Returns non-standard fields defined by the parser

Parameters
  • scores – if True include alignment scoring fields.

  • regions – if True include IMGT-gapped CDR and FWR region fields.

  • junction – if True include detailed junction annotation fields.

  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns

list of field names.

Return type

list

parseRecord(summary, gapped, ntseq, junction)

Parses a single row from each IMTG file.

Parameters
  • summary – dictionary containing one row of the ‘1_Summary’ file.

  • gapped – dictionary containing one row of the ‘2_IMGT-gapped-nt-sequences’ file.

  • ntseq – dictionary containing one row of the ‘3_Nt-sequences’ file.

  • junction – dictionary containing one row of the ‘6_Junction’ file.

Returns

database entry for the row.

Return type

dict

class changeo.IO.IgBLASTReader(igblast, sequences, references, asis_calls=False, regions='default', receptor=True, infer_junction=False)

Bases: object

An iterator to read and parse IgBLAST output files

__iter__()

Iterator initializer.

Returns

changeo.IO.IgBLASTReader

__next__()

Next method.

Returns

parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).

Return type

changeo.Receptor.Receptor

static customFields(schema=None)

Returns non-standard fields defined by the parser

Parameters

schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns

list of field names.

Return type

list

parseBlock(block)

Parses an IgBLAST result into separate sections

Parameters

block (iter) – an iterator from itertools.groupby containing a single IgBLAST result.

Returns

a parsed results block;

with the keys ‘query’ (sequence identifier as a string), ‘summary’ (dictionary of the alignment summary), ‘subregion’ (dictionary of IgBLAST CDR3 sequences), and ‘hits’ (VDJ hit table as a list of dictionaries). Returns None if the block has no data that can be parsed.

Return type

dict

parseSections(sections)

Parses an IgBLAST sections into a db dictionary

Parameters

sections – dictionary of parsed sections from parseBlock.

Returns

db entries.

Return type

dict

class changeo.IO.IgBLASTReaderAA(igblast, sequences, references, asis_calls=False, regions='default', receptor=True, infer_junction=False)

Bases: changeo.IO.IgBLASTReader

An iterator to read and parse IgBLAST amino acid alignment output files

static customFields(schema=None)

Returns non-standard fields defined by the parser

Parameters

schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns

list of field names.

Return type

list

parseSections(sections)

Parses an IgBLAST sections into a db dictionary

Parameters

sections – dictionary of parsed sections from parseBlock.

Returns

db entries.

Return type

dict

class changeo.IO.TSVReader(handle)

Bases: object

Simple csv.DictReader wrapper to read format agnostic TSV files.

reader

reader object.

Type

iter

fields

field names.

Type

list

__iter__()

Iterator initializer

Returns

changeo.IO.TSVReader

__next__()

Next method

Returns

row as a dictionary of field:value pairs.

Return type

dist

class changeo.IO.TSVWriter(handle, fields, header=True)

Bases: object

Simple csv.DictWriter wrapper to write format agnostic TSV files.

writeDict(records)

Writes a row from a dictionary

Parameters

records – dictionary of row data or an iterable of such objects.

Returns

None

writeHeader()

Writes the header

Returns

None

changeo.IO.checkFields(attributes, header, schema=<class 'changeo.Receptor.AIRRSchema'>)

Checks that a file header contains a required set of Receptor attributes

Parameters
  • attributes (list) – list of Receptor attributes to check for.

  • header (list) – list of fields names in the file header.

  • schema (object) – schema object to convert field names to Receptor attributes.

Returns

True if all attributes mapping fields are found.

Return type

bool

Raises

LookupError

changeo.IO.countDbFile(file)

Counts the records in database files

Parameters

file – tab-delimited database file.

Returns

count of records in the database file.

Return type

int

changeo.IO.extractIMGT(imgt_output)

Extract necessary files from IMGT/HighV-QUEST results.

Parameters

imgt_output – zipped file or unzipped folder output by IMGT/HighV-QUEST.

Returns

(temporary directory handle, dictionary with names of extracted IMGT files).

Return type

tuple

changeo.IO.getDbFields(file, add=None, exclude=None, reader=<class 'changeo.IO.TSVReader'>)

Get field names from a db file

Parameters
  • file – db file to pull base fields from.

  • add – fields to append to the field set.

  • exclude – fields to exclude from the field set.

  • reader – reader class.

Returns

list of field names

Return type

list

changeo.IO.getFormatOperators(format)

Simple wrapper for fetching the set of operator classes for a data format

Parameters

format (str) – name of the data format.

Returns

a tuple with the reader class, writer class, and schema definition class.

Return type

tuple

changeo.IO.getOutputHandle(file, out_label=None, out_dir=None, out_name=None, out_type=None)

Opens an output file handle

Parameters
  • file – filename to base output file name on.

  • out_label – text to be inserted before the file extension; if None do not add a label.

  • out_type – the file extension of the output file; if None use input file extension.

  • out_dir – the output directory; if None use directory of input file

  • out_name – the short filename to use for the output file; if None use input file short name.

Returns

File handle

Return type

file

changeo.IO.getOutputName(file, out_label=None, out_dir=None, out_name=None, out_type=None)

Creates and output filename from an existing filename

Parameters
  • file – filename to base output file name on.

  • out_label – text to be inserted before the file extension; if None do not add a label.

  • out_type – the file extension of the output file; if None use input file extension.

  • out_dir – the output directory; if None use directory of input file

  • out_name – the short filename to use for the output file; if None use input file short name.

Returns

file name.

Return type

str

changeo.IO.readGermlines(references, asis=False, warn=False)

Parses germline repositories

Parameters
  • references (list) – list of strings specifying directories and/or files from which to read germline records.

  • asis (bool) – if True use sequence ID as record name and do not parse headers for allele names.

  • warn (bool) – print warning messages to standard error if True.

Returns

Dictionary of germlines in the form {allele: sequence}.

Return type

dict

changeo.IO.splitName(file)

Extract the extension from a file name

Parameters

file (str) – file name.

Returns

tuple of the file directory, basename and extension.

Return type

tuple

changeo.IO.yamlDict(file)

Returns a dictionary from a yaml file

Parameters

file (str) – simple yaml file with rows in the form ‘argument: value’.

Returns

dictionary of key:value pairs in the file.

Return type

dict