changeo.IO

File I/O and parsers

class changeo.IO.AIRRReader(handle)

Bases: changeo.IO.TSVReader

An iterator to read and parse AIRR formatted data.

class changeo.IO.AIRRWriter(handle, fields=['sequence_id', 'sequence', 'sequence_alignment', 'germline_alignment', 'rev_comp', 'productive', 'stop_codon', 'vj_in_frame', 'v_call', 'd_call', 'j_call', 'junction', 'junction_length', 'junction_aa', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end'])

Bases: changeo.IO.TSVWriter

Writes AIRR formatted data.

writeReceptor(records)

Writes a row from a Receptor object

Parameters:records – a changeo.Receptor object to write or iterable of such objects.
Returns:None
class changeo.IO.ChangeoReader(handle)

Bases: changeo.IO.TSVReader

An iterator to read and parse Change-O formatted data.

class changeo.IO.ChangeoWriter(handle, fields=['SEQUENCE_ID', 'SEQUENCE_INPUT', 'FUNCTIONAL', 'IN_FRAME', 'STOP', 'MUTATED_INVARIANT', 'INDELS', 'V_CALL', 'D_CALL', 'J_CALL', 'SEQUENCE_VDJ', 'SEQUENCE_IMGT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'V_GERM_START_VDJ', 'V_GERM_LENGTH_VDJ', 'V_GERM_START_IMGT', 'V_GERM_LENGTH_IMGT', 'NP1_LENGTH', 'D_SEQ_START', 'D_SEQ_LENGTH', 'D_GERM_START', 'D_GERM_LENGTH', 'NP2_LENGTH', 'J_SEQ_START', 'J_SEQ_LENGTH', 'J_GERM_START', 'J_GERM_LENGTH', 'JUNCTION', 'JUNCTION_LENGTH', 'GERMLINE_IMGT'], header=True)

Bases: changeo.IO.TSVWriter

Writes Change-O formatted data.

writeReceptor(records)

Writes a row from a Receptor object

Parameters:records – a changeo.Receptor.Receptor object to write or an iterable of such objects.
Returns:None
class changeo.IO.IHMMuneReader(ihmmune, sequences, references, receptor=True)

Bases: object

An iterator to read and parse iHMMune-Align output files.

__iter__()

Iterator initializer.

Returns:changeo.IO.IHMMuneReader
__next__()

Next method.

Returns:parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).
Return type:changeo.Receptor.Receptor
static customFields(scores=False, regions=False, schema=None)

Returns non-standard Receptor attributes defined by the parser

Parameters:
  • scores – if True include alignment scoring fields.
  • regions – if True include IMGT-gapped CDR and FWR region fields.
  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.
Returns:

list of field names.

Return type:

list

ihmmune_fields = ['SEQUENCE_ID', 'V_CALL', 'D_CALL', 'J_CALL', 'V_SEQ', 'NP1_SEQ', 'D_SEQ', 'NP2_SEQ', 'J_SEQ', 'V_MUT', 'D_MUT', 'J_MUT', 'NX_COUNT', 'J_INFRAME', 'V_SEQ_START', 'STOP_COUNT', 'D_PROB', 'HMM_SCORE', 'RC', 'COMMON_MUT', 'COMMON_NX_COUNT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'A_SCORE']
parseRecord(record)

Parses a single row from each IMTG file.

Parameters:record – dictionary containing one row of iHMMune-Align file.
Returns:database entry for the row.
Return type:dict
class changeo.IO.IMGTReader(summary, gapped, ntseq, junction, receptor=True)

Bases: object

An iterator to read and parse IMGT output files.

__iter__()

Iterator initializer.

Returns:changeo.IO.IMGTReader
__next__()

Next method.

Returns:parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).
Return type:changeo.Receptor.Receptor
static customFields(scores=False, regions=False, junction=False, schema=None)

Returns non-standard fields defined by the parser

Parameters:
  • scores – if True include alignment scoring fields.
  • regions – if True include IMGT-gapped CDR and FWR region fields.
  • junction – if True include detailed junction annotation fields.
  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.
Returns:

list of field names.

Return type:

list

parseRecord(summary, gapped, ntseq, junction)

Parses a single row from each IMTG file.

Parameters:
  • summary – dictionary containing one row of the ‘1_Summary’ file.
  • gapped – dictionary containing one row of the ‘2_IMGT-gapped-nt-sequences’ file.
  • ntseq – dictionary containing one row of the ‘3_Nt-sequences’ file.
  • junction – dictionary containing one row of the ‘6_Junction’ file.
Returns:

database entry for the row.

Return type:

dict

class changeo.IO.IgBLASTReader(igblast, sequences, references, asis_calls=False, receptor=True)

Bases: object

An iterator to read and parse IgBLAST output files

__iter__()

Iterator initializer.

Returns:changeo.IO.IgBLASTReader
__next__()

Next method.

Returns:parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).
Return type:changeo.Receptor.Receptor
static customFields(scores=False, regions=False, cdr3=False, schema=None)

Returns non-standard fields defined by the parser

Parameters:
  • scores – if True include alignment scoring fields.
  • regions – if True include IMGT-gapped CDR and FWR region fields.
  • cdr3 – if True include IgBLAST CDR3 assignment fields.
  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.
Returns:

list of field names.

Return type:

list

parseBlock(block)

Parses an IgBLAST result into separate sections

Parameters:block (iter) – an iterator from itertools.groupby containing a single IgBLAST result.
Returns:
a parsed results block;
with the keys ‘query’ (sequence identifier as a string), ‘summary’ (dictionary of the alignment summary), ‘subregion’ (dictionary of IgBLAST CDR3 sequences), and ‘hits’ (VDJ hit table as a list of dictionaries). Returns None if the block has no data that can be parsed.
Return type:dict
parseSections(sections)

Parses an IgBLAST sections into a db dictionary

Parameters:sections – dictionary of parsed sections from parseBlock.
Returns:db entries.
Return type:dict
class changeo.IO.TSVReader(handle)

Bases: object

Simple csv.DictReader wrapper to read format agnostic TSV files.

reader

iter – reader object.

fields

list – field names.

__iter__()

Iterator initializer

Returns:changeo.IO.TSVReader
__next__()

Next method

Returns:row as a dictionary of field:value pairs.
Return type:dist
class changeo.IO.TSVWriter(handle, fields, header=True)

Bases: object

Simple csv.DictWriter wrapper to write format agnostic TSV files.

writeDict(records)

Writes a row from a dictionary

Parameters:records – dictionary of row data or an iterable of such objects.
Returns:None
writeHeader()

Writes the header

Returns:None
changeo.IO.checkFields(attributes, header, schema=<class 'changeo.Receptor.ChangeoSchema'>)

Checks that a file header contains a required set of Receptor attributes

Parameters:
  • attributes (list) – list of Receptor attributes to check for.
  • header (list) – list of fields names in the file header.
  • schema (object) – schema object to convert field names to Receptor attributes.
Returns:

True if all attributes mapping fields are found.

Return type:

bool

Raises:

LookupError

changeo.IO.countDbFile(file)

Counts the records in database files

Parameters:file – tab-delimited database file.
Returns:count of records in the database file.
Return type:int
changeo.IO.extractIMGT(imgt_output)

Extract necessary files from IMGT/HighV-QUEST results.

Parameters:imgt_output – zipped file or unzipped folder output by IMGT/HighV-QUEST.
Returns:(temporary directory handle, dictionary with names of extracted IMGT files).
Return type:tuple
changeo.IO.getDbFields(file, add=None, exclude=None, reader=<class 'changeo.IO.TSVReader'>)

Get field names from a db file

Parameters:
  • file – db file to pull base fields from.
  • add – fields to append to the field set.
  • exclude – fields to exclude from the field set.
  • reader – reader class.
Returns:

list of field names

Return type:

list

changeo.IO.getFormatOperators(format)

Simple wrapper for fetching the set of operator classes for a data format

Parameters:format (str) – name of the data format.
Returns:a tuple with the reader class, writer class, and schema definition class.
Return type:tuple
changeo.IO.getOutputHandle(file, out_label=None, out_dir=None, out_name=None, out_type=None)

Opens an output file handle

Parameters:
  • file – filename to base output file name on.
  • out_label – text to be inserted before the file extension; if None do not add a label.
  • out_type – the file extension of the output file; if None use input file extension.
  • out_dir – the output directory; if None use directory of input file
  • out_name – the short filename to use for the output file; if None use input file short name.
Returns:

File handle

Return type:

file

changeo.IO.getOutputName(file, out_label=None, out_dir=None, out_name=None, out_type=None)

Creates and output filename from an existing filename

Parameters:
  • file – filename to base output file name on.
  • out_label – text to be inserted before the file extension; if None do not add a label.
  • out_type – the file extension of the output file; if None use input file extension.
  • out_dir – the output directory; if None use directory of input file
  • out_name – the short filename to use for the output file; if None use input file short name.
Returns:

file name.

Return type:

str

changeo.IO.readGermlines(repo, asis=False)

Parses germline repositories

Parameters:
  • repo – list of strings specifying directories and/or files from which to read germline records.
  • asis – if True use sequence ID as record name and do not parse headers for allele names.
Returns:

Dictionary of {allele: sequence} germlines

Return type:

dict

changeo.IO.splitName(file)

Extract the extension from a file name

Parameters:file (str) – file name.
Returns:tuple of the file directory, basename and extension.
Return type:tuple
changeo.IO.yamlDict(file)

Returns a dictionary from a yaml file

Parameters:file (str) – simple yaml file with rows in the form ‘argument: value’.
Returns:dictionary of key:value pairs in the file.
Return type:dict