changeo.IO

File I/O and parsers

class changeo.IO.AIRRReader(handle)

Bases: TSVReader

An iterator to read and parse AIRR formatted data.

class changeo.IO.AIRRWriter(handle, fields=['sequence_id', 'sequence', 'sequence_alignment', 'germline_alignment', 'rev_comp', 'productive', 'stop_codon', 'vj_in_frame', 'locus', 'v_call', 'd_call', 'j_call', 'c_call', 'junction', 'junction_length', 'junction_aa', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end'])

Bases: TSVWriter

Writes AIRR formatted data.

writeReceptor(records)

Writes a row from a Receptor object

Parameters:

records – a changeo.Receptor object to write or iterable of such objects.

Returns:

None

class changeo.IO.ChangeoReader(handle)

Bases: TSVReader

An iterator to read and parse Change-O formatted data.

class changeo.IO.ChangeoWriter(handle, fields=['SEQUENCE_ID', 'SEQUENCE_INPUT', 'FUNCTIONAL', 'IN_FRAME', 'STOP', 'MUTATED_INVARIANT', 'INDELS', 'LOCUS', 'V_CALL', 'D_CALL', 'J_CALL', 'SEQUENCE_VDJ', 'SEQUENCE_IMGT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'V_GERM_START_VDJ', 'V_GERM_LENGTH_VDJ', 'V_GERM_START_IMGT', 'V_GERM_LENGTH_IMGT', 'NP1_LENGTH', 'D_SEQ_START', 'D_SEQ_LENGTH', 'D_GERM_START', 'D_GERM_LENGTH', 'NP2_LENGTH', 'J_SEQ_START', 'J_SEQ_LENGTH', 'J_GERM_START', 'J_GERM_LENGTH', 'JUNCTION', 'JUNCTION_LENGTH', 'GERMLINE_IMGT'], header=True)

Bases: TSVWriter

Writes Change-O formatted data.

writeReceptor(records)

Writes a row from a Receptor object

Parameters:

records – a changeo.Receptor.Receptor object to write or an iterable of such objects.

Returns:

None

class changeo.IO.IHMMuneReader(ihmmune, sequences, references, receptor=True)

Bases: object

An iterator to read and parse iHMMune-Align output files.

__iter__()

Iterator initializer.

Returns:

changeo.IO.IHMMuneReader

__next__()

Next method.

Returns:

parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).

Return type:

changeo.Receptor.Receptor

static customFields(scores=False, regions=False, cell=False, schema=None)

Returns non-standard Receptor attributes defined by the parser

Parameters:
  • scores – if True include alignment scoring fields.

  • regions – if True include IMGT-gapped CDR and FWR region fields.

  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns:

list of field names.

Return type:

list

ihmmune_fields = ['SEQUENCE_ID', 'V_CALL', 'D_CALL', 'J_CALL', 'V_SEQ', 'NP1_SEQ', 'D_SEQ', 'NP2_SEQ', 'J_SEQ', 'V_MUT', 'D_MUT', 'J_MUT', 'NX_COUNT', 'J_INFRAME', 'V_SEQ_START', 'STOP_COUNT', 'D_PROB', 'HMM_SCORE', 'RC', 'COMMON_MUT', 'COMMON_NX_COUNT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'A_SCORE']
parseRecord(record)

Parses a single row from each IMTG file.

Parameters:

record – dictionary containing one row of iHMMune-Align file.

Returns:

database entry for the row.

Return type:

dict

class changeo.IO.IMGTReader(summary, gapped, ntseq, junction, receptor=True)

Bases: object

An iterator to read and parse IMGT output files.

__iter__()

Iterator initializer.

Returns:

changeo.IO.IMGTReader

__next__()

Next method.

Returns:

parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).

Return type:

changeo.Receptor.Receptor

static customFields(scores=False, regions=False, junction=False, schema=None)

Returns non-standard fields defined by the parser

Parameters:
  • scores – if True include alignment scoring fields.

  • regions – if True include IMGT-gapped CDR and FWR region fields.

  • junction – if True include detailed junction annotation fields.

  • schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns:

list of field names.

Return type:

list

parseRecord(summary, gapped, ntseq, junction)

Parses a single row from each IMTG file.

Parameters:
  • summary – dictionary containing one row of the ‘1_Summary’ file.

  • gapped – dictionary containing one row of the ‘2_IMGT-gapped-nt-sequences’ file.

  • ntseq – dictionary containing one row of the ‘3_Nt-sequences’ file.

  • junction – dictionary containing one row of the ‘6_Junction’ file.

Returns:

database entry for the row.

Return type:

dict

class changeo.IO.IgBLASTReader(igblast, sequences, references, asis_calls=False, regions='default', receptor=True, infer_junction=False)

Bases: object

An iterator to read and parse IgBLAST output files

__iter__()

Iterator initializer.

Returns:

changeo.IO.IgBLASTReader

__next__()

Next method.

Returns:

parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False).

Return type:

changeo.Receptor.Receptor

static customFields(schema=None)

Returns non-standard fields defined by the parser

Parameters:

schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns:

list of field names.

Return type:

list

parseBlock(block)

Parses an IgBLAST result into separate sections

Parameters:

block (iter) – an iterator from itertools.groupby containing a single IgBLAST result.

Returns:

a parsed results block;

with the keys ‘query’ (sequence identifier as a string), ‘summary’ (dictionary of the alignment summary), ‘subregion’ (dictionary of IgBLAST CDR3 sequences), and ‘hits’ (VDJ hit table as a list of dictionaries). Returns None if the block has no data that can be parsed.

Return type:

dict

parseSections(sections)

Parses an IgBLAST sections into a db dictionary

Parameters:

sections – dictionary of parsed sections from parseBlock.

Returns:

db entries.

Return type:

dict

class changeo.IO.IgBLASTReaderAA(igblast, sequences, references, asis_calls=False, regions='default', receptor=True, infer_junction=False)

Bases: IgBLASTReader

An iterator to read and parse IgBLAST amino acid alignment output files

static customFields(schema=None)

Returns non-standard fields defined by the parser

Parameters:

schema – schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names.

Returns:

list of field names.

Return type:

list

parseSections(sections)

Parses an IgBLAST sections into a db dictionary

Parameters:

sections – dictionary of parsed sections from parseBlock.

Returns:

db entries.

Return type:

dict

class changeo.IO.TSVReader(handle)

Bases: object

Simple csv.DictReader wrapper to read format agnostic TSV files.

reader

reader object.

Type:

iter

fields

field names.

Type:

list

__iter__()

Iterator initializer

Returns:

changeo.IO.TSVReader

__next__()

Next method

Returns:

row as a dictionary of field:value pairs.

Return type:

dist

class changeo.IO.TSVWriter(handle, fields, header=True)

Bases: object

Simple csv.DictWriter wrapper to write format agnostic TSV files.

writeDict(records)

Writes a row from a dictionary

Parameters:

records – dictionary of row data or an iterable of such objects.

Returns:

None

writeHeader()

Writes the header

Returns:

None

changeo.IO.checkFields(attributes, header, schema=<class 'changeo.Receptor.AIRRSchema'>)

Checks that a file header contains a required set of Receptor attributes

Parameters:
  • attributes (list) – list of Receptor attributes to check for.

  • header (list) – list of fields names in the file header.

  • schema (object) – schema object to convert field names to Receptor attributes.

Returns:

True if all attributes mapping fields are found.

Return type:

bool

Raises:

LookupError

changeo.IO.countDbFile(file)

Counts the records in database files

Parameters:

file – tab-delimited database file.

Returns:

count of records in the database file.

Return type:

int

changeo.IO.extractIMGT(imgt_output)

Extract necessary files from IMGT/HighV-QUEST results.

Parameters:

imgt_output – zipped file or unzipped folder output by IMGT/HighV-QUEST.

Returns:

(temporary directory handle, dictionary with names of extracted IMGT files).

Return type:

tuple

changeo.IO.getDbFields(file, add=None, exclude=None, reader=<class 'changeo.IO.TSVReader'>)

Get field names from a db file

Parameters:
  • file – db file to pull base fields from.

  • add – fields to append to the field set.

  • exclude – fields to exclude from the field set.

  • reader – reader class.

Returns:

list of field names

Return type:

list

changeo.IO.getFormatOperators(format)

Simple wrapper for fetching the set of operator classes for a data format

Parameters:

format (str) – name of the data format.

Returns:

a tuple with the reader class, writer class, and schema definition class.

Return type:

tuple

changeo.IO.getOutputHandle(file, out_label=None, out_dir=None, out_name=None, out_type=None)

Opens an output file handle

Parameters:
  • file – filename to base output file name on.

  • out_label – text to be inserted before the file extension; if None do not add a label.

  • out_type – the file extension of the output file; if None use input file extension.

  • out_dir – the output directory; if None use directory of input file

  • out_name – the short filename to use for the output file; if None use input file short name.

Returns:

File handle

Return type:

file

changeo.IO.getOutputName(file, out_label=None, out_dir=None, out_name=None, out_type=None)

Creates and output filename from an existing filename

Parameters:
  • file – filename to base output file name on.

  • out_label – text to be inserted before the file extension; if None do not add a label.

  • out_type – the file extension of the output file; if None use input file extension.

  • out_dir – the output directory; if None use directory of input file

  • out_name – the short filename to use for the output file; if None use input file short name.

Returns:

file name.

Return type:

str

changeo.IO.readGermlines(references, asis=False, warn=False)

Parses germline repositories

Parameters:
  • references (list) – list of strings specifying directories and/or files from which to read germline records.

  • asis (bool) – if True use sequence ID as record name and do not parse headers for allele names.

  • warn (bool) – print warning messages to standard error if True.

Returns:

Dictionary of germlines in the form {allele: sequence}.

Return type:

dict

changeo.IO.splitName(file)

Extract the extension from a file name

Parameters:

file (str) – file name.

Returns:

tuple of the file directory, basename and extension.

Return type:

tuple

changeo.IO.yamlDict(file)

Returns a dictionary from a yaml file

Parameters:

file (str) – simple yaml file with rows in the form ‘argument: value’.

Returns:

dictionary of key:value pairs in the file.

Return type:

dict