changeo.Distance

Distance calculations

changeo.Distance.calcDistances(sequences, n, dist_mat, sym='avg', norm=None)

Calculate pairwise distances between input sequences

Parameters:
  • sequences – List of sequences for which to calculate pairwise distances

  • n – Length of n-mers to be used in calculating distance

  • dist_mat – pandas.DataFrame of mutation distances

  • norm – Normalization method. One of None, ‘len’, or ‘mut’.

  • sym – Symmetry method; one of ‘avg’ of ‘min.

Returns:

numpy matrix of pairwise distances between input sequences

Return type:

ndarray

changeo.Distance.formClusters(dists, link, distance)

Form clusters based on hierarchical clustering of input distance matrix with linkage type and cutoff distance

Parameters:
  • dists – numpy matrix of distances

  • link – Linkage type for hierarchical clustering

  • distance – Distance at which to cut into clusters

Returns:

List of cluster assignments

Return type:

list

changeo.Distance.getAADistMatrix(mat=None, mask_dist=0, gap_dist=0)

Generates an amino acid distance matrix

Parameters:
  • mat – Input distance matrix to extend to full alphabet; if unspecified, creates Hamming distance matrix that incorporates IUPAC equivalencies

  • mask_dict – Score for all matches against an X character

  • gap_dist – Score for all matches against a gap (-, .) character

Returns:

pandas.DataFrame of distances

Return type:

DataFrame

changeo.Distance.getDNADistMatrix(mat=None, mask_dist=0, gap_dist=0)

Generates a DNA distance matrix

Parameters:
  • mat – Input distance matrix to extend to full alphabet; if unspecified, creates Hamming distance matrix that incorporates IUPAC equivalencies

  • mask_dist – Distance for all matches against an N character

  • gap_dist – Distance for all matches against a gap (-, .) character

Returns:

pandas.DataFrame of distances

Return type:

DataFrame

changeo.Distance.getNmers(sequences, n)

Breaks input sequences down into n-mers

Parameters:
  • sequences – List of sequences to be broken into n-mers

  • n – Length of n-mers to return

Returns:

Dictionary mapping sequence to a list of n-mers

Return type:

dict

changeo.Distance.zip_equal(*iterables)

Zips iterables and raises exception if different lengths

Parameters:

iterables – pointer to iterables to zip together

Returns:

A generator of tuples with combined elements from the iterables

Return type:

iter