kinoml.features.ligand

Featurizers that mostly concern ligand-based models

Module Contents

class kinoml.features.ligand.SingleLigandFeaturizer(**kwargs)

Bases: kinoml.features.core.ParallelBaseFeaturizer

Provides a minimally useful ._supports() method for all Ligand-like featurizers.

_COMPATIBLE_LIGAND_TYPES = ()
_supports(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) bool

Check that exactly one ligand is present in the System

class kinoml.features.ligand.MorganFingerprintFeaturizer(radius: int = 2, nbits: int = 512, **kwargs)

Bases: SingleLigandFeaturizer

Given a System containing one Ligand component, convert it to an RDKit molecule and generate the Morgan fingerprints bitvectors.

Parameters
  • radius (int, optional=2) – Morgan fingerprint neighborhood radius

  • nbits (int, optional=512) – Length of the resulting bit vector

_featurize_one(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) Union[numpy.ndarray, None]

Return the Morgan fingerprint for the given system.

Parameters

system (LigandSystem or ProteinLigandComplex) – The System to be featurized.

Return type

np.array or None

class kinoml.features.ligand.OneHotSMILESFeaturizer(smiles_type: str = 'canonical', **kwargs)

Bases: kinoml.features.core.BaseOneHotEncodingFeaturizer, SingleLigandFeaturizer

One-hot encodes a Ligand from a SMILES representation.

ALPHABET

Defines the character-integer mapping (as a sequence) of the one-hot encoding.

Type

str

ALPHABET = 'BCFHIKNOPSUVWYacegilnosru-=#1234567890.*()/+@:[]%\\LR$'
_retrieve_sequence(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) str

Get SMILES string from a Ligand-like component and postprocesses it.

Double element symbols (such as Cl, Br for atoms and @@ for chirality) are replaced with single element symbols (L, R and $ respectively).

class kinoml.features.ligand.GraphLigandFeaturizer(max_in_ring_size: int = 10, **kwargs)

Bases: SingleLigandFeaturizer

Creates a graph representation of a Ligand-like component. Each node (atom) is decorated with several RDKit descriptors Check `self._per_atom_features` for details.

Parameters

max_in_ring_size (int, optional=10) – Maximum ring size for testing whether an atom belongs to a ring or not. Currently unused

ALL_ATOMIC_SYMBOLS = ['C', 'N', 'O', 'S', 'F', 'Si', 'P', 'Cl', 'Br', 'Mg', 'Na', 'Ca', 'Fe', 'As', 'Al', 'I', 'B',...
_featurize_one(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) Union[tuple, None]

Featurizes ligands contained in a System as a labeled graph.

Parameters

system (LigandSystem or ProteinLigandComplex) – The System being featurized.

Returns

A two-tuple with: - Graph connectivity of the molecule with shape (2, n_edges) - Feature matrix with shape (n_atoms, n_features)

Return type

tuple of np.array or None

_per_atom_features(atom) numpy.ndarray

Computes desired features for each atom in the molecular graph.

Parameters

atom (rdkit.Chem.Atom) – Atom to extract features from

Returns

atomic_symbolarray

the one-hot encoded atomic symbol from ALL_ATOMIC_SYMBOLS.

formal_chargeint

the formal charge of atom.

hybridization_typearray

the one-hot encoded hybridization type from rdkit.Chem.rdchem.HybridizationType.

aromaticbool

if atom is aromatic.

degreearray

the one-hot encoded degree of the atom in the molecule.

total_hint

the total number of hydrogens on the atom (implicit and explicit).

implicit_hint

the number of implicit hydrogens on the atom.

radical_electronsint

the number of radical electrons.

Return type

tuple of atomic features.

Notes

The atomic features are the same as in PotentialNet 1.

1

https://doi.org/10.1021/acscentsci.8b00507

static _connectivity_COO_format(mol: rdkit.Chem.Mol) numpy.ndarray

Returns the connectivity of the molecular graph in COO format.

Parameters

mol (rdkit.Chem.Mol) – RDKit molecule to extract bonds from

Returns

graph connectivity in COO format with shape [2, num_edges]

Return type

np.ndarray