kinoml.features.ligand
¶
Featurizers that mostly concern ligand-based models
Module Contents¶
- class kinoml.features.ligand.SingleLigandFeaturizer(**kwargs)¶
Bases:
kinoml.features.core.ParallelBaseFeaturizer
Provides a minimally useful
._supports()
method for all Ligand-like featurizers.- _COMPATIBLE_LIGAND_TYPES = ()¶
- _supports(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) bool ¶
Check that exactly one ligand is present in the System
- class kinoml.features.ligand.MorganFingerprintFeaturizer(radius: int = 2, nbits: int = 512, **kwargs)¶
Bases:
SingleLigandFeaturizer
Given a
System
containing oneLigand
component, convert it to an RDKit molecule and generate the Morgan fingerprints bitvectors.- Parameters
radius (int, optional=2) – Morgan fingerprint neighborhood radius
nbits (int, optional=512) – Length of the resulting bit vector
- _featurize_one(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) Union[numpy.ndarray, None] ¶
Return the Morgan fingerprint for the given system.
- Parameters
system (LigandSystem or ProteinLigandComplex) – The System to be featurized.
- Return type
np.array or None
- class kinoml.features.ligand.OneHotSMILESFeaturizer(smiles_type: str = 'canonical', **kwargs)¶
Bases:
kinoml.features.core.BaseOneHotEncodingFeaturizer
,SingleLigandFeaturizer
One-hot encodes a
Ligand
from a SMILES representation.- ALPHABET¶
Defines the character-integer mapping (as a sequence) of the one-hot encoding.
- Type
str
- ALPHABET = 'BCFHIKNOPSUVWYacegilnosru-=#1234567890.*()/+@:[]%\\LR$'¶
- _retrieve_sequence(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) str ¶
Get SMILES string from a Ligand-like component and postprocesses it.
Double element symbols (such as Cl,
Br
for atoms and@@
for chirality) are replaced with single element symbols (L,R
and$
respectively).
- class kinoml.features.ligand.GraphLigandFeaturizer(max_in_ring_size: int = 10, **kwargs)¶
Bases:
SingleLigandFeaturizer
Creates a graph representation of a Ligand-like component. Each node (atom) is decorated with several RDKit descriptors Check
`self._per_atom_features`
for details.- Parameters
max_in_ring_size (int, optional=10) – Maximum ring size for testing whether an atom belongs to a ring or not. Currently unused
- ALL_ATOMIC_SYMBOLS = ['C', 'N', 'O', 'S', 'F', 'Si', 'P', 'Cl', 'Br', 'Mg', 'Na', 'Ca', 'Fe', 'As', 'Al', 'I', 'B',...¶
- _featurize_one(system: Union[kinoml.core.systems.LigandSystem, kinoml.core.systems.ProteinLigandComplex]) Union[tuple, None] ¶
Featurizes ligands contained in a System as a labeled graph.
- Parameters
system (LigandSystem or ProteinLigandComplex) – The System being featurized.
- Returns
A two-tuple with: - Graph connectivity of the molecule with shape
(2, n_edges)
- Feature matrix with shape(n_atoms, n_features)
- Return type
tuple of np.array or None
- _per_atom_features(atom) numpy.ndarray ¶
Computes desired features for each atom in the molecular graph.
- Parameters
atom (rdkit.Chem.Atom) – Atom to extract features from
- Returns
- atomic_symbolarray
the one-hot encoded atomic symbol from ALL_ATOMIC_SYMBOLS.
- formal_chargeint
the formal charge of atom.
- hybridization_typearray
the one-hot encoded hybridization type from
rdkit.Chem.rdchem.HybridizationType
.- aromaticbool
if atom is aromatic.
- degreearray
the one-hot encoded degree of the atom in the molecule.
- total_hint
the total number of hydrogens on the atom (implicit and explicit).
- implicit_hint
the number of implicit hydrogens on the atom.
- radical_electronsint
the number of radical electrons.
- Return type
tuple of atomic features.
Notes
The atomic features are the same as in PotentialNet 1.
- static _connectivity_COO_format(mol: rdkit.Chem.Mol) numpy.ndarray ¶
Returns the connectivity of the molecular graph in COO format.
- Parameters
mol (rdkit.Chem.Mol) – RDKit molecule to extract bonds from
- Returns
graph connectivity in COO format with shape
[2, num_edges]
- Return type
np.ndarray