kinoml.modeling.OEModeling

Module Contents

kinoml.modeling.OEModeling.logger
kinoml.modeling.OEModeling.read_smiles(smiles: str, add_hydrogens: bool = True) openeye.oechem.OEGraphMol

Read molecule from a smiles string. Explicit hydrogens will be added by default.

Parameters:
  • smiles (str) – Smiles string.

  • add_hydrogens (bool) – If explicit hydrogens should be added.

Returns:

molecule – A molecule as OpenEye molecules.

Return type:

oechem.OEGraphMol

Raises:

ValueError – Could not interpret input SMILES.

kinoml.modeling.OEModeling.read_molecules(path: str | pathlib.Path, add_hydrogens: bool = False) List[openeye.oechem.OEGraphMol]

Read molecules from a file. Explicit hydrogens will not be added by default.

Parameters:
  • path (str, pathlib.Path) – Path to molecule file.

  • add_hydrogens (bool) – If explicit hydrogens should be added.

Returns:

molecules – A List of molecules as OpenEye molecules.

Return type:

list of oechem.OEGraphMol

Raises:

ValueError – Given file does not contain valid molecules.

kinoml.modeling.OEModeling.read_electron_density(path: str | pathlib.Path) openeye.oegrid.OESkewGrid

Read electron density from a file.

Parameters:

path (str, pathlib.Path) – Path to electron density file.

Returns:

electron_density – A List of molecules as OpenEye molecules.

Return type:

oegrid.OESkewGrid or None

Raises:

ValueError – Not a valid electron density file or wrong format. Only MTZ is currently supported.

kinoml.modeling.OEModeling.write_molecules(molecules: List[openeye.oechem.OEMolBase], path: str | pathlib.Path)

Save molecules to file.

Parameters:
  • molecules (list of oechem.OEMolBase) – A list of OpenEye molecules for writing.

  • path (str, pathlib.Path) – File path for saving molecules.

kinoml.modeling.OEModeling.select_chain(molecule: openeye.oechem.OEMolBase, chain_id: str) openeye.oechem.OEMolBase

Select a chain from an OpenEye molecule.

Parameters:
  • molecule (oechem.OEMolBase) – An OpenEye molecule holding a molecular structure.

  • chain_id (str) – Chain identifier.

Returns:

selection – An OpenEye molecule holding the selected chain.

Return type:

oechem.OEMolBase

Raises:

ValueError – No atoms were found with given chain id.

kinoml.modeling.OEModeling.select_altloc(molecule: openeye.oechem.OEMolBase, altloc_id: str, altloc_fallback: bool = True) openeye.oechem.OEMolBase

Select an alternate location from an OpenEye molecule.

Parameters:
  • molecule (oechem.OEMolBase) – An OpenEye molecule holding a molecular structure.

  • altloc_id (str) – Alternate location identifier.

  • altloc_fallback (bool) – If the alternate location with the highest occupancy should be used for residues that do not contain the given alternate location identifier.

Returns:

selection – An OpenEye molecule holding the selected alternate location.

Return type:

oechem.OEMolBase

Raises:

ValueError – No atoms were found with given altloc id.

kinoml.modeling.OEModeling.remove_non_protein(molecule: openeye.oechem.OEMolBase, exceptions: None | List[str] = None, remove_water: bool = False) openeye.oechem.OEMolBase

Remove non-protein atoms from an OpenEye molecule. Water will be kept by default.

Parameters:
  • molecule (oechem.OEMolBase) – An OpenEye molecule holding a molecular structure.

  • exceptions (None or list of str) – Exceptions that should not be removed.

  • remove_water (bool) – If water should be removed.

Returns:

selection – An OpenEye molecule holding the filtered structure.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.delete_residue(structure: openeye.oechem.OEMolBase, chain_id: str, residue_name: str, residue_id: int) openeye.oechem.OEGraphMol

Delete a residue from an OpenEye molecule.

Parameters:
  • structure (oechem.OEMolBase) – An OpenEye molecule with residue information.

  • chain_id (str) – The chain id of the residue

  • residue_name (str) – The residue name in three letter code.

  • residue_id (int) – The residue id.

Returns:

The OpenEye molecule without the residue.

Return type:

oechem.OEMolBase

Raises:

ValueError – Defined residue was not found in given structure.

kinoml.modeling.OEModeling.get_expression_tags(structure: openeye.oechem.OEMolBase, labels: Iterable[str] = ('EXPRESSION TAG', 'CLONING ARTIFACT')) List[Dict]

Get the chain id, residue name and residue id of residues in expression tags from a protein structure listed in the PDB header section “SEQADV”.

Parameters:
  • structure (oechem.OEMolBase) – An OpenEye molecule with associated PDB header section “SEQADV”.

  • labels (Iterable of str) – The ‘SEQADV’ labels defining expression tags. Default: (‘EXPRESSION TAG’, ‘CLONING ARTIFACT’).

Returns:

The chain id, residue name and residue id of residues in the expression tags.

Return type:

list of dict

kinoml.modeling.OEModeling.assign_caps(structure: openeye.oechem.OEMolBase, real_termini: Union[Iterable[int] or None] = None) openeye.oechem.OEMolBase

Cap N and C termini of the given input structure. Real termini can be protected from capping by providing the corresponding residue ids via the ‘real_termini’ argument.

Parameters:
  • structure (oechem.OEMolBase) – The OpenEye molecule holding the protein structure to cap.

  • real_termini (iterable of int or None) – The biologically relevant real termini that should be prevented from capping.

Returns:

structure – The OpenEye molecule holding the capped structure.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.prepare_structure(structure: openeye.oechem.OEMolBase, has_ligand: bool = False, electron_density: openeye.oegrid.OESkewGrid | None = None, loop_db: str | None = None, ligand_name: str | None = None, chain_id: str | None = None, alternate_location: str | None = None, cap_termini: bool = True, real_termini: List[int] | None = None) openeye.oechem.OEDesignUnit

Prepare an OpenEye molecule holding a protein ligand complex for docking.

Parameters:
  • structure (oechem.OEMolBase) – An OpenEye molecule holding a structure with protein and optionally a ligand.

  • has_ligand (bool) – If structure contains a ligand that should be used in design unit generation.

  • electron_density (oegrid.OESkewGrid) – An OpenEye grid holding the electron density.

  • loop_db (str or None) – Path to OpenEye Spruce loop database. You can request a copy at https://www.eyesopen.com/database-downloads. A testing subset (3TPP) is available at https://docs.eyesopen.com/toolkits/python/sprucetk/examples_make_design_units.html.

  • ligand_name (str or None) – The name of the ligand located in the binding pocket of interest.

  • chain_id (str or None) – The chain id of interest. If chain id is None, best chain will be selected according to OESpruce.

  • alternate_location (str or None) – The alternate location of interest. If alternate location is None, best alternate location will be selected according to OEChem.

  • cap_termini (bool) – If termini should be capped with ACE and NME.

  • real_termini (list of int or None) – Residue numbers of biologically real termini will not be capped with ACE and NME.

Returns:

design_unit – An OpenEye design unit holding the prepared structure with the highest quality among all identified design units.

Return type:

oechem.OEDesignUnit

Raises:

ValueError – No design unit found with given chain ID, ligand name and alternate location.

kinoml.modeling.OEModeling.prepare_complex(protein_ligand_complex: openeye.oechem.OEMolBase, electron_density: openeye.oegrid.OESkewGrid | None = None, loop_db: str | None = None, ligand_name: str | None = None, chain_id: str | None = None, alternate_location: str | None = None, cap_termini: bool = True, real_termini: List[int] | None = None) openeye.oechem.OEDesignUnit

Prepare an OpenEye molecule holding a protein ligand complex for docking.

Parameters:
  • protein_ligand_complex (oechem.OEMolBase) – An OpenEye molecule holding a structure with protein and ligand.

  • electron_density (oegrid.OESkewGrid) – An OpenEye grid holding the electron density.

  • loop_db (str or None) – Path to OpenEye Spruce loop database.

  • ligand_name (str or None) – The name of the ligand located in the binding pocket of interest.

  • chain_id (str or None) – The chain id of interest. If chain id is None, best chain will be selected according to OESpruce.

  • alternate_location (str or None) – The alternate location of interest. If alternate location is None, best alternate location will be selected according to OEChem.

  • cap_termini (bool) – If termini should be capped with ACE and NME.

  • real_termini (list of int or None) – Residue numbers of biologically real termini will not be capped with ACE and NME.

Returns:

design_unit – An OpenEye design unit holding the prepared structure with the highest quality among all identified design units.

Return type:

oechem.OEDesignUnit

Raises:

ValueError – No design unit found with given chain ID, ligand name and alternate location.

kinoml.modeling.OEModeling.prepare_protein(protein: openeye.oechem.OEMolBase, loop_db: str | None = None, chain_id: str | None = None, alternate_location: str | None = None, cap_termini: bool = True, real_termini: List[int] | None = None) openeye.oechem.OEDesignUnit

Prepare an OpenEye molecule holding a protein structure for docking.

Parameters:
  • protein (oechem.OEMolBase) – An OpenEye molecule holding a structure with protein.

  • loop_db (str) – Path to OpenEye Spruce loop database.

  • chain_id (str or None) – The chain id of interest. If chain id is None, best chain will be selected according to OESpruce.

  • alternate_location (str or None) – The alternate location of interest. If alternate location is None, best alternate location will be selected according to OEChem.

  • cap_termini (bool) – If termini should be capped with ACE and NME.

  • real_termini (list of int or None) – Residue numbers of biologically real termini will not be capped with ACE and NME.

Returns:

design_unit – An OpenEye design unit holding the prepared structure with the highest quality among all identified design units.

Return type:

oechem.OEDesignUnit or None

Raises:

ValueError – No design unit found with given chain ID, ligand name and alternate location.

kinoml.modeling.OEModeling.generate_tautomers(molecule: openeye.oechem.OEMolBase | openeye.oechem.OEMCMolBase, max_generate: int = 4096, max_return: int = 16, pKa_norm: bool = True) List[openeye.oechem.OEMolBase | openeye.oechem.OEMCMolBase]

Generate reasonable tautomers of a given molecule.

Parameters:
  • molecule (oechem.OEMolBase or oechem.OEMCMolBase) – An OpenEye molecule.

  • max_generate (int) – Maximal number of tautomers to generate.

  • max_return (int) – Maximal number of tautomers to return.

  • pKa_norm (bool) – Assign the predominant ionization state at pH ~7.4.

Returns:

tautomers – A list of OpenEye molecules holding the tautomers.

Return type:

list of oechem.OEMolBase or oechem.OEMCMolBase

kinoml.modeling.OEModeling.generate_enantiomers(molecule: openeye.oechem.OEMolBase, max_centers: int = 12, force_flip: bool = False, enumerate_nitrogens: bool = False) List[openeye.oechem.OEMolBase]

Generate enantiomers of a given molecule.

Parameters:
  • molecule (oechem.OEMolBase) – An OpenEye molecule.

  • max_centers (int) – The maximal number of stereo centers to enumerate.

  • force_flip (bool) – If specified stereo centers should be enumerated.

  • enumerate_nitrogens (bool) – If nitrogens with invertible pyramidal geometry should be enumerated.

Returns:

enantiomers – A list of OpenEye molecules holding the enantiomers.

Return type:

list of oechem.OEMolBase

kinoml.modeling.OEModeling.generate_conformations(molecule: openeye.oechem.OEMolBase, options: openeye.oeomega.OEOmegaOptions = oeomega.OEOmegaOptions(oeomega.OEOmegaSampling_Classic)) openeye.oechem.OEMCMolBase

Generate conformations of a given molecule.

Parameters:
  • molecule (oechem.OEMolBase) – An OpenEye molecule.

  • options (oeomega.OEOmegaOptions, default=oeomega.OEOmegaOptions(oeomega.OEOmegaSampling_Classic)) – Options for generating conformations. If the given molecule is a macrocycle only the maximal number of conformations will be changed from the defaults defined in oeomega.OEMacrocycleOmegaOptions().

Returns:

conformations – An OpenEye multi-conformer molecule holding the generated conformations.

Return type:

oechem.OEMCMolBase

kinoml.modeling.OEModeling.generate_reasonable_conformations(molecule: openeye.oechem.OEMolBase, options: openeye.oeomega.OEOmegaOptions = oeomega.OEOmegaOptions(oeomega.OEOmegaSampling_Classic), pKa_norm: bool = True) List[openeye.oechem.OEMCMolBase]

Generate conformations of reasonable enantiomers and tautomers of a given molecule.

Parameters:
  • molecule (oechem.OEMolBase) – An OpenEye molecule.

  • options (oeomega.OEOmegaOptions, default=oeomega.OEOmegaOptions(oeomega.OEOmegaSampling_Classic)) – Options for generating conformations. If the given molecule is a macrocycle only the maximal number of conformations will be changed from the defaults defined in oeomega.OEMacrocycleOmegaOptions().

  • pKa_norm (bool) – Assign the predominant ionization state at pH ~7.4.

Returns:

conformations_ensemble – A list of OpenEye multi-conformer molecules.

Return type:

list of oechem.OEMCMolBase

kinoml.modeling.OEModeling.overlay_molecules(reference_molecule: openeye.oechem.OEMolBase, fit_molecule: openeye.oechem.OEMCMolBase)

Overlay a multi-conformer molecule to a single-conformer molecule and calculate the TanimotoCombo score.

Parameters:
  • reference_molecule (oechem.OEMolBase) – An OpenEye molecule holding a single conformation of the reference molecule for overlay.

  • fit_molecule (oechem.OEMCMolBase) – An OpenEye multi-conformer molecule holding the conformations of a molecule to fit during overlay.

Returns:

The TanimotoCombo score and the OpenEye molecules of the best overlay

Return type:

float, list of oechem.OEGraphMol

kinoml.modeling.OEModeling.enumerate_isomeric_smiles(molecule: openeye.oechem.OEMolBase) Set[str]

Enumerate reasonable isomeric SMILES representations of a given OpenEye molecule.

Parameters:

molecule (oechem.OEMolBase) – An OpenEye molecule.

Returns:

smiles_set – A set of reasonable isomeric SMILES strings.

Return type:

set of str

kinoml.modeling.OEModeling.are_identical_molecules(molecule1: openeye.oechem.OEMolBase, molecule2: openeye.oechem.OEMolBase) bool

Check if two OpenEye molecules are identical.

Parameters:
  • molecule1 (oechem.OEMolBase) – The first OpenEye molecule.

  • molecule2 (oechem.OEMolBase) – The second OpenEye molecule.

Returns:

True if identical molecules, else False.

Return type:

bool

kinoml.modeling.OEModeling.get_sequence(structure: openeye.oechem.OEMolBase) str

Get the amino acid sequence with one letter characters of an OpenEye molecule. All residues not perceived as standard amino acid will receive the character ‘X’.

Parameters:

structure (oechem.OEMolBase) – An OpenEye molecule.

Returns:

sequence – The amino acid sequence with one letter characters.

Return type:

str

kinoml.modeling.OEModeling.get_structure_sequence_alignment(structure: openeye.oechem.OEMolBase, sequence: str) Tuple[str, str]

Generate an alignment between a protein structure and an amino acid sequence. The provided protein structure should only contain protein residues to prevent unexpected behavior. Also, this alignment was optimized for highly similar sequences, i.e. only few mutations, deletions and insertions. Non protein residues will be marked with “X”.

Parameters:
  • structure (oechem.OEMolBase) – An OpenEye molecule holding a protein structure.

  • sequence (str) – A one letter amino acid sequence.

Returns:

  • structure_sequence_aligned (str) – The aligned protein structure sequence with gaps denoted as “-“.

  • sequence_aligned (str) – The aligned amino acid sequence with gaps denoted as “-“.

kinoml.modeling.OEModeling.apply_deletions(target_structure: openeye.oechem.OEMolBase, template_sequence: str, delete_n_anchors: int = 2) openeye.oechem.OEMolBase

Apply deletions to a protein structure according to an amino acid sequence. The provided protein structure should only contain protein residues to prevent unexpected behavior.

Parameters:
  • target_structure (oechem.OEMolBase) – An OpenEye molecule holding a protein structure for which deletions should be applied.

  • template_sequence (str) – A template one letter amino acid sequence, which holds potential deletions when compared to the target structure sequence.

  • delete_n_anchors (int) – Specify how many anchoring residues should be deleted at each side of the deletion. Important if connecting anchoring residues after deletion is intended, e.g. via apply_insertion. Only affects deletions in the middle of a sequence, not at the end or the beginning.

Returns:

structure_with_deletions – An OpenEye molecule holding the protein structure with applied deletions.

Return type:

oechem.OEMolBase

Raises:

ValueError – Negative values are not allowed for ‘delete_n_anchors’.

kinoml.modeling.OEModeling.apply_insertions(target_structure: openeye.oechem.OEMolBase, template_sequence: str, loop_db: str | pathlib.Path, ligand: openeye.oechem.OEMolBase | None = None) openeye.oechem.OEMolBase

Apply insertions to a protein structure according to an amino acid sequence. The provided protein structure should only contain protein residues to prevent unexpected behavior.

Parameters:
  • target_structure (oechem.OEMolBase) – An OpenEye molecule holding a protein structure for which insertions should be applied.

  • template_sequence (str) – A template one letter amino acid sequence, which holds potential insertions when compared to the target structure sequence.

  • loop_db (str or Path) – The path to the loop database used by OESpruce to model missing loops.

  • ligand (oechem.OEMolBase or None, default=None) – An OpenEye molecule that should be checked for heavy atom clashes with built insertions.

Returns:

structure_with_insertions – An OpenEye molecule holding the protein structure with applied insertions.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.apply_mutations(target_structure: openeye.oechem.OEMolBase, template_sequence: str, fallback_delete: bool = True) openeye.oechem.OEMolBase

Mutate a protein structure according to an amino acid sequence. The provided protein structure should only contain protein residues to prevent unexpected behavior. Residues that could not be mutated will be deleted by default.

Parameters:
  • target_structure (oechem.OEMolBase) – An OpenEye molecule holding a protein structure to mutate.

  • template_sequence (str) – A template one letter amino acid sequence, which holds potential mutations when compared to the target structure sequence.

  • fallback_delete (bool) – If the residue should be deleted if it could not be mutated.

Returns:

An OpenEye molecule holding the mutated protein structure.

Return type:

oechem.OEMolBase

Raises:

ValueError – Mutation {oeresidue.GetName()}{oeresidue.GetResidueNumber()}{three_letter_code} failed! Only raised when fallback_delete is set False.

kinoml.modeling.OEModeling.delete_partial_residues(structure: openeye.oechem.OEMolBase) openeye.oechem.OEMolBase

Delete residues with missing sidechain or backbone atoms. The backbone is considered complete if atoms C, CA and N are present.

Parameters:

structure (oechem.OEMolBase) – An OpenEye molecule holding a protein structure.

Returns:

structure – An OpenEye molecule holding only residues with completely modeled side chains.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.delete_short_protein_segments(structure: openeye.oechem.OEMolBase) openeye.oechem.OEMolBase

Delete protein segments consisting of 3 or less residues.

Parameters:

structure (oechem.OEMolBase) – An OpenEye molecule holding a protein with possibly short segments.

Returns:

structure – An OpenEye molecule holding the protein without short segments.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.delete_clashing_sidechains(structure: openeye.oechem.OEMolBase, cutoff: float = 2.0) openeye.oechem.OEMolBase

Delete side chains that are clashing with other atoms of the given structure.

Note: Structures containing non-protein residues may lead to unexpected behavior, since also those residues will be deleted if clashing with other residues of the system. However, this behavior is important to be able to also check PTMs for clashes.

Parameters:
  • structure (oechem.OEMolBase) – An OpenEye molecule holding a protein structure.

  • cutoff (float) – The distance cutoff that is used for defining a heavy atom clash. Note: Going bigger than 2.3 A may lead to the deletion of residues involved in strong hydrogen bonds.

Returns:

processed_structure – An OpenEye molecule holding the protein structure without clashing sidechains.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.get_atom_coordinates(molecule: openeye.oechem.OEMolBase) List[Tuple[float, float, float]]

Retrieve the atom coordinates of an OpenEye molecule.

Parameters:

molecule (oechem.OEMolBase) – An OpenEye molecule for which the coordinates should be retrieved.

Returns:

coordinates – The coordinates of the given molecule atoms.

Return type:

list of tuple of float

kinoml.modeling.OEModeling.renumber_structure(target_structure: openeye.oechem.OEMolBase, residue_ids: Iterable[int]) openeye.oechem.OEGraphMol

Renumber the residues of a protein structure according to the given list of residue IDs.

Parameters:
  • target_structure (oechem.OEMolBase) – An OpenEye molecule holding the protein structure to renumber.

  • residue_ids (iterable of int) – An iterable of residue IDs matching the order of the target structure.

Returns:

renumbered_structure – An OpenEye molecule holding the cropped protein structure.

Return type:

oechem.OEMolBase

Raises:
  • ValueError – Number of given residue IDs does not match number of residues in the given structure.

  • ValueError – Given residue IDs contain wrong types, only int is allowed.

kinoml.modeling.OEModeling.superpose_proteins(reference_protein: openeye.oechem.OEMolBase, fit_protein: openeye.oechem.OEMolBase, residues: Iterable = tuple(), chain_id: str = ' ', insertion_code: str = ' ') openeye.oechem.OEMolBase

Superpose a protein structure onto a reference protein. The superposition can be customized to consider only the specified residues.

Parameters:
  • reference_protein (oechem.OEMolBase) – An OpenEye molecule holding a protein structure which will be used as reference during superposition.

  • fit_protein (oechem.OEMolBase) – An OpenEye molecule holding a protein structure which will be superposed onto the reference protein.

  • residues (Iterable of str) – Residues that should be used during superposition in format “GLY123”.

  • chain_id (str) – Chain identifier for residues that should be used during superposition.

  • insertion_code (str) – Insertion code for residues that should be used during superposition.

Returns:

superposed_protein – An OpenEye molecule holding the superposed protein structure.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.update_residue_identifiers(structure: openeye.oechem.OEMolBase, keep_protein_residue_ids: bool = True, keep_chain_ids: bool = False) openeye.oechem.OEMolBase

Update the atom, residue and chain IDs of the given molecular structure. All residues become part of chain A, unless ‘keep_chain_ids’ is set True. Atom IDs will start from 1. Residue IDs will start from 1, except ‘keep_protein_residue_ids’ is set True. This is especially useful, if molecules were merged, which can result in overlapping atom and residue IDs as well as separate chains.

Parameters:
  • structure (oechem.OEMolBase) – The OpenEye molecule structure for updating atom and residue ids.

  • keep_protein_residue_ids (bool) – If the protein residues should be kept.

  • keep_chain_ids (bool) – If the chain IDS should be kept.

Returns:

structure – The OpenEye molecule structure with updated atom and residue ids.

Return type:

oechem.OEMolBase

kinoml.modeling.OEModeling.split_molecule_components(molecule: openeye.oechem.OEMolBase) List[openeye.oechem.OEGraphMol]

Split an OpenEye molecule into its bonded components.

Parameters:

molecule (oechem.OEMolBase) – An OpenEye molecule holding multiple components.

Returns:

components – A list of OpenEye molecules holding the split components.

Return type:

list of oechem.OEGraphMol

kinoml.modeling.OEModeling.residue_ids_to_residue_names(structure: openeye.oechem.OEMolBase, residue_ids: List[int], chain_id: None | str = None) List[str]

Get the corresponding residue names for a list of residue IDs and a give OpenEye molecule with residue information.

Parameters:
  • structure (oechem.OEMolBase) – An OpenEye molecule with residue information.

  • residue_ids (list of int) – A list of residue IDs.

  • chain_id (None or str) – The chain ID to filter for.

Returns:

residue_names – The corresponding residue names as three letter codes.

Return type:

list of str

Raises:
  • ValueError – No residue found for residue ID {resid}.

  • ValueError – Found multiple residues for residue ID {resid}.