kinoml.features.protein
¶
Featurizers that mostly concern protein-based models
Module Contents¶
- kinoml.features.protein.logger¶
- class kinoml.features.protein.SingleProteinFeaturizer(**kwargs)¶
Bases:
kinoml.features.core.ParallelBaseFeaturizer
Provides a minimally useful
._supports()
method for all Protein-like featurizers.- _COMPATIBLE_PROTEIN_TYPES = ()¶
- _supports(system: Union[kinoml.core.systems.ProteinSystem, kinoml.core.systems.ProteinLigandComplex]) bool ¶
Check that exactly one protein is present in the System
- class kinoml.features.protein.AminoAcidCompositionFeaturizer(**kwargs)¶
Bases:
SingleProteinFeaturizer
Featurizes the protein using the composition of the residues in the binding site.
- _counter¶
- _featurize_one(system: Union[kinoml.core.systems.ProteinSystem, kinoml.core.systems.ProteinLigandComplex]) Union[numpy.array, None] ¶
Featurizes a protein using the residue count in the sequence.
- Parameters
system (ProteinSystem or ProteinLigandComplex) – The System to be featurized.
- Returns
The count of amino acids in the binding site.
- Return type
np.array or None
- class kinoml.features.protein.OneHotEncodedSequenceFeaturizer(sequence_type: str = 'full', **kwargs)¶
Bases:
kinoml.features.core.BaseOneHotEncodingFeaturizer
,SingleProteinFeaturizer
Featurizes the sequence of the protein to a one hot encoding.
- ALPHABET¶
- _retrieve_sequence(system: Union[kinoml.core.systems.ProteinSystem, kinoml.core.systems.ProteinLigandComplex]) str ¶
Implement in your component-specific subclass!
- class kinoml.features.protein.OEProteinStructureFeaturizer(**kwargs)¶
Bases:
kinoml.features.core.OEBaseModelingFeaturizer
,SingleProteinFeaturizer
Given systems with exactly one protein, prepare the protein structure by:
modeling missing loops with OESpruce according to the PDB header unless a custom sequence is specified via the uniprot_id or sequence attribute in the protein component (see below), missing sequences at N- and C-termini are not modeled
building missing side chains
substitutions, deletions and insertions, if a uniprot_id or sequence attribute is provided for the protein component alteration will be modeled with OESpruce, if an alteration could not be modeled, the corresponding mismatch in the structure will be deleted
removing everything but protein and water
protonation at pH 7.4
The protein component of each system must be a core.proteins.Protein or a subclass thereof, must be initialized with toolkit=’OpenEye’ and give access to a molecular structure, e.g. via a pdb_id. Additionally, the protein component can have the following optional attributes to customize the protein modeling:
- name: A string specifying the name of the protein, will be used for
generating the output file name.
chain_id: A string specifying which chain should be used.
- alternate_location: A string specifying which alternate location
should be used.
expo_id: A string specifying a ligand bound to the protein of interest. This is especially useful if multiple proteins are found in one PDB structure.
uniprot_id: A string specifying the UniProt ID that will be used to fetch the amino acid sequence from UniProt, which will be used for modeling the protein. This will supersede the sequence information given in the PDB header.
sequence: A string specifying the amino acid sequence in one-letter-codes that should be used during modeling the protein. This will supersede a given uniprot_id and the sequence information given in the PDB header.
- Parameters
loop_db (str) – The path to the loop database used by OESpruce to model missing loops.
cache_dir (str, Path or None, default=None) – Path to directory used for saving intermediate files. If None, default location provided by appdirs.user_cache_dir() will be used.
output_dir (str, Path or None, default=None) – Path to directory used for saving output files. If None, output structures will not be saved.
use_multiprocessing (bool, default=True) – If multiprocessing to use.
n_processes (int or None, default=None) – How many processes to use in case of multiprocessing. Defaults to number of available CPUs.
- _featurize_one(system: kinoml.core.systems.ProteinSystem) Union[Universe, None] ¶
Prepare a protein structure.
- Parameters
system (ProteinSystem) – A system object holding a protein component.
- Returns
An MDAnalysis universe of the featurized system. None if no design unit was found.
- Return type
Universe or None