kinoml.datasets.chembl
¶
Creates DatasetProvider objects from ChEMBL activity data
Module Contents¶
- kinoml.datasets.chembl.logger¶
- class kinoml.datasets.chembl.ChEMBLDatasetProvider(measurements: Iterable[kinoml.core.measurements.BaseMeasurement], metadata: dict = None)¶
Bases:
kinoml.datasets.core.MultiDatasetProvider
This provider relies heavily on
openkinome/kinodata
data ingestion pipelines. It will load ChEMBL activities from its releases page.- classmethod from_source(path_or_url='https://github.com/openkinome/datascripts/releases/download/v0.2/activities-chembl28_v0.2.zip', measurement_types=('pIC50', 'pKi', 'pKd'), uniprot_ids=None, sample=None, protein_type: str = 'KLIFSKinase', toolkit: str = 'OpenEye')¶
Create a MultiDatasetProvider out of the raw data contained in the zip file.
- Parameters
path_or_url (str, optional) – path or URL to a (zipped) CSV file containing activities from ChEMBL, using schema detailed below.
measurement_types (tuple of str, optional) – Which measurement types must be imported from the CSV. By default, all three (pIC50, pKi, pKd) will be loaded, but you can choose a subset ( e.g.
("pIC50",)
).uniprot_ids (None or list of str, default=None) – Restrict measurements to the given UniProt IDs.
sample (int, optional=None) – If set to larger than zero, load only N data points from the dataset.
protein_type (str, default=KLIFSKinase) – The protein object type to use (‘Protein’ or ‘KLIFSKinase’).
toolkit (str, default=OpenEye) – The toolkit to use for creating protein objects (e.g. ‘OpenEye’, ‘MDAnalysis’), allowed values depend on the specified protein_type.
- Raises
ValueError – Given protein_type {protein_type} is not valid, only {allowed_protein_types} are allowed.
Note
ChEMBL aggregates data from lots of sources, so conditions are guaranteed to be different across experiments.