kinoml.datasets.chembl

Creates DatasetProvider objects from ChEMBL activity data

Module Contents

kinoml.datasets.chembl.logger
class kinoml.datasets.chembl.ChEMBLDatasetProvider(measurements: Iterable[kinoml.core.measurements.BaseMeasurement], metadata: dict = None)

Bases: kinoml.datasets.core.MultiDatasetProvider

This provider relies heavily on openkinome/kinodata data ingestion pipelines. It will load ChEMBL activities from its releases page.

classmethod from_source(path_or_url='https://github.com/openkinome/datascripts/releases/download/v0.2/activities-chembl28_v0.2.zip', measurement_types=('pIC50', 'pKi', 'pKd'), uniprot_ids=None, sample=None, protein_type: str = 'KLIFSKinase', toolkit: str = 'OpenEye')

Create a MultiDatasetProvider out of the raw data contained in the zip file.

Parameters
  • path_or_url (str, optional) – path or URL to a (zipped) CSV file containing activities from ChEMBL, using schema detailed below.

  • measurement_types (tuple of str, optional) – Which measurement types must be imported from the CSV. By default, all three (pIC50, pKi, pKd) will be loaded, but you can choose a subset ( e.g. ("pIC50",)).

  • uniprot_ids (None or list of str, default=None) – Restrict measurements to the given UniProt IDs.

  • sample (int, optional=None) – If set to larger than zero, load only N data points from the dataset.

  • protein_type (str, default=KLIFSKinase) – The protein object type to use (‘Protein’ or ‘KLIFSKinase’).

  • toolkit (str, default=OpenEye) – The toolkit to use for creating protein objects (e.g. ‘OpenEye’, ‘MDAnalysis’), allowed values depend on the specified protein_type.

Raises

ValueError – Given protein_type {protein_type} is not valid, only {allowed_protein_types} are allowed.

Note

ChEMBL aggregates data from lots of sources, so conditions are guaranteed to be different across experiments.