pkis2

PKIS2DatasetProvider

Loads PKIS2 dataset as provided in Progress towards a public chemogenomic set for protein kinases and a call for contributions1.

It will build a dataframe where the SMILES-representation of ligands are the index and the columns are the kinase names. To map between KINOMEscan kinase names and actual sequences, helper object kinoml.datatasets.kinomescan.utils.KINOMEScanMapper is instantiated as a class attribute.

Examples

>>> from kinoml.datasets.kinomescan.pkis2 import PKIS2DatasetProvider
>>> provider = PKIS2DatasetProvider.from_source()
>>> system = provider.systems[0]
>>> print(f"% displacement for kinase={system.protein.name} and ligand={system.ligand.to_smiles()} is {system.measurement}"

from_source(filename=PosixPath('/home/runner/work/kinoml/kinoml/kinoml/data/kinomescan/journal.pone.0181585.s004.csv'), measurement_type=<class 'kinoml.core.measurements.PercentageDisplacementMeasurement'>, conditions=<AssayConditions pH=7.0>, **kwargs) (classmethod)

Show source code in kinomescan/pkis2.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
    @classmethod
    def from_source(  # pylint: disable=arguments-differ
        cls,
        filename: Union[AnyStr, Path] = datapath("kinomescan/journal.pone.0181585.s004.csv"),
        measurement_type: BaseMeasurement = PercentageDisplacementMeasurement,
        conditions: BaseConditions = AssayConditions(pH=7.0),
        **kwargs
    ):
        """
        Create a DatasetProvider out of the raw data in a file

        Parameters:
            filename: CSV file with the protein-ligand measurements
            measurement_type: which type of measurement was taken for each pair
            conditions: experimental conditions of the assay

        !!! todo
            - Investigate lazy access and object generation
            - Review accuracy of item access by indices (correlative order?)
        """
        df = cls._read_dataframe(filename)
        df = df[df.index.notna()]

        # Read in proteins
        mapper = KINOMEScanMapper()
        kinases = []
        for kin_name in df.columns:
            sequence = mapper.sequence_for_name(kin_name)
            accession = mapper.accession_for_name(kin_name)
            mutations = mapper.mutations_for_name(kin_name)
            if math.isnan(mutations):
                mutations = None
            start_stop = mapper.start_stop_for_name(kin_name)
            provenance = {"accession": accession, "mutations": mutations, "start_stop": start_stop}
            kinases.append(AminoAcidSequence(sequence, name=kin_name, _provenance=provenance))

        # Read in ligands
        ligands = []
        for smiles in df.index:
            ligand = Ligand.from_smiles(smiles, name=smiles, allow_undefined_stereo=True)
            ligands.append(ligand)

        lol = list(df.itertuples(index=False, name=None))  # FIXME: This might be dangerous
        # Build ProteinLigandComplex objects
        complexes = []
        for i, ligand in enumerate(ligands):
            for j, kinase in enumerate(kinases):
                measurement = measurement_type(
                    lol[i][j], conditions=conditions, components=[kinase, ligand]
                )
                comp = ProteinLigandComplex(components=[kinase, ligand], measurement=measurement)
                complexes.append(comp)

        return cls(systems=complexes, conditions=conditions, **kwargs)

Create a DatasetProvider out of the raw data in a file

Parameters

Name Type Description Default
filename Union[~AnyStr, pathlib.Path] CSV file with the protein-ligand measurements PosixPath('/home/runner/work/kinoml/kinoml/kinoml/data/kinomescan/journal.pone.0181585.s004.csv')
measurement_type BaseMeasurement which type of measurement was taken for each pair <class 'kinoml.core.measurements.PercentageDisplacementMeasurement'>
conditions BaseConditions experimental conditions of the assay <AssayConditions pH=7.0>

Todo

  • Investigate lazy access and object generation
  • Review accuracy of item access by indices (correlative order?)

  1. DOI: 10.1371/journal.pone.0181585 


Last update: April 24, 2020