core

BaseDatasetProvider

Base object for all DatasetProvider classes.

Parameters

Name Type Description Default
systems Iterable[kinoml.core.systems.System] A DatasetProvider holds a list of kinoml.core.systems.System objects (or any of its subclasses). A System is a collection of MolecularComponent objects (e.g. protein or ligand-like entities), plus an optional Measurement. required

featurize(self, *featurizers)

Show source code in datasets/core.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
    def featurize(self, *featurizers: Iterable[BaseFeaturizer]) -> System:
        """
        Given a collection of `kinoml.features.core.BaseFeaturizers`, apply them
        to the present systems.

        Parameters:
            featurizers: Featurization schemes that will be applied to the system,
                in a stacked way.

        !!! todo
            * Do we want to support parallel featurizing too or only stacked featurization?
            * Shall we modify the system in place (default now), return the modified copy or store it?
        """
        # Do we assume the dataset is homogeneous (single type of system and associated measurement)?
        # That would allow to only check once (e.g. test support for first system)
        for system in tqdm(self.systems, desc="Featurizing systems..."):
            for featurizer in featurizers:
                featurizer.supports(system, raise_errors=True)
                # .supports() will test for system type, type of components, type of measurement, etc
                system.featurizations[featurizer.name] = featurizer.featurize(system, inplace=True)
                system.featurizations["last"] = system.featurizations[featurizer.name]

Given a collection of kinoml.features.core.BaseFeaturizers, apply them to the present systems.

Parameters

Name Type Description Default
*featurizers Iterable[kinoml.features.core.BaseFeaturizer] Featurization schemes that will be applied to the system, in a stacked way. ()

Todo

  • Do we want to support parallel featurizing too or only stacked featurization?
  • Shall we modify the system in place (default now), return the modified copy or store it?

from_source(filename=None, **kwargs) (classmethod)

Show source code in datasets/core.py
32
33
34
35
36
37
38
39
    @classmethod
    def from_source(cls, filename=None, **kwargs):
        """
        Parse CSV/raw file to object model. This method is responsible of generating
        the objects for `self.data` and `self.measurements`, if relevant.
        Additional kwargs will be passed to `__init__`
        """
        raise NotImplementedError

Parse CSV/raw file to object model. This method is responsible of generating the objects for self.data and self.measurements, if relevant. Additional kwargs will be passed to __init__

to_dataframe(self, *args, **kwargs)

Show source code in datasets/core.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
    def to_dataframe(self, *args, **kwargs):
        """
        Generates a `pandas.DataFrame` containing information on the systems
        and their measurements

        Returns:
            pandas.DataFrame
        """
        if not self.systems:
            return pd.DataFrame()
        s = self.systems[0]
        records = [
            [s.__class__.__name__, "n_components", f"Avg {s.measurement.__class__.__name__}",]
        ]
        for system in self.systems:
            records.append([system.name, len(system.components), system.measurement.values.mean()])
        return pd.DataFrame.from_records(records[1:], columns=records[0])

Generates a pandas.DataFrame containing information on the systems and their measurements

Returns

Type Description
_empty pandas.DataFrame

Last update: April 24, 2020