kinoml.core.measurements
¶
Measurements are the first-level citizens in a dataset.
A kinoml.datasets.DatasetProvider
can be essentially considered
a list of Measurement
objects. These objects contain:
One or more numeric values, stored as an array under the
.values
attribute.The set of
MolecularComponent
objects that were measured, collected under aSystem
in the.system
attribute.The conditions the measurement was taken under.
Read on the subclasses for more concrete information about observation models, loss functions, errors and other features.
Module Contents¶
- kinoml.core.measurements.LN10¶
- class kinoml.core.measurements.BaseMeasurement(values: Union[float, Iterable[float]], conditions: kinoml.core.conditions.AssayConditions, system: kinoml.core.systems.System, errors: Union[float, Iterable[float]] = np.nan, group: Union[int, str] = None, strict: bool = True, metadata: dict = None, **kwargs)¶
We will have several subclasses depending on the experiment. They will also provide observation models tailored to it.
Values of the measurement can have more than one replicate. In fact, single replicates are considered a specific case of a multi-replicate.
- Parameters
values (float or array-like of floats) – The numeric measurement(s). If float, it will be reshaped to a single-element array.
errors (float or array-like of floats, optional) – The associated errors to
values
. Must be same shape asvalues
. If float, it will be reshaped to a single-element array.conditions (AssayConditions) – Experimental conditions of this measurement
system (System) – Molecular entities measured, contained in a System object
group (int or str, optional) – A label that identifies this measurement as part of a group. Useful to split datasets according to shared properties, like research group, measured molecule(s), etc.
metadata (dict, optional) – Provenance data for this measurement
strict (bool, optional=True) – Whether to perform safe checks at initialization.
- RANGE¶
Acceptable range of measurement values, as stored in
values
- Type
tuple of float
Note
TODO: Investigate possible uses for
pint
- property values¶
- property errors¶
- RANGE = ()¶
- check()¶
Perform some checks for valid values
- __eq__(other)¶
Return self==value.
- __repr__() str ¶
Return repr(self).
- class kinoml.core.measurements.ObservationModelMeasurement(values: Union[float, Iterable[float]], conditions: kinoml.core.conditions.AssayConditions, system: kinoml.core.systems.System, errors: Union[float, Iterable[float]] = np.nan, group: Union[int, str] = None, strict: bool = True, metadata: dict = None, **kwargs)¶
Bases:
BaseMeasurement
A base class that implements the concept of observation models and loss function adapters.
Read on the
.observation_model
and.loss_adapter
class methods for more information.- classmethod observation_model(backend='pytorch')¶
The observation_model function must be defined per Measurement type, in the appropriate subclass. It dispatches to underlying static methods, suffixed by the backend (e.g. _observation_model_pytorch, _observation_model_tensorflow). These methods are _static_, so they do not have access to the class. This is done on purpose for composability of modular observation_model functions. The signature is, hence, undefined.
There are some standardized keyword arguments we use by convention, though:
values
errors
- classmethod _observation_model(backend='pytorch', type_=None)¶
- static _observation_model_null(dG_over_KT)¶
- abstract static _observation_model_pytorch(*args, **kwargs)¶
- abstract static _observation_model_xgboost(*args, **kwargs)¶
- classmethod loss_adapter(backend='pytorch')¶
Some frameworks require objective functions to include the observation model transformation in the same callable. This method provides a factory of such methods.
- static _loss_adapter_generic(predicted, observed, loss_func, loss_kwargs=None, pre_loss_func=None, pre_loss_kwargs=None, post_loss_func=None, post_loss_kwargs=None)¶
- classmethod _loss_adapter_pytorch(predicted, observed, loss_func, **kwargs)¶
- static _post_loss_adapter(loss, **kwargs)¶
- class kinoml.core.measurements.PercentageDisplacementMeasurement(values: Union[float, Iterable[float]], conditions: kinoml.core.conditions.AssayConditions, system: kinoml.core.systems.System, errors: Union[float, Iterable[float]] = np.nan, group: Union[int, str] = None, strict: bool = True, metadata: dict = None, **kwargs)¶
Bases:
ObservationModelMeasurement
Measurement where the value(s) must be percentage(s) of displacement.
For the percent displacement measurements available from KinomeScan, we have the following:
\[D([I]) = \frac{1}{1 + \frac{K_d}{[I]}}\]We therefore define the following function:
\[\mathbf{F}_{KinomeScan}(\Delta g, [I]) = 100 * \frac{1}{1 + \frac{exp[\Delta g] * C[M]}{[I]}}\]where \(C\) is the standard concentration of 1 [M].
Note
The acceptable range for this measurement is [0, 100], inclusive.
- RANGE = (0, 100)¶
- _observation_model_xgboost¶
- check()¶
Perform some checks for valid values
- static _observation_model_pytorch(dG_over_KT, inhibitor_conc=1, standard_conc=1, **kwargs)¶
- static _observation_model_numpy(dG_over_KT, inhibitor_conc=1, standard_conc=1, **kwargs)¶
Return the observation model.
\[F(\Delta g) = 100 * \frac{1}{1 + \frac{exp[\Delta g] * C[M]}{[I]}}\]
- static _post_loss_adapter(loss, **kwargs)¶
- static _loss_adapter_xgboost_mse(labels, dG_over_KT, inhibitor_conc=1, standard_conc=1, **kwargs)¶
Return the gradient and the hessian of the loss defined by
\[L(y, \hat y) = \frac{1}{2} * (y - F(\hat y)) ** 2\]See theory notes for more details.
- class kinoml.core.measurements.pIC50Measurement(values: Union[float, Iterable[float]], conditions: kinoml.core.conditions.AssayConditions, system: kinoml.core.systems.System, errors: Union[float, Iterable[float]] = np.nan, group: Union[int, str] = None, strict: bool = True, metadata: dict = None, **kwargs)¶
Bases:
ObservationModelMeasurement
Measurement where the value(s) come from pIC50 experiments
We use the Cheng Prusoff equation here.
The Cheng Prusoff equation states the following relationship:
\[K_i = \frac{IC50}{1+\frac{[S]}{K_m}}\]We make the following assumption (which will be relaxed in the future):
\[K_i \approx K_d\]Under this assumptions, the Cheng-Prusoff equation becomes:
\[IC50 \approx {1+\frac{[S]}{K_m}} * K_d\]We define the following function:
\[\mathbf{F}_{IC_{50}}(\Delta g) = \Big({1+\frac{[S]}{K_m}}\Big) * \mathbf{F}_{K_d}(\Delta g) = \Big({1+\frac{[S]}{K_m}}\Big) * exp[\Delta g] * C[M]\]Given IC50 values given in molar units, we obtain pIC50 values in molar units using the tranformation:
\[pIC50 [M] = -log_{10}(IC50[M])\]Finally the observation model for pIC50 values is:
\[\mathbf{F}_{pIC_{50}}(\Delta g) = - \frac{\Delta g + \ln\Big(\big(1+\frac{[S]}{K_m}\big)*C\Big)}{\ln(10)}\]Note
The acceptable range for this measurement is [0, 15], inclusive.
- RANGE = (0, 15)¶
- _observation_model_numpy¶
- _observation_model_xgboost¶
- static _observation_model_pytorch(dG_over_KT, substrate_conc=1e-06, michaelis_constant=1, standard_conc=1, **kwargs)¶
- static _loss_adapter_xgboost_mse(labels, dG_over_KT, substrate_conc=1e-06, michaelis_constant=1, standard_conc=1, **kwargs)¶
In XGBoost, observation models need to be applied within the loss function. In this specific case, MSE is applied and differentiated (twice) to provide the gradients and hessian matrices.
\[loss = 1/2 * (observation_pIC50(preds)-labels)^2\]- Parameters
dmatrix – xgboost.DMatrix Passed automatically by the xgboost loop
- static _post_loss_adapter(loss, **kwargs)¶
- check()¶
Perform some checks for valid values
- class kinoml.core.measurements.pKiMeasurement(values: Union[float, Iterable[float]], conditions: kinoml.core.conditions.AssayConditions, system: kinoml.core.systems.System, errors: Union[float, Iterable[float]] = np.nan, group: Union[int, str] = None, strict: bool = True, metadata: dict = None, **kwargs)¶
Bases:
ObservationModelMeasurement
Measurement where the value(s) come from \(K_i\) experiments
We make the assumption that \(K_i \approx K_d\) and therefore
\[\mathbf{F}_{pK_i} = \mathbf{F}_{pK_d}\]Note
The acceptable range for this measurement is [0, 100], inclusive.
- RANGE = (0, 15)¶
- _observation_model_numpy¶
- _observation_model_xgboost¶
- static _observation_model_pytorch(dG_over_KT, standard_conc=1, **kwargs)¶
- static _loss_adapter_xgboost_mse(labels, dG_over_KT, standard_conc=1, **kwargs)¶
Return the gradient and the hessian of the loss defined by
\[L(y, \hat y) = \frac{1}{2} * (y - F(\hat y)) ** 2\]
- static _post_loss_adapter(loss, **kwargs)¶
- check()¶
Perform some checks for valid values
- class kinoml.core.measurements.pKdMeasurement(values: Union[float, Iterable[float]], conditions: kinoml.core.conditions.AssayConditions, system: kinoml.core.systems.System, errors: Union[float, Iterable[float]] = np.nan, group: Union[int, str] = None, strict: bool = True, metadata: dict = None, **kwargs)¶
Bases:
ObservationModelMeasurement
Measurement where the value(s) come from Kd experiments
We define the following physics-based function
\[\mathbf{F}_{pK_d}(\Delta g) = - \frac{\Delta g + \ln(C)}{\ln(10)}\]where C given in molar [M] can be adapted if measurements were undertaken at different concentrations.
Note
The acceptable range for this measurement is [0, 15], inclusive.
- RANGE = (0, 15)¶
- _observation_model_numpy¶
- _observation_model_xgboost¶
- static _observation_model_pytorch(dG_over_KT, standard_conc=1, **kwargs)¶
- static _loss_adapter_xgboost_mse(labels, dG_over_KT, standard_conc=1, **kwargs)¶
Return the gradient and the hessian of the loss defined by
\[L(y, \hat y) = \frac{1}{2} * (y - F(\hat y)) ** 2\]
- static _post_loss_adapter(loss, **kwargs)¶
- check()¶
Perform some checks for valid values
- kinoml.core.measurements.null_observation_model(arg)¶
A callable that returns
arg
directly. It works as an identity function when observation models need to be disabled for a particular experiment.