kinoml.datasets.groups

Splitting strategies for datasets

Module Contents

class kinoml.datasets.groups.BaseGrouper

Base class to assign groups to measurements in a DatasetProvider

assign(dataset, overwrite=False, **kwargs)

Given a DatasetProvider, assign a key to the elements of each group, as provided by .indices()

Parameters
  • dataset (DatasetProvider) –

  • overwrite (bool, optional=False) – If a measurement has been assigned a group already, do not overwrite unless this option is set to True.

Returns

dataset – The same dataset passed in the input, with measurements modified in place.

Return type

DatasetProvider

abstract indices(dataset, **kwargs)

Given a dataset, create a dictionary that maps keys or labels to a set of numerical indices. The strategy to follow will depend on the subclass.

Parameters

dataset (DatasetProvider) –

Returns

Maps int` or ``str to a list of int

Return type

dict

class kinoml.datasets.groups.RandomGrouper(ratios)

Bases: BaseGrouper

Randomized groups following a split proportional to the provided ratios

Parameters

ratios (tuple or dict) – 1-based ratios for the different groups. They must sum 1.0. If a dict is provided, the keys are used to label the resulting groups. Otherwise, the groups are 0-enumerated.

indices(dataset, **kwargs)

Given a dataset, create a dictionary that maps keys or labels to a set of numerical indices. The strategy to follow will depend on the subclass.

Parameters

dataset (DatasetProvider) –

Returns

Maps int` or ``str to a list of int

Return type

dict

class kinoml.datasets.groups.CallableGrouper(function)

Bases: BaseGrouper

A grouper that applies a user-provided function to each Measurement in the Dataset. Returned value should be the name of the group.

Parameters

function (callable) – This function must be able to take a Measurement object and return a str or int.

indices(dataset, progress=True)

Given a dataset, create a dictionary that maps keys or labels to a set of numerical indices. The strategy to follow will depend on the subclass.

Parameters

dataset (DatasetProvider) –

Returns

Maps int` or ``str to a list of int

Return type

dict

class kinoml.datasets.groups.BaseFilter

Bases: BaseGrouper

Base class to assign groups to measurements in a DatasetProvider