pyrelational.strategies¶

Abstract strategy¶

This module defines the interface for an abstract active learning strategy which is composed of defining a __call__ function which suggests observations to be labelled. In the default case the __call__ is the composition of a informativeness function which assigns a measure of informativeness to unlabelled observations and a selection algorithm which chooses what observations to present to the oracle

class Strategy(*args: Any, **kwargs: Any)[source]¶

Bases: ABC

This module defines an abstract active learning strategy.

Any strategy should be a subclass of this class and override the __call__ method to suggest observations to be labeled. In the general case __call__ would be the composition of an informativeness function, which assigns a measure of informativeness to unlabelled observations, and a selection algorithm which chooses what observations to present to the oracle.

The user defined __call__ method must have a “num_annotate” argument

suggest(num_annotate: int, **kwargs: Any) → List[int][source]¶

Filter kwargs and feed arguments to the __call__ method to return unlabelled observations to be labelled as a list of dataset indices.

Parameters:

num_annotate – number of samples to annotate
kwargs – any kwargs (filtered to match internal suggest inputs)

Returns:

list of indices of samples to query from oracle

static train_and_infer(data_manager: DataManager, model_manager: ModelManager[Any, Any]) → Any[source]¶

Train the model on the currently labelled subset of the data and produces an output that can be used in model uncertainty based strategies.

Parameters:

data_manager – reference to data_manager which will supply data to train model and the unlabelled observations
model_manager – Model with generic model interface that will be trained and used to produce output of this method

Returns:

output of the model

Strategies for regression tasks¶

Abstract regression strategy¶

class RegressionStrategy[source]¶

Bases: Strategy, ABC

A base active learning strategy class for regression in which the top n indices, according to user-specified scoring function, are queried at each iteration

__call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) → List[int][source]¶

Call function which identifies samples which need to be labelled based on user defined scoring function.

Parameters:

num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification

Returns:

list of indices to annotate

abstract scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Expected improvement¶

class ExpectedImprovementStrategy[source]¶

Bases: Strategy

Implement Expected Improvement Strategy whereby each unlabelled sample is scored based on the expected improvement scoring function. The top samples according to this score are selected at each step

__call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) → List[int][source]¶

Call function which identifies samples which need to be labelled

Parameters:

num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification

Returns:

list of indices to annotate

Mean Prediction¶

class MeanPredictionStrategy[source]¶

Bases: RegressionStrategy

Implements Mean Prediction Strategy whereby unlabelled samples are queried based on their predicted mean value by the model. ie samples with the highest predicted mean values are queried.

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Least confidence¶

class LeastConfidenceStrategy[source]¶

Bases: RegressionStrategy

Implements Least Confidence Strategy whereby unlabelled samples are queried based on their predicted variance by the model

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Thompson sampling¶

class ThompsonSamplingStrategy[source]¶

Bases: RegressionStrategy

Implements Thompson Sampling Strategy whereby unlabelled samples are scored and queried based on the thompson sampling scorer

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Upper confidence bound¶

class UpperConfidenceBoundStrategy(kappa: float = 1.0)[source]¶

Bases: Strategy

Implements Upper Confidence Bound Strategy whereby unlabelled samples are scored and queried based on the UCB scorer

Parameters:: kappa – trade-off parameter between exploitation and exploration

__call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) → List[int][source]¶

Call function which identifies samples which need to be labelled

Parameters:

num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification

Returns:

list of indices to annotate

Strategies for classification tasks¶

Abstract Classification strategy¶

class ClassificationStrategy[source]¶

Bases: Strategy, ABC

A base active learning strategy class for classification in which the top n indices, according to user-specified scoring function, are queried at each iteration.

__call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) → List[int][source]¶

Call function which identifies samples which need to be labelled based on user defined scoring function.

Parameters:

num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification

Returns:

list of indices to annotate

abstract scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Entropy¶

Active learning using entropy based confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle

class EntropyClassificationStrategy[source]¶

Bases: ClassificationStrategy

Implements Entropy Classification Strategy whereby unlabelled samples are scored and queried based on entropy

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Least confidence¶

Active learning using least confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle

class LeastConfidenceStrategy[source]¶

Bases: ClassificationStrategy

Implements Least Confidence Strategy whereby unlabelled samples are scored and queried based on the least confidence for classification scorer

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Marginal confidence¶

Active learning using marginal confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle

class MarginalConfidenceStrategy[source]¶

Bases: ClassificationStrategy

Implements Marginal Confidence Strategy whereby unlabelled samples are scored and queried based on the marginal confidence for classification scorer

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Confidence ratio¶

Active learning using ratio based confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle

class RatioConfidenceStrategy[source]¶

Bases: ClassificationStrategy

Implements Ratio Confidence Strategy whereby unlabelled samples are scored and queried based on the ratio confidence for classification scorer

scoring_function(predictions: Tensor) → Tensor[source]¶

Compute score of each sample.

Parameters:: predictions – model predictions for each sample
Returns:: scores for each sample

Task-agnostic strategies¶

Random acquisition¶

Defines and implements a random acquisition active learning strategy.

class RandomAcquisitionStrategy[source]¶

Bases: Strategy

Implements RandomAcquisition whereby random samples from unlabelled set are chosen at each step

__call__(num_annotate: int, data_manager: DataManager) → List[int][source]¶

Call function which identifies samples which need to be labelled

Parameters:

num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning

Returns:

list of indices to annotate

Relative distance¶

class RelativeDistanceStrategy(metric: str = 'euclidean')[source]¶

Bases: Strategy

Diversity sampling based active learning strategy.

Parameters:: metric – Name of distance metric to use. This should be supported by scikit-learn pairwise_distances function.

__call__(num_annotate: int, data_manager: DataManager) → List[int][source]¶

Call function which identifies samples which need to be labelled

Parameters:

num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning

Returns:

list of indices to annotate

Representative sampling¶

Representative sampling based active learning strategy

class RepresentativeSamplingStrategy(clustering_method: str | ClusterMixin = 'KMeans', **clustering_kwargs: Any)[source]¶

Bases: Strategy

Representative sampling based active learning strategy

Parameters:

clustering_method – name, or instantiated class, of the clustering method to use
clustering_kwargs – arguments to be passed to instantiate clustering class if a string is passed to clustering_method

__call__(data_manager: DataManager, num_annotate: int) → List[int][source]¶

Call function which identifies samples which need to be labelled

Parameters:

data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
num_annotate – number of samples to annotate

Returns:

list of indices to annotate