pyrelational.strategies¶
Abstract strategy¶
This module defines the interface for an abstract active learning strategy which is composed of defining a __call__ function which suggests observations to be labelled. In the default case the __call__ is the composition of a informativeness function which assigns a measure of informativeness to unlabelled observations and a selection algorithm which chooses what observations to present to the oracle
- class Strategy(*args: Any, **kwargs: Any)[source]¶
Bases:
ABC
This module defines an abstract active learning strategy.
Any strategy should be a subclass of this class and override the __call__ method to suggest observations to be labeled. In the general case __call__ would be the composition of an informativeness function, which assigns a measure of informativeness to unlabelled observations, and a selection algorithm which chooses what observations to present to the oracle.
The user defined __call__ method must have a “num_annotate” argument
- suggest(num_annotate: int, **kwargs: Any) List[int] [source]¶
Filter kwargs and feed arguments to the __call__ method to return unlabelled observations to be labelled as a list of dataset indices.
- Parameters:
num_annotate – number of samples to annotate
kwargs – any kwargs (filtered to match internal suggest inputs)
- Returns:
list of indices of samples to query from oracle
- static train_and_infer(data_manager: DataManager, model_manager: ModelManager[Any, Any]) Any [source]¶
Train the model on the currently labelled subset of the data and produces an output that can be used in model uncertainty based strategies.
- Parameters:
data_manager – reference to data_manager which will supply data to train model and the unlabelled observations
model_manager – Model with generic model interface that will be trained and used to produce output of this method
- Returns:
output of the model
Strategies for regression tasks¶
Abstract regression strategy¶
- class RegressionStrategy[source]¶
Bases:
Strategy
,ABC
A base active learning strategy class for regression in which the top n indices, according to user-specified scoring function, are queried at each iteration
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int] [source]¶
Call function which identifies samples which need to be labelled based on user defined scoring function.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
Expected improvement¶
- class ExpectedImprovementStrategy[source]¶
Bases:
Strategy
Implement Expected Improvement Strategy whereby each unlabelled sample is scored based on the expected improvement scoring function. The top samples according to this score are selected at each step
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int] [source]¶
Call function which identifies samples which need to be labelled
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
Mean Prediction¶
- class MeanPredictionStrategy[source]¶
Bases:
RegressionStrategy
Implements Mean Prediction Strategy whereby unlabelled samples are queried based on their predicted mean value by the model. ie samples with the highest predicted mean values are queried.
Least confidence¶
- class LeastConfidenceStrategy[source]¶
Bases:
RegressionStrategy
Implements Least Confidence Strategy whereby unlabelled samples are queried based on their predicted variance by the model
Thompson sampling¶
- class ThompsonSamplingStrategy[source]¶
Bases:
RegressionStrategy
Implements Thompson Sampling Strategy whereby unlabelled samples are scored and queried based on the thompson sampling scorer
Upper confidence bound¶
- class UpperConfidenceBoundStrategy(kappa: float = 1.0)[source]¶
Bases:
Strategy
Implements Upper Confidence Bound Strategy whereby unlabelled samples are scored and queried based on the UCB scorer
- Parameters:
kappa – trade-off parameter between exploitation and exploration
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int] [source]¶
Call function which identifies samples which need to be labelled
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
Strategies for classification tasks¶
Abstract Classification strategy¶
- class ClassificationStrategy[source]¶
Bases:
Strategy
,ABC
A base active learning strategy class for classification in which the top n indices, according to user-specified scoring function, are queried at each iteration.
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int] [source]¶
Call function which identifies samples which need to be labelled based on user defined scoring function.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
Entropy¶
Active learning using entropy based confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle
- class EntropyClassificationStrategy[source]¶
Bases:
ClassificationStrategy
Implements Entropy Classification Strategy whereby unlabelled samples are scored and queried based on entropy
Least confidence¶
Active learning using least confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle
- class LeastConfidenceStrategy[source]¶
Bases:
ClassificationStrategy
Implements Least Confidence Strategy whereby unlabelled samples are scored and queried based on the least confidence for classification scorer
Marginal confidence¶
Active learning using marginal confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle
- class MarginalConfidenceStrategy[source]¶
Bases:
ClassificationStrategy
Implements Marginal Confidence Strategy whereby unlabelled samples are scored and queried based on the marginal confidence for classification scorer
Confidence ratio¶
Active learning using ratio based confidence uncertainty measure between classes in the posterior predictive distribution to choose which observations to propose to the oracle
- class RatioConfidenceStrategy[source]¶
Bases:
ClassificationStrategy
Implements Ratio Confidence Strategy whereby unlabelled samples are scored and queried based on the ratio confidence for classification scorer
Task-agnostic strategies¶
Random acquisition¶
Defines and implements a random acquisition active learning strategy.
- class RandomAcquisitionStrategy[source]¶
Bases:
Strategy
Implements RandomAcquisition whereby random samples from unlabelled set are chosen at each step
- __call__(num_annotate: int, data_manager: DataManager) List[int] [source]¶
Call function which identifies samples which need to be labelled
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
- Returns:
list of indices to annotate
Relative distance¶
- class RelativeDistanceStrategy(metric: str = 'euclidean')[source]¶
Bases:
Strategy
Diversity sampling based active learning strategy.
- Parameters:
metric – Name of distance metric to use. This should be supported by scikit-learn pairwise_distances function.
- __call__(num_annotate: int, data_manager: DataManager) List[int] [source]¶
Call function which identifies samples which need to be labelled
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
- Returns:
list of indices to annotate
Representative sampling¶
Representative sampling based active learning strategy
- class RepresentativeSamplingStrategy(clustering_method: str | ClusterMixin = 'KMeans', **clustering_kwargs: Any)[source]¶
Bases:
Strategy
Representative sampling based active learning strategy
- Parameters:
clustering_method – name, or instantiated class, of the clustering method to use
clustering_kwargs – arguments to be passed to instantiate clustering class if a string is passed to clustering_method
- __call__(data_manager: DataManager, num_annotate: int) List[int] [source]¶
Call function which identifies samples which need to be labelled
- Parameters:
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
num_annotate – number of samples to annotate
- Returns:
list of indices to annotate