pyrelational.strategies¶
Abstract strategy¶
This module defines the interface for an abstract active learning strategy.
It is composed of defining a __call__ function which suggests observations to be labelled. In the default case the __call__ is the composition of a informativeness function which assigns a measure of informativeness to unlabelled observations and a selection algorithm which chooses what observations to present to the oracle.
- class Strategy(scorer: AbstractScorer | AbstractRegressionScorer | AbstractClassificationScorer, sampler: BatchModeSampler)[source]¶
Bases:
ABCThis module defines an abstract active learning strategy.
Any strategy should be a subclass of this class and override the __call__ method to suggest observations to be labeled. In the general case __call__ would be the composition of an informativeness function, which assigns a measure of informativeness to unlabelled observations, and a selection algorithm which chooses what observations to present to the oracle.
The user defined __call__ method must have a “num_annotate” argument
Initialize the strategy with a scorer and a sampler.
- Parameters:
scorer – instance of a scorer class
sampler – instance of a sampler class
- suggest(num_annotate: int, **kwargs: Any) List[int][source]¶
Filter kwargs and feed arguments to the __call__ method.
- Parameters:
num_annotate – number of samples to annotate
kwargs – any kwargs (filtered to match internal suggest inputs)
- Returns:
list of indices of samples to query from oracle
- static train_and_infer(data_manager: DataManager, model_manager: ModelManager[Any, Any]) Any[source]¶
Train the model on the currently labelled subset of the data.
Return an output that can be used in model uncertainty based strategies. :param data_manager: reference to data_manager which will supply data to train model
and the unlabelled observations
- Parameters:
model_manager – Model with generic model interface that will be trained and used to produce output of this method
- Returns:
output of the model
Strategies for regression tasks¶
Abstract regression strategy¶
Regression strategy class implementing __call__ logic.
- class RegressionStrategy(scorer: AbstractScorer | AbstractRegressionScorer | AbstractClassificationScorer, sampler: BatchModeSampler)[source]¶
Bases:
StrategyA base active learning strategy class for regression.
Initialize the strategy with a scorer and a sampler.
- Parameters:
scorer – instance of a scorer class
sampler – instance of a sampler class
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int][source]¶
Identify samples for labelling based on user defined scoring and sampling function.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
Expected improvement¶
Implement Expected Improvement Strategy for regression tasks.
- class ExpectedImprovementStrategy(xi: float = 0.01, axis: int = 0)[source]¶
Bases:
StrategyImplement Expected Improvement Strategy.
Unlabelled sample is scored based on the expected improvement scoring function.
Initialize the strategy with the expected improvement scorer and a deterministic sampler for regression.
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int][source]¶
Identify samples which need to be labelled.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
- scorer: ExpectedImprovement¶
Mean Prediction¶
Least confidence¶
Thompson sampling¶
Thomas Sampling Strategy for Regression.
- class ThompsonSamplingStrategy(axis: int = 0)[source]¶
Bases:
RegressionStrategyImplements Thompson Sampling Strategy.
Unlabelled samples are scored and queried based on the thompson sampling scorer.
Initialize the strategy with the thompson sampling scorer and a deterministic scorer for regression.
Upper confidence bound¶
Upper Confidence Bound Strategy.
- class UpperConfidenceBoundStrategy(kappa: float = 1.0, axis: int = 0)[source]¶
Bases:
RegressionStrategyImplements Upper Confidence Bound Strategy.
Unlabelled samples are scored and queried based on the UCB scorer.
Initialize the strategy with the UCB scorer and a deterministic scorer for regression.
- Parameters:
kappa – trade-off parameter between exploitation and exploration
Strategies for classification tasks¶
Abstract Classification strategy¶
ClassificationStrategy class for active learning in classification tasks.
- class ClassificationStrategy(scorer: AbstractScorer | AbstractRegressionScorer | AbstractClassificationScorer, sampler: BatchModeSampler)[source]¶
Bases:
StrategyA base active learning strategy class for classification.
Initialize the strategy with a scorer and a sampler.
- Parameters:
scorer – instance of a scorer class
sampler – instance of a sampler class
- __call__(num_annotate: int, data_manager: DataManager, model_manager: ModelManager[Any, Any]) List[int][source]¶
Identify samples for labelling based on user defined scoring and sampling function.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
model_manager – A pyrelational model manager which wraps a user defined ML model to handle instantiation, training, testing, as well as uncertainty quantification
- Returns:
list of indices to annotate
- softmax(scores: Tensor, base: float = 2.718281828459045, axis: int = -1) Tensor[source]¶
Return softmax array for array of scores.
Converts a set of raw scores from a model (logits) into a probability distribution via softmax.
The probability distribution will be a set of real numbers such that each is in the range 0-1.0 and the sum is 1.0.
Assumes input is a pytorch tensor: tensor([1.0, 4.0, 2.0, 3.0])
- Parameters:
scores – (pytorch tensor) a pytorch tensor of any positive/negative real numbers.
base – the base for the exponential (default e)
- Param:
axis to apply softmax on scores
- Returns:
tensor of softmaxed scores
Entropy¶
Active learning using entropy based confidence uncertainty measure.
The score is computed between classes in the posterior predictive distribution to choose which observations to propose to the oracle.
- class EntropyClassificationStrategy(axis: int = -1)[source]¶
Bases:
ClassificationStrategyImplements Entropy Classification Strategy.
Initialise the strategy with entropy scorer and deterministic sampler.
Least confidence¶
Active learning using least confidence uncertainty measure.
- class LeastConfidenceStrategy(axis: int = -1)[source]¶
Bases:
ClassificationStrategyImplements Least Confidence Strategy.
Unlabelled samples are scored and queried based on the least confidence for classification scorer.
Initialize the strategy with the least confidence scorer and a deterministic scorer for classification.
Marginal confidence¶
Active learning using marginal confidence uncertainty measure.
- class MarginalConfidenceStrategy(axis: int = -1)[source]¶
Bases:
ClassificationStrategyImplements Marginal Confidence Strategy.
Unlabelled samples are scored and queried based on the marginal confidence for classification scorer.
Initialize the strategy with the marginal confidence scorer and a deterministic scorer for classification.
Confidence ratio¶
Active learning using ratio based confidence uncertainty measure.
- class RatioConfidenceStrategy(axis: int = -1)[source]¶
Bases:
ClassificationStrategyImplements Ratio Confidence Strategy.
Unlabelled samples are scored and queried based on the ratio confidence for classification scorer.
Initialize the strategy with the ratio confidence scorer and a deterministic scorer for classification.
Task-agnostic strategies¶
Random acquisition¶
Defines and implements a random acquisition active learning strategy.
- class RandomAcquisitionStrategy[source]¶
Bases:
StrategyImplements RandomAcquisition whereby random samples from unlabelled set are chosen at each step.
Override init method to do nothing. This strategy does not require any initialization.
- __call__(num_annotate: int, data_manager: DataManager) List[int][source]¶
Identify samples for labelling based on random sampling.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
- Returns:
list of indices to annotate
Relative distance¶
Relative distance based active learning strategy.
- class RelativeDistanceStrategy(metric: str = 'euclidean')[source]¶
Bases:
StrategyDiversity sampling based active learning strategy.
Initialise the strategy with a distance metric.
- Parameters:
metric – Name of distance metric to use. This should be supported by scikit-learn pairwise_distances function.
- __call__(num_annotate: int, data_manager: DataManager) List[int][source]¶
Identify samples which need to be labelled.
- Parameters:
num_annotate – number of samples to annotate
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
- Returns:
list of indices to annotate
- scorer: RelativeDistanceScorer¶
Representative sampling¶
Representative sampling based active learning strategy.
- class RepresentativeSamplingStrategy(clustering_method: str | ClusterMixin = 'KMeans', **clustering_kwargs: Any)[source]¶
Bases:
StrategyRepresentative sampling based active learning strategy.
Initialise the strategy with a clustering method and its arguments.
- Parameters:
clustering_method – name, or instantiated class, of the clustering method to use
clustering_kwargs – arguments to be passed to instantiate clustering class if a string is passed to clustering_method
- __call__(data_manager: DataManager, num_annotate: int) List[int][source]¶
Identify samples for labelling based on representative sampling informativeness measure.
- Parameters:
data_manager – A pyrelational data manager which keeps track of what has been labelled and creates data loaders for active learning
num_annotate – number of samples to annotate
- Returns:
list of indices to annotate
- representative_sampling(query_set: Tensor | ndarray[Any, dtype[float64]] | List[float] | DataLoader[Any], num_annotate: int, clustering_method: str | ClusterMixin = 'KMeans', **clustering_kwargs: Any) List[int][source]¶
Select representative samples from the query set using clustering algorithms from scikit-learn.
- Parameters:
query_set – The query set, either as an array-like object or a PyTorch DataLoader. If a DataLoader, the first element of each batch should be the features of the samples.
num_annotate – Number of representative samples to select.
clustering_method – The clustering method to use, either as a string (name of the clustering algorithm) or as an instantiated clustering class.
clustering_kwargs – Additional arguments for the clustering method, used if clustering_method is a string.
- Returns:
A list of indices representing the selected samples.