Creating your own active learning strategies with PyRelationAL

While PyRelationAL already implements multiple standard active learning strategies, it is not exhaustive. However, users can easily define their own strategies by subclassing pyrelational.strategies.abstract_strategy.Strategy and overriding pyrelational.strategies.abstract_strategy.Strategy.__call__() Let’s look at some examples.

Implementing a mixed strategy

In the first example, we show how to implement a mixed strategy for a regression task using the least confidence scorer for informativeness in combination with representative sampling. The step is decomposed in two steps: 1) identifying a subset of the queryable set based on the least confidence score (make sure that this subset is sufficiently larger than the number of annotations we want to query) and 2) select representative samples from this subset based on euclidean distance between input features.

import torch
from pyrelational.informativeness import regression_least_confidence
from pyrelational.informativeness.task_agnostic import representative_sampling
from pyrelational.strategies.generic_al_strategy import Strategy

class MixedStrategy(Strategy):
    Implements a strategy that combines least_confidence scorer with representative sampling.
    To this end, 10 times more samples than requested are selected based on least_confidence scorer,
    the list is then reduced based on representative_sampling.

    def __init(self):
        super(MixedStrategy, self).__init__()

    def __call__(self, num_annotate, data_manager, model_manager):
        output = self.train_and_infer(data_manager=data_manager, model_manager=model_manager)
        scores = regression_least_confidence(x=output)
        ixs = torch.argsort(scores, descending=True).tolist()
        ixs = [data_manager.u_indices[i] for i in ixs[: 10 * num_annotate]]
        subquery = torch.stack(data_manager.get_sample_feature_vectors(ixs))
        new_ixs = representative_sampling(subquery)
        return [ixs[i] for i in new_ixs]

Implementing an \(\epsilon\)-greedy strategy

In the second example, we implement an \(\epsilon\)-greedy strategy: for \(N\) queries, \((1-\epsilon)N\) are selected greedily based on model prediction and \(\epsilon N\) are selected uniformly at random from the remaining queryable set.

import torch
import numpy as np
from pyrelational.informativeness import regression_mean_prediction
from pyrelational.strategies.generic_al_strategy import Strategy

class EpsilonGreedyStrategy(Strategy):
    Implements an epsilon-greedy strategy, whereby a percentage of the samples to annotate
    are selected randomly while the remaining are selected greedily.

    def __init(self):
        super(EpsilonGreedyStrategy, self).__init__()

    def __call__(self, num_annotate, data_manager, model_manager, eps=0.05):
        assert 0 <= eps <= 1, "epsilon should be a float between 0 and 1"
        output = self.train_and_infer(data_manager=data_manager, model_manager=model_manager)
        scores = regression_mean_prediction(x=output)
        ixs = torch.argsort(scores, descending=True).tolist()
        greedy_annotate = int((1-eps)*num_annotate)
        ixs = [data_manager.u_indices[i] for i in ixs[: greedy_annotate]]
        remaining_u_indices = list(set(data_manager.u_indices) - set(ixs))
        random_annotate = np.random.choice(remaining_u_indices, num_annotate-greedy_annotate, replace=False)
        return ixs + random_annotate.tolist()

See the examples folder in the source repository for more examples.