alpbench.benchmark.ActiveLearningScenario¶

Functions

create_dataset_split(X, y, test_split_seed, ...)

This method creates a split of the data into labeled, unlabeled and test data.

Classes

ActiveLearningScenario(scenario_id, ...[, ...])

Active Learning Scenario

class alpbench.benchmark.ActiveLearningScenario.ActiveLearningScenario(scenario_id, openml_id, test_split_seed, train_split_seed, seed, setting, labeled_indices=None, test_indices=None)[source]¶

Bases: object

Active Learning Scenario

The active learning scenario defines the data and the setting of one active learning setup. The scenario is initialized with the openml id of the dataset, the test split, train split and the seed for reproducibility, the setting, and optionally labeled and test indices.

Parameters:

scenario_id (int) – id of the scenario in the database
openml_id (int) – id of the dataset on openml
test_split_seed (int) – seed for the test split
train_split_seed (int) – seed for the train split
seed (int) – seed for reproducibility
setting (ActiveLearningSetting) – active learning setting
labeled_indices (list) – indices of the labeled data
test_indices (list) – indices of the test data

scenario_id¶

id of the scenario in the database

Type:: int

openml_id¶

id of the dataset on openml

Type:: int

test_split_seed¶

seed for the test split

Type:: int

train_split_seed¶

seed for the train split

Type:: int

seed¶

seed for reproducibility

Type:: int

setting¶

active learning setting

Type:: ActiveLearningSetting

labeled_indices¶

indices of the labeled data

Type:: list

test_indices¶

indices of the test data

Type:: list

get_data_split()[source]¶: Get labeled, unlabeled and test data.

get_labeled_instances()[source]¶: Get the labeled instances.

get_labeled_train_data()[source]¶: Get the labeled training data.

get_openml_id()[source]¶: Get the openml id.

get_scenario_id()[source]¶: Get the scenario id.

get_seed()[source]¶: Get the seed.

get_setting()[source]¶: Get the setting.

get_test_data()[source]¶: Get the test data.

get_test_indices()[source]¶: Get the test indices.

get_unlabeled_train_data()[source]¶: Get the unlabeled training data (X and y).

alpbench.benchmark.ActiveLearningScenario.create_dataset_split(X, y, test_split_seed, test_split_size, train_split_seed, train_split_size, train_split_type, factor)[source]¶

This method creates a split of the data into labeled, unlabeled and test data. The type of the split can be either absolute (i.e., a fixed number of labeled data points) or relative (i.e., a fixed share of the training data). The split is stratified according to the labels. The labeled data is guaranteed to contain at least one instance of each class. Further, if a factor is given, the number of labeled data points is determined by the number of classes times the factor.

Parameters:

X (numpy.ndarray) – data
y (numpy.ndarray) – labels
test_split_seed (int) – seed for the test split
test_split_size (float) – size of the test data
train_split_seed (int) – seed for the train split
train_split_size (float) – size of the labeled training data
train_split_type (str) – type of the size parameter: number of data points or share of the (training) dataset
factor (int) – task-dependent factor

Returns:

indices of the labeled data test_indices (list): indices of the test data

Return type:

labeled_indices (list)