alpbench.benchmark.ActiveLearningScenario¶
Functions
|
This method creates a split of the data into labeled, unlabeled and test data. |
Classes
|
Active Learning Scenario |
- class alpbench.benchmark.ActiveLearningScenario.ActiveLearningScenario(scenario_id, openml_id, test_split_seed, train_split_seed, seed, setting, labeled_indices=None, test_indices=None)[source]¶
Bases:
objectActive Learning Scenario
The active learning scenario defines the data and the setting of one active learning setup. The scenario is initialized with the openml id of the dataset, the test split, train split and the seed for reproducibility, the setting, and optionally labeled and test indices.
- Parameters:
scenario_id (int) – id of the scenario in the database
openml_id (int) – id of the dataset on openml
test_split_seed (int) – seed for the test split
train_split_seed (int) – seed for the train split
seed (int) – seed for reproducibility
setting (ActiveLearningSetting) – active learning setting
labeled_indices (list) – indices of the labeled data
test_indices (list) – indices of the test data
- setting¶
active learning setting
- Type:
- alpbench.benchmark.ActiveLearningScenario.create_dataset_split(X, y, test_split_seed, test_split_size, train_split_seed, train_split_size, train_split_type, factor)[source]¶
This method creates a split of the data into labeled, unlabeled and test data. The type of the split can be either absolute (i.e., a fixed number of labeled data points) or relative (i.e., a fixed share of the training data). The split is stratified according to the labels. The labeled data is guaranteed to contain at least one instance of each class. Further, if a factor is given, the number of labeled data points is determined by the number of classes times the factor.
- Parameters:
X (numpy.ndarray) – data
y (numpy.ndarray) – labels
test_split_seed (int) – seed for the test split
test_split_size (float) – size of the test data
train_split_seed (int) – seed for the train split
train_split_size (float) – size of the labeled training data
train_split_type (str) – type of the size parameter: number of data points or share of the (training) dataset
factor (int) – task-dependent factor
- Returns:
indices of the labeled data test_indices (list): indices of the test data
- Return type:
labeled_indices (list)