alpbench.util.pytorch_tabnet.multiclass_utils¶

Multi-class / multi-label utility function¶

Functions

assert_all_finite(X[, allow_nan])

Throw a ValueError if X contains NaN or infinity.

check_classification_targets(y)

Ensure that target y is of a non-regression type.

check_output_dim(labels, y)

check_unique_type(y)

infer_multitask_output(y_train)

Infer output_dim from targets This is for multiple tasks.

infer_output_dim(y_train)

Infer output_dim from targets

is_multilabel(y)

Check if y is in a multilabel format.

type_of_target(y)

Determine the type of data indicated by the target.

unique_labels(*ys)

Extract an ordered array of unique labels

alpbench.util.pytorch_tabnet.multiclass_utils.assert_all_finite(X, allow_nan=False)[source]¶

Throw a ValueError if X contains NaN or infinity.

Parameters:
  • X (array or sparse matrix) –

  • allow_nan (bool) –

alpbench.util.pytorch_tabnet.multiclass_utils.check_classification_targets(y)[source]¶

Ensure that target y is of a non-regression type.

Only the following target types (as defined in type_of_target) are allowed:

‘binary’, ‘multiclass’, ‘multiclass-multioutput’, ‘multilabel-indicator’, ‘multilabel-sequences’

Parameters:

y (array-like) –

alpbench.util.pytorch_tabnet.multiclass_utils.check_output_dim(labels, y)[source]¶
alpbench.util.pytorch_tabnet.multiclass_utils.check_unique_type(y)[source]¶
alpbench.util.pytorch_tabnet.multiclass_utils.infer_multitask_output(y_train)[source]¶

Infer output_dim from targets This is for multiple tasks.

Parameters:

y_train (np.ndarray) – Training targets

Returns:

  • tasks_dims (list) – Number of classes for output

  • tasks_labels (list) – List of sorted list of initial classes

alpbench.util.pytorch_tabnet.multiclass_utils.infer_output_dim(y_train)[source]¶

Infer output_dim from targets

Parameters:

y_train (np.array) – Training targets

Returns:

  • output_dim (int) – Number of classes for output

  • train_labels (list) – Sorted list of initial classes

alpbench.util.pytorch_tabnet.multiclass_utils.is_multilabel(y)[source]¶

Check if y is in a multilabel format.

Parameters:

y (numpy array of shape [n_samples]) – Target values.

Returns:

out – Return True, if y is in a multilabel format, else `False.

Return type:

bool

Examples

>>> import numpy as np
>>> from sklearn.utils.multiclass import is_multilabel
>>> is_multilabel([0, 1, 0, 1])
False
>>> is_multilabel([[1], [0, 2], []])
False
>>> is_multilabel(np.array([[1, 0], [0, 0]]))
True
>>> is_multilabel(np.array([[1], [0], [0]]))
False
>>> is_multilabel(np.array([[1, 0, 0]]))
True
alpbench.util.pytorch_tabnet.multiclass_utils.type_of_target(y)[source]¶

Determine the type of data indicated by the target.

Note that this type is the most specific type that can be inferred. For example:

  • binary is more specific but compatible with multiclass.

  • multiclass of integers is more specific but compatible with continuous.

  • multilabel-indicator is more specific but compatible with multiclass-multioutput.

Parameters:

y (array-like) –

Returns:

target_type – One of:

  • ’continuous’: y is an array-like of floats that are not all integers, and is 1d or a column vector.

  • ’continuous-multioutput’: y is a 2d array of floats that are not all integers, and both dimensions are of size > 1.

  • ’binary’: y contains <= 2 discrete values and is 1d or a column vector.

  • ’multiclass’: y contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.

  • ’multiclass-multioutput’: y is a 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.

  • ’multilabel-indicator’: y is a label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.

  • ’unknown’: y is array-like but none of the above, such as a 3d array, sequence of sequences, or an array of non-sequence objects.

Return type:

string

Examples

>>> import numpy as np
>>> type_of_target([0.1, 0.6])
'continuous'
>>> type_of_target([1, -1, -1, 1])
'binary'
>>> type_of_target(['a', 'b', 'a'])
'binary'
>>> type_of_target([1.0, 2.0])
'binary'
>>> type_of_target([1, 0, 2])
'multiclass'
>>> type_of_target([1.0, 0.0, 3.0])
'multiclass'
>>> type_of_target(['a', 'b', 'c'])
'multiclass'
>>> type_of_target(np.array([[1, 2], [3, 1]]))
'multiclass-multioutput'
>>> type_of_target([[1, 2]])
'multiclass-multioutput'
>>> type_of_target(np.array([[1.5, 2.0], [3.0, 1.6]]))
'continuous-multioutput'
>>> type_of_target(np.array([[0, 1], [1, 1]]))
'multilabel-indicator'
alpbench.util.pytorch_tabnet.multiclass_utils.unique_labels(*ys)[source]¶

Extract an ordered array of unique labels

We don’t allow:
  • mix of multilabel and multiclass (single label) targets

  • mix of label indicator matrix and anything else, because there are no explicit labels)

  • mix of label indicator matrices of different sizes

  • mix of string and integer labels

At the moment, we also don’t allow “multiclass-multioutput” input type.

Parameters:

*ys (array-likes) –

Returns:

out – An ordered array of unique labels.

Return type:

numpy array of shape [n_unique_labels]

Examples

>>> from sklearn.utils.multiclass import unique_labels
>>> unique_labels([3, 5, 5, 5, 7, 7])
array([3, 5, 7])
>>> unique_labels([1, 2, 3, 4], [2, 2, 3, 4])
array([1, 2, 3, 4])
>>> unique_labels([1, 2, 10], [5, 11])
array([ 1,  2,  5, 10, 11])