alpbench.util.pytorch_tabnet.utils¶

Functions

`check_embedding_parameters`(cat_dims, ...)	Check parameters related to embeddings and rearrange them in a unique manner.
`check_input`(X)	Raise a clear error if X is a pandas dataframe and check array according to scikit rules
`check_list_groups`(list_groups, input_dim)	Check that list groups:
`check_warm_start`(warm_start, from_unsupervised)	Gives a warning about ambiguous usage of the two parameters.
`create_dataloaders`(X_train, y_train, ...)	Create dataloaders with or without subsampling depending on weights and balanced.
`create_explain_matrix`(input_dim, ...)	This is a computational trick.
`create_group_matrix`(list_groups, input_dim)	Create the group matrix corresponding to the given list_groups
`create_sampler`(weights, y_train)	This creates a sampler from the given weights
`define_device`(device_name)	Define the device to use during training and inference.
`filter_weights`(weights)	This function makes sure that weights are in correct format for regression and multitask TabNet
`validate_eval_set`(eval_set, eval_name, ...)	Check if the shapes of eval_set are compatible with (X_train, y_train).

Classes

`ComplexEncoder`(*[, skipkeys, ensure_ascii, ...])
`PredictDataset`(x)	Format for numpy array
`SparsePredictDataset`(x)	Format for csr_matrix
`SparseTorchDataset`(x, y)	Format for csr_matrix
`TorchDataset`(x, y)	Format for numpy array

class alpbench.util.pytorch_tabnet.utils.ComplexEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶

Bases: JSONEncoder

default(obj)[source]¶

class alpbench.util.pytorch_tabnet.utils.PredictDataset(x)[source]¶

Bases: Dataset

Format for numpy array

Parameters:: X (2D array) – The input matrix

class alpbench.util.pytorch_tabnet.utils.SparsePredictDataset(x)[source]¶

Bases: Dataset

Format for csr_matrix

Parameters:: X (CSR matrix) – The input matrix

class alpbench.util.pytorch_tabnet.utils.SparseTorchDataset(x, y)[source]¶

Bases: Dataset

Format for csr_matrix

Parameters:

X (CSR matrix) – The input matrix
y (2D array) – The one-hot encoded target

class alpbench.util.pytorch_tabnet.utils.TorchDataset(x, y)[source]¶

Bases: Dataset

Format for numpy array

Parameters:

X (2D array) – The input matrix
y (2D array) – The one-hot encoded target

alpbench.util.pytorch_tabnet.utils.check_embedding_parameters(cat_dims, cat_idxs, cat_emb_dim)[source]¶: Check parameters related to embeddings and rearrange them in a unique manner.

alpbench.util.pytorch_tabnet.utils.check_input(X)[source]¶: Raise a clear error if X is a pandas dataframe and check array according to scikit rules

alpbench.util.pytorch_tabnet.utils.check_list_groups(list_groups, input_dim)[source]¶

Check that list groups:

is a list of list
does not contain twice the same feature in different groups
does not contain unknown features (>= input_dim)
does not contain empty groups

Parameters:

list_groups (-) – Each element is a list representing features in the same group. One feature should appear in maximum one group. Feature that don’t get assign a group will be in their own group of one feature.
input_dim (-) –

alpbench.util.pytorch_tabnet.utils.check_warm_start(warm_start, from_unsupervised)[source]¶: Gives a warning about ambiguous usage of the two parameters.

alpbench.util.pytorch_tabnet.utils.create_dataloaders(X_train, y_train, eval_set, weights, batch_size, num_workers, drop_last, pin_memory)[source]¶

Create dataloaders with or without subsampling depending on weights and balanced.

Parameters:

X_train (np.ndarray) – Training data
y_train (np.array) – Mapped Training targets
eval_set (list of tuple) – List of eval tuple set (X, y)
weights (either 0, 1, dict or iterable) –
if 0 (default) : no weights will be applied if 1 : classification only, will balanced class with inverse frequency if dict : keys are corresponding class values are sample weights if iterable : list or np array must be of length equal to nb elements

in the training set
batch_size (int) – how many samples per batch to load
num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process
drop_last (bool) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller
pin_memory (bool) – Whether to pin GPU memory during training

Returns:

train_dataloader, valid_dataloader – Training and validation dataloaders

Return type:

torch.DataLoader, torch.DataLoader

alpbench.util.pytorch_tabnet.utils.create_explain_matrix(input_dim, cat_emb_dim, cat_idxs, post_embed_dim)[source]¶

This is a computational trick. In order to rapidly sum importances from same embeddings to the initial index.

Parameters:

input_dim (int) – Initial input dim
cat_emb_dim (int or list of int) – if int : size of embedding for all categorical feature if list of int : size of embedding for each categorical feature
cat_idxs (list of int) – Initial position of categorical features
post_embed_dim (int) – Post embedding inputs dimension

Returns:

reducing_matrix – Matrix of dim (post_embed_dim, input_dim) to performe reduce

Return type:

np.array

alpbench.util.pytorch_tabnet.utils.create_group_matrix(list_groups, input_dim)[source]¶

Create the group matrix corresponding to the given list_groups

Parameters:

list_groups (-) – Each element is a list representing features in the same group. One feature should appear in maximum one group. Feature that don’t get assigned a group will be in their own group of one feature.
input_dim (-) –

Returns:

- group_matrix – A matrix of size (n_groups, input_dim) where m_ij represents the importance of feature j in group i The rows must some to 1 as each group is equally important a priori.

Return type:

torch matrix

alpbench.util.pytorch_tabnet.utils.create_sampler(weights, y_train)[source]¶

This creates a sampler from the given weights

Parameters:

weights (either 0, 1, dict or iterable) –
if 0 (default) : no weights will be applied if 1 : classification only, will balanced class with inverse frequency if dict : keys are corresponding class values are sample weights if iterable : list or np array must be of length equal to nb elements

in the training set
y_train (np.array) – Training targets

alpbench.util.pytorch_tabnet.utils.define_device(device_name)[source]¶

Define the device to use during training and inference. If auto it will detect automatically whether to use cuda or cpu

Parameters:: device_name (str) – Either “auto”, “cpu” or “cuda”
Returns:: Either “cpu” or “cuda”
Return type:: str

alpbench.util.pytorch_tabnet.utils.filter_weights(weights)[source]¶

This function makes sure that weights are in correct format for regression and multitask TabNet

Parameters:: weights (int, dict or list) – Initial weights parameters given by user
Returns:: None
Return type:: This function will only throw an error if format is wrong

alpbench.util.pytorch_tabnet.utils.validate_eval_set(eval_set, eval_name, X_train, y_train)[source]¶

Check if the shapes of eval_set are compatible with (X_train, y_train).

Parameters:

eval_set (list of tuple) – List of eval tuple set (X, y). The last one is used for early stopping
eval_name (list of str) – List of eval set names.
X_train (np.ndarray) – Train owned products
y_train (np.array) – Train targeted products

Returns:

eval_names (list of str) – Validated list of eval_names.
eval_set (list of tuple) – Validated list of eval_set.