alpbench.util.pytorch_tabnet.utils¶
Functions
|
Check parameters related to embeddings and rearrange them in a unique manner. |
|
Raise a clear error if X is a pandas dataframe and check array according to scikit rules |
|
Check that list groups: |
|
Gives a warning about ambiguous usage of the two parameters. |
|
Create dataloaders with or without subsampling depending on weights and balanced. |
|
This is a computational trick. |
|
Create the group matrix corresponding to the given list_groups |
|
This creates a sampler from the given weights |
|
Define the device to use during training and inference. |
|
This function makes sure that weights are in correct format for regression and multitask TabNet |
|
Check if the shapes of eval_set are compatible with (X_train, y_train). |
Classes
|
|
Format for numpy array |
|
Format for csr_matrix |
|
|
Format for csr_matrix |
|
Format for numpy array |
- class alpbench.util.pytorch_tabnet.utils.ComplexEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
Bases:
JSONEncoder
- class alpbench.util.pytorch_tabnet.utils.PredictDataset(x)[source]¶
Bases:
DatasetFormat for numpy array
- Parameters:
X (2D array) – The input matrix
- class alpbench.util.pytorch_tabnet.utils.SparsePredictDataset(x)[source]¶
Bases:
DatasetFormat for csr_matrix
- Parameters:
X (CSR matrix) – The input matrix
- class alpbench.util.pytorch_tabnet.utils.SparseTorchDataset(x, y)[source]¶
Bases:
DatasetFormat for csr_matrix
- Parameters:
X (CSR matrix) – The input matrix
y (2D array) – The one-hot encoded target
- class alpbench.util.pytorch_tabnet.utils.TorchDataset(x, y)[source]¶
Bases:
DatasetFormat for numpy array
- Parameters:
X (2D array) – The input matrix
y (2D array) – The one-hot encoded target
- alpbench.util.pytorch_tabnet.utils.check_embedding_parameters(cat_dims, cat_idxs, cat_emb_dim)[source]¶
Check parameters related to embeddings and rearrange them in a unique manner.
- alpbench.util.pytorch_tabnet.utils.check_input(X)[source]¶
Raise a clear error if X is a pandas dataframe and check array according to scikit rules
- alpbench.util.pytorch_tabnet.utils.check_list_groups(list_groups, input_dim)[source]¶
- Check that list groups:
is a list of list
does not contain twice the same feature in different groups
does not contain unknown features (>= input_dim)
does not contain empty groups
- Parameters:
list_groups (-) – Each element is a list representing features in the same group. One feature should appear in maximum one group. Feature that don’t get assign a group will be in their own group of one feature.
input_dim (-) –
- alpbench.util.pytorch_tabnet.utils.check_warm_start(warm_start, from_unsupervised)[source]¶
Gives a warning about ambiguous usage of the two parameters.
- alpbench.util.pytorch_tabnet.utils.create_dataloaders(X_train, y_train, eval_set, weights, batch_size, num_workers, drop_last, pin_memory)[source]¶
Create dataloaders with or without subsampling depending on weights and balanced.
- Parameters:
X_train (np.ndarray) – Training data
y_train (np.array) – Mapped Training targets
weights (either 0, 1, dict or iterable) –
if 0 (default) : no weights will be applied if 1 : classification only, will balanced class with inverse frequency if dict : keys are corresponding class values are sample weights if iterable : list or np array must be of length equal to nb elements
in the training set
batch_size (int) – how many samples per batch to load
num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process
drop_last (bool) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller
pin_memory (bool) – Whether to pin GPU memory during training
- Returns:
train_dataloader, valid_dataloader – Training and validation dataloaders
- Return type:
torch.DataLoader, torch.DataLoader
- alpbench.util.pytorch_tabnet.utils.create_explain_matrix(input_dim, cat_emb_dim, cat_idxs, post_embed_dim)[source]¶
This is a computational trick. In order to rapidly sum importances from same embeddings to the initial index.
- Parameters:
input_dim (int) – Initial input dim
cat_emb_dim (int or list of int) – if int : size of embedding for all categorical feature if list of int : size of embedding for each categorical feature
cat_idxs (list of int) – Initial position of categorical features
post_embed_dim (int) – Post embedding inputs dimension
- Returns:
reducing_matrix – Matrix of dim (post_embed_dim, input_dim) to performe reduce
- Return type:
np.array
- alpbench.util.pytorch_tabnet.utils.create_group_matrix(list_groups, input_dim)[source]¶
Create the group matrix corresponding to the given list_groups
- Parameters:
list_groups (-) – Each element is a list representing features in the same group. One feature should appear in maximum one group. Feature that don’t get assigned a group will be in their own group of one feature.
input_dim (-) –
- Returns:
- group_matrix – A matrix of size (n_groups, input_dim) where m_ij represents the importance of feature j in group i The rows must some to 1 as each group is equally important a priori.
- Return type:
torch matrix
- alpbench.util.pytorch_tabnet.utils.create_sampler(weights, y_train)[source]¶
This creates a sampler from the given weights
- Parameters:
weights (either 0, 1, dict or iterable) –
if 0 (default) : no weights will be applied if 1 : classification only, will balanced class with inverse frequency if dict : keys are corresponding class values are sample weights if iterable : list or np array must be of length equal to nb elements
in the training set
y_train (np.array) – Training targets
- alpbench.util.pytorch_tabnet.utils.define_device(device_name)[source]¶
Define the device to use during training and inference. If auto it will detect automatically whether to use cuda or cpu
- alpbench.util.pytorch_tabnet.utils.filter_weights(weights)[source]¶
This function makes sure that weights are in correct format for regression and multitask TabNet
- alpbench.util.pytorch_tabnet.utils.validate_eval_set(eval_set, eval_name, X_train, y_train)[source]¶
Check if the shapes of eval_set are compatible with (X_train, y_train).
- Parameters:
- Returns:
eval_names (list of str) – Validated list of eval_names.
eval_set (list of tuple) – Validated list of eval_set.