API Reference¶

Group Lasso regularised estimators¶

class group_lasso.GroupLasso(groups=None, group_reg=0.05, l1_reg=0.05, n_iter=100, tol=1e-05, scale_reg='group_size', subsampling_scheme=None, fit_intercept=True, frobenius_lipschitz=False, random_state=None, warm_start=False, old_regularisation=False, supress_warning=False)[source]¶

Sparse group lasso regularised least squares linear regression.

This class implements the Sparse Group Lasso [1] regularisation for linear regression with the mean squared penalty.

This class is implemented as both a regressor and a transformation. If the transform method is called, then the columns of the input that correspond to zero-valued regression coefficients are dropped.

The loss is optimised using the FISTA algorithm proposed in [2] with the generalised gradient-based restarting scheme proposed in [3]. This algorithm is not as accurate as a few other optimisation algorithms, but it is extremely efficient and does recover the sparsity patterns. We therefore reccomend that this class is used as a transformer to select the viable features and that the output is fed into another regression algorithm, such as RidgeRegression in scikit-learn.

Parameters:

groups (Iterable) – Iterable that specifies which group each column corresponds to. For columns that should not be regularised, the corresponding group index should either be None or negative. For example, the list [1, 1, 1, 2, 2, -1] specifies that the first three columns of the data matrix belong to the first group, the next two columns belong to the second group and the last column should not be regularised.
group_reg (float or iterable [default=0.05]) – The regularisation coefficient(s) for the group sparsity penalty. If group_reg is an iterable, then its length should be equal to the number of groups.
l1_reg (float or iterable [default=0.05]) – The regularisation coefficient for the coefficient sparsity penalty.
n_iter (int [default=100]) – The maximum number of iterations to perform
tol (float [default=1e-5]) – The convergence tolerance. The optimisation algorithm will stop once ||x_{n+1} - x_n|| < tol.
scale_reg (str [in {"group_size", "none", "inverse_group_size"] or None) – How to scale the group-wise regularisation coefficients. In the original group lasso paper scaled the regularisation by the square root of the elements in each group so that each variable has the same effect on the regularisation. This is not sensible for dummy encoded variables, as these always have either unit or zero norm. scale_reg should therefore be None if all variables are dummy variables. Finally, if the group size shouldn’t be considered when choosing variables, then inverse_group_size should be used instead as that divide by the square root of the group size, removing the dependence of group size on the regularisation strength.
subsampling_scheme (None, float, int or str [default=None]) – The subsampling rate used for the gradient and singular value computations. If it is a float, then it specifies the fraction of rows to use in the computations. If it is an int, it specifies the number of rows to use in the computation and if it is a string, then it must be ‘sqrt’ and the number of rows used in the computations is the square root of the number of rows in X.
frobenius_lipschitz (bool [default=False]) – Use the Frobenius norm to estimate the lipschitz coefficient of the MSE loss. This works well for systems whose power iterations converge slowly. If False, then subsampled power iterations are used. Using the Frobenius approximation for the Lipschitz coefficient might fail, and end up with all-zero weights.
fit_intercept (bool [default=True]) – Whether to fit an intercept or not.
random_state (np.random.RandomState [default=None]) – The random state used for initialisation of parameters.
warm_start (bool [default=False]) – If true, then subsequent calls to fit will not re-initialise the model parameters. This can speed up the hyperparameter search

References

[1] Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231-245.

[2] Beck A, Teboulle M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences. 2009 Mar 4;2(1):183-202.

[3] O’Donoghue B, Candes E. (2015) Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics. Jun 1;15(3):715-32

Attributes:	`chosen_groups_` A set of the coosen group ids. `sparsity_mask` A boolean mask indicating whether features are used in prediction. `sparsity_mask_` A boolean mask indicating whether features are used in prediction.

Methods

`fit`(X, y[, lipschitz])	Fit a group lasso regularised linear regression model.
`fit_transform`(X, y[, lipschitz])	Fit a group lasso model to X and y and remove unused columns from X
`get_params`([deep])	Get parameters for this estimator.
`loss`(X, y)	The group-lasso regularised loss with the current coefficients
`predict`(X)	Predict using the linear model.
`score`(X, y[, sample_weight])	Return the coefficient of determination R^2 of the prediction.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Remove columns corresponding to zero-valued coefficients.

fit_predict

chosen_groups_¶: A set of the coosen group ids.

fit(X, y, lipschitz=None)[source]¶

Fit a group lasso regularised linear regression model.

Parameters:	X (np.ndarray) – Data matrix y (np.ndarray) – Target vector or matrix lipschitz (float or None [default=None]) – A Lipshitz bound for the mean squared loss with the given data and target matrices. If None, this is estimated.

fit_transform(X, y, lipschitz=None)¶: Fit a group lasso model to X and y and remove unused columns from X

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params – Parameter names mapped to their values.
Return type:	mapping of string to any

loss(X, y)¶

The group-lasso regularised loss with the current coefficients

Parameters:	X (np.ndarray) – Data matrix, `X.shape == (num_datapoints, num_features)` y (np.ndarray) – Target vector/matrix, `y.shape == (num_datapoints, num_targets)`, or `y.shape == (num_datapoints,)`

predict(X)[source]¶: Predict using the linear model.

score(X, y, sample_weight=None)¶

Return the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:	X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead, shape = (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator. y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X. sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:	score – R^2 of self.predict(X) wrt. y.
Return type:	float

Notes

The R2 score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:	*params (dict*) – Estimator parameters.
Returns:	self – Estimator instance.
Return type:	object

sparsity_mask¶: A boolean mask indicating whether features are used in prediction.

sparsity_mask_¶: A boolean mask indicating whether features are used in prediction.

transform(X)¶: Remove columns corresponding to zero-valued coefficients.

class group_lasso.LogisticGroupLasso(groups, group_reg=0.05, l1_reg=0.05, n_iter=100, tol=1e-05, scale_reg='group_size', subsampling_scheme=None, fit_intercept=True, random_state=None, warm_start=False, old_regularisation=False, supress_warning=False)[source]¶

Sparse group lasso regularised multi-class logistic regression.

This class implements the Sparse Group Lasso [1] regularisation for multi-class logistic regression with a cross entropy penalty.

This class is implemented as both a regressor and a transformation. If the transform method is called, then the columns of the input that correspond to zero-valued regression coefficients are dropped.

The loss is optimised using the FISTA algorithm proposed in [2] with the generalised gradient-based restarting scheme proposed in [3]. This algorithm is not as accurate as a few other optimisation algorithms, but it is extremely efficient and does recover the sparsity patterns. We therefore reccomend that this class is used as a transformer to select the viable features and that the output is fed into another classification algorithm, such as LogisticRegression in scikit-learn.

Parameters:

groups (Iterable) – Iterable that specifies which group each column corresponds to. For columns that should not be regularised, the corresponding group index should either be None or negative. For example, the list [1, 1, 1, 2, 2, -1] specifies that the first three columns of the data matrix belong to the first group, the next two columns belong to the second group and the last column should not be regularised.
group_reg (float or iterable [default=0.05]) – The regularisation coefficient(s) for the group sparsity penalty. If group_reg is an iterable, then its length should be equal to the number of groups.
l1_reg (float or iterable [default=0.05]) – The regularisation coefficient for the coefficient sparsity penalty.
n_iter (int [default=100]) – The maximum number of iterations to perform
tol (float [default=1e-5]) – The convergence tolerance. The optimisation algorithm will stop once ||x_{n+1} - x_n|| < tol.
scale_reg (str [in {"group_size", "none", "inverse_group_size"] or None) – How to scale the group-wise regularisation coefficients. In the original group lasso paper scaled the regularisation by the square root of the elements in each group so that each variable has the same effect on the regularisation. This is not sensible for dummy encoded variables, as these always have either unit or zero norm. scale_reg should therefore be None if all variables are dummy variables. Finally, if the group size shouldn’t be considered when choosing variables, then inverse_group_size should be used instead as that divide by the square root of the group size, removing the dependence of group size on the regularisation strength.
subsampling_scheme (None, float, int or str [default=None]) – The subsampling rate used for the gradient and singular value computations. If it is a float, then it specifies the fraction of rows to use in the computations. If it is an int, it specifies the number of rows to use in the computation and if it is a string, then it must be ‘sqrt’ and the number of rows used in the computations is the square root of the number of rows in X.
frobenius_lipschitz (bool [default=False]) – Use the Frobenius norm to estimate the lipschitz coefficient of the MSE loss. This works well for systems whose power iterations converge slowly. If False, then subsampled power iterations are used. Using the Frobenius approximation for the Lipschitz coefficient might fail, and end up with all-zero weights.
fit_intercept (bool [default=True]) – Whether to fit an intercept or not.
random_state (np.random.RandomState [default=None]) – The random state used for initialisation of parameters.
warm_start (bool [default=False]) – If true, then subsequent calls to fit will not re-initialise the model parameters. This can speed up the hyperparameter search

References

[1] Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231-245.

[2] Beck A, Teboulle M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences. 2009 Mar 4;2(1):183-202.

[3] O’Donoghue B, Candes E. (2015) Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics. Jun 1;15(3):715-32

Attributes:	`chosen_groups_` A set of the coosen group ids. `sparsity_mask` A boolean mask indicating whether features are used in prediction. `sparsity_mask_` A boolean mask indicating whether features are used in prediction.

Methods

`fit`(X, y[, lipschitz])	Fit a group-lasso regularised linear model.
`fit_transform`(X, y[, lipschitz])	Fit a group lasso model to X and y and remove unused columns from X
`get_params`([deep])	Get parameters for this estimator.
`loss`(X, y)	The group-lasso regularised loss with the current coefficients
`predict`(X)	Predict using the linear model.
`score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Remove columns corresponding to zero-valued coefficients.

fit_predict
predict_proba

chosen_groups_¶: A set of the coosen group ids.

fit(X, y, lipschitz=None)¶: Fit a group-lasso regularised linear model.

fit_transform(X, y, lipschitz=None)¶: Fit a group lasso model to X and y and remove unused columns from X

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params – Parameter names mapped to their values.
Return type:	mapping of string to any

loss(X, y)¶

The group-lasso regularised loss with the current coefficients

Parameters:	X (np.ndarray) – Data matrix, `X.shape == (num_datapoints, num_features)` y (np.ndarray) – Target vector/matrix, `y.shape == (num_datapoints, num_targets)`, or `y.shape == (num_datapoints,)`

predict(X)[source]¶: Predict using the linear model.

score(X, y, sample_weight=None)¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:	X (array-like of shape (n_samples, n_features)) – Test samples. y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X. sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:	score – Mean accuracy of self.predict(X) wrt. y.
Return type:	float

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:	*params (dict*) – Estimator parameters.
Returns:	self – Estimator instance.
Return type:	object

sparsity_mask¶: A boolean mask indicating whether features are used in prediction.

sparsity_mask_¶: A boolean mask indicating whether features are used in prediction.

transform(X)¶: Remove columns corresponding to zero-valued coefficients.

Utilities for group lasso¶

group_lasso.utils.extract_ohe_groups(onehot_encoder)[source]¶

Extract a vector with group indices from a scikit-learn OneHotEncoder

Parameters:	onehot_encoder (sklearn.preprocessing.OneHotEncoder) –
Returns:	A group-vector that can be used with the group lasso regularised linear models.
Return type:	np.ndarray