celer.celer_path¶

celer.celer_path(X, y, pb, eps=0.001, n_alphas=100, alphas=None, l1_ratio=1.0, coef_init=None, max_iter=20, max_epochs=50000, p0=10, verbose=0, tol=1e-06, prune=0, weights=None, groups=None, return_thetas=False, use_PN=False, X_offset=None, X_scale=None, return_n_iter=False, positive=False)[source]¶

Compute optimization path with Celer as inner solver.

With n = len(y) and p = len(w) the number of samples and features, the losses are:

Lasso:

\[\frac{\| y - X w \||_2^2}{2 n} + \alpha \sum_{j=1}^p weights_j |w_j|\]

ElasticNet:

\[\frac{\| y - X w \|_2^2}{2 n} + \alpha \sum_{j=1}^p weights_j (l1\_ratio |w_j| + (1-l1\_ratio) w_j^2)\]

Logreg:

\[\sum_{i=1}^n \text{log} \,(1 + e^{-y_i x_i^\top w}) + \alpha \sum_{j=1}^p weights_j |w_j|\]

GroupLasso, with G the number of groups and \(w_{[g]}\) the subvector

corresponding the group g:

\[\frac{\| y - X w \|_2^2}{2 n} + \alpha \sum_{g=1}^G weights_g \| w_{[g]} \|_2\]

Parameters:

X{array-like, sparse matrix}, shape (n_samples, n_features): Training data. Pass directly as Fortran-contiguous data or column sparse format (CSC) to avoid unnecessary memory duplication.
yndarray, shape (n_samples,): Target values.
pb“lasso” | “logreg” | “grouplasso”: Optimization problem to solve.
epsfloat, optional: Length of the path. eps=1e-3 means that alpha_min = 1e-3 * alpha_max.
n_alphasint, optional: Number of alphas along the regularization path.
alphasndarray, optional: List of alphas where to compute the models. If None alphas are set automatically.
l1_ratiofloat, optional: The ElasticNet mixing parameter, with 0 < l1_ratio <= 1. Defaults to 1.0 which corresponds to L1 penalty (Lasso). l1_ratio = 0 (Ridge regression) is not supported.
coef_initndarray, shape (n_features,) | None, optional, (default=None): Initial value of coefficients. If None, np.zeros(n_features) is used.
max_iterint, optional: The maximum number of iterations (definition of working set and resolution of problem restricted to features in working set).
max_epochsint, optional: Maximum number of (block) CD epochs on each subproblem.
p0int, optional: First working set size.
verbosebool or integer, optional: Amount of verbosity. 0 or False is silent.
tolfloat, optional: The tolerance for the optimization: the solver runs until the duality gap is smaller than tol or the maximum number of iteration is reached.
prune0 | 1, optional: Whether or not to use pruning when growing working sets.
weightsndarray, shape (n_features,) or (n_groups,), optional: Feature/group weights used in the penalty. Default to array of ones. Features with weights equal to np.inf are ignored.
groupsint or list of ints or list of list of ints, optional: Used for the group Lasso only. See the documentation of the celer.GroupLasso class.
return_thetasbool, optional: If True, dual variables along the path are returned.
use_PNbool, optional: If pb == "logreg", use ProxNewton solver instead of coordinate descent.
X_offsetnp.array, shape (n_features,), optional: Used to center sparse X without breaking sparsity. Mean of each column. See sklearn.linear_model.base._preprocess_data().
X_scalenp.array, shape (n_features,), optional: Used to scale centered sparse X without breaking sparsity. Norm of each centered column. See sklearn.linear_model.base._preprocess_data().
return_n_iterbool, optional: If True, number of iterations along the path are returned.
positivebool, optional (default=False): If True and pb == "lasso", forces the coefficients to be positive.

Returns:

alphasarray, shape (n_alphas,): The alphas along the path where models are computed.
coefsarray, shape (n_features, n_alphas): Coefficients along the path.
dual_gapsarray, shape (n_alphas,): Duality gaps returned by the solver along the path.
thetasarray, shape (n_alphas, n_samples): The dual variables along the path. (thetas are returned if return_thetas is set to True).