celer.celer_path¶
- celer.celer_path(X, y, pb, eps=0.001, n_alphas=100, alphas=None, l1_ratio=1.0, coef_init=None, max_iter=20, max_epochs=50000, p0=10, verbose=0, tol=1e-06, prune=0, weights=None, groups=None, return_thetas=False, use_PN=False, X_offset=None, X_scale=None, return_n_iter=False, positive=False)[source]¶
Compute optimization path with Celer as inner solver.
With
n = len(y)andp = len(w)the number of samples and features, the losses are:Lasso:
\[\frac{\| y - X w \||_2^2}{2 n} + \alpha \sum_{j=1}^p weights_j |w_j|\]ElasticNet:
\[\frac{\| y - X w \|_2^2}{2 n} + \alpha \sum_{j=1}^p weights_j (l1\_ratio |w_j| + (1-l1\_ratio) w_j^2)\]Logreg:
\[\sum_{i=1}^n \text{log} \,(1 + e^{-y_i x_i^\top w}) + \alpha \sum_{j=1}^p weights_j |w_j|\]GroupLasso, with G the number of groups and \(w_{[g]}\) the subvector
corresponding the group g:
\[\frac{\| y - X w \|_2^2}{2 n} + \alpha \sum_{g=1}^G weights_g \| w_{[g]} \|_2\]- Parameters:
- X{array-like, sparse matrix}, shape (n_samples, n_features)
Training data. Pass directly as Fortran-contiguous data or column sparse format (CSC) to avoid unnecessary memory duplication.
- yndarray, shape (n_samples,)
Target values.
- pb“lasso” | “logreg” | “grouplasso”
Optimization problem to solve.
- epsfloat, optional
Length of the path.
eps=1e-3means thatalpha_min = 1e-3 * alpha_max.- n_alphasint, optional
Number of alphas along the regularization path.
- alphasndarray, optional
List of alphas where to compute the models. If
Nonealphas are set automatically.- l1_ratiofloat, optional
The ElasticNet mixing parameter, with
0 < l1_ratio <= 1. Defaults to 1.0 which corresponds to L1 penalty (Lasso).l1_ratio = 0(Ridge regression) is not supported.- coef_initndarray, shape (n_features,) | None, optional, (default=None)
Initial value of coefficients. If
None,np.zeros(n_features)is used.- max_iterint, optional
The maximum number of iterations (definition of working set and resolution of problem restricted to features in working set).
- max_epochsint, optional
Maximum number of (block) CD epochs on each subproblem.
- p0int, optional
First working set size.
- verbosebool or integer, optional
Amount of verbosity.
0orFalseis silent.- tolfloat, optional
The tolerance for the optimization: the solver runs until the duality gap is smaller than
tolor the maximum number of iteration is reached.- prune0 | 1, optional
Whether or not to use pruning when growing working sets.
- weightsndarray, shape (n_features,) or (n_groups,), optional
Feature/group weights used in the penalty. Default to array of ones. Features with weights equal to
np.infare ignored.- groupsint or list of ints or list of list of ints, optional
Used for the group Lasso only. See the documentation of the celer.GroupLasso class.
- return_thetasbool, optional
If
True, dual variables along the path are returned.- use_PNbool, optional
If
pb == "logreg", use ProxNewton solver instead of coordinate descent.- X_offsetnp.array, shape (n_features,), optional
Used to center sparse X without breaking sparsity. Mean of each column. See sklearn.linear_model.base._preprocess_data().
- X_scalenp.array, shape (n_features,), optional
Used to scale centered sparse X without breaking sparsity. Norm of each centered column. See sklearn.linear_model.base._preprocess_data().
- return_n_iterbool, optional
If
True, number of iterations along the path are returned.- positivebool, optional (default=False)
If
Trueandpb == "lasso", forces the coefficients to be positive.
- Returns:
- alphasarray, shape (n_alphas,)
The alphas along the path where models are computed.
- coefsarray, shape (n_features, n_alphas)
Coefficients along the path.
- dual_gapsarray, shape (n_alphas,)
Duality gaps returned by the solver along the path.
- thetasarray, shape (n_alphas, n_samples)
The dual variables along the path. (
thetasare returned ifreturn_thetasis set toTrue).