celer.datasets.make_correlated_data¶
- celer.datasets.make_correlated_data(n_samples=100, n_features=50, corr=0.6, snr=3, density=0.2, w_true=None, random_state=None)[source]¶
Generate correlated design matrix with decaying correlation rho**|i-j|. according to
\[y = X w^* + \epsilon\]such that \(||X w^*|| / ||\epsilon|| = snr\).
The generated features have mean 0, variance 1 and the expected correlation structure:
\[\mathbb E[x_i] = 0~, \quad \mathbb E[x_i^2] = 1 \quad \text{and} \quad \mathbb E[x_ix_j] = \rho^{|i-j|}\]- Parameters:
- n_samples: int
Number of samples in the design matrix.
- n_features: int
Number of features in the design matrix.
- corr: float
Correlation \(\rho\) between successive features. The element \(C_{i, j}\) in the correlation matrix will be \(\rho^{|i-j|}\). This parameter should be selected in \([0, 1[\).
- snr: float or np.inf
Signal-to-noise ratio. In np.inf, no noise is added.
- density: float
Proportion of non zero elements in w_true if it must be simulated.
- w_true: np.array, shape (n_features,) | None
True regression coefficients. If None, an array with nnz non zero standard Gaussian entries is simulated.
- random_state: int | RandomState instance | None (default)
Determines random number generation for data generation. Use an int to make the randomness deterministic.
- Returns:
- X: ndarray, shape (n_samples, n_features)
A design matrix with Toeplitz covariance.
- y: ndarray, shape (n_samples,)
Observation vector.
- w_true: ndarray, shape (n_features,)
True regression vector of the model.