# celer.datasets.make_correlated_data#

celer.datasets.make_correlated_data(n_samples=100, n_features=50, corr=0.6, snr=3, density=0.2, w_true=None, random_state=None)[source]#

Generate correlated design matrix with decaying correlation rho**|i-j|. according to

$y = X w^* + \epsilon$

such that $$||X w^*|| / ||\epsilon|| = snr$$.

The generated features have mean 0, variance 1 and the expected correlation structure:

$\mathbb E[x_i] = 0~, \quad \mathbb E[x_i^2] = 1 \quad \text{and} \quad \mathbb E[x_ix_j] = \rho^{|i-j|}$
Parameters:
n_samples: int

Number of samples in the design matrix.

n_features: int

Number of features in the design matrix.

corr: float

Correlation $$\rho$$ between successive features. The element $$C_{i, j}$$ in the correlation matrix will be $$\rho^{|i-j|}$$. This parameter should be selected in $$[0, 1[$$.

snr: float or np.inf

Signal-to-noise ratio. In np.inf, no noise is added.

density: float

Proportion of non zero elements in w_true if it must be simulated.

w_true: np.array, shape (n_features,) | None

True regression coefficients. If None, an array with nnz non zero standard Gaussian entries is simulated.

random_state: int | RandomState instance | None (default)

Determines random number generation for data generation. Use an int to make the randomness deterministic.

Returns:
X: ndarray, shape (n_samples, n_features)

A design matrix with Toeplitz covariance.

y: ndarray, shape (n_samples,)

Observation vector.

w_true: ndarray, shape (n_features,)

True regression vector of the model.