.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_leukemia_path.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_leukemia_path.py: ========================================== Lasso path computation on Leukemia dataset ========================================== The example runs the Celer algorithm for the Lasso on the Leukemia dataset which is a dense dataset. Running time is compared with the scikit-learn implementation. .. GENERATED FROM PYTHON SOURCE LINES 11-60 .. image-sg:: /auto_examples/images/sphx_glr_plot_leukemia_path_001.png :alt: plot leukemia path :srcset: /auto_examples/images/sphx_glr_plot_leukemia_path_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Loading data... /home/circleci/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py:311: UserWarning: Multiple active versions of the dataset matching the name leukemia exist. Versions may be fundamentally different, returning version 1. warn( /home/circleci/.local/lib/python3.8/site-packages/sklearn/datasets/_openml.py:1022: FutureWarning: The default value of `parser` will change from `'liac-arff'` to `'auto'` in 1.4. You can set `parser='auto'` to silence this warning. Therefore, an `ImportError` will be raised from 1.4 if the dataset is dense and pandas is not installed. Note that the pandas parser may return different data types. See the Notes Section in fetch_openml's API doc for details. warn( Starting path computation... Celer time: 0.03 s Celer time: 0.12 s Celer time: 0.31 s | .. code-block:: Python import time import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import lasso_path from sklearn.datasets import fetch_openml from celer import celer_path print(__doc__) print("Loading data...") dataset = fetch_openml("leukemia") X = np.asfortranarray(dataset.data.astype(float)) y = 2 * ((dataset.target != "AML") - 0.5) n_samples = len(y) y -= np.mean(y) y /= np.std(y) print("Starting path computation...") alpha_max = np.max(np.abs(X.T.dot(y))) / n_samples n_alphas = 100 alphas = alpha_max * np.geomspace(1, 0.01, n_alphas) tols = [1e-2, 1e-3, 1e-4] results = np.zeros([2, len(tols)]) for tol_ix, tol in enumerate(tols): t0 = time.time() _, coefs, gaps = celer_path( X, y, pb='lasso', alphas=alphas, tol=tol, prune=True) results[0, tol_ix] = time.time() - t0 print('Celer time: %.2f s' % results[0, tol_ix]) t0 = time.time() _, coefs, dual_gaps = lasso_path( X, y, tol=tol, alphas=alphas, max_iter=10_000) results[1, tol_ix] = time.time() - t0 df = pd.DataFrame(results.T, columns=["Celer", "scikit-learn"]) df.index = [str(tol) for tol in tols] df.plot.bar(rot=0) plt.xlabel("stopping tolerance") plt.ylabel("path computation time (s)") plt.tight_layout() plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 24.256 seconds) .. _sphx_glr_download_auto_examples_plot_leukemia_path.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_leukemia_path.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_leukemia_path.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_leukemia_path.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_