Issuing Corrections (Benchmarks are Hard)

Also, read docstrings carefully!

The correction

Original (incorrect) Dask runtimes
New (correct) Dask runtimes

What went wrong

from sklearn.pipeline import Pipeline
from sklearn.linear_model import ElasticNet
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
pipeline = Pipeline(steps=[
('preprocess', ColumnTransformer(transformers=[
('num', StandardScaler(), numeric_feat),
('cat', OneHotEncoder(handle_unknown='ignore',
sparse=False), categorical_feat),
])),
('clf', ElasticNet(normalize=False, max_iter=100)),
])
from dask_ml.compose import ColumnTransformer
from dask_ml.preprocessing import StandardScaler, DummyEncoder, Categorizer
pipeline = Pipeline(steps=[
('categorize', Categorizer(columns=categorical_feat)),
('onehot', DummyEncoder(columns=categorical_feat)),
('scale', ColumnTransformer(
transformers=[('num', StandardScaler(), numeric_feat)],
)),
('clf', ElasticNet(normalize=False, max_iter=100)),
])
('scale', ColumnTransformer(
transformers=[('num', StandardScaler(), numeric_feat)],
))
('scale', ColumnTransformer(
transformers=[('num', StandardScaler(), numeric_feat)],
remainder='passthrough',
))

All benchmarks are wrong, but some are useful

Experimenter, content creator, data person💡📊💬 rikturr.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store