How to properly pickle sklearn pipeline when using custom transformer

I found a pretty straightforward solution. Assuming you are using Jupyter notebooks for training: Create a .py file where the custom transformer is defined and import it to the Jupyter notebook. This is the file custom_transformer.py from sklearn.pipeline import TransformerMixin class FilterOutBigValuesTransformer(TransformerMixin): def __init__(self): pass def fit(self, X, y=None): self.biggest_value = X.c1.max() return self def … Read more

Joblib UserWarning while trying to cache results

I don’t have an answer to the “why doesn’t this work?” portion of the question. However to simply ignore the warning you can use warnings.catch_warnings with warnings.simplefilter as seen here. import warnings with warnings.catch_warnings(): warnings.simplefilter(“ignore”) your_code() Obviously, I don’t recommend ignoring the warning unless you’re sure its harmless, but if you’re going to do it … Read more

Tracking progress of joblib.Parallel execution

Yet another step ahead from dano’s and Connor’s answers is to wrap the whole thing as a context manager: import contextlib import joblib from tqdm import tqdm @contextlib.contextmanager def tqdm_joblib(tqdm_object): “””Context manager to patch joblib to report into tqdm progress bar given as argument””” class TqdmBatchCompletionCallback(joblib.parallel.BatchCompletionCallBack): def __call__(self, *args, **kwargs): tqdm_object.update(n=self.batch_size) return super().__call__(*args, **kwargs) old_batch_callback … Read more

What does the delayed() function do (when used with joblib in Python)

Perhaps things become clearer if we look at what would happen if instead we simply wrote Parallel(n_jobs=8)(getHog(i) for i in allImages) which, in this context, could be expressed more naturally as: Create a Parallel instance with n_jobs=8 create a generator for the list [getHog(i) for i in allImages] pass that generator to the Parallel instance … Read more

tech