I found a pretty straightforward solution. Assuming you are using Jupyter notebooks for training:
- Create a
.pyfile where the custom transformer is defined and import it to the Jupyter notebook.
This is the file custom_transformer.py
from sklearn.pipeline import TransformerMixin
class FilterOutBigValuesTransformer(TransformerMixin):
def __init__(self):
pass
def fit(self, X, y=None):
self.biggest_value = X.c1.max()
return self
def transform(self, X):
return X.loc[X.c1 <= self.biggest_value]
- Train your model importing this class from the
.pyfile and save it usingjoblib.
import joblib
from custom_transformer import FilterOutBigValuesTransformer
from sklearn.externals import joblib
from sklearn.preprocessing import MinMaxScaler
pipeline = Pipeline([
('filter', FilterOutBigValuesTransformer()),
('encode', MinMaxScaler()),
])
X=load_some_pandas_dataframe()
pipeline.fit(X)
joblib.dump(pipeline, 'pipeline.pkl')
- When loading the
.pklfile in a different python script, you will have to import the.pyfile in order to make it work:
import joblib
from utils import custom_transformer # decided to save it in a utils directory
pipeline = joblib.load('pipeline.pkl')