pipeline – Page 2 – Tarik Billa

Gitlab pipeline – reports config contains unknown keys: cobertura

July 24, 2023 by Tarik

How to insert Keras model into scikit-learn pipeline?

July 20, 2023 by Tarik

You need to wrap your Keras model as a Scikit learn model first and then proceed as usual. Here’s a quick example (I’ve omitted the imports for brevity) Here is a full blog post with this one and many other examples: Scikit-learn Pipeline Examples # create a function that returns a model, taking as parameters … Read more

How can I fix “kex_exchange_identification: read: Connection reset by peer”?

July 10, 2023 by Tarik

What is the difference between pipeline and make_pipeline in scikit-learn?

June 29, 2023 by Tarik

The only difference is that make_pipeline generates names for steps automatically. Step names are needed e.g. if you want to use a pipeline with model selection utilities (e.g. GridSearchCV). With grid search you need to specify parameters for various steps of a pipeline: pipe = Pipeline([(‘vec’, CountVectorizer()), (‘clf’, LogisticRegression()]) param_grid = [{‘clf__C’: [1, 10, 100, … Read more

GitLab CI Pipeline on specific branch only

May 4, 2023 by Tarik

Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?

April 21, 2023 by Tarik

import joblib joblib.dump(grid.best_estimator_, ‘filename.pkl’) If you want to dump your object into one file – use: joblib.dump(grid.best_estimator_, ‘filename.pkl’, compress = 1)

How to extract tar archive from stdin?

April 5, 2023 by Tarik

Use – as the input file: cat largefile.tgz.aa largefile.tgz.ab | tar zxf – Make sure you cat them in the same order they were split. If you’re using zsh you can use the multios feature and avoid invoking cat: < largefile.tgz.aa < largefile.tgz.ab tar zxf – Or if they are in alphabetical order: <largefile.tgz.* | … Read more

Getting model attributes from pipeline

April 1, 2023 by Tarik

Did you look at the documentation: http://scikit-learn.org/dev/modules/pipeline.html I feel it is pretty clear. Update: in 0.21 you can use just square brackets: pipeline[‘pca’] or indices pipeline[1] There are two ways to get to the steps in a pipeline, either using indices or using the string names you gave: pipeline.named_steps[‘pca’] pipeline.steps[1][1] This will give you the … Read more

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

February 2, 2023 by Tarik

You can access the feature_names using the following snippet: clf.named_steps[‘preprocessor’].transformers_[1][1]\ .named_steps[‘onehot’].get_feature_names(categorical_features) Using sklearn >= 0.21 version, we can make it even simpler: clf[‘preprocessor’].transformers_[1][1]\ [‘onehot’].get_feature_names(categorical_features) Reproducible example: import numpy as np import pandas as pd from sklearn.impute import SimpleImputer from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer from sklearn.linear_model import LinearRegression … Read more

Functional pipes in python like %>% from R’s magrittr

December 20, 2022 by Tarik

Pipes are a new feature in Pandas 0.16.2. Example: import pandas as pd from sklearn.datasets import load_iris x = load_iris() x = pd.DataFrame(x.data, columns=x.feature_names) def remove_units(df): df.columns = pd.Index(map(lambda x: x.replace(” (cm)”, “”), df.columns)) return df def length_times_width(df): df[‘sepal length*width’] = df[‘sepal length’] * df[‘sepal width’] df[‘petal length*width’] = df[‘petal length’] * df[‘petal width’] x.pipe(remove_units).pipe(length_times_width) … Read more