grid-search – Tarik Billa

Using GridSearchCV with AdaBoost and DecisionTreeClassifier

December 25, 2023 by Tarik

There are several things wrong in the code you posted: The keys of the param_grid dictionary need to be strings. You should be getting a NameError. The key “abc__n_estimators” should just be “n_estimators”: you are probably mixing this with the pipeline syntax. Here nothing tells Python that the string “abc” represents your AdaBoostClassifier. None (and … Read more

Early stopping with Keras and sklearn GridSearchCV cross-validation

December 5, 2023 by Tarik

[Answer after the question was edited & clarified:] Before rushing into implementation issues, it is always a good practice to take some time to think about the methodology and the task itself; arguably, intermingling early stopping with the cross validation procedure is not a good idea. Let’s make up an example to highlight the argument. … Read more

Is there easy way to grid search without cross validation in python?

December 2, 2023 by Tarik

I would really advise against using OOB to evaluate a model, but it is useful to know how to run a grid search outside of GridSearchCV() (I frequently do this so I can save the CV predictions from the best grid for easy model stacking). I think the easiest way is to create your grid … Read more

Using Smote with Gridsearchcv in Scikit-learn

September 20, 2023 by Tarik

Yes, it can be done, but with imblearn Pipeline. You see, imblearn has its own Pipeline to handle the samplers correctly. I described this in a similar question here. When called predict() on a imblearn.Pipeline object, it will skip the sampling method and leave the data as it is to be passed to next transformer. … Read more

Use sklearn’s GridSearchCV with a pipeline, preprocessing just once

August 7, 2023 by Tarik

Update: Ideally, the answer below should not be used as it leads to data leakage as discussed in comments. In this answer, GridSearchCV will tune the hyperparameters on the data already preprocessed by StandardScaler, which is not correct. In most conditions that should not matter much, but algorithms which are too sensitive to scaling will … Read more

How to graph grid scores from GridSearchCV?

July 29, 2023 by Tarik

The code shown by @sascha is correct. However, the grid_scores_ attribute will be soon deprecated. It is better to use the cv_results attribute. It can be implemente in a similar fashion to that of @sascha method: def plot_grid_search(cv_results, grid_param_1, grid_param_2, name_param_1, name_param_2): # Get Test Scores Mean and std for each grid search scores_mean = … Read more

Invalid parameter for sklearn estimator pipeline

July 29, 2023 by Tarik

There should be two underscores between estimator name and it’s parameters in a Pipeline logisticregression__C. Do the same for tfidfvectorizer It is mentioned in the user guide here: https://scikit-learn.org/stable/modules/compose.html#nested-parameters. See the example at https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py

Sklearn How to Save a Model Created From a Pipeline and GridSearchCV Using Joblib or Pickle?

April 21, 2023 by Tarik

import joblib joblib.dump(grid.best_estimator_, ‘filename.pkl’) If you want to dump your object into one file – use: joblib.dump(grid.best_estimator_, ‘filename.pkl’, compress = 1)

What is the difference between cross-validation and grid search?

April 11, 2023 by Tarik

Cross-validation is when you reserve part of your data to use in evaluating your model. There are different cross-validation methods. The simplest conceptually is to just take 70% (just making up a number here, it doesn’t have to be 70%) of your data and use that for training, and then use the remaining 30% of … Read more