cross-validation – Tarik Billa

Does TensorFlow have cross validation implemented?

December 28, 2023 by Tarik

As already discussed, tensorflow doesn’t provide its own way to cross-validate the model. The recommended way is to use KFold. It’s a bit tedious, but doable. Here’s a complete example of cross-validating MNIST model with tensorflow and KFold: from sklearn.model_selection import KFold import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data # Parameters learning_rate = 0.01 … Read more

predict_proba for a cross-validated model

December 21, 2023 by Tarik

This is now implemented as part of scikit-learn version 0.18. You can pass a ‘method’ string parameter to the cross_val_predict method. Documentation is here. Example: proba = cross_val_predict(logreg, X, y, cv=cv, method=’predict_proba’) Also note that this is part of the new sklearn.model_selection package so you will need this import: from sklearn.model_selection import cross_val_predict

how to implement walk forward testing in sklearn?

December 17, 2023 by Tarik

I think you could use a TimeSeriesSplit() either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it. After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit() in this PR, which seems like it does what you want.

Early stopping with Keras and sklearn GridSearchCV cross-validation

December 5, 2023 by Tarik

[Answer after the question was edited & clarified:] Before rushing into implementation issues, it is always a good practice to take some time to think about the methodology and the task itself; arguably, intermingling early stopping with the cross validation procedure is not a good idea. Let’s make up an example to highlight the argument. … Read more

How is scikit-learn cross_val_predict accuracy score calculated?

September 15, 2023 by Tarik

No, it does not! According to cross validation doc page, cross_val_predict does not return any scores but only the labels based on a certain strategy which is described here: The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when … Read more

What is OOF approach in machine learning?

September 11, 2023 by Tarik

OOF simply stands for “Out-of-fold” and refers to a step in the learning process when using k-fold validation in which the predictions from each set of folds are grouped together into one group of 1000 predictions. These predictions are now “out-of-the-folds” and thus error can be calculated on these to get a good measure of … Read more

Difference between cross_val_score and cross_val_predict

September 11, 2023 by Tarik

cross_val_score returns score of test fold where cross_val_predict returns predicted y values for the test fold. For the cross_val_score(), you are using the average of the output, which will be affected by the number of folds because then it may have some folds which may have high error (not fit correctly). Whereas, cross_val_predict() returns, for … Read more

Using statsmodel estimations with scikit-learn cross validation, is it possible?

August 30, 2023 by Tarik

Indeed, you cannot use cross_val_score directly on statsmodels objects, because of different interface: in statsmodels training data is passed directly into the constructor a separate object contains the result of model estimation However, you can write a simple wrapper to make statsmodels objects look like sklearn estimators: import statsmodels.api as sm from sklearn.base import BaseEstimator, … Read more

How to extract model hyper-parameters from spark.ml in PySpark?

August 22, 2023 by Tarik

Ran into this problem as well. I found out you need to call the java property for some reason I don’t know why. So just do this: from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder, CrossValidator from pyspark.ml.regression import LinearRegression from pyspark.ml.evaluation import RegressionEvaluator evaluator = RegressionEvaluator(metricName=”mae”) lr = LinearRegression() grid = ParamGridBuilder().addGrid(lr.maxIter, [500]) \ .addGrid(lr.regParam, [0]) \ … Read more

return coefficients from Pipeline object in sklearn

August 1, 2023 by Tarik

You can always use the names you assigned to them while making the pipeline by using the named_steps dict. scaler = sgd_randomized_pipe.best_estimator_.named_steps[‘scl’] classifier = sgd_randomized_pipe.best_estimator_.named_steps[‘clf’] and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator. This is the formal attribute exposed by the Pipeline as specified in the … Read more