xgboost – Tarik Billa

What are different options for objective functions available in xgboost.XGBClassifier?

January 8, 2024 by Tarik

That’s true that binary:logistic is the default objective for XGBClassifier, but I don’t see any reason why you couldn’t use other objectives offered by XGBoost package. For example, you can see in sklearn.py source code that multi:softprob is used explicitly in multiclass case. Moreover, if it’s really necessary, you can provide a custom objective function … Read more

XGBoost for multilabel classification?

January 8, 2024 by Tarik

One possible approach, instead of using OneVsRestClassifier which is for multi-class tasks, is to use MultiOutputClassifier from the sklearn.multioutput module. Below is a small reproducible sample code with the number of input features and target outputs requested by the OP import xgboost as xgb from sklearn.datasets import make_multilabel_classification from sklearn.model_selection import train_test_split from sklearn.multioutput import … Read more

Xgboost-How to use “mae” as objective function?

December 12, 2023 by Tarik

A little bit of theory first, sorry! You asked for the grad and hessian for MAE, however, the MAE is not continuously twice differentiable so trying to calculate the first and second derivatives becomes tricky. Below we can see the “kink” at x=0 which prevents the MAE from being continuously differentiable. Moreover, the second derivative … Read more

Difference between Objective and feval in xgboost

December 2, 2023 by Tarik

What is the difference between num_boost_round and n_estimators

November 29, 2023 by Tarik

Yes they are the same, both referring to the same parameter (see the docs here, or the github issue). The reason of the different name is because xgb.XGBRegressor is an implementation of the scikit-learn API; and scikit-learn conventionally uses n_estimators to refer to the number of boosting stages (for example the GradientBoostingClassifier)

ValueError: feature_names mismatch: in xgboost in the predict() function

September 26, 2023 by Tarik

This is the case where the order of column-names while model building is different from order of column-names while model scoring. I have used the following steps to overcome this error First load the pickle file model = pickle.load(open(“saved_model_file”, “rb”)) extraxt all the columns with order in which they were used cols_when_model_builds = model.get_booster().feature_names reorder … Read more

What is the difference between xgb.train and xgb.XGBRegressor (or xgb.XGBClassifier)?

September 18, 2023 by Tarik

xgboost.train is the low-level API to train the model via gradient boosting method. xgboost.XGBRegressor and xgboost.XGBClassifier are the wrappers (Scikit-Learn-like wrappers, as they call it) that prepare the DMatrix and pass in the corresponding objective function and parameters. In the end, the fit call simply boils down to: self._Booster = train(params, dmatrix, self.n_estimators, evals=evals, early_stopping_rounds=early_stopping_rounds, … Read more

XGBoost plot_importance doesn’t show feature names

September 15, 2023 by Tarik

If you’re using the scikit-learn wrapper you’ll need to access the underlying XGBoost Booster and set the feature names on it, instead of the scikit model, like so: model = joblib.load(“your_saved.model”) model.get_booster().feature_names = [“your”, “feature”, “name”, “list”] xgboost.plot_importance(model.get_booster())

GridSearchCV – XGBoost – Early Stopping

September 1, 2023 by Tarik

When using early_stopping_rounds you also have to give eval_metric and eval_set as input parameter for the fit method. Early stopping is done via calculating the error on an evaluation set. The error has to decrease every early_stopping_rounds otherwise the generation of additional trees is stopped early. See the documentation of xgboosts fit method for details. … Read more

How do I use a TimeSeriesSplit with a GridSearchCV object to tune a model in scikit-learn?

August 24, 2023 by Tarik

It turns out the problem was I was using GridSearchCV from sklearn.grid_search, which is deprecated. Importing GridSearchCV from sklearn.model_selection resolved the problem: import xgboost as xgb from sklearn.model_selection import TimeSeriesSplit, GridSearchCV import numpy as np X = np.array([[4, 5, 6, 1, 0, 2], [3.1, 3.5, 1.0, 2.1, 8.3, 1.1]]).T y = np.array([1, 6, 7, 1, … Read more