xgboost
multioutput regression by xgboost
My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor. MultiOutputRegressor trains one regressor per target and only requires that the regressor implements fit and predict, which xgboost happens to support. # get some noised linear data X = np.random.random((1000, 10)) a = np.random.random((10, 3)) y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3)) … Read more
How to install xgboost package in python (windows platform)?
In case anyone’s looking for a simpler solution that doesn’t require compiling it yourself: download xgboost whl file from here (make sure to match your python version and system architecture, e.g. “xgboost-0.6-cp35-cp35m-win_amd64.whl” for python 3.5 on 64-bit machine) open command prompt cd to your Downloads folder (or wherever you saved the whl file) pip install … Read more
XGBoost XGBClassifier Defaults in Python
That isn’t how you set parameters in xgboost. You would either want to pass your param grid into your training function, such as xgboost’s train or sklearn’s GridSearchCV, or you would want to use your XGBClassifier’s set_params method. Another thing to note is that if you’re using xgboost’s wrapper to sklearn (ie: the XGBClassifier() or … Read more
How does XGBoost do parallel computation?
Xgboost doesn’t run multiple trees in parallel like you noted, you need predictions after each tree to update gradients. Rather it does the parallelization WITHIN a single tree my using openMP to create branches independently. To observe this,build a giant dataset and run with n_rounds=1. You will see all your cores firing on one tree. … Read more
How can I implement incremental training for xgboost?
Try saving your model after you train on the first batch. Then, on successive runs, provide the xgb.train method with the filepath of the saved model. Here’s a small experiment that I ran to convince myself that it works: First, split the boston dataset into training and testing sets. Then split the training set into … Read more
XGBoost Categorical Variables: Dummification vs encoding
xgboost only deals with numeric columns. if you have a feature [a,b,b,c] which describes a categorical variable (i.e. no numeric relationship) Using LabelEncoder you will simply have this: array([0, 1, 1, 2]) Xgboost will wrongly interpret this feature as having a numeric relationship! This just maps each string (‘a’,’b’,’c’) to an integer, nothing more. Proper … Read more
How to get feature importance in xgboost?
In your code you can get feature importance for each feature in dict form: bst.get_score(importance_type=”gain”) >>{‘ftr_col1’: 77.21064539577829, ‘ftr_col2’: 10.28690566363971, ‘ftr_col3’: 24.225014841466294, ‘ftr_col4′: 11.234086283060112} Explanation: The train() API’s method get_score() is defined as: get_score(fmap=”, importance_type=”weight”) fmap (str (optional)) – The name of feature map file. importance_type ‘weight’ – the number of times a feature is used … Read more
How to install xgboost in Anaconda Python (Windows platform)?
The easiest way (Worked for me) is to do the following: anaconda search -t conda xgboost You will get a list of install-able features like this: for example if you want to install the first one on the list mndrake/xgboost (FOR WINDOWS-64bits): conda install -c mndrake xgboost If you’re in a Unix system you can … Read more