How to use random forests in R with missing values?

My initial reaction to this question was that it didn’t show much research effort, since “everyone” knows that random forests don’t handle missing values in predictors. But upon checking ?randomForest I must confess that it could be much more explicit about this. (Although, Breiman’s PDF linked to in the documentation does explicitly say that missing … Read more

RandomForestClassfier.fit(): ValueError: could not convert string to float

You have to do some encoding before using fit(). As it was told fit() does not accept strings, but you solve this. There are several classes that can be used : LabelEncoder : turn your string into incremental value OneHotEncoder : use One-of-K algorithm to transform your String into integer Personally, I have post almost … Read more

Do I need to normalize (or scale) data for randomForest (R package)? [closed]

No, scaling is not necessary for random forests. The nature of RF is such that convergence and numerical precision issues, which can sometimes trip up the algorithms used in logistic and linear regression, as well as neural networks, aren’t so important. Because of this, you don’t need to transform variables to a common scale like … Read more

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Yes both conclusions are correct, although the Random Forest implementation in scikit-learn makes it possible to enable or disable the bootstrap resampling. In practice, RFs are often more compact than ETs. ETs are generally cheaper to train from a computational point of view but can grow much bigger. ETs can sometime generalize better than RFs … Read more

How are feature_importances in RandomForestClassifier determined?

There are indeed several ways to get feature “importances”. As often, there is no strict consensus about what this word means. In scikit-learn, we implement the importance as described in [1] (often cited, but unfortunately rarely read…). It is sometimes called “gini importance” or “mean decrease impurity” and is defined as the total decrease in … Read more

How to extract the decision rules from scikit-learn decision-tree?

I believe that this answer is more correct than the other answers here: from sklearn.tree import _tree def tree_to_code(tree, feature_names): tree_ = tree.tree_ feature_name = [ feature_names[i] if i != _tree.TREE_UNDEFINED else “undefined!” for i in tree_.feature ] print “def tree({}):”.format(“, “.join(feature_names)) def recurse(node, depth): indent = ” ” * depth if tree_.feature[node] != _tree.TREE_UNDEFINED: … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)