How does the class_weight parameter in scikit-learn work?

Question

First off, it might not be good to just go by recall alone. You can simply achieve a recall of 100% by classifying everything as the positive class.
I usually suggest using AUC for selecting parameters, and then finding a threshold for the operating point (say a given precision level) that you are interested in.

For how class_weight works: It penalizes mistakes in samples of class[i] with class_weight[i] instead of 1. So higher class-weight means you want to put more emphasis on a class. From what you say it seems class 0 is 19 times more frequent than class 1. So you should increase the class_weight of class 1 relative to class 0, say {0:.1, 1:.9}.
If the class_weight doesn’t sum to 1, it will basically change the regularization parameter.

For how class_weight="auto" works, you can have a look at this discussion.
In the dev version you can use class_weight="balanced", which is easier to understand: it basically means replicating the smaller class until you have as many samples as in the larger one, but in an implicit way.

Leave a Comment Cancel reply