From the documentation:
The main difference between the two is that
min_samples_leaf
guarantees a minimum number of samples in a leaf, whilemin_samples_split
can create arbitrary small leaves, thoughmin_samples_split
is more common in the literature.
To get a grasp of this piece of documentation I think you should make the distinction between a leaf (also called external node) and an internal node. An internal node will have further splits (also called children), while a leaf is by definition a node without any children (without any further splits).
min_samples_split
specifies the minimum number of samples required to split an internal node, while min_samples_leaf
specifies the minimum number of samples required to be at a leaf node.
For instance, if min_samples_split = 5
, and there are 7 samples at an internal node, then the split is allowed. But let’s say the split results in two leaves, one with 1 sample, and another with 6 samples. If min_samples_leaf = 2
, then the split won’t be allowed (even if the internal node has 7 samples) because one of the leaves resulted will have less then the minimum number of samples required to be at a leaf node.
As the documentation referenced above mentions, min_samples_leaf
guarantees a minimum number of samples in every leaf, no matter the value of min_samples_split
.