How to split data on balanced training set and test set on sklearn

Although Christian’s suggestion is correct, technically train_test_split should give you stratified results by using the stratify param.

So you could do:

X_train, X_test, y_train, y_test = cross_validation.train_test_split(Data, Target, test_size=0.3, random_state=0, stratify=Target)

The trick here is that it starts from version 0.17 in sklearn.

From the documentation about the parameter stratify:

stratify : array-like or None (default is None)
If not None, data is split in a stratified fashion, using this as the labels array.
New in version 0.17: stratify splitting

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)