How to generate a train-test-split based on a group id?

December 11, 2023 by Tarik

I figured out the answer. This seems to work:

from sklearn.model_selection import GroupShuffleSplit 

splitter = GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 7)
split = splitter.split(df, groups=df['Group_Id'])
train_inds, test_inds = next(split)

train = df.iloc[train_inds]
test = df.iloc[test_inds]

Leave a Comment Cancel reply