How to generate a train-test-split based on a group id?
I figured out the answer. This seems to work: from sklearn.model_selection import GroupShuffleSplit splitter = GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 7) split = splitter.split(df, groups=df[‘Group_Id’]) train_inds, test_inds = next(split) train = df.iloc[train_inds] test = df.iloc[test_inds]