How to generate a train-test-split based on a group id?

I figured out the answer. This seems to work: from sklearn.model_selection import GroupShuffleSplit splitter = GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 7) split = splitter.split(df, groups=df[‘Group_Id’]) train_inds, test_inds = next(split) train = df.iloc[train_inds] test = df.iloc[test_inds]

Normalize data before or after split of training and testing data?

You first need to split the data into training and test set (validation set could be useful too). Don’t forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and normalise the data by subtracting the mean and dividing by … Read more

Keras split train test set when using ImageDataGenerator

Keras has now added Train / validation split from a single directory using ImageDataGenerator: train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, validation_split=0.2) # set validation split train_generator = train_datagen.flow_from_directory( train_data_dir, target_size=(img_height, img_width), batch_size=batch_size, class_mode=”binary”, subset=”training”) # set as training data validation_generator = train_datagen.flow_from_directory( train_data_dir, # same directory as training data target_size=(img_height, img_width), batch_size=batch_size, class_mode=”binary”, subset=”validation”) # … Read more

tech