train-test-split – Tarik Billa

How to generate a train-test-split based on a group id?

December 11, 2023 by Tarik

I figured out the answer. This seems to work: from sklearn.model_selection import GroupShuffleSplit splitter = GroupShuffleSplit(test_size=.20, n_splits=2, random_state = 7) split = splitter.split(df, groups=df[‘Group_Id’]) train_inds, test_inds = next(split) train = df.iloc[train_inds] test = df.iloc[test_inds]

Normalize data before or after split of training and testing data?

February 21, 2023 by Tarik

You first need to split the data into training and test set (validation set could be useful too). Don’t forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and normalise the data by subtracting the mean and dividing by … Read more

Keras split train test set when using ImageDataGenerator

December 16, 2022 by Tarik

Keras has now added Train / validation split from a single directory using ImageDataGenerator: train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, validation_split=0.2) # set validation split train_generator = train_datagen.flow_from_directory( train_data_dir, target_size=(img_height, img_width), batch_size=batch_size, class_mode=”binary”, subset=”training”) # set as training data validation_generator = train_datagen.flow_from_directory( train_data_dir, # same directory as training data target_size=(img_height, img_width), batch_size=batch_size, class_mode=”binary”, subset=”validation”) # … Read more