Yes that can be done. Its explained in the paper 'Striving for simplicity: The all convolutional net'
https://arxiv.org/pdf/1412.6806.pdf. Quote from the paper:
‘We find that max-pooling can simply be replaced by a convolutional
layer with increased stride without loss in accuracy on several image
recognition benchmarks’