I think you could use a TimeSeriesSplit() either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it.
After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit() in this PR, which seems like it does what you want.