I think it may indeed be due to branch prediction. If you count the number of swaps compared to the number of inner sort iterations you find:
Limit = 10
- A = 560M swaps / 1250M loops
- B = 1250M swaps / 1250M loops (0.02% less swaps than loops)
Limit = 50000
- A = 627M swaps / 1250M loops
- B = 850M swaps / 1250M loops
So in the Limit == 10 case the swap is performed 99.98% of the time in the B sort which is obviously favourable for the branch predictor. In the Limit == 50000 case the swap is only hit randomly 68% so the branch predictor is less beneficial.