There is a good paper on the allocator: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks
My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn’t compare the memory usage though.