Relative performance of swap vs compare-and-swap locks on x86
I assume atomic_swap(lockaddr, 1) gets translated to a xchg reg,mem instruction and atomic_compare_and_swap(lockaddr, 0, val) gets translated to a cmpxchg[8b|16b]. Some linux kernel developers think cmpxchg ist faster, because the lock prefix isn’t implied as with xchg. So if you are on a uniprocessor, multithread or can otherwise make sure the lock isn’t needed, you … Read more