It’s about locking the memory bus for that address. The Intel 64 and IA-32 Architectures Software Developer’s Manual – Volume 3A: System Programming Guide, Part 1 tells us:
7.1.4 Effects of a LOCK Operation on Internal Processor Caches.
For the Intel486 and Pentium processors, the LOCK# signal is always
asserted on the bus during a LOCK
operation, even if the area of memory
being locked is cached in the
processor.For the P6 and more recent processor
families, if the area of memory being
locked during a LOCK operation is
cached in the processor that is
performing the LOCK operation as
write-back memory and is completely
contained in a cache line, the
processor may not assert the LOCK#
signal on the bus. Instead, it will
modify the memory location internally
and allow [its] cache coherency
mechanism to insure that the operation
is carried out atomically. This
operation is called “cache locking.”
The cache coherency mechanism
automatically prevents two or more
processors that have the same area of
memory from simultaneously modifying
data in that area. (emphasis added)
Here we learn that the P6 and newer chips are smart enough to determine if they really have to block off the bus or can just rely on intelligent caching. I think this is a neat optimization.
I discussed this more in my blog post “How Do Locks Lock?”