Sentir barreras de memoria

ntento comprender las barreras de memoria en un nivel útil para los programadores sin bloqueo de Java. Este nivel, creo, está en algún lugar entre aprender sobre volátiles y aprender sobre el funcionamiento de las memorias intermedias de almacenamiento / carga de un manual x86.

Pasé algún tiempo leyendo un montón de blogs / libros de cocina y he elaborado el resumen a continuación. ¿Podría alguien más entendido mirar el resumen para ver si me perdí o enumeré algo incorrectamente?

LFENCE:

Name             : LFENCE/Load Barrier/Acquire Fence
Barriers         : LoadLoad + LoadStore
Details          : Given sequence {Load1, LFENCE, Load2, Store1}, the
                   barrier ensures that Load1 can't be moved south and
                   Load2 and Store1 can't be moved north of the
                   barrier. 
                   Note that Load2 and Store1 can still be reordered.

Buffer Effect    : Causes the contents of the LoadBuffer 
                   (pending loads) to be processed for that CPU.This
                   makes program state exposed from other CPUs visible
                   to this CPU before Load2 and Store1 are executed.

Cost on x86      : Either very cheap or a no-op.
Java instructions: Reading a volatile variable, Unsafe.loadFence()

SFENCE

Name             : SFENCE/Store Barrier/Release Fence
Barriers         : StoreStore + LoadStore
Details          : Given sequence {Load1, Store1, SFENCE, Store2,Load2}
                   the barrier ensures that Load1 and Store1 can't be
                   moved south and Store2 can't be moved north of the 
                   barrier.
                   Note that Load1 and Store1 can still be reordered AND 
                   Load2 can be moved north of the barrier.
Buffer Effect    : Causes the contents of the StoreBuffer flushed to 
                   cache for the CPU on which it is issued.
                   This will make program state visible to other CPUs
                   before Store2 and Load1 are executed.
Cost on x86      : Either very cheap or a no-op.
Java instructions: lazySet(), Unsafe.storeFence(), Unsafe.putOrdered*()

MFENCE

Name             : MFENCE/Full Barrier/Fence
Barriers         : StoreLoad
Details          : Obtains the effects of the other three barrier.
                   Given sequence {Load1, Store1, MFENCE, Store2,Load2}, 
                   the barrier ensures that Load1 and Store1 can't be
                   moved south and Store2 and Load2 can't be moved north
                   of the barrier.
                   Note that Load1 and Store1 can still be reordered AND
                   Store2 and Load2 can still be reordered.
 Buffer Effect   : Causes the contents of the LoadBuffer (pending loads) 
                   to be processed for that CPU.
                   AND
                   Causes the contents of the StoreBuffer flushed to
                   cache for the CPU on which it is issued.
 Cost on x86     : The most expensive kind.
Java instructions: Writing to a volatile, Unsafe.fullFence(), Locks

Finalmente, si SFENCE y MFENCE agotan el búfer de la tienda (invalida la línea de caché y espera los ataques de otros cpus), ¿por qué uno es un no-op y el otro es un op muy costoso?

Gracia

(Publicación cruzada del foro de simpatía mecánica de Google)

Respuestas a la pregunta(1)

Su respuesta a la pregunta