Memory Barriers (aka membar or memory fence) are constructs  that control reordering of memory reads and writes. Memory barriers generally find use in synchronizing threads based on shared memory. There are mainly three kinds of barriers, Read Barrier, Write Barrier, Read/Write Barrier. Read Barrier prevents a read before the barrier to be reordered to read after the barrier (thereby crossing it).  Therefore all reads before the barrier must finish (in whatever order) before the instructions past the barrier execute. A Write Barrier prevents a write reordering in a similar fashion – all writes before the barrier must finish before instructions after the write barrier get executed.

Why does the CPU reorder memory reads/writes ? If a program needs to read from a memory location and that location is not found in CPU cache,  the content of the location needs to be fetched from RAM. Since RAM access is much slower compared to CPU cache access (even with on-board memory controllers), and RAM is not getting faster at the rate the CPU is, reading memory locations in the order in which they appear in the program (aka program order), is sub-optimal1.

Memory read/write reordering generally remains invisible and inconsequential to software until memory contents become the basis of synchronization between multiple threads (for example in Lockless Programming2 paradigms). Reordering causes side-effects in such programs, thus requiring Memory Barriers.

Different CPUs have different memory reordering behaviour. Also type of memory being accessed affects reordering. Here is a summary table of some current mainstream processors along with the legendary DEC Alpha.

R=Read
W=Write
WB=WriteBack
Bold=Reordered
x86(P6+)9
Core2 Duo
(WB)
amd644
(WB)
Alpha5 ppc (xbox)6
Reads ahead of reads
(R1R2=>R2R1)
Y Y Y Y
Reads ahead of writes
(WR=>RW)
Y3 Y Y Y
Writes ahead of reads
(RW=>WR)
N8 N Y Y
Writes ahead of writes
(W1W2=>W2W1)
N N7 Y Y

To setup barriers on x86/amd64 CPU, one uses lfence (load fence), sfence (store fence) and mfence (memory fence) instructions to control CPU re-ordering [On Alpha, one uses mb (memory barrier), wmb (write memory barrier) instructions.]

MemoryBarrier, _ReadBarrier, _WriteBarrier and _ReadWriteBarrier compiler intrinsics can be used in to prevent compiler reordering in Microsoft compiler.

On Alpha, which is one of the most aggressive CPUs as far as memory reordering is concerned, there is a Dependency Barrier which is a special case of Read Barrier which makes sure that members of a structure are not read before the structure pointer is.

Too bad Alpha is not around any more.

1Any discussion of memory read (load) and write (store) reordering is incomplete without also discussing instruction reordering. A program, which is basically a sequence of instructions, seldom executes on a CPU exactly the way it is written. If an instruction CPU can pre-fetch and execute instructions that can be executed while a previous instruction is still in the process of being executed (and is potentially stalled on data from RAM). CPU branch prediction logic can lead to executing bunch of instructions on the assumption that the code will take that branch (if the code does not, all results are thrown away).  There are other CPU optimizations that work better by reordering reads and writes.

2which is an oxymoron because unless the world around us gets built with entirely different fundamental principles, there has to be locking of some kind for synchronization. Hardware locking is probably more appropriate term for this but less sensational than lockless programming.

3The addresses have to be different. If the write is to the same address as the read, x86/x64 CPU makes sure the read happens before the write

4AMD64 Architecture Programmer’s Manual Volume 2: System Programming Section 7.1.1 and 7.1.2 Publication No:24593 Revision 3.14 September 2007

5Memory Ordering in Modern Microprocessors by Paul E. McKenney, March 2006

6Lockless Programming Considerations for Xbox 360 and Microsoft Windows By Bruce Dawson, February 2007

7For certain instuctions such as CLFLUSH and fast string writes (MOVS, STOS etc) can be executed out of order even if the memory is WriteBack memory (which is what typical application memory is). For WriteCombine memory, which is used for video frame buffers, writes can be reordered with respect to each other (which is why the memory is setup that way in the first place so that writes are faster)

8Memory Ordering in Modern Microprocessors by Paul E. McKenney, March 2006 states that this is possible in x86. But I could not find any other reference to this reordering being possible.

9Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1(order number 253668) section 7.2.2

Tagged with →  
Share →

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Looking for something?

Use the form below to search the site:


Still not finding what you're looking for? Drop us a note so we can take care of it!

Visit our friends!

A few highly recommended friends...

Set your Twitter account name in your settings to use the TwitterBar Section.