Crisp Reading Notes on Latest Technology Trends and Basics

MS-Word version of this posting


Multi-CPU Environment

The standard implementation of memory in a Multi-CPU environment

  1. CPU is connected through Local Caches, where local variables are kept
  2. The caches are  connected to memory


The code

To acquire a lock, the following is the code

  1. A variable “x” is declared, that is shared among the threads (and hence the CPUs )
  1. The following is the code executed for each “Spin-Lock”
int x;while(1) {if ( test_and_set(&x) ) {

// Lock is acquired



The Steps for Test and Set

The “Test and Set” instruction bounces the Cache line.

To understand why, let us look at the typical operation of a “Test and Set”

  1. Transfers the variable to its local cache in Exclusive (EX) mode
  2. Disables interrupts
  3. Verifies if the variable contains the value 0 = Unlocked
  4. If so, changes the value to 1 = Locked
  5. Enables the interrupts
  6. Changes the cache status to Modified (M) mode or to (S) mode

This involves at least one Cache coherency cycle, and upto two memory accesses (depending on the Cache coherency protocol).

Spin-Locks slow down the system

Note that, in the above,

  • Spin-Lock is executed by every CPU.
  • So, every CPU will try to periodically acquire the variable in Exclusive (E) mode, and there will be continuous Cache coherency messages  on the bus
  • This will slow the lock acquisition and release, as well as the throughput of the general system.

Test and Spin

Modified Code

Suppose we used a “Modified Code” that worked as follows

  1. Verify if the variable is unlocked.
  2. If the variable is unlocked, then acquire the same.
int x;while(1) {if ( x == Unlocked ) {

if ( test_and_set(&x) ) {

// Lock is acquired




The Modified Steps

The Steps in this case are

  1. Transfers the variable to its local cache in (S) mode.

From the second time onwards, the variable is already present in the cache

  1. If the variable is in the “Unlocked” state
    1. Acquire the variable in an Exclusive (EX) mode
    2. Disable the interrupts
    3. Verify if the variable value is still Unlocked
    4. If so, change the value to Locked
    5. Enables interrupts
    6. Change Cache-status to Modified (M)

Test and Spin is a little better

The Test and Spin performs a little better, as

  • There is no Cache contention, while the item is Locked. This is a big saving on traffic
  • The only contention is immediately after a lock is released.
  • But, even this is enough to create problems, if the number of waiting threads is huge.


The Approach

The “Test and Spin” approach has “Cache Contention”, when the lock is released, and when there are other threads waiting to acquire the Lock.

The following approach will avoid “Cache Contention”

  • Each thread spins on a separate variable.
  • The spinning variables are allocated from a queue (implemented as an array)
  • Whenever a lock is released, it will “Unlock” the variable from the head of the queue.

The Code

The code for the sections is defined as follows

int tail, head;Lock l[NL];int acquire_lock() {

// Increment the tail of the queue atomically

while( !fetch_and_increment(&tail) );

l[tail] = WAIT;

Lock *p = &l[tail];

while( *p != LOCKED );


int release_lock() {

// Increment the head of the queue atomically

l[head] = FREE;


l[head] = LOCKED;


MCS Locks

Works similar to the Locks above, except that the Queue is implemented as a Linked List, rather than as a Array.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Tag Cloud

%d bloggers like this: