Crisp Reading Notes on Latest Technology Trends and Basics

Coroutines in C

This is a very readable and understandable article, that I have just rephrased a little. Enjoy reading it …

References

Motivating example

Decompressor

/* Decompression code */while (1) {

c = getchar();

if (c == EOF)

break;

if (c == 0xFF) {

len = getchar();

c = getchar();

while (len–)

emit(c);

} else

emit(c);

}

emit(EOF);

Parser

  • The second routine is a parser
    • This function scans continuous tokens of alphabets as a word
    • The rest of the characters are output individually as tokens.
/* Parser code */
while (1) {
    c = getchar();
    if (c == EOF)
       break;
    if (isalpha(c)) {
      do {
        add_to_token(c);
        c = getchar();
      } while (isalpha(c));
      got_token(WORD);
    }
    add_to_token(c);
    got_token(PUNCT);
}

Putting them together

  • These functions need to be put together. There are two approaches to doing this
  • In the first approach, we create two programs around each of these processes, and send the output of one to the input of the other.
  •  This involves two programs, and is typically a performance overhead.
  • Or one could create two threads, one thread running the decompressor, and the other thread running the parser
  • The decompressor produces a character, and places it into a buffer. The parser reads data from the buffer.
  • The threads co-ordinate by means of a lock or a semaphore.
  • This is still a performance overhead, especially if
  • An efficient way of putting the programs together within a single program is to rewrite one of the routines to output a character at a time, so that it can be used within the loop of another program.
  • The following is the routine to output the decompressed character
    • There are two static variables – replen and repchar, that represent repeated characters.

 UNIX Pipes

POSIX Threads

Rewriting the routines

Per-character Decompressor

int decompressor(void) {
  static int repchar;
  static int replen;
 
  // If repeated characters have not ended, send the character back
  if (replen > 0) {
    replen--;
    return repchar;
  }
 
  c = getchar();
  if (c == EOF)
    return EOF;
  if (c == 0xFF) {
    replen = getchar();
    repchar = getchar();
    replen--;
    return repchar;
  } else
    return c;
}
  • This function can be used to replace the getchar() method in the second function
/* Parser code */
while (1) {
  c = decompressor();
  if (c == EOF)
    break;
  if (isalpha(c)) 
    do {
      add_to_token(c);
      c = decompressor();
    } while (isalpha(c));
    got_token(WORD);
  }
  add_to_token(c);
  got_token(PUNCT);
}

Per-character Tokenizer

  • The following variant of the parser, returns token, if the next character forms a token
  • The actual implementation is more complex
    • The routine returns no token if the current character is alphabetic, and instead adds the token to its internal word.
    • If the next character is non-alphabetic, the internal word is output, as well as the next token.
void parser(int c) {
 
  // The state of whether there is a word being built or not
  static enum {
    START, IN_WORD
  } state;
 
  switch (state) {
 
    // If we are already in the word, and we get a token, we fall through
    case IN_WORD:
      if (isalpha(c)) {
        add_to_token(c);
        return;
      }
      got_token(WORD);
      state = START;
      /* fall through */
 
    // If the state is already Start, if we found a non-alphanumeric token, 
    //   we return that token as well
    //   else, we add to the internal array
    case START:
      add_to_token(c);
      if (isalpha(c))
         state = IN_WORD;
      else
         got_token(PUNCT);
      break;
    }
}

Knuth’s Co-routines

Knuth’s concept is to avoid the caller-callee relationship, and treat the processes as co-operating equals.

  1. Each co-routine gets its own co-location frame, where it gets to store its own local variables, and the return address
  2. When calling a co-routine, the function saves its location on the stack, and jumps to the address specified in the co-routine frame.
  3. Inside the co-routine, local variables are accessed from the co-routine frame, rather than from the stack.
  4. When the co-routine returns, the control returns to its old location, based on the return address saved on the stack as part of the co-routine call.

Co-routines in a stack-based Environment

Co-routines can also be implemented with the current chip architectures with the current compilers

  1. Every local variable needs to be static. This will ensure that all variables get allocated on a separate co-routine frame.
  2. One such variable – state denotes the location from where the co-routine previously returned.
  3. Based on this state, we go-to the previous point, where the computation left off.

A Toy Example

  • A routine to iterate through a list is specified as follows
  • The following is the original “C” function
int function(void) {
  int i;
  for (i = 0; i < 10; i++)
    return i;   /* won't work, but wouldn't it be nice */
}
  • When expressed as a co-routine, the following is the co-routine
int function(void) {static int state = 0;

static int i;

switch(state) {

case 0: goto BEGIN;  break;

case 1: goto LABEL1; break;

}

for( i = 0; i < 10; i++ ) {

state = 1;

return i;

LABEL1:;

}

}

Framework in C, and solution to the problem

Macros to achieve the same effect

  • The following are a good set of macros, that will achieve the same effect
#define crBegin static int state=0; switch(state) { case 0:
#define crReturn(i,x) do { state=i; return x; case i:; } while (0)
#define crFinish }
  • The co-routine may therefore be coded as
int function(void) {
    static int i;
    crBegin;
    for (i = 0; i < 10; i++)
        crReturn(1, i);
    crFinish;
}
  • The macro crReturn could be simplified even further, as follows
#define crReturn(x) do { state=__LINE__; return x; \
                         case __LINE__:; } while (0)

The Decompressor code

  • The decompressor function may be modified to a co-routine as follows
int decompressor(void) {
  static int c, len;
  crBegin;
  while (1) {
    c = getchar();
    if (c == EOF)
      break;
    if (c == 0xFF) {
      len = getchar();
      c = getchar();
      while (len--)
        crReturn(c);
    } else
      crReturn(c);
  }
  crReturn(EOF);
  crFinish;
}
  • This could be used directly with the parser code.
  • The parser code could continue to be a direct subroutine call.
  • Alternatively, we could convert the Parser code into subroutine code as follows

The Parser Code

void parser(int c) {
  crBegin;
  while (1) {
    /* first char already in c */
    if (c == EOF)
      break;
    if (isalpha(c)) {
      do {
        add_to_token(c);
        crReturn( );
      } while (isalpha(c));
      got_token(WORD);
    }
    add_to_token(c);
    got_token(PUNCT);
    crReturn( );
  }
  crFinish;
}

Scalability of the Model

Multi-threading and Redundancy

In a multi-threaded environment, the above semantics are unlikely to be useful, because of the use of static variables.

The Use of Contexts

The way to solve this is to

  1. Have multiple contexts.
  2. Have an additional argument to each call – the Context itself. One variable in the Context State can store the last exit-point, or the expected entry-point.
  3.  Pass the Context into every call of the subroutine.
  4. All static variables need to dereference the context, instead of being static variables.

The use of Macros

The macros can make it easier to achieve this

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: