Crisp Reading Notes on Latest Technology Trends and Basics

Archive for July, 2012

Process Groups, Sessions and Terminals

Processes

Review of Concepts

Creation

pid_t p;
p = fork();
if (p == (pid_t) -1)
/* ERROR */
else if (p == 0)
/* CHILD */
else
/* PARENT */

Termination

  • A process is terminated normally by the exit “C” library call, or an _exit system call
  • It also returns by executing return(n) from the main function
  • The status of the system call may be collected by the parent using the wait system call

Collecting Status

pid_t p;
int status;
p = wait(&status);
  • A process that has terminated, but not yet been waited-for is called a “zombie”.
  • A zombie process stores a 2-byte status.
  • On the other hand, if the parent dies first, theinit process inherits the child, and becomes its parent.

Signal

  • Signals may force abnormal termination of a process.
  • The default action on receipt of a signal by a process is as follows
    • Terminate the process
    • Terminate the process with a core-dump
    • Ignore the signal

SIGSTOP and SIGCONT

  • The default action on all signals except SIGKILL and SIGSTOP can be overridden
  • The SIGSTOP signal pauses the given process
  • The SIGCONT signal continues the process from where it left off.
  • SIGTTIN (input requested by background process) and SIGTTOUT (output requested by background process), also cause a process to pause.
  • Several related processes may be configured as part of the same process group.
  • An example of this is when we pipe several processes together

Process Group

Process group

% cat paper | ideal | pic | tbl | eqn | ditroff > out
  • A process group is created with the setpgid(pid, pgid) call
  • Only the process or its parent can set the process group-id.
    • The parent must set the process group-id before the child does the exec, and may move it within the session only
  • A process may set itself as a process-group leader by giving itself as the process-group-id
  • A session leader cannot set the process group-id
  • An example to set the process group

Example of setting a process group

p = fork();if (p == (pid_t) -1) {/* ERROR */

} else if (p == 0) {    /* CHILD */

setpgid(0, pgid);

} else {                /* PARENT */

setpgid(p, pgid);

}

Signaling and Waiting

  • One can signal to all members of a process group as part of the kill system call
kill( -getpgid(), signal );
  • One can also wait for all children of a specified process group
wait( 0, &status ); // By default, wait waits for all processes of the// process group of the current processwait( -pid, &status ); // Waits for pid of the current process

Foreground Process Groups

  • Among the several process-groups associated with a session, at most one is called a “foreground process group”.
  • The terminals associated with a “session” send their input and signals to the processes associated with the “foreground process group”.
  • A process can get / set the “foreground process group” associated with its session using
    • fd denotes the controlling terminal of the process
pid = tcgetpgrp(fd);tcsetpgrp(pgid,fd);
  • The file descriptor is obtained by opening the device /dev/tty. This works irrespective of the redirects of standard inputs and standard outputs.
  • All process groups, except “foreground process group”, are called “background process groups”.
  • Since the foreground process group is interacting with the terminal, the background processes stay out of this
    • When they try to read the Terminal input, they get a SIGTTIN signal. If this signal is blocked or ignored, then the function returns an error
  • Similarly, when they try to read the Terminal output, they get a SIGTTOU signal.
    • If background processes are allowed to write to the terminal, verified by an stty setting
    • then, they may be allowed to write to the terminal.

Background Process Group

stty -tostop

Orphaned Process Group

  • Whenever a Process Group Leader dies, it leaves the entire process group orphaned.
  • When the process leader dies, all the members of the process group are sent SIGHUP or SIGCONT.
  • Each process group belongs to a unique session.
  • A session is often created by a user-login
  • The terminal on which one is logged in becomes the controlling tty of the session.
  • By convention, the session-id is the process-id of the first member of the session, also known as “Session Leader”.
  • One can obtain the session-id with the getsessid() call.
  • The id can be changed with a setsessid() calls
  • The setsid creates a new session with the specified process-id, and makes it the leader of a group.

Sessions

What are sessions

Creation and Propagation of Sessions

p = fork();
if (p) exit(0);
pid = setsid();

Controlling Terminal

Getting a Controlling terminal

  • System V specifies that the first tty that is opened by the process becomes its controlling tty.
  • As per BSD, a control tty is explicitly set using the process,
  • Linux uses a combination of these techniques.
ioctl(fd, TIOCSCTTY, …);

Getting rid of the controlling terminal

  • When the controlling terminal is not needed, as for a daemon,
    • we need to detach the process from the controlling tty.
  • This is done using the function setsid(),
  • Or by the function ioctl with flag TIOCNOTTY
if ((fork()) != 0)
  exit(0);
setsid();
if ((fork()) != 0)
   exit(0);

The daemon function

  • The daemon function enables the process to close the controlling terminals, and to start as a system daemon
int daemonize( int nochdir, int noclose )// Specifies whether the current directory should be changed// and whether or not the stdin, stdout and stderr file descriptors

should be closed

Advertisements

HBase Summary

A really short write-up on HBase and its features. For a longer version, check the open source tutorial.

References

Regions

Like BigTable and many relational databases, HBase also follows a three-level hierarchy

HBase Data Hierarchy

Region Name Relational Term Functionality
ROOT Catalog Covers an entire installation
.meta Database Covers an entire database
.table Tables / Partitions Covers a partition of the table

Region

  • Each directory is also stored in the same format as the regular tables.
  • The common data structure is called a REGION
  • This makes it easy to store and access the meta-data the same way.
  • Each region is assigned to a unique Region Server, that handles both the scan and the update loads on the server.
  • Of course, secondary Region Servers will take over when the Primary Region Server goes down.
  • This approach ensures there is no data contention across machines

Region Servers

Reads

Finding the Region

To find the Region Server at which a particular key may be present, the following are the steps

  1. From Zookeeper, find the Server that contains the Root region
  2. From the Root Server, find out the Server that contains the Meta region
  3. From the Meta Server, find the Server that contains the Key Region
  4. From the Key Server, retrieve the value of the key

Note that the Root Region Server, the Meta Region Server and Key Region Server may be cached at the client. So, it may directly contact the Key Region Server directly in many cases.

File Representation

The two points below help search effectively within a Key Region

  1. The records in the file are kept sorted, and hence can provide both sub-range queries and random queries efficiently.
  2. The Indexes are flat, and only on the key. This means at-most two disk blocks are accessed, before the data is read.

Finding the Value

Once a Region is located, the following steps help search for the value

  1. The Store-File contains the Index at the end of the file, which contains information on exactly which block stores the data.
  2. Within a block, binary search can be used to verify if the data exists or not.

Writes

Immediate Full Consistency

  • The database follows “Immediate Consistency” on all of its writes.
  • It also follows full ACID semantics
  • This means after a Write is completed, any subsequent Read will read the latest value.
  • For each region, there are Transaction Logs maintained on the underlying file system.
  • This ensures the recovery in case the rest of the Write process fails.
  • The database is designed, not as a B-Tree, but as a “Log-Store Managed Tree”
  • A “Log Store Managed Tree” works as follows

Write-Ahead Logging

Log Store Managed Technique

  1. Write the transaction log, so that the “D” portion of ACID is satisfied
  2. Next change the memory structures. At this point, the transaction is considered done.
  3. Periodically, we run a Merge pass, that
    1. Do an entire sequential scan of the entire sorted Region from the disk,
    2. Merge the memory changes
    3. Write back a new version of the Region to disk
    4. Update the Region server to use this new Region server, instead of the old one.

Why this technique works better

The scenario we have is one of “High Workloads” and a “Large dataset”. Under this situation, a sequential scan performs better than B-Tree writes

  1. B-Trees modify different sparse blocks, and the Disk seek time is wasted. At higher workloads, the disk arm may not be able to keep up with the pace of “Disk requests”.
  2. Even if we batch the Writes, because the “Data set” is vast, the disk blocks written are still sparse.

Storage

Cell Representations

Since the Cell values are sparse, the cells are stored as tuples

<row-key> <col-key> <value>

Enhancing this to include column names and timestamps, the representation is

<row-key>:<col-family-name>:<col-name>:<timestamp> <value>

The key-value is therefore represented as

<row-key>:<col-family-name>:<col-name>:<timestamp>

Efficient Representation of Data

As a result of the above representation, the keys can get very long

  • One needs to be conscious of the size of the key, and keep it short
  • If there is only one column, part of the data may be used as part of the key.

Load-balancing

HBase requires a good key design, if the load needs to be distributed across multiple machines

  • If the keys are timestamps or sequences, then, they get assigned to the same region, and this becomes a hot-spot
  • To avoid this, the keys may be hashed.
  • Alternately, the leading dimension should not be timestamped.

Tag Cloud