Erlang -> Proposals -> "Out of Band" Messaging

The Erlang Messaging Problem

Erlang makes it very easy to spawn processes, send messages among them and selectively receive messages. Using selective receive simplifies code at the expense of allowing the message queue to grow. In pathological cases, this can lead to factorial complexity in the execution of the algorithm because of the many scans across the message queue.

The typical way to avoid the computational complexity is to split the message queue via a router process and to forward messages to other processes in a manner that ensures each queue contains only similar message types. The individual processes handling each message queue can then process messages in order of receipt, eliminating the queue scanning caused by selective receive. The drawback is that the code is much more complex and the number of processes quickly grows.

As an example, having a priority message type and a normal message type leads to polynomial complexity [N/2 * (N+1)] in the number of messages on the message queue. Fixing this by splitting to processes, leads to four separate processes and requires each message to be ACKed when sent from the router. To accomplish this, we need to unload the queue and store it in the process' local variable context.

The second aspect of this problem arises with the router solution. Because we want to ACK messages, and possibly even send pause and resume commands to manage the interaction of processes, the message queues must be kept empty so that control messages can be received without re-introducing the queue scanning we were trying to avoid in the first place.

A Solution - "Out of Band" Messages

The appearance of control messages suggests a possible solution. If there were a second message queue for control messages, we could receive control messages ahead of all other messages. Data messages could be received as normal using the existing message queue. By routing the control messages a different way, they avoid getting mixed in the heavy traffic of data messages. They are delivered via a channel that is "out of band" from the normal message delivery channel and thus both data streams avoid interfering with one another. In the case of a single priority level, this also brings us back to a single process with a simple, clear implementation.

Example Code

To remain backwards compatible, existing message send and receive syntax and semantics must be retained. Adding a control queue means adding an alternative send syntax and an alternative receive syntax. However, it should be a goal to be able to mix control patterns and data patterns in a single receive statement so that we avoid polling two queues, and the priority of message handling can be ordered among all messages on the two queues.

When multiple processes are communicating, one may need to cause the other to pause so it can catch up before allowing the peer to resume. This is a perfect case for using control messages on a channel independent of the data messages. In the sample code below, !@ is the operator used to send a message to the control channel of a process; ->> after a pattern is the operator to receive a message from the control queue rather than the data queue.

      loop(State) ->
          {pause, _From} ->>
          {DataMsg, From} ->
             NewState = send_response(From, DataMsg),

      wait_for_resume() ->
          {resume, _From} ->> ok


Solving the priority problem merely requires splitting the two messages to the two channels provided.

      send_message(Pid, Msg, low_priority) ->  Pid ! Msg;
      send_message(Pid, Msg, high_priority) -> Pid !@ Msg.

      loop(State) ->
          CtlMsg ->> NewCtl = handle_control_msg(CtlMsg), loop(NewCtl);
          DataMsg -> NewData = handle_data_msg(DataMsg), loop(NewData)


Implementation Details

A second message queue is added to a process, with identical behavior as the existing message queue. The existing message queue works as before; the new queue handles only messages sent using the send control message (i.e., !@) operator.

The receive statement must juggle both message queues at the same time. When the statement starts, the two message queues each have a pointer to the current message for evaluation against the set of patterns. The following cases may occur:

  1. Data message patterns only: the control pointer does not advance or reset
  2. Control message patterns only: the data pointer does not advance or reset
  3. Both message patterns: both pointers advance if there is no matching pattern
  4. Both message patterns: both pointers must be reset if either type matches
  5. No matches: a new message should only trigger re-evaluation of corresponding type patterns


This approach has simplicity of implementation as its goal and main benefit. Limiting the solution to only two message channels also makes it easier to augment the existing syntax cleanly.

This change does require the message sender to know the channel on which to send a message, and also only succeeds when both the sender and the receiver are in agreement as to the channel and protocol, but should not introduce a significant number of user errors over the current approach.

Other Possible Solutions

A more functional approach would be to allow any number of named or unnamed message channels. All would operate as the existing message queue does, however, the identification of a channel would automatically segregate the message types without the use of a router process.

The drawback of this approach is that the simplicity of spawn, send and receive would be compromised to introduce the new "channel" concept. An agreement as to the attributes of a channel would be necessary:


If these issues could be overcome, the user will likely run into a new class of errors caused by channel mismatches. This is especially likely to occur when various processes are maintained by different developers and changing requirements cause the messaging protocol to be modified periodically. The management of channels would be analogous to the current problem of managing registered processes and process ids, however, it would be a multiplier on the current problem causing an NxN explosion in the number of combinations of process plus channel to consider.

Join the Discussion

The merits of this proposal are being discussed on the erlang-questions mailing list.

DuoMark International, Inc.