spec_23.txt

The ZeroMQ Message Transport Protocol (ZMTP) is a transport layer protocol for exchanging messages between two peers over a connected transport layer such as TCP. This document describes ZMTP 3.0. The major change in this version is the addition of security mechanisms and the removal of hard-coded connection metadata (socket type and identity) from the greeting.

* Name: rfc.zeromq.org/spec:23/ZMTP
* Editor: Pieter Hintjens <ph@imatix.com>
* Contributors: Martin Hurton <mh@imatix.com>, Ian Barber <ian.barber@gmail.com>

++ Preamble

Copyright (c) 2009-2013 iMatix Corporation

This Specification is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This Specification is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, see <http://www.gnu.org/licenses>.

This Specification is a [http://www.digistan.org/open-standard:definition free and open standard] and is governed by the Digital Standards Organization's [http://www.digistan.org/spec:1/COSS Consensus-Oriented Specification System].

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [http://tools.ietf.org/html/rfc2119 RFC 2119].

++ Goals

The ZeroMQ Message Transport Protocol (ZMTP) is a transport layer protocol for exchanging messages between two peers over a connected transport layer such as TCP. This document describes version 3.0 of ZMTP. ZMTP solves a number of problems we face when using TCP carry messages:

* TCP carries a stream of octets with no delimiters, but we want to send and receive discrete messages. Thus, ZMTP reads and writes //frames// consisting of a size and a body.

* We need to carry metadata on each frame (such as, whether the frame is part of a multipart message, or not). ZMTP provides a //flags// field in each frame for metadata.

* We need to be able to talk to older implementations, so that our framing can evolve without breaking existing implementations. ZMTP defines a //greeting// that announces the implemented version number, and specifies a method for version negotiation.

* We need security so that peers can be sure of the identity of the peers they talk to, and so that messages cannot be tampered with, nor inspected, by third parties. ZMTP defines a //security handshake// that allows peers to create a secure connection.

* We need a range of security protocols, from clear text (no security, but fast) to fully authenticated and encrypted (secure, but slow). Further, we need the freedom to add new security protocols over time. ZMTP defines a way for peers to agree on an extensible //security mechanism//.

* We need a way to carry metadata about the connection, after the security handshake. ZMTP defines a standard set of //metadata properties// (socket type, identity, etc.) that peers exchange after the security mechanism.

* We want to allow multiple tasks to share a single external unique interface and port, to decrease system administration costs. ZMTP defines a //resource// metadata property that lets any number of tasks share a single interface and port.

* We need to write down these solutions in a way that is easy for teams to implement on any platform and in any language. ZMTP is thus specified as a formal protocol (this document) and made available to teams under a free license.

* We need guarantees that people will not create private forks of ZMTP, thus breaking interoperability. ZMTP is thus licensed under the GPLv3, so that any derived versions must also be made available to users of software that implements it.

+++ Changes from ZMTP 2.0

The changes in version 3.0 of the specification are a simpler version numbering scheme, the addition of security mechanisms, the removal of hard-coded connection metadata (socket type and identity) from the greeting, the addition of connection metadata, the addition of commands, and the addition of heartbeating.

+++ Recent Change History

22 June 2013:

* Allow commands to be mixed with messages at any point in the protocol.

* Added SUBSCRIBE and CANCEL commands instead of special messages. TODO: update libzmq.

* Added PING and PONG commands for heartbeating. TODO: update libzmq, including API for enabling these.

* Added PUB and SUB socket types so that the semantics for these could be explicitly documented. (PUB sockets cannot receive messages; SUB sockets cannot send them).

* Moved transport-independent socket semantics to separate RFCs (28, 29, 30, 31) since they are not specific to ZMTP.

+++ Related Specifications

* http://rfc.zeromq.org/spec:24/ZMTP-PLAIN defines the ZMTP-PLAIN security mechanism.
* http://rfc.zeromq.org/spec:25/ZMTP-CURVE defines the ZMTP-CURVE security mechanism.
* http://rfc.zeromq.org/spec:26/CURVEZMQ defines the CurveZMQ authentication and encryption protocol.
* http://rfc.zeromq.org/spec:27/ZAP defines the ZeroMQ Authentication Protocol.
* http://rfc.zeromq.org/spec:28/REQREP defines the semantics of REQ, REP, DEALER and ROUTER sockets. 
* http://rfc.zeromq.org/spec:29/PUBSUB defines the semantics of PUB, XPUB, SUB and XSUB sockets. 
* http://rfc.zeromq.org/spec:30/PIPELINE defines the semantics of PUSH and PULL sockets. 
* http://rfc.zeromq.org/spec:31/EXPAIR defines the semantics of exclusive PAIR sockets. 

++ Implementation

+++ Overall Behavior

A ZMTP connection goes through these main stages:

* The two peers agree on the version and security mechanism of the connection by sending each other data and either continuing the discussion, or closing the connection.

* The two peers handshake the security mechanism by exchanging zero or more commands. If the security handshake is successful, the peers continue the discussion, otherwise one or both peers closes the connection.

* Each peer then sends the other metadata about the connection as a final command. The peers may check the metadata and each peer decides either to continue, or to close the connection.

* Each peer is then able to send the other messages. Either peer may at any moment close the connection.

+++ Formal Grammar

The following ABNF grammar defines the protocol:

[[code]]
;   The protocol consists of zero or more connections
zmtp = *connection

;   A connection is a greeting, a handshake, and traffic
connection = greeting handshake traffic

;   The greeting announces the protocol details
greeting = signature version mechanism as-server filler

signature = %xFF padding %x7F
padding = 8OCTET        ; Not significant

version = version-major version-minor
version-major = %x03            
version-minor = %x00

;   The mechanism is a null padded string
mechanism = 20mechanism-char
mechanism-char = "A"-"Z" | DIGIT
    | "-" | "_" | "." | "+" | %x0

;   Is the peer acting as server?
as-server = %x00 | %x01

;   The filler extends the greeting to 64 octets
filler = 31%x00             ; 31 zero octets

;   The handshake consists of at least one command
;   The actual grammar depends on the security mechanism
handshake = 1*command

;   Traffic consists of commands and messages intermixed
traffic = *(command | message)

;   A command is a single long or short frame
command = command-size command-body
command-size = %x04 short-size | %x06 long-size
short-size = OCTET          ; Body is 0 to 255 octets
long-size = 8OCTET          ; Body is 0 to 2^63-1 octets
command-body = command-name command-data
command-name = OCTET 1*255command-name-char
command-name-char = ALPHA
command-data = *OCTET

;   A message is one or more frames
message = *message-more message-last
message-more = ( %x01 short-size | %x03 long-size ) message-body
message-last = ( %x00 short-size | %x02 long-size ) message-body
message-body = *OCTET
[[/code]]

+++ Version Negotiation

ZMTP provides asymmetric version negotiation. A ZMTP peer MAY attempt to detect and work with older versions of the protocol. It MAY also demand ZMTP capabilities from its peers.

In the first case, after making or receiving a connection, the peer SHALL send to the other a partial greeting sufficient to trigger version detection. This is the first 11 octets of the greeting (signature and major version number). The peer SHALL then read the first eleven octets of greeting sent by the other peer, and determine whether to downgrade or not. The specific heuristics for each older ZMTP version are explained in the section "Backwards Interoperability". In this case the peer MAY use the padding field for older protocol detection (we explain the specific known case below).

In the second case, after making or receiving a connection, the peer SHALL send its entire greeting (64 octets) and SHALL expect a matching 64 octet greeting. In this case the peer SHOULD set the padding field to binary zeros.

In either case, note that:

* A peer SHALL NOT assign any significance to the padding field and MUST NOT validate this nor interpret it in any way whatsoever.

* A peer MUST accept higher protocol versions as valid. That is, a ZMTP peer MUST accept protocol versions greater or equal to 3.0. This allows future implementations to safely interoperate with current implementations.

* A peer SHALL always use its own protocol (including framing) when talking to an equal or higher protocol peer.

* A peer MAY downgrade its protocol to talk to a lower protocol peer.

* If a peer cannot downgrade its protocol to match its peer, it MUST close the connection.

+++ Topology

ZMTP is by default a peer-to-peer protocol that makes no distinction between the clients and servers.

However, the security mechanisms (which are extension protocols, and explained below) MAY define distinct roles for client peers and server peers. This difference reflects the general model of concentrating authentication on servers.

Traditionally in TCP topologies, the "server" is the peer which binds, and the "client" is the peer that connects. ZMTP allows this but also the opposite topology in which the client binds, and the server connects. The actual server role is therefore not defined by bind/connect direction, but specified in the greeting {{as-server}} field, which is set to 1 for server peers, and 0 for client peers.

If the chosen security mechanism does not specify a client and server topology, the as-server field SHALL have no significance, and SHOULD be zero for all peers.

+++ Authentication and Confidentiality

ZMTP provides extensible authentication and confidentiality through the use of a negotiated security mechanism that is loosely based on the IETF [http://tools.ietf.org/html/rfc4422 Simple Authentication and Security Layer (SASL)]. The peer MAY support any or all of the following mechanisms:

* "NULL", specified later in this document, which implements no authentication, and no confidentiality.

* "PLAIN", specified by [http://rfc.zeromq.org/spec:24/ rfc.zeromq.org/spec:24/ZMTP-PLAIN], which implements simple user-name and password authentication, in clear text.

* "CURVE", specified by [http://rfc.zeromq.org/spec:25/ rfc.zeromq.org/spec:25/ZMTP-CURVE], which implements full authentication and confidentiality using the [http://curvezmq.org CurveZMQ security protocol].

The security mechanism is an ASCII string, null-padded as needed to fit 20 octets. Implementations MAY define their own mechanisms for experimentation and internal use. All mechanisms intended for public interoperability SHALL be defined as 0MQ RFCs. Mechanism names SHALL be assigned on a first-come first-served basis. Mechanism names SHALL consist only of uppercase letters A to Z, digits, and embedded hyphens or underscores.

A peer announces precisely one security mechanism, unlike SASL, which lets servers announce multiple security mechanisms. Security in ZMTP is //assertive// in that all peers on a given socket have the same, required level of security. This prevents downgrade attacks and simplifies implementations.

Each security mechanism defines a protocol consisting of zero or more commands sent by either peer to the other until the handshaking is complete or either of the peers refuses to continue, and closes the connection. 

Commands are single frames consisting of a size field, and a body. If the body is 0-255 octets, the command SHOULD use a short size field (%x04 followed by 1 octet). If the body is 256 or more octets, the command SHOULD use a long size field (%x06 followed by 8 octets).

The size field is always in clear text. The body may be partially or fully encrypted. ZMTP does not define the syntax nor semantics of commands. These are fully defined by the security mechanism protocol.

The supported mechanism is not considered sensitive information. A peer that reads a full greeting, including mechanism, MUST also send a full greeting including mechanism. This avoids deadlocks in which two peers each wait for the other to divulge the remainder of their greeting.

If the mechanism that the peer received does not exactly match the mechanism it sent, it MUST close the connection.

+++ Error Handling

ZMTP has no explicit error responses. The peer SHALL treat all errors as fatal and act by closing the connection. When a peer sees that the other peer has closed the connection before the security handshake was complete, it SHALL NOT reconnect. If the other peer closes the connection after the security handshake was complete, the peer SHOULD reconnect.

To avoid connection storms, peers should reconnect after a short and possibly randomized interval. Further, if a peer reconnects more than once, it should increase the delay between reconnects. Various strategies are possible.

If a peer closes its connection before the connection was completed, this SHOULD be treated as a permanent failure, i.e. the implementation SHOULD NOT reconnect automatically.

If the peer closes its side of the connection after the connection was made successfully, this SHOULD be treated as a temporary failure, and the implementation SHOULD reconnect after a suitable waiting period.

In some cases there is no visible difference between an error disconnect and a normal disconnect. The peer MAY in such cases use timing as an indicator (an immediate disconnect is an error, a later disconnect is "normal").

+++ Framing

Following the greeting, which has a fixed size of 64 octets, all further data is sent as //frames//, which carry commands or messages. Frames consist of a size field, a flags fields, and a body. The frame design is meant to be efficient for small frames but capable of handling extremely large data as well.

A frame consists of a flags field (1 octet), followed by a size field (one octet or eight octets) and a frame body of size octets. The size does not include the flags field, nor itself, so an empty frame has a size of zero.

A short frame has a body of 0 to 255 octets. A long frame has a body of 0 to 2^63-1 octets. 

The flags field consists of a single octet containing various control flags. Bit 0 is the least significant bit (rightmost bit):

* Bits 7-3: //Reserved//. Bits 7-3 are reserved for future use and MUST be zero.

* Bit 2 (COMMAND): //Command frame//. A value of 1 indicates that the frame is a command frame. A value of 0 indicates that the frame is a message frame.

* Bit 1 (LONG): //Long frame//. A value of 0 indicates that the frame size is encoded as a single octet. A value of 1 indicates that the frame size is encoded as a 64-bit unsigned integer in network byte order.

* Bit 0 (MORE): //More frames to follow//. A value of 0 indicates that there are no more frames to follow. A value of 1 indicates that more frames will follow. This bit SHALL be zero on command frames.

++++ Commands

Commands are used by the ZMTP implementation and not generally visible to the application except in some cases. Commands always consist of one frame, containing a printable command name, a null octet separator, and data. 

These are the commands that this specification defines:

* READY - implements the NULL security handshake, see "The NULL Security Mechanism" below.
* PING and PONG - heartbeat requests and replies, see "Connection Heartbeating" below.
* SUBSCRIBE and CANCEL - subscription management, see "The Publish-Subscribe Pattern", below.

ZMTP supports extensible security mechanisms and these may define their own commands. Security mechanisms MAY use any command names they need to.

++++ Messages

Messages carry application data and are not generally created, modified, or filtered by the ZMTP implementation except in some cases. Messages consist of one or more frames and an implementation SHALL always send and deliver messages atomically, that is, all the frames of a message, or none of them.

++ The NULL Security Mechanism

The NULL mechanism implements no authentication and no confidentiality. The NULL mechanism SHOULD NOT be used on public infrastructure without transport-level security (e.g. over a VPN).

To complete a NULL security handshake, both peers SHALL send each other a READY command. Both peers SHOULD then parse, and MAY validate the READY command. Either or both peer MAY then choose to close the connection if validation failed. A peer MAY start to send messages immediately after sending its metadata.

When a peer uses the NULL security mechanism, the as-server field MUST be zero. The following ABNF grammar defines the NULL security handshake:

[[code]]
null = ready *message
ready = command-size %d5 "READY" metadata
metadata = *property
property = name value
name = OCTET 1*255name-char
name-char = ALPHA | DIGIT | "-" | "_" | "." | "+"
value = 4OCTET *OCTET       ; Size in network byte order
[[/code]]

The message and command-size are defined by the ZMTP grammar previously explained.

The body of the READY command consists of a list of properties consisting of name and value as size-specified strings.

The name SHALL be 1 to 255 characters. Zero-sized names are not valid. The case (upper or lower) of names SHALL NOT be significant.

The value SHALL be 0 to 2,147,483,647 (2^31-1 or INT32_MAX in C/C++) octets of opaque binary data. Zero-sized values are allowed. The semantics of the value depend on the property. The value size field SHALL be four octets, in network byte order. Note that this size field will mostly not be aligned in memory.

Note that to avoid deadlocks, each peer MUST send its READY command before attempting to receive a READY from the other peer. In the NULL mechanism, peers are symmetric. Here is the command flow for two peers, P and R:

[[code]]
P:greeting          R:greeting
P:ready             R:ready
P:message...        R:message...
[[/code]]

+++ Connection Metadata

The security mechanism SHALL provide a way for peers to exchange metadata in the form of a key-value dictionary. The precise encoding of the metadata depends on the mechanism.

Metadata names SHALL be case-insensitive.

These metadata properties are defined:

* "Socket-Type", which specifies the sender's socket type. See the section "The Socket Type Property" below. The sender SHOULD specify the Socket-Type.

* "Identity", which specifies the sender's socket identity. See the section "The Identity Property" below. The sender MAY specify an Identity.

* "Resource", which specifies the a resource to connect to. See the section "The Resource Property" below. The sender MAY specify a Resource.

The implementation MAY provide other metadata properties such as implementation name, platform name, and so on. For interoperability, metadata names and semantics MAY be defined as RFCs.

Metadata names starting with "X-" SHALL be reserved for application use.

++++ The Socket-Type Property

The Socket-Type announces the ZeroMQ socket type of the sending peer. The Socket-Type SHALL match this grammar:

[[code]]
socket-type = "REQ" | "REP" 
            | "DEALER" | "ROUTER" 
            | "PUB" | "XPUB" 
            | "SUB" | "XSUB"
            | "PUSH" | "PULL"
            | "PAIR" 
[[/code]]

The peer SHOULD enforce that the other peer is using a valid socket type. This table shows the legal combinations (as "*"):

[[code]]
       | REQ | REP | DEALER | ROUTER | PUB | XPUB | SUB | XSUB | PUSH | PULL | PAIR |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
REQ    |     |  *  |        |   *    |     |      |     |      |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
REP    |  *  |     |   *    |        |     |      |     |      |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
DEALER |     |  *  |   *    |   *    |     |      |     |      |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
ROUTER |  *  |     |   *    |   *    |     |      |     |      |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
PUB    |     |     |        |        |     |      |  *  |  *   |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
XPUB   |     |     |        |        |     |      |  *  |  *   |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
SUB    |     |     |        |        |  *  |  *   |     |      |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
XSUB   |     |     |        |        |  *  |  *   |     |      |      |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
PUSH   |     |     |        |        |     |      |     |      |      |  *   |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
PULL   |     |     |        |        |     |      |     |      |  *   |      |      |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
PAIR   |     |     |        |        |     |      |     |      |      |      |  *   |
-------+-----+-----+--------+--------+-----+------+-----+------+------+------+------+
[[/code]]

When a peer validates the socket type, it MUST handle errors by silently disconnecting the other peer and possibly logging the error for debugging purposes.

++++ The Identity Property

A REQ, DEALER, or ROUTER peer connecting to a ROUTER MAY announce its identity, which is used as an addressing mechanism by the ROUTER socket. For all other socket types, the Identity property shall be ignored.

The Identity SHALL match this grammar:

[[code]]
identity = 0*255OCTET
[[/code]]

The first octet of the Identity SHALL NOT be zero: identities starting with a zero octet are reserved for implementations' internal use.

++++ The Resource Property

The Resource Property addresses the common problem of running multiple services (within a single process) on a single network endpoint. This is particularly desirable when using public-facing ports, which can be expensive to manage due to firewall issues. Without any protocol support, one service requires one port number to bind to. With protocol support, multiple services can share a single port.

ZMTP implements this sharing using the concept of "Resource". Resources are chosen by end-user applications and conceptually extend the zmq_bind or zmq_connect API calls. For example:

[[code]]
zmq_bind (server_socket, "tcp://eth0:6000/system/name-service/test");
[[/code]]

And:

[[code]]
zmq_connect (client_socket, "tcp://192.168.55.212:6000/system/name-service/test");
[[/code]]

Where the resource is then "system/name-service/test".

The Resource SHALL match this grammar:

[[code]]
resource = resource-name * ( "/" resource-name )
resource-name = 1*resource-char
resource-char = ALPHA | DIGIT | "$" | "-" | "_" | "@" | "." | "&" | "+"
[[/code]]

The implementation SHALL accept all resource requests, as a resource may become available at an arbitrary time after the connection has been established.

Since resource evaluation happens //after// security handshaking, all services that use the same endpoint SHALL use the same security credentials. One strategy for achieving this is:

* Create a listener for the first bind to a particular interface:port.
* Store security credentials on a per-listener basis.
* Handle new incoming connections on a per-listener basis.
* Resolve the resource name after security handshaking, and hand-off connections to the appropriate service.

++ Socket Semantics

In order to ensure interoperability, we aim to define both the socket-specific semantics of the API to calling applications, and the behavior of peers talking to each other over the network.

These rules apply to all sockets:

* All sockets SHALL accept connections (binding to an address) and make connections.

* All sockets SHALL establish connections opportunistically, that is: they connect to an endpoint asynchronously, and if the connection is broken, SHOULD reconnect after a suitable delay.

* A message SHALL be sent or received atomically; that is, all frames or none. On sending, the peer SHALL queue all frames of a message in memory until the final frame is sent.

* A message SHALL NOT be delivered more than once to any peer.

* All messages between two immediate peers SHALL be delivered in order.

+++ The Request-Reply Pattern

The implementation SHOULD follow http://rfc.zeromq.org/spec:28/REQREP for the semantics of REQ, REP, DEALER and ROUTER sockets. 

+++ The Publish-Subscribe Pattern

The implementation SHOULD follow http://rfc.zeromq.org/spec:29/PUBSUB for the semantics of PUB, XPUB, SUB and XSUB sockets. 

When using ZMTP, message filtering SHALL happen at the publisher side (the PUB or XPUB socket). To create a subscription, the SUB or XSUB peer SHALL send a SUBSCRIBE command, which has this grammar:

[[code]]
subscribe = command-size %d9 "SUBSCRIBE" subscription
subscription = *OCTET
[[/code]]

To cancel a subscription, the SUB or XSUB peer SHALL send a CANCEL command, which has this grammar:

[[code]]
cancel = command-size %d6 "CANCEL" subscription
[[/code]]

The subscription is a binary string that specifies what messages the subscriber wants. A subscription of "A" SHALL match all messages starting with "A". An empty subscription SHALL match all messages. 

Subscriptions SHALL be additive and SHALL NOT be idempotent. That is, subscribing to "A" and "" is the same as subscribing to "" alone. Subscribing to "A" and "A" counts as two subscriptions, and would require two CANCEL commands to undo.

+++ The Pipeline Pattern

The implementation SHOULD follow http://rfc.zeromq.org/spec:30/PIPELINE for the semantics of PUSH and PULL sockets. 

+++ The Exclusive Pair Pattern

The implementation SHOULD follow http://rfc.zeromq.org/spec:31/EXPAIR for the semantics of exclusive PAIR sockets.

++ Connection Heartbeating

ZMTP provides connection heartbeating to address some specific problems:

* Network connections can go stale and die without reporting TCP errors. By detecting the lack of incoming traffic, a peer can deduce that the connection has gone stale.

* Processes can become blocked, especially if they run out of memory. TCP will not report an error but as for a stale connection, the lack of traffic can be used to deduce a fatal error.

To invoke a single heartbeat, the peer MAY, at any point after the security handshake is complete, send a PING command:

[[code]]
ping = command-size %d4 "PING" ping-ttl ping-context
ping-ttl = 2OCTET
ping-context = *OCTET
[[/code]]

The ping-ttl is a 16-bit unsigned integer in network order that MAY be zero or MAY contain a time-to-live measured in tenths of seconds. The ping-ttl provides a strong hint to the other peer to disconnect if no further traffic is received after that time. The maximum TTL is 6553 seconds.

When a peer receives a PING command it SHALL respond with a PONG command that echoes the ping-context, which may be empty:

[[code]]
pong = command-size %d4 "PONG" ping-context
ping-context = *OCTET
[[/code]]

When a peer does not receive a reply after a reasonable interval, it MAY consider the connection dead, and close it. The interval SHOULD be selected to suit the relevant application use case. This time-out interval will usually be a small multiple of the PING interval.

Since PONG replies may be arbitrarily delayed behind already queued traffic, a peer SHOULD treat any incoming traffic (not just a PONG reply) as a sign of life.

To avoid PONG storms, a peer should not send more than a small number of PING commands without receiving PONG replies.

A peer SHOULD consider the connection dead in these cases:

* If it sends a PING command and does not receive any traffic within some timeout.
* If it receives a PING command with a non-zero TTL and then does not receive any further traffic within that TTL.

After a failed heartbeat, the peer SHOULD reconnect as normal - there is no specific strategy for recovery after such an event.

++ Backwards Interoperability

To detect and work with older versions of ZMTP, we define two strategies; one will detect only 2.0 peers, and the second will detect both 1.0 and 2.0 peers. 

When a peer does not need backwards interoperability, it SHOULD send its full greeting immediately upon establishing a new peer connection. 

+++ Detecting ZMTP 2.0 Peers

From [[http://rfc.zeromq.org/spec:13 ZMTP 2.0] onwards, the protocol contains the version number immediately after the signature. To detect and interoperate with ZMTP 2.0 peers, the implementation MAY use this strategy:

* Send the 10-octet signature followed by the major version number (the single octet %x03).

* Wait for the other peer to send its greeting.

* If the peer version number is 1 or 2, the peer is using ZMTP 2.0, so send the ZMTP 2.0 socket type and identity and continue with ZMTP 2.0.

* If the peer version number is 3 or higher, the peer is using ZMTP 3.0, so send the rest of the greeting and continue with ZMTP 3.0.

Here is the sequence of commands showing two ZMTP 3.0 peers using this algorithm:

[[code]]
C:signature + major-version             S:signature + major-version
C:rest of greeting                      S:rest of greeting
C:ready                                 S:ready 
C:message...                            S:message...
[[/code]]

+++ Detecting ZMTP 1.0 and 2.0 Peers

[http://rfc.zeromq.org/spec:15 ZMTP 1.0] did not have any version information. To detect and interoperate with a ZMTP 1.0 and 2.0 peers, an implementation MAY use this strategy:

* Send a 10-octet pseudo-signature consisting of "%xFF size %x7F" where 'size' is the number of octets in the sender's identity (0 or greater) plus 1. The size SHALL be 8 octets in network byte order and occupies the padding field.

* Read the first octet from the other peer.

* If this first octet is not %FF, then the other peer is using ZMTP 1.0, and has sent us a short frame length for its identity. We read that many octets.

* If this first octet is %FF, then we read nine further octets, and inspect the last octet (the 10th in total). If the least significant bit is 0, then the other peer is using ZMTP 1.0 and has sent us a long length for its identity. We read that many octets.

* If the least significant bit is 1, the peer is using ZMTP 2.0 or later and has sent us the ZMTP signature. We read a further octet, which indicates the ZMTP version. If this is 1 or 2, we have a ZMTP 2.0 peer. If this is 3, we have a ZMTP 3.0 peer.

* When we have detected a ZMTP 1.0 peer, we have already sent 10 octets, which the other peer interprets as the start of an identity frame. We continue by sending the body of the identity frame (zero or more octets). From then, we encode and decode all frames on that connection using the ZMTP 1.0 framing syntax.

* When we have detected a ZMTP 2.0 peer, we continue with version negotiation as explained above, by sending our version number and then socket type and identity as defined by the ZMTP 2.0 specification.

* When we have detected a ZMTP 3.0 peer, we continue by sending the remainder of the greeting (octets 10-64) and then continue with the security handshake as normal.

++ Worked Example

A DEALER client connects to a ROUTER server. Both client and server are running ZMTP and the implementation has backwards compatibility detection. The peers will use the NULL security mechanism to talk to each other.

* The client sends a partial greeting (11 octets) greeting to the server, and at the same time (before receiving anything from the client), the server also sends a partial greeting:

[[code]]
 signature                   major
+------+-------------+------+------+
| %xFF | %x00...%x00 | %x7F | %x03 |
+------+-------------+------+------+
   0        1 - 8       9      10
[[/code]]

* The client and server read the major version number (%x03) and send the rest of their greeting to each other:

[[code]]
 minor   mechanism  as-server  filler
+------+-----------+---------+-------------+
| %x00 |  "NULL"   |  %x00   | %x00...%x00 |
+------+-----------+---------+-------------+
   11     12 - 31      32        33 - 63
[[/code]]

* The client and server now perform the NULL security handshake. First the client sends a READY command to the server that specifies the "DEALER" Socket-Type and empty Identity properties:

[[code]]
+------+----+
| %x04 | 41 |
+------+----+
   0     1
 flags  size
     
+------+---+---+---+---+---+
| %x05 | R | E | A | D | Y |
+------+---+---+---+---+---+
   2     3   4   5   6   7 
 Command name "READY"
 
+----+---+---+---+---+---+---+---+---+---+---+---+
| 11 | S | o | c | k | e | t | - | T | y | p | e |
+----+---+---+---+---+---+---+---+---+---+---+---+
  8    9  10  11  12  13  14  15  16  17  18  19  
 Property name "Socket-Type"

+------+------+------+------+---+---+---+---+---+---+
| %x00 | %x00 | %x00 | %x06 | D | E | A | L | E | R |
+------+------+------+------+---+---+---+---+---+---+
   20     21     22     23    24  25  26  27  28  29  
 Property value "DEALER"
   
+----+---+---+---+---+---+---+---+---+
| 8  | I | d | e | n | t | i | t | y |
+----+---+---+---+---+---+---+---+---+
  30  31  32  33  34  35  36  37  38  
 Property name "Identity"
 
+------+------+------+------+
| %x00 | %x00 | %x00 | %x00 |
+------+------+------+------+
   39     40     41     42
 Property value ""
[[/code]]

* The server validates the socket type, accepts it, and replies with a READY command that contains only the Socket-Type property (ROUTER sockets do not send an identity):

[[code]]
+------+----+
| %x04 | 28 |
+------+----+
+------+---+---+---+---+---+
| %x05 | R | E | A | D | Y |
+------+---+---+---+---+---+
+----+---+---+---+---+---+---+---+---+---+---+---+
| 11 | S | o | c | k | e | t | - | T | y | p | e |
+----+---+---+---+---+---+---+---+---+---+---+---+
+------+------+------+------+---+---+---+---+---+---+
| %x00 | %x00 | %x00 | %x06 | R | O | U | T | E | R |
+------+------+------+------+---+---+---+---+---+---+
[[/code]]

As soon as the server has sent its own READY command, it may also send messages to the client. As soon as the client has received the READY command from the server, it may send messages to the server.

++ Wire Analysis of the Connection

A ZMTP connection is either clear text, or encrypted, according to the security mechanism. On a clear text connection, message data SHALL be sent as message frames. All clear text mechanisms SHALL use the message framing defined in this specification, so that wire analysis does not require knowledge of the specific mechanism.

On an encrypted connection, message data SHALL be encoded as commands so that wire analysis is not possible.

On all ZMTP connections, command names SHALL be visible and command frames SHALL be printable.

++ Security Considerations

* Attackers may overwhelm a peer by repeated connection attempts. Thus, a peer MAY log failed accesses, and MAY detect and block repeated failed connections from specific originating IP addresses.

* Attackers may try to cause memory exhaustion in a peer by holding open connections. Thus, a peer MAY allocate memory to a connection only after the security handshake is complete, and MAY limit the number and cost of in-progress handshakes.

* Attackers may try to make small requests that generate large responses back to faked originating addresses (the real target of the attack). This is called an "amplification attack". Thus, peers MAY limit the number of in-progress handshakes per originating IP address.

* Attackers may try to uncover vulnerable versions of an implementation from its metadata. Thus, ZMTP sends metadata //after// security handshaking.

* Attackers can use the size of messages (even without breaking encryption) to collect information about the meaning. Thus, on an encrypted connection, message data SHOULD be padded to a randomized minimum size.

* Attackers can use the simple presence or absence of traffic to collect information about the peer ("Person X has come on line"). Thus, on an encrypted connection, the peer SHOULD send random garbage data ("noise") when there is no other traffic. To fully mask traffic, a peer MAY throttle messages so there is no visible difference between noise and real data.

++ Topics for Discussion

This draft is open for discussion. The following topics at least are still on the table:

* If we move the socket type back into the greeting, we can formalize the grammar to include things like subscription messages for XSUB sockets. This is rather hard when socket types are stored as metadata.

* Do we need strategies to protect REQ sockets against blocking when a peer dies?

* Do we need strategies to protect REP sockets against malformed requests?

* Do we want DEALER and PUSH sockets to requeue undeliverable messages to other peers? This can improve reliability but results in out-of-order messages, and is not robust against messages lost in transit, or already delivered to the other peer.

* Credits, as a signaling mechanism for asynchronous flow control. This allows fine control over the amount of queued data. In this scenario, the receiving peer (DEALER, PULL, REP) would send credit which would be used up by messages routed to it. The minimal use-case is for PUSH/DEALER-to-PULL/DEALER round-robin routing. Credit could be octets, or messages.

* Explicit rejection message in case of failed security negotiation. This would give peers more help in deciding whether to retry or return a fatal error to their application. The value in not giving explicit errors on security failures is that it makes life harder for attackers.