Internet-Draft The R5N Distributed Hash Table January 2022
Schanzenbach, et al. Expires 20 July 2022 [Page]
Workgroup:
Independent Stream
Internet-Draft:
draft-schanzen-r5n-00
Published:
Intended Status:
Informational
Expires:
Authors:
M. Schanzenbach
GNUnet e.V.
C. Grothoff
Berner Fachhochschule
B. Fix
GNUnet e.V.

The R5N Distributed Hash Table

Abstract

This document contains the R5N DHT technical specification.

This document defines the normative wire format of resource records, resolution processes, cryptographic routines and security considerations for use by implementers.

This specification was developed outside the IETF and does not have IETF consensus. It is published here to guide implementation of R5N and to ensure interoperability among implementations.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 20 July 2022.

Table of Contents

1. Introduction

FIXME: Here we should also cite and discuss RELOAD (https://datatracker.ietf.org/doc/html/rfc6940) and establish why we need this spec and are not a "Topology plugin" in RELOAD. The argumentation revolves around the trust model (openness) and security aspects (path signatures).

Distributed Hash Tables (DHTs) are a key data structure for the construction of completely decentralized applications. DHTs are important because they generally provide a robust and efficient means to distribute the storage and retrieval of key-value pairs.

While [RFC6940] already provides a peer-to-peer (P2P) signaling protocol with extensible routing and topology mechanisms, it also relies on strict admission control through the use of either centralized enrollment servers or pre-shared keys. Modern decentralized applications require a more open system that enables ad-hoc participation and other means to prevent common attacks on P2P overlays.

This document contains the technical specification of the R5N DHT [R5N], a secure DHT routing algorithm and data structure for decentralized applications. R5N is an open P2P overlay routing mechanism which supports ad-hoc participation and security properties including support for topologies in restricted-route environments and path signatures.

This document defines the normative wire format of peer-to-peer messages, routing algorithms, cryptographic routines and security considerations for use by implementors.

1.1. Requirements Notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1.2. Structure of This Document

  • Section X defines...

2. Terminology

Peer:
A host that is participating in the overlay. Peers are responsible for holding some portion of the data that has been stored in the overlay, and they are responsible for routing messages on behalf of other hosts as needed by the Routing Algorithm.
Peer Key:
The Peer Key is the identifier used on the Overlay to address a peer.
Peer ID:
The Peer ID is the identity which is used to authenticate a peer in the underlay. The Peer Address is derived from the Peer ID.
Neighbour:
A neighbour is a node which is directly connected to our node.
Block:
An object or group of objects stored in the DHT.
Block-Type:
A unique 32-bit value identifying a block type. Block-Types are either private or allocated by GANA (see Section 12).
Block Storage
The Block Storage component is used to persist and manage data by nodes. It includes logic for quotas, caching stragegies and data validation.
Responsible Peer:
The peer N that is responsible for a specific resource K, as defined by the SelectClosestNode(K, N) algorithm (see Section 7.
Applications
Applications are components which directly use the DHT overlay interfaces. Possible applications include the GNU Name System [I-D.draft-schanzen-gns] or the CADET transport system [cadet].
Overlay Interface
The Overlay Interface exposes the core operations of the DHT overlay to applications. This includes querying and retrieving data from the DHT.
Message Processing
The Message Processing component processes requests from and responses to applications as well as messages from the underlay network.
Routing
The Routing component includes the routing table as well as routing and node selection logic. It facilitates the R5N routing algorithm with required data structures and algorithms.
Underlay Interface
The DHT Underlay Interface is an abstraction layer on top of the supported links of a node. Nodes may be linked by a variety of different transports, including "classical" protocols such as TCP, UDP and TLS or advanced protocols such as GNUnet, L2P or Tor.

3. Architecture

R5N is an overlay network with a pluggable transport layer. The following figure shows the R5N architecture.

             |  +-----------------+  +-------+
Applications |  | GNU Name System |  | CADET |  ...
             |  +-----------------+  +-------+
-------------+------------------------------------ Overlay Interface
             |  ^
             |  |   +---------------+
             |  |   | Block Storage |
             |  |   +---------------+
             |  |    ^
R5N          |  v    v
             | +--------------------+    +---------+
             | | Message Processing |<-->| Routing |
             | +--------------------+    +---------+
             |  ^                          ^
             |  v                          v
-------------+------------------------------------ Underlay Interface
             | +--------+  +--------+
             | |GNUnet  |  |IP      |  ...
Connectivity | |Underlay|  |Underlay|
             | |Link    |  |Link    |
             | +--------+  +--------+

Figure 1

Other glossary

4. Overlay

In the DHT overlay, a node is addressable by its Node Address. The Node Address is a SHA-512 hash [RFC4634] of the Node ID. The Node ID is the public key of the corresponding Ed25519[ed25519] node private key.

An implementation of this specification commonly exposes the two API procedures "GET" and "PUT". The following are non-normative examples of such APIs and their behaviour are detailed in order to give implementers a fuller picture of the protocol.

4.1. The GET procedure

A basic GET procedure may be exposed as:

GET(Query-Key) -> Results as List

The procedure requires at least a Query-Key to initiate a lookup:

QueryKey:
the key to look for in the DHT.

The procedure may allow a set of optional parameters in order to control or modify the query:

Block-Type:
the type of block to look for.
Replication-Level:
An integer which controls how many nearest peers the request should reach.
Route-Options:
Flags that are used in order to indicate certain processing requirements for messages. Any combination of options as defined in Section 8.1 may be specified.
Extended-Query:
is extended query medatadata which may be required depending on the respective Block-Type. A Block-Type must define if the XQuery can or must be used and what the specific format of its contents should be. See also Section 9.3.
Result-Filter:
allows to indicate results which are not relevant anymore to the caller (see Section 8.2).

If the procedure is implemented synchronuously, it may return a list of results. If it is implemented asynchronuously, it may return individual results. A single result commonly consists of:

Block-Type:
the type of block in the result.
Block-Data:
the block payload. Contents are defined by the Block-Type.
Expiration:
the duration of validity of the result.
Key:
the key of the result. This may be different from the Query-Key, for example if a flag for approximate matches was set.
GET-Path:
is a signed path the query took through the network.
PUT-Path:
is a signed path the PUT-Request of this data took through the network.

4.2. The PUT procedure

A PUT procedure may be exposed as:

PUT(Key, Block)

The procedure takes at least two parameters:

Key:
the key under which to store the block.
Block:
the block to store.

The procedure may allow a set of optional parameters in order to control or modify the query:

Block-Type:
the type of the block to store.
Replication-Level:
An integer which controls how many nearest peers the request should reach.
Route-Options:
Flags that are used in order to indicate certain processing requirements for messages. Any combination of options as defined in Section 8.1 may be specified.
Block-Expiration
is the requested expiration date for the block payload.

The procedure does not necessarily output any information.

5. Underlay

In the network underlay, a node is addressable by traditional means out of scope of this document. For example, the node may have a TCP/IP address, or a HTTPS endpoint. While the specific addressing options and mechanisms are out of scope for this document, it is necessary to define a universal addressing format in order to facilitate the distribution of connectivity information to other nodes in the DHT overlay. This format is the "HELLO" message.

It is expected that there are basic mechanisms available to manage node connectivity and addressing. The required functionality are abstracted through the following procedures:

TRY_CONNECT(N, A)
A function which allows the local node to attempt the establishment of a connection to another node N using an address A. When the connection attempt is successful, information on the new peer is offered through the PEER_CONNECTED signal.
HOLD(P)
A function which tells the underlay to keep a hold on the connection to a peer P. FIXME what is this needed for?
DROP(P)
A function which tells the underlay to drop the connection to a peer P. FIXME what is this needed for?
M = RECEIVE(P)
A function or event that allows the local node to receive a protocol message M as defined in this document from a peer P.
SEND(P, M)
A function that allows the local node to send a protocol message M as defined in this document to a peer P. If call to SEND fails, the message has not been sent.
S = ESTIMATE_NETWORK_SIZE()
A procedure that provides estimates on the network size S for use in the DHT routing algorithms. FIXME: What is S and give an example.

In addition to the above procedures, which are meant to be actively executed by the implementation as part of the peer-to-peer protocol, the following callbacks or signals drive updates of the routing table:

PEER_CONNECTED -> P
is a signal that allows the DHT to react to a newly connected peer P. Such an event triggers, for example, updates in the routing table.
PEER_DISCONNECTED -> P
is a signal that allows the DHT to react to a recently disconnected peer. Such an event triggers, for example, updates in the routing table.
ADDRESS_ADDED -> A
The underlay signals us that an address A was added for our local peer. This information is used to advertise connectivity information to the local peer. A is a string suitable for inclusion in a HELLO payload Section 9.3.1.
ADDRESS_DELETED -> A
The underlay signals us that an address A was removed. This information is used, for example, to no longer advertise this address.

6. Bootstrapping

Initially, the implementation depends upon either the Underlay to provide at least one initial connection to a peer signalled through PEER_CONNECTED, or the application/end-user providing at least one working HELLO to the DHT or the Underlay for bootstrapping. While details on how the first connection is established MAY depend on the specific implementation, this SHOULD usually be done by an out-of-band exchange of the information from a HELLO block. For this, section TBD specifies a URL format for encoding HELLO blocks as text strings which SHOULD be supported by implementations.

Regardless of how the initial connections are established, the peers resulting from these initial connections are subsequently stored in the routing table component Section 7.3.

Further, the Underlay must provide the implementation with one or more addresses signalled through ADDRESS_ADDED. The implementation then proceeds to periodically advertise all active addresses in a HELLO block Section 9.3.1.

In order to find more close nodes in the network, an implementation MUST now periodically Section 7.2 messages.

In both cases the frequency of advertisements and peer discovery MAY be adapted according to network conditions, connected peers, workload of the system and other factors which are at the discretion of the developer.

Any implementation encountering a HELLO GET request initially sends its own node address if it.

7. Routing

7.1. Peer Storage

A R5N implementation must store the information on connected nodes and update changes accordingly in a local persistance component such as a database. Upon receiving a connection notification from the Underlay through NODE_CONNECTED, information on the new peer is added to the local peer storage. When disconnect is indicated by the Underlay through NODE_DISCONNECTED the peer MUST be removed from the local peer storage. In order to achieve O(log n) routing performance, the data structure for managing connected nodes and their metadata MUST be implemented using the k-buckets concept of [Kademlia]. In order to select nodes which are suitable destinations for routing messages, R5N uses a hybrid approach: Given an estimated network size N, the node selection for the first N hops is random. After the initial N hops, node selection follows an XOR-based node distance calculation.

7.2. Peer Discovery

To build its routing table, a peer will send out requests asking for blocks of type HELLO using its own location as the key, but filtering its own HELLO via the Bloom filter. These requests MUST use the FindApproximate and DemultiplexEverywhere options. FindApproximate will ensure that other nodes will reply with keys they merely consider close-enough, while DemultiplexEverywhere will cause each peer on the path to respond, which is likely to yield HELLOs of peers that are useful somewhere in the routing table.

To facilitate peer discovery, each peer MUST broadcast its own HELLO data to all nodes in the routing table periodically. Whenever a peer receives such a HELLO message from another node, it must cache it as long as that node is in its routing table (or until the HELLO expires) and serve it in response to Peer Discovery requests. Details about the format of the HELLO message are given in section p2p_hello_wire.

7.3. Routing Table

The routing table consists of an array of lists of connected nodes. The i-th list stores neighbours whose identifiers are between distance 2^i 2^(i+1) from the local node. An implementation MAY choose an upper limit on the length of the lists, but SHOULD try to keep at least 5 entries per bucket. For example, in case of system constraints with respect to active connections, an implementation SHOULD evict nodes in large buckets rather than nodes from shallow ones.

As the message traverses a random path through the network for the first N hops, it is essential that routing loops are avoided. In R5N, a bloomfilter is used as part of the routing metadata in messages. The bloomfilter is updates at each hop with the hops node identity. For the next hop selection in both the random and the deterministic case, any node which is in the bloomfilter for the respective message is not included in the node selection process.

The R5N routing component MUST implement the following functions:

GetDistance(A, B) -> Distance as Integer
this function calculates the binary XOR between A and B. The resulting distance is interpreted as an integer where the leftmost bit is the most significant bit.
SelectClosestNode(K, B) -> N
This function selects the connected node N from our routing table with the shortest XOR-distance to the key K. This means that for all other nodes N' in the routing table GetDistance(N, K) < GetDistance(N',K). Nodes in the bloomfilter B are not considered.
SelectRandomNode(B) -> N
This function selects a random node N from all connected nodes. Nodes in the bloomfilter B are not considered.
SelectNode(K, H, B) -> N
This function selects a node N depending on the number of hops H parameter. If H < NETWORK_SIZE_ESTIMATE this function MUST return SelectRandomNode() and SelectClosestnode(K) otherwise. Nodes in the bloomfilter B are not considered.
IsClosestNode(N, K, B) -> true | false
checks if N is the closest node for K (cf. SelectClosestNode(K)). Nodes in the bloomfilter B are not considered.

8. Message Processing

The implementation MUST listen for RECEIVE(P, M) signals from the Underlay and respond to the respective messages sent by the peer P. In the following, the wire formats of the messages and the required processing are detailed. The local node address is referred to as N.

8.1. Route Options

The RouteOptions consist of the following flags which are represented in an options field in the messages. Each flag is represented by a bit in the field starting from 0 as the rightmost bit to 15 as the leftmost bit. FIXME: Actually, we set those bits and then store the resulting value in NBO...

0: Demultiplex-Everywhere
indicates that each node along the way should process the request.
1: Record-Route
indicates to keep track of the route that the message takes in the P2P network.
2: Find-Approximate
This is a special flag which modifies the message processing to allow approximate results.
3-15: Reserved
For future use.

8.2. Bloomfilter

In order to prevent circular routes, GET and PUT messages contain a 128-bit Bloom filter (m=128). The Bloom filter is used to detect duplicate node addresses along the route. A Bloom filter "bf" is initially empty, consisting only of zeroes. There are two functions which can be invoked on the Bloom filter: BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to be added to the Bloom filter or queried against the set. Any bloom filter uses k=16 different hash functions each of which is defined as follows:

8.3. Extended query

TODO: Talk about XQuery in the context of messages.

8.4. HelloMessage

8.4.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   | RESERVED  | URL_CTR   |
+-----+-----+-----+-----+-----+-----+-----+-----+
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    SIGNATURE                  /
/                   (64 byte)                   |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    EXPIRATION                 |
+-----+-----+-----+-----+-----+-----+-----+-----+
/ ADDRESSES (variable length)                   /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 2

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. This type can be one of the DHT message types, but for HELLO messages it must be set to the value 157 in network byte order.
RESERVED
is a 16-bit field that must be zero.
URL_CTR
is a 16-bit number that gives the total number of addresses encoded in the ADDRESSES field. In network byte order.
SIGNATURE
is a 64 byte EdDSA signature using the sender's private key affirming the information contained in the message. The signature is signing exactly the same data that is being signed in a HELLO block as described in section XXX.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. The value specified is microseconds since midnight (0 hour), January 1, 1970, but must be a multiple of one million (so that it can be represented in seconds in a HELLO URL). Stored in network byte order.
ADDRESSES
A sequence of exactly URL_CTR 0-terminated URIs in UTF-8 encoding representing addresses of this peer. Each URI must begin with a non-empty URI schema terminated by "://" and followed by some non-empty Underlay-specific address encoding.

8.4.2. Processing

Upon receiving a HelloMessage from a peer P. An implementation MUST process it step by step as follows:

  1. If P is not in its routing table, the message is discarded.
  2. The signature is verified, including a check that the expiration time is in the future. If the signature is invalid, the message is discarded.
  3. The HELLO information is cached in the routing table until it expires, the peer is removed from the routing table, or the information is replaced by another message from the peer.

8.5. PutMessage

8.5.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   |         BTYPE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|  OPTIONS  | HOPCOUNT  | REPL_LVL  | PATH_LEN  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    EXPIRATION                 |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   BLOOMFILTER                 /
/                 (128 byte)                    |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                      KEY                      /
/                 (64 byte)                     |
+-----+-----+-----+-----+-----+-----+-----+-----+
/              PUTPATH (variable length)        /
+-----+-----+-----+-----+-----+-----+-----+-----+
/              BLOCK (variable length)        /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 3

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. This type can be one of the DHT message types but for put messages it must be set to the value 146 in network byte order.
BTYPE
is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
OPTIONS
is a 16-bit options field (see below).
HOPCOUNT
is a 16-bit number indicating how many hops this message has traversed to far. In network byte order.
REPL_LVL
is a 16-bit number indicating the desired replication level of the data. In network byte order.
PATH_LEN
is a 16-bit number indicating the length of the PUT path recorded in PUTPATH. As PUTPATH is optional, this value may be zero. In network byte order.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
BLOOMFILTER
A bloomfilter (for node addresses) to stop circular routes.
KEY
The key under which the PUT request wants to store content under.
PUTPATH
the variable-length PUT path. The path consists of a list of PATH_LEN node addresses.
BLOCK
the variable-length block payload. The contents are determined by the BTYPE field.

8.5.2. Processing

Upon receiving a PutMessage from a peer P. An implementation MUST process it step by step as follows:

  1. The EXPIRATION field is evaluated. If the message is expired, it MUST be discarded.
  2. If the BTYPE is not supported by the implementation, no validation of the block payload is performed and processing continues at (4). Else, the block MUST be validated as defined in (3).
  3. The block payload of the message is evaluated using according to the BTYPE using the respective ValidateBlockStoreRequest procedure. If the block payload is invalid or does not match the key, it MUST be discarded.
  4. The node address of the sender peer P SHOULD be in BLOOMFILTER. If not, the implementation MAY log an error, but MUST continue.
  5. If the RecordRoute flag is set in OPTIONS, the local node address MUST be appended to the PUTPATH of the message.
  6. If the local node is the closest node (cf. IsClosestNode(N, Key)) or the DemultiplexEverywhere options flag ist set, the message MUST be stored locally in the block storage.
  7. Given the value in REPL_LVL, the number of peers to forward to MUST be calculated. If there is at least one peers to forward to, the implementation SHOULD select up to this number of peers to forward the message to. The implementation MAY forward to fewer or no peers in order to handle resource constraints such as bandwidth. Finally, the local node address MUST be added to the BLOOMFILTER of the forwarded message. For all peers with node address P chosen to forward the message to, SEND(P, PutMessage) is called.

8.6. GetMessage

8.6.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   |         BTYPE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|  OPTIONS  |  HOPCOUNT | REPL_LVL  |  XQ_SIZE  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                 BLOOMFILTER                   /
/                 (128 byte)                    |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                       KEY                     /
/                 (64 byte)                     |
+-----+-----+-----+-----+-----+-----+-----+-----+
/     BF_MUTATOR        |   XQUERY              /
+-----+-----+-----+-----+                       /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
/              BF_RESULT (variable length)      /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 4

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. It must be set to the value 147 in network byte order.
BTYPE
is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
OPTIONS
is a 16-bit options field (see below).
HOPCOUNT
is a 16-bit number indicating how many hops this message has traversed to far. In network byte order.
REPL_LVL
is a 16-bit number indicating the desired replication level of the data. In network byte order.
XQ_SIZE
is a 32-bit number indicating the length of the optional extended query XQUERY. In network byte order.
BLOOMFILTER
A bloomfilter (for node identities) to stop circular routes.
KEY
The key under which the PUT request wants to store content under.
XQUERY
the variable-length extended query. Optional.
BF_MUTATOR
The 32-bit bloomfilter mutator for the result bloomfilter.
RESULT_BF
the variable-length result bloomfilter.

8.6.2. Processing

Upon receiving a GetMessage from a peer an implementation MUST process it step by step as follows:

  1. The KEY and XQUERY is validated against the requested BTYPE as defined by its respective ValidateBlockQuery procedure. If the BTYPE is not supported, or if the block key does not match or if the XQUERY is malformed, the message MUST be discarded.
  2. The node address of the sender peer P SHOULD be in the BLOOMFILTER. If not, the implementation MAY log an error, but MUST continue.
  3. If the local node is the closest node (cf. IsClosestNode (N, Key)) or the DemultiplexEverywhere options flag is set, a reply MUST be produced (if one is available) using the following steps:

    1. If TYPE indicates a request for a HELLO block, the peer MUST consult the HELLOs it has cached for the peers in its routing table instead of the local block storage (while continuing to respect options like DemultiplexEverywhere and FindApproximate).
    2. If OPTIONS indicate a FindApproximate request, the peer should respond with the closest block it has that is not filtered by the RESULT_BF.
    3. Else, the peer should only respond if it has a block that matches the key exactly and that is not filtered by the RESULT_BF.
    4. Any resulting block must be encapsulated in a ResultMessage and transmitted to the neighbor from which the request was received.
  4. FIXME: We only handle if not GNUNET_BLOCK_EVALUATION_OK_LAST. This means that we must evaluate the Reply produced in the previous step using ValidateBlockReply for this BTYPE
  5. Given the value in REPL_LVL, the number of nodes to forward to MUST be calculated (NUM-FORWARD-nodeS). If there is at least one node to forward to, the implementation SHOULD select up to this number of nodes to forward the message to. The implementation MAY forward to fewer or no nodes in order to handle resource constraints such as bandwidth. The message BLOOMFILTER MUST be updated with the local node address N. For all peers with node address P' chosen to forward the message to, SEND(P', PutMessage) is called.

8.7. ResultMessage

8.7.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   |        BTYPE          |
+-----+-----+-----+-----+-----+-----+-----+-----+
|   //      | OPTIONS   | PUTPATH_L | GETPATH_L |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                      KEY                      /
/                 (64 byte)                     |
+-----+-----+-----+-----+-----+-----+-----+-----+
/                    PUTPATH                    /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
/                    GETPATH                    /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
/                   BLOCK                       /
/              (variable length)                /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 5

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. This type can be one of the DHT message types but for put messages it must be set to the value 148 in network byte order.
OPTIONS
is a 16-bit options field (see below).
BTYPE
is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
PUTPATH_L
is a 16-bit number indicating the length of the PUT path recorded in PUTPATH. As PUTPATH is optiona, this value may be zero. In network byte order.
GET_PATH_LEN
is a 16-bit number indicating the length of the GET path recorded in GETPATH. As PUTPATH is optiona, this value may be zero. In network byte order.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
KEY
The key under which the PUT request wants to store content under.
PUTPATH
the variable-length PUT path. The path consists of a list of PATH_LEN node addresses.
GETPATH
the variable-length PUT path. The path consists of a list of PATH_LEN node addresses.
BLOCK
the variable-length resource record data payload. The contents are defined by the respective type of the resource record.

8.7.2. Processing

Upon receiving a ResultMessage from a connected node. An implementation MUST process it step by step as follows:

  1. The EXPIRATION field is evaluated. If the message is expired, it MUST be discarded.
  2. If the MTYPE of the message indicates a HELLO block, it must be validated according to Section 9.3.1. The payload MUST be considered for the local routing table by trying to establish a connection to the node using the information from the HELLO block. If a connection can be established, the node is added to the KBuckets routing table.
  3. If the sender node of the message is already found in the GETPATH, the path MUST be truncated at this position. The implementation MAY log a warning in such a case.
  4. If the KEY of this ResultMessage is found in the list of pending local queries, the KEY and XQUERY are validated against the requested BTYPE using the respective block type implementation of ValidateBlockReply. If the BTYPE is not supported, or if the block key does not match the BTYPE or if the XQUERY is malformed, the message MUST be discarded.
  5. The implementation MAY cache RESULT messages.
  6. If requests by other nodes for this KEY or BTYPE are known, the result block is validated against each request using the respective ValidateBlockReply function.

    If the request options include FindApproximate and the result message block type is HELLO the block validation must use the key derived using DeriveBlockKey as the key included in the request is only approximate. (FIXME: So we extract the key to then check it again against the block? This does not make sense...)

    If the result message block cannot be verified against the KEY of the result message or if BLOCK is malformed, the message MUST be discarded.

    For each pending request the reply is routed to the requesting node N'. FIXME routed to node or forwarded to peer?

9. Block Storage

9.1. Block Processing

RequestEvaluationResult

REQUEST_VALID
Query is valid, no reply given.
REQUEST_INVALID
Query format does not match block type. For example, XQuery not given or of size of XQuery is not appropriate for type.

ReplyEvaluationResult

OK_MORE
Valid result, and there may be more.
OK_LAST
Last possible valid result.
OK_DUPLICATE
Valid result, but duplicate.
RESULT_INVALID
Invalid result. Block does not match query. Value = 4.
RESULT_IRRELEVANT
Block does not match xquery. Valid result, but not relevant for the request.

9.2. Block Functions

Any block type implementation MUST implement the following functions.

ValidateBlockQuery(Key, XQuery) -> RequestEvaluationResult
is used to evaluate the request for a block. It is used as part of GetMessage processing, where the block payload is still unkown, but the block XQuery (FIXME: Undefined here) and Key can and MUST be verified, if possible.
ValidateBlockStoreRequest(Block, Key) -> RequestEvaluationResult
is used to evaluate a block including its key and payload. It is used as part of PutMessage processing. The validation MUST include a check of the block payload against the Key under which it is requested to be stored.
ValidateBlockReply(Block, XQuery, Key) -> ReplyEvaluationResult
is used to evaluate a block including its Key and payload. It is used as part ResultMessage processing. The validation of the respective Block requires a pending local query or a previously routed request of another node and its associated XQuery data and Key. The validation MUST include a check of the block payload against the key under which it is requested to be stored.
DeriveBlockKey(Block) -> Key
is used to synthesize the block key from the block payload and metadata. It is used as part of FIND-node message processing.
FilterResult(Block, XQuery, Key) -> ReplyEvaluationResult
is used to filter results stored in the local block storage for local queries. Locally stored blocks from previously observed ResultMessages and PutMessages MAY use this function instead of ValidateBlockReply in order to avoid revalidation of the block and only perform filtering based on request parameters.

9.3. Block Types

Applications can and should define their own block types. The block type determines the format and handling of the block payload by nodes in PUT and RESULT messages. Block types MUST be registered with GANA Section 12.

For bootstrapping and node discovery, the DHT implementation uses its own block type called "HELLO". A block with this block type contains the NodeID of the node initiating the GET request.

9.3.1. HELLO

The HELLO block type wire format is illustrated in Figure 6. A query for block of type HELLO MUST NOT include extended query data (XQuery). Any implementation encountering a HELLO block with XQuery data MUST consider the block invalid and ignore it.

0   8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|                     PEER-ID                   |
|                    (32 byte)                  |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    SIGNATURE                  |
|                    (64 byte)                  |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
/                   ADDRESSES                   /
/               (variable length)               /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 6
PEER-ID
is the Peer-ID of the node which has generated this HELLO.
SIGNATURE
is the signature of the HELLO.
EXPIRATION
denotes the absolute 64-bit expiration date of the HELLO. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
ADDRESSES
is a list of UTF-8 [RFC3629] URIs [RFC3986] which can be used as addresses to contact the peer. The strings MUST be 0-terminated.

The SIGNATURE covers a 64-bit pseudo header conceptually prefixed to the block. The pseudo header includes the expiration, signature purpose and a hash over the addresses. The wire format is illustrated in Figure 7.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|         SIZE          |       PURPOSE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   H_ADDRS                     |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 7

The Wire Format of the HELLO for Signing.

SIZE
A 32-bit value containing the length of the signed data in bytes in network byte order. The length of the signed data MUST be 80 bytes.
PURPOSE
A 32-bit signature purpose flag. This field MUST be 40 (in network byte order).
EXPIRATION
denotes the absolute 64-bit expiration date of the HELLO. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
H_ADDRS
a hash over the addresses in the HELLO.

H_ADDRS is generated over the ADDRESSES field as provided in the HELLO block using SHA-512 [RFC4634].

A HELLO reply block MAY be empty. Otherwise, it contains the HELLO of a node.

The ADDRESSES part of the HELLO indicate endpoints which can be used by the Underlay in order to establish a connection with the node identified by Peer-ID. An example of an addressing scheme used throughout this document is "ip+tcp", which refers to a standard TCP/IP socket connection. The "hier"-part of the URI must provide a suitable address for the given addressing scheme. The following is a non-normative example of address strings:

ip+udp://1.2.3.4:6789 \
gnunet+tcp://12.3.4.5/ \
Figure 8

10. Security Considerations

An implementation MAY limit the number the number of neighbours it stores is any k-bucket of the routing table. However, the lower bound MUST be adhered to. If there is a limit to the maximum number of neighbours in a k-bucket, the implementation must be careful when choosing the which nodes to replace: For example, a simple optimization of the peer selection algorithm may be to oder all nodes within the k-bucket by distance and evict the nodes which are the farthest away. However, this is not a good idea from a resilience perspective. It is much more important to preserve older connections than closer ones. Preferring long-term connections over close connections makes it more difficult for an attacker to execute a Sibyl attack as more resources need to be invested over time.

11. IANA Considerations

TODO: URI handler for "common" URI handler that Underlays may want to use as part of HELLOs.

12. GANA Considerations

GANA [GANA] is requested to create a "DHT Block Types" registry. The registry shall record for each entry:

The registration policy for this sub-registry is "First Come First Served", as described in [RFC8126]. GANA is requested to populate this registry as follows:

Number| Name   | Contact | References | Description
------+--------+---------+------------+-------------------------
0       ANY      N/A       [This.I-D]   Reserved
7       HELLO    N/A       [This.I-D]   Type of a block that contains
                                        a HELLO for a node
11      GNS      N/A       GNS          Block for storing record data
Figure 9

GANA is requested to amend the "GNUnet Signature Purpose" registry as follows:

Purpose | Name            | References | Description
--------+-----------------+------------+---------------
Figure 10

13. Local Storage

14. Test Vectors

15. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3629]
Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, , <https://www.rfc-editor.org/info/rfc3629>.
[RFC3986]
Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, , <https://www.rfc-editor.org/info/rfc3986>.
[RFC4634]
Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and HMAC-SHA)", RFC 4634, DOI 10.17487/RFC4634, , <https://www.rfc-editor.org/info/rfc4634>.
[RFC4648]
Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, , <https://www.rfc-editor.org/info/rfc4648>.
[RFC6940]
Jennings, C., Lowekamp, B., Ed., Rescorla, E., Baset, S., and H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base Protocol", RFC 6940, DOI 10.17487/RFC6940, , <https://www.rfc-editor.org/info/rfc6940>.
[RFC8126]
Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, , <https://www.rfc-editor.org/info/rfc8126>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[ed25519]
Bernstein, D., Duif, N., Lange, T., Schwabe, P., and B. Yang, "High-Speed High-Security Signatures", , <http://link.springer.com/chapter/10.1007/978-3-642-23951-9_9>.
[CrockfordB32]
Douglas, D., "Base32", , <https://www.crockford.com/base32.html>.
[GANA]
GNUnet e.V., "GNUnet Assigned Numbers Authority (GANA)", , <https://gana.gnunet.org/>.

16. Informative References

[R5N]
Evans, N. S. and C. Grothoff, "R5N: Randomized recursive routing for restricted-route networks", , <https://doi.org/10.1109/ICNSS.2011.6060022>.
[Kademlia]
Maymounkov, P. and D. Mazieres, "Kademlia: A peer-to-peer information system based on the xor metric.", , <http://css.csail.mit.edu/6.824/2014/papers/kademlia.pdf>.
[cadet]
Polot, B. and C. Grothoff, "CADET: Confidential ad-hoc decentralized end-to-end transport", , <https://doi.org/10.1109/MedHocNet.2014.6849107>.
[I-D.draft-schanzen-gns]
Schanzenbach, M., Grothoff, C., and B. Fix, "The GNU Name System", , <https://datatracker.ietf.org/doc/draft-schanzen-gns/>.

Authors' Addresses

Martin Schanzenbach
GNUnet e.V.
Boltzmannstrasse 3
85748 Garching
Germany
Christian Grothoff
Berner Fachhochschule
Hoeheweg 80
CH-2501 Biel/Bienne
Switzerland
Bernd Fix
GNUnet e.V.
Boltzmannstrasse 3
85748 Garching
Germany