Friday, August 9, 2013

ADEN

update: This article was featured on java.net home page (you can look for it by date [08/15/2013] on https://home.java.net/archive/spotlight).


ADEN is a Java API that allows to exchange messages reliably between two Java applications over UDP/IP. It is implemented on top of Java's DatagramChannels, and provides an efficient alternative of TCP sockets for Java applications that exchange only small amount of data a time.

This article explains how ADEN works. It also demonstrates how secure transmission can be implemented on top of ADEN using asymmetric cryptography. Note that some details are omitted for clarity, ADEN protocol is explained in details in a  separate article. Note that ADEN's source code is shipped with complete examples, this article does not provide any code snippets.

  Download ADEN source code

1.Overview

Exchanging small messages between pairs of computers over TCP streams is not highly efficient as -for example- transferring large files.
TCP is really useful when you want your application to send as much as possible data in as short a time as possible. to do that TCP uses sophisticated congestion control mechanism to prevent your application from congesting the network or causing a congested network to become worst. This mechanism ensures optimal send rate at all time.
It is clear that TCP's congestion control has little benefit on the network when peers exchange only small amount of data a time.
Programmers has another choice, to use UDP,  a connection-less unreliable protocol that does not provide any congestion avoidance mechanism. Programmer has the freedom to implement a custom transmission mechanism. Fortunately, guidelines are available for designing efficient congestion control in systems where only small amount of data is exchanged a time (see RFC5405).
However, working with UDP directly via UDP sockets means programmer has to deal with all UDP limitations: unreliability and the lack of 'negotiated' connection are among the main issues. Because of that programmers may prefer to use TCP sockets to save time and effort, at the expense of the overall efficiency of the system.

ADEN's project basic goals are:
- To implement 'negotiated' connection/disconnection and reliable in-order delivery
on top of Java's DatagramChannels.
-To build an API that is as simple-to-use as TCP sockets.

2.ADEN connection

ADEN connection is established using a two-way handshake and closed the same way. So a total of 4 IP packets are exchanged instead of 7 packets as with TCP.
To establish an ADEN connection, client creates an ADEN socket, bounds it to a local address/port, initiates it with the server address/port and sends a special message called BOF. After sending BOF the ADEN socket is now connected and can be used to exchange messages with server. To close ADEN connection the client sends a special message called EOF. To accept a connection request, server creates a special ADEN socket, bounds it to a local address/port. And by blocking on 'accept' method, the server gets a an ADEN socket that is used to receive the BOF message and exchange further messages with the client.

The general criteria for BOF & EOF messages is:

 1-the first message the client send is BOF and  the last message the client send is EOF.
2- BOF should be small enough to be put in a single UDP datagram -the same for EOF (this way, ADEN's protocol will transmit each message in a two-way handshake only).
3- BOF can be used to pass useful data. however data in a BOF message should never be used to invoke a non-idempotent operation on the receiving end(3). Normally BOF would be used to pass application metadata (version...etc ).
4-EOF should not hold any data (besides the flag that indicates this is an EOF !). If attempt to send EOF failed the client can continue normal execution(4).

The server should not send to the client unless it receives the first message after BOF  and that message is not EOF (i.e. server should receive at least one request before sending data to the client). This way if a BOF is re-received on a new socket we will avoid sending to a previously connected socket on the other side.

 All in bold above is MANDATORY .
 After sending EOF the client should close the ADEN socket (Yes, ADEN socket can not be reused). Note that the client should be made to do 'its best' to send the EOF at the end of the session e.g. calling the EOF sending command in a 'finally' clause.
How connections are closed by the server?
-the server closes  ADEN connection by closing the ADEN socket only i.e server does not have to send EOF to the client.
The normal scenario is that the server will close the connection in response to one of these events:
-server receives an EOF.
-server does not receive any message for long time on that connection.For how long server should wait before disposing an idle connection is application-specific.

2.1 Connection lifetime

connection lifetime can only be controlled by application-level means:
-server uses read timeouts to determine when to abandon connections.
-client uses read timeouts to determine when to 'give up' and rise a 'server is not responding!'-like error message.
-client may need to 'ping' the server  periodically so the server application does not close the connection (keep-alive signals).
-client may need to 'ping' the server periodically to prevent connections through NATs from getting 'expired' .

2.2 Connection termination again

as we mentioned earlier, normal termination of an ADEN connection involves one-way EOF message passing before local resources associated with that connection are released on both sides i.e. closing the sockets. However, it's not always possible to perform normal termination because server or client may just crash!. ADEN does not provide special handling for such possible scenarios, and it is let to fix it self. If the server crashed the client will detect that the connection is lost when it tries to write a message, ADEN on the client side will fail due not receiving acknowledgments (or due to receiving a RESET signal from the server if the server restarted). If the client crashed server will dispose the socket normally due to read timeout (unless you use 0 timeout causing your server to block indefinitely!) or when the server tries to write a message causing ADEN to fail due not receiving acknowledgments. If the client recovers and try to re-connect then if the server has already closed the old socket a new connection/socket will be created normally, otherwise the connection request will cause the old socket to be closed automatically, if the server thread was reading or writing on that socket it will be interrupted with an appropriate error.

In short, the following should be kept in mind when using ADEN:
-Always use positive non-zero timeout for read operation on both client and server.
-Server should detect and close idle connections. (the same rules applies when using TCP sockets!)

2.3 Connection uniqueness

Without going into details of ADEN protocol (that is left to a separate article), ADEN  sockets can connect different hosts from the same local address/port. Incoming datagrams are always delivered to the correct socket (i.e thread) within a JVM. Datagrams that arrive after closing the associated socket are discarded, unless that datagram is a BOF and the local host is listening for connection requests on the local address/port on which the datagram was received. In that case a new socket will be created and passed to the application. It is possible to receive an old BOF retransmitted by the network, however, we pretend that never happens, to avoid further complexity in the internal structure of ADEN. Recall the rule that says server must not send over a new socket unless it receives further message (see 2. above), server will eventually close a socket when not receiving anything from it (except a BOF!).


3. Sending messages over ADEN

local application can send a message to the remote application by invoking write() method of ADEN socket. ADEN supports message segmentation, An application level message passed to ADEN is either transmitted in a single datagram  or fragmented into a number of datagrams sent independently , However that depends on the size of the message (message size is not restricted by ADEN ) and the maximum payload for UDP datagram that ADEN is configured to use.
ADEN performs reliable transmission,  each transmitted datagram has to be acknowledged by another datagram, that is the normal case but ADEN can be configured to allow multiple datagrams to be sent and acknowledged by one datagram, However, this feature is experimental and using it is not recommended, it should not be used at all if you are using ADEN on the internet (according to RFC5405 low data-volume applications should not send on average more than one UDP datagram per round-trip time) .
ADEN provides synchronous output operations, i.e. each send operation will cause the sending thread to block until the message is received on the other end by-at least- the ADEN protocol . However, asynchronous output can be built over ADEN if needed.

Send operation in ADEN have two modes:

1- asynchronous to remote application (default)
in this mode write() call returns when the message is received by the ADEN protocol on the other end but read() is not necessarily invoked by the remote application.

2-synchronous mode
this mode allows write() call at one end to synchronize with read() call on the other end. When write() is called read() should be called on the other end too, otherwise send will fail. This mode should be used to write the control messages (BOF and EOF message)  in order to synchronize connection initiation/termination (see ADEN connections in 2). Normally, application messages are sent in the asynchronous mode. Sending messages faster than they could be received at application-level will cause overflow at ADEN protocol level. Overflow is described in the following paragraph:

ADEN's receiver protocol allows incoming messages to be queued and received later by the application, if the maximum queue size is reached new messages will be ignored. The protocol will continue to receive new messages when the room become available (i.e. when the application continue to receive messages). Ignoring messages to prevent overflow will cause ADEN's sender protocol on the other side to perform re-transmissions. Note that overflow on one side can cause write() failure and unplanned connection termination on the other side.

If your application involves transmitting a file as a stream of messages then you need to ensure sender and receiver codes are well-orchestrated i.e. the overflow situation described above is unlikely to happen under normal execution. A  clever test would be running both sending and receiving modules on the same machine(5). When running the tests  overflow can be detected by observing ADEN's retransmissions log(6).

Alternatively you can use synchronous sending only and in this case overflow would not be a concern.

4. Receiving messages over ADEN

local application can receive a message from the remote application by invoking read() method of ADEN socket. Messages are received from the network at the ADEN protocol level, messages are then buffered until they are received by the application. The maximum number of  messages that can be queued per connection is configurable, ADEN deals with overflow by ignoring any new message, this will cause ADEN protocol -at the sending end- to initiate a retransmission after some time (it is forced to  assume datagrams are lost on the network).  At worst case the sender application will lose the connection with an error (the other end is not receiving!). Note that overflow is not possible when sending messages in the synchronous mode only.
Read is a blocking method that allows timeouts: if no messages are yet available in the underlying queue the caller thread will block  until a message is available or timeout occur whichever happen first.  Note that forgetting to use non-zero positive timeouts will put your application at the risk of blocking indefinitely.


5. Testing ADEN connection

ADEN provides  ping() method that can be called any time after establishing the connection. ping() helps the local application to check whether the remote application still have the connection open or not, by sending a special message that is not receivable by the remote application. This message is received by ADEN protocol, acknowledged as an ordinary message and then dropped.

Usage:
Although it can be used by both client and server, ping is intended to be used by clients only. Client may call ping from time to time while waiting for a server response (mainly when connections are expected to be of long duration ). Note that pings can not cause overflow because they are not receivable by the application.
Another use of ping() is to prevent UDP sessions across NATs from expiring when the communication is idle (ADEN is not tested yet for communication across NATs).  However, ping cannot prevent the server application from closing the connection due to read timeout, that can only be done via application-level 'pings' .

6. Exactly-once message delivery

Implementing exactly-once message delivery model over ADEN messages can be described in the light of two applications:
1-messages are passed in one direction
In this application sender needs confirmation that the message is received, but does not expect useful data to be returned. What we need here is to send messages in synchronous mode(see 3).  No need for application-level acknowledgements .
2-messages are exchanged in request-reply manner
here the client sends the message (request) and receives another message in return (reply)  which normally contains useful data needed by the client. No need to assign an id for the request if there is only one conversation a time . otherwise each request-reply should be uniquely identified. That can be done by assigning a sequence (0,1,2,3,....)  to the request, the server then uses the sequence provided with the request as an id for the corresponding reply. Note that this sequence is not used to tell the server about requests order. It is only used to map requests to their replies.

Constraints for implementing exactly-once model:

-Client should not be programmed to resend requests:
Client must not resend requests. After sending a request if no reply is received for long time the client should consider the request lost. Client may also rise an error indicating that the server is not responding. Because the problem could be that the connection is dead client can be programmed to ping the connection as a final step before throwing an error. Note that if the connection is not dead i.e. ping was successful then you probably have a bug in your server code causing valid requests to be thrown away. Client behaviour when a request is lost is application-specific and -normally- reflects the model essence. For example a client sends a request to transfer money from one bank account to another and the power went-off before a reply is received confirming the transaction was sucessful. Later when power is back the client may first query the source account to ensure that the transaction was not  carried out at the first time, if that is true then request the transaction again.

-Server should not be programmed to drop requests to prevent  overflow:
for example a service thread that reads messages from ADEN socket and then put them in a message queue to be handled by another thread, in order to avoid dropping messages when the queue is full the service thread should not read from ADEN socket unless the queue has room for more messages. However, server can still be made to ignore invalid requests silently i.e. without replying an error message to the client.

7. Implementing secure communication over ADEN

security can be implemented by a separate layer on top of ADEN, in the following I describe a technique by using asymmetric cryptography.
First it is required that each end of an ADEN connection to have his own pair of cryptographic keys known as public key and private key. Each end should also have the public key of the other end. How public keys are exchanged is out of our scope.
Before we go into details there is something we should know first:
Each ADEN's connection produces a 64-bit unique identifier on each side of the connection. They are used to identify the  ADEN connection (details about ADEN protocol will be discussed in a separate article ). These IDs are exchanged during connection setup . They are accessible by the application and can be used to implement the security mechanism described here.

Each message should be composed of two parts:
header and payload.  Header contains two fields the first is session_id which is a 64-bit value (Long)and the second is message_sequence which is a 32-bit value (Integer) . The payload is the application-level message passed to the security layer to be protected and then passed to the next layer for transmission.

 session_id: for message sender it must be the remote id of the ADEN socket over which the message is to be sent, and for receiver it must be the local id of the ADEN socket over which the message was received.

message_sequence: a counter that both ends keep during the session. initially 0 when ADEN connection is established and incremented by 1 for each message sent/received. Receiver should accept a message only if this field equals the current value of the counter.

The Algorithm:

Sender:
1- set the header fields and append the payload to the header.
2-Using own private key sign the message and then append the signature to the message payload, encrypt the resulting message using public key of the receiver and send it over the ADEN socket.

Receiver:
1-Receive the message from ADEN socket, decrypt the message using own private key, remove signature part from message body and verify signature using sender public key. If everything is OK continue to the next step otherwise ignore the message.
2- accept a message only if it has correct session_id (equals the local id of the ADEN socket the message was received on) AND correct message_sequence.

Note that using this security layer we will ensure that:
-messages are not readable except by the receiver (confidentiality provided by encryption) .
-messages cannot be faked (authentication provided by signature).
-messages cannot be reused e.g. an attacker can not fool receiver to repeat a computation by repeating an old request (uniqueness provided by the message header).

8. Building concurrent servers

similar to Java's classical I/O  ADEN's  I/O are blocking operations.
 the following technique is usually used with  classical Java sockets :
A single main thread is used to accept connections and each connection is then assigned to a separate thread. This technique can be used to build concurrent servers using ADEN sockets. Probably on of the best ways to implement this mechanism is to use thread pools so threads can be reused instead of creating a new thread every time we have a new connection. Another good strategy is to ensure the server application uses a 'maximum limit' for the number of threads it may use.  so instead of allowing the number of threads to grow blindly the server application should use a threshold that is when reached  the server simply closes new connections if all available threads are busy.

A full example of a concurrent sever is shipped with ADEN source code.

log:
(1) i.e an application-level message is either fits in a single UDP datagram or needs to be fragmented into a few datagrams only.
 
(2)ADEN follows specifications and recommendations provided by IETF regarding congestion control and retransmissions scheduling.
(3)It is possible to re-receive a BOF on a new socket  if the socket created by the server is closed before the connection is finished with the client (i.e. before BOF is acknowledged). Also if an old BOF is re-transmitted by the network.

(4)There are two reasons - a logical reason : failing to send EOF has no effect on the application.  And  technical reason:  EOF could be received and acknowledged but the acknowledgement could be lost on the network and the server could have already disposed the connection when EOF is retransmitted  by ADEN protocol. 

(5) messages are unlikely to be lost over network in this case.

(6)log should be enabled first.

2 comments: