Friday, August 9, 2013

ADEN

update: This article was featured on java.net home page (you can look for it by date [08/15/2013] on https://home.java.net/archive/spotlight)

ADEN is  a lightweight Java networking framework for message passing applications. This article provides detailed information about ADEN except about the protocol it uses (we refer to it as ADEN protocol) which will be covered later in a separate article.

UDP is a cheap technology to pass small messages between processes running on separate machines. UDP sockets are relatively simple to use: using one socket you can exchange messages (i.e datagrams) with any number of peers .You can also associate a socket with a specific remote address/port so you only exchange datagrams with that address/port. UDP sockets are built around peer-to-peer model i.e server and client semantics are not defined at the socket level. However UDP has some serious limitations: transmitted datagrams can be lost, received multiple times and received in order different than the order they were transmitted in. Therefore building 'real-life' applications using UDP (i.e UDP sockets) is often difficult.The other alternative is to use TCP sockets . TCP provides one-to-one connections i.e a TCP socket can only be used to interact with one peer a time. TCP sockets are built around server-client model and provides reliable in-order delivery. The fact that TCP 'hides' most of network complications from programmer makes it possible to build networked applications with little knowledge in computer networks. Unlike the datagram model provided by UDP TCP provides stream-based model. 'stream' means data transmitted over TCP is seen by the receiving application as a continuous stream of bytes, there is no message boundaries (they can only be provided by application level means). However, TCP is really useful when you want your application to transmit the largest possible amount of data all the time. To make that possible TCP uses sophisticated congestion control mechanisms. Using TCP has little benefit for the network when only a small chunk of bytes is transmitted a time. In this article I introduce a network protocol that is optimized for transmitting small messages but also 'hides' network complications from programmer.

ADEN is personal effort to build a message passing API over UDP/IP that provides reliable in-order delivery. The work involved designing a protocol that is a small connection-oriented reliable network protocol for applications where messages are small (from few bytes to few kilobytes is considered 'small' in this context).

Note that ADEN provides TCP-like features but also provides a message-oriented rather than stream-oriented model which can be seen as higher-level representation of UDP's datagram model.
What are the applications ADEN is useful for? well, Applications like online card games and chats. Applications where messages are small and conversations are request-reply. ADEN is also suitable for transporting signalling protocols.  


Three years ago I decided to research about UDP - first because I enjoy it, second because I had a lot of free time back then ( an elegant way to say jobless), third because I still enjoy it, ,fourth reason is..Why Not?
The idea of this project has been evolving since then, slowly maybe because that ugly thing called 'working' consuming all your time. Anyway I started to get some results in the last few months and I changed the project name one week ago and here we are! thank God the world does not have to wait one more day!

ADEN's project goal is to provide 'negotiated' connections and reliable data transmission on top of Java's DatagramSockets.  Programmers can build robust networking application using ADEN API. ADEN is programmed as a socket API that can be used on any network including the internet(1) almost exactly the same way you use Java's TCP sockets.

2.ADEN connection

ADEN connection is established using a two-way handshake and closed the same way. So a total of 4 IP packets are exchanged instead of 7 packets as with TCP.
To establish an ADEN connection, client creates an ADEN socket, bounds it to a local address/port, initiates it with the server address/port and sends a special message called BOF. After sending BOF the ADEN socket is now connected and can be used to exchange messages with server. To close ADEN connection the client sends a special message called EOF. To accept a connection request, server creates a special ADEN socket, bounds it to a local address/port. And by blocking on 'accept' method, the server gets a an ADEN socket that is used to receive the BOF message and exchange further messages with the client.

The general criteria for BOF & EOF messages is:

 1-the first message the client send is BOF and  the last message the client send is EOF.
2- BOF should be small enough to be sent in a single datagram -the same for EOF.
3- BOF can be used to pass useful data. however data in a BOF message should never be used to invoke a non-idempotent operation on the receiving end(2). Normally BOF would be used to pass application metadata (version...etc ).
4-EOF should not hold any data (besides the flag that indicates this is an EOF !). If attempt to send EOF failed the client can continue normal execution(3).

The server should not send to the client unless it receives the first message after BOF  and that message is not EOF (i.e. server should receive at least one request before sending data to the client).

 All in bold above is MANDATORY .
 After sending EOF the client should close the ADEN socket (Yes, ADEN socket can not be reused). Note that the client should be made to do 'its best' to send the EOF at the end of the session e.g. calling the EOF sending command in a 'finally' clause.
How connections are closed by the server?
-the server closes  ADEN connection by closing the ADEN socket only i.e server does not have to send EOF to the client.
The normal scenario is that the server will close the connection in response to one of these events:
-server receives an EOF.
-server does not receive any message for long time on that connection.

2.1 Connection lifetime

connection lifetime can only be controlled by application-level means:
-server uses read timeouts to determine when to abandon connections.
-client uses read timeouts to determine when to 'give up' and rise a 'server is not responding!'-like error message.
-client may need to 'ping' the server  periodically so the server application does not close the connection (keep-alive signals).
-client may need to 'ping' the server periodically to prevent connections through NATs from getting 'expired' .

2.2 Connection termination again

as we mentioned earlier, normal termination of an ADEN connection involves one-way EOF message passing before local resources associated with that connection are released on both sides i.e. closing the sockets. However, it's not always possible to perform normal termination because server or client may just crash!. ADEN does not provide special handling for such possible scenarios. If the server crashes the client will detect that the connection is lost when it realizes it cannot send requests or receive replies any more . if the client crashes server will dispose the socket normally due to read timeout (unless you use 0 timeout causing your server to block indefinitely). If the client recovers and try to re-connect then if the server already closed the old socket a new connection/socket will be created normally, otherwise the connection request will cause the old socket to be closed automatically, if the server thread was reading or writing on that socket it will be interrupted with an appropriate error. 

2.3 Connection uniqueness

Without going into details of ADEN protocol (that is left to a separate article)  a datagram that belongs to a certain ADEN connection/socket will not be mistakenly delivered to another connection/socket as long as the original connection exists and not yet disposed by the application. If the original connection does not exist or already disposed then that datagram will be discarded silently unless it is a BOF and currently there is no socket identified by that datagram destination/source combination, in that case ADEN protocol will create a new socket if the user is accepting connections on that datagram destination. Although that mean sockets could be created in response to receiving datagrams re-transmitted by the network, that is fine because the application will not be able to receive anything on a socket that was created in response to an old BOF (except that BOF!). Since re-receiving a BOF has no effect on the application  (see rules regarding BOF & EOF mentioned eariler) detecting old BOFs at the application level is really unnecessary. If such socket is ever created the application will normally close it due to read timeout .


3. Sending messages over ADEN

local application can send a message to the remote application by invoking write() method of ADEN socket. ADEN supports message segmentation, Application level messages passed to ADEN are either transmitted in a single datagram  or fragmented into a number of datagrams sent independently , However that depends on the size of the message (message size is not restricted by ADEN ) and the maximum datagram payload ADEN is configured to use.
ADEN performs reliable transmission,  each transmitted datagram has to be acknowledged by another datagram, that is the normal case but ADEN can be configured to allow multiple datagrams to be sent and acknowledged by one datagram, However, Using this facility is not recommended, it should not be used at all if you are using ADEN on the internet (according to RFC5405 low data-volume applications should not send on average more than one UDP datagram per round-trip time) .
ADEN provides synchronous output operations, i.e. each send operation will cause the sending thread to block until the message is received on the other end by-at least- the ADEN protocol . However, asynchronous output can be built over ADEN if needed.

Send operation in ADEN have two modes:
1- asynchronous to remote application (default)
in this mode message is received by the ADEN protocol but may not be delivered yet to the remote application.
2-synchronous mode
this mode allows write() call at one end to synchronize with read() call on the other end. this mode should be used to write the first message (BOF message) and the last message (EOF message) of an ADEN connection in order to synchronize connection initiation/termination processes done by local and remote applications (see ADEN connections in 2). Normally, application-level messages are sent in the asynchronous mode. Sending messages faster than they could be received at application-level will cause overflow at ADEN protocol level. that is described in the following paragraph:

ADEN allows messages to be queued and received later by the application, if the maximum queue size is reached the reception service by ADEN will be suspended until there is room for at least one more message (i.e. until the application receives one message at least). If another message is being sent  ADEN protocol -on the sending end- will be forced to perform re-transmissions and may cause the connection to be closed at the sending end because the other end is not responding for long time and therefore the connection is considered lost.

If your application involves transmitting a file as a stream of messages then
you need to ensure sender and receiver codes are well-orchestrated i.e. the overflow situation described above is unlikely to happen under normal execution. A  clever test would be running both sending and receiving modules on the same machine(4). When running the tests  overflow can be detected by observing ADEN's retransmissions log(5).

Alternatively you can use synchronous sending only and in this case overflow would not be a concern.

4. Receiving messages over ADEN

local application can receive a message from the remote application by invoking read() method of ADEN socket. Messages are received from the network at the ADEN protocol level, messages are then buffered until they are received by the application. The maximum number of  messages that can be queued per connection is configurable, ADEN deals with overflow by ignoring any new message, this will cause ADEN protocol -at the sending end- to initiate a retransmission after some time (it is forced to  assume datagrams are lost on the network).  At worst case the sender application will lose the connection with an error (the other end is not receiving!). Note that overflow is not possible when sending messages in the synchronous mode.
Read is a blocking method that allows timeouts: if no messages are yet available in the underlying queue the caller thread will block  until a message is available or timeout occur whichever happen first.  Note that forgetting to use non-zero positive timeouts will put your application at the risk of blocking indefinitely.


5. Testing ADEN connection

ADEN provides  ping() method that can be called any time after establishing the connection. ping() helps the local application to check whether the remote application still have the connection open or not. That is done by sending a special message that is received and acknowledged as an ordinary message and then dropped by ADEN protocol. that message is sent in asynchronous mode.

Usage:
Although it can be used by both client and server, ping is intended to be used by clients only. Client may call ping from time to time while waiting for a server response (mainly when connections are expected to be of long duration ).
Another use of ping() is to prevent UDP sessions across NATs from expiring when the communication is idle (ADEN is not tested yet for communication across NATs).  However, ping cannot prevent the server application from closing the connection due to read timeout, that can only be done via application-level 'pings' .

6. Exactly-once message delivery

Implementing exactly-once message delivery model over ADEN messages can be described in the light of two applications:
1-messages are passed in one direction
In this application sender needs confirmation that the message is received, but does not expect useful data to be returned. What we need here is to send messages in synchronous mode(see 3).  No need for application-level acknowledgements .
2-messages are exchanged in request-reply manner
here the client sends the message (request) and receives another message in return (reply)  which normally contains useful data needed by the client. No need to assign an id for the request if there is only one conversation a time . otherwise each request-reply should be uniquely identified. That can be done by assigning a sequence (0,1,2,3,....)  to the request, the server then uses the sequence provided with the request as an id for the corresponding reply. Note that this sequence is not used to tell the server about requests order. It is only used to map requests to their replies.

Constraints for implementing exactly-once model:

-Client should not be programmed to resend requests:
Client must not resend requests. After sending a request if no reply is received for long time the client should consider the request lost. Client may also rise an error indicating that the server is not responding. Because the problem could be that the connection is dead client can be programmed to ping the connection as a final step before throwing an error. Note that if the connection is not dead i.e. ping was successful then you probably have a bug in your server code causing valid requests to be thrown away. Client behaviour when a request is lost is application-specific and -normally- reflects the model essence. For example a client sends a request to transfer money from one bank account to another and the power went-off before a reply is received confirming the transaction was sucessful. Later when power is back the client may first query the source account to ensure that the transaction was not  carried out at the first time, if that is true then request the transaction again.

-Server should not be programmed to drop requests to prevent  overflow:
for example a service thread that reads messages from ADEN socket and then put them in a message queue to be handled by another thread, in order to avoid dropping messages when the queue is full the service thread should not read from ADEN socket unless the queue has room for more messages. However, server can still be made to ignore invalid requests silently i.e. without replying an error message to the client.

7. Implementing secure communication over ADEN

security can be implemented by a separate layer on top of ADEN, in the following I describe a technique by using asymmetric cryptography.
First it is required that each end of an ADEN connection to have his own pair of cryptographic keys known as public key and private key. Each end should also have the public key of the other end. How public keys are exchanged is out of our scope.
Before we go into details there is something we should know first:
Each ADEN's connection produces a 64-bit unique identifier on each side of the connection. They are used to identify the  ADEN connection (details about ADEN protocol will be discussed in a separate article ). These IDs are exchanged during connection setup . They are accessible by the application and can be used to implement the security mechanism described here.

Each message should be composed of two parts:
header and payload.  Header contains two fields the first is session_id which is a 64-bit value (Long)and the second is message_sequence which is a 32-bit value (Integer) . The payload is the application-level message passed to the security layer to be protected and then passed to the next layer for transmission.

 session_id: for message sender it must be the remote id of the ADEN socket over which the message is to be sent, and for receiver it must be the local id of the ADEN socket over which the message was received.

message_sequence: a counter that both ends keep during the session. initially 0 when ADEN connection is established and incremented by 1 for each message sent/received. Receiver should accept a message only if this field equals the current value of the counter.

The Algorithm:

Sender:
1- set the header fields and append the payload to the header.
2-Using own private key sign the message and then append the signature to the message payload, encrypt the resulting message using public key of the receiver and send it over the ADEN socket.

Receiver:
1-Receive the message from ADEN socket, decrypt the message using own private key, remove signature part from message body and verify signature using sender public key. If everything is OK continue to the next step otherwise ignore the message.
2- accept a message only if it has correct session_id (equals the local id of the ADEN socket the message was received on) AND correct message_sequence.

Note that using this security layer we will ensure that:
-messages are not readable except by the receiver (confidentiality provided by encryption) .
-messages cannot be faked (authentication provided by signature).
-messages cannot be reused e.g. an attacker can not fool receiver to repeat a computation using an old message (uniqueness provided by the message header).

8. Building concurrent servers

similar to Java's classical I/O  ADEN's  I/O are blocking operations.
 the following technique is usually used with  classical Java sockets :
A single main thread is used to accept connections and each connection is then assigned to a separate thread. This technique can be used to build concurrent servers using ADEN sockets. Probably on of the best ways to implement this mechanism is to use thread pools so threads can be reused instead of creating a new thread every time we have a new connection. Another good strategy is to ensure the server application uses a 'maximum limit' for the number of threads it may use.  so instead of allowing the number of threads to grow blindly the server application should use a threshold that is when reached  the server simply closes new connections if all available threads are busy.

A full example of a concurrent sever is shipped with ADEN source code.

log:
(1)ADEN follows specifications and recommendations provided by IETF regarding congestion control and retransmissions scheduling.
(2)It is possible to re-receive a BOF on a new socket  if the socket created by the server is closed before the connection is finished with the client (i.e. before BOF is acknowledged). Also if an old BOF is re-transmitted by the network.

(3)There are two reasons - a logical reason : failing to send EOF has no effect on the application.  And  technical reason:  EOF could be received and acknowledged but the acknowledgement could be lost on the network and the server could already be closed when EOF is retransmitted  by ADEN protocol. 

(4) messages are unlikely to be lost over network in this case.

(5)log should be enabled first.

2 comments: