MFTP is a protocol for reliable data transport across the internet via the unreliable User Datagram Protocol (UDP). Ideal for embedded or IoT platforms with constrained resources where TCP's overhead is undesirable or impractical. Sends full 512 byte blocks of data with no overhead needed for ancillary information, enabled by acknowledgements with CRC32 of block received compared by the CLIENT with CRC32 of block sent.
While UDP has a checksum for basic data integrity checking, the 16-bit checksum has limited error-detecting capability. It can miss some types of errors, such as those that invert bits in a way that results in the same checksum (e.g., two-bit errors that cancel each other out).
For applications that require strong data integrity guarantees, additional measures must be taken at the application level to verify data integrity more robustly.
MFTP provides a stronger guarantee of data integrity by implementing additional CRC32 checks, which are used for the acknowledgments.
FILE_SIZE
: 32 bit unsigned valueCRC32
: 32 bit unsigned valueBLOCK_NUMBER
: 32 bit unsigned valueSHORT_MSG_SIZE
: 8 bytesFILENAME
: variable size, null terminated, 1 to 256 bytesHDR_MSG_SIZE
: variable size - 5 to 512 bytes zero padded such that HDR_MSG_SIZE != SHORT_MSG_SIZE
DUP_WARN
: 0xFFFFFFFF (optional - see below)WAIT_TIME
: 2 secondsMAX_WAIT_TIMES
: 10MAX_RESENDS
: 10SOCKET_TIMEOUT
: 20 seconds (adjust as required)A FILE_SIZE
followed by a FILENAME
for a total of HDR_MSG_SIZE
bytes.
(There is plenty of room here for other header information if desired.)
HDR_MSG
must not be SHORT_MSG_SIZE
in total so there is no chance of the SERVER mistaking it for a SHORT_MSG
. This can be assured by simply appending a zero as necessary.HDR_MSG
, except...HDR_MSG
sent is broadcast the SERVER may receive more than one copy. If this arrives immediately after the first, it will be swallowed by the redundancy checking. However, it may arrive much later. Therefore the SERVER must save the CRC32
from the HDR_MSG
and compare it against all blocks received to see if it is a repeat of that first HDR_MSG
, and if it is, it must be silently discarded. It is recommended that if a block arrives with that same CRC32
, then its first 32 bit value be compared with the FILESIZE
value to confirm that it is indeed a repeat before discarding it.A CRC32
followed by a BLOCK_NUMBER
for a total of SHORT_MSG_SIZE
bytes. There are three types of SHORT_MSG
:
ACK_MSG
: a CRC32
followed by a BLOCK_NUMBER
REP_MSG
: a CRC32
followed by a BLOCK_NUMBER
DUP_MSG
: a CRC32
followed by a DUP_WARN
(optional)The type of message received can only be identified by the number of bytes received. Any message that is SHORT_MSG
sized bytes must be one of these 3 types, but how do we know an ACK_MSG
from a REP_MSG
from a DUP_MSG
when received?
The CLIENT only ever receives an ACK_MSG
, so any SHORT_MSG
sized message must be that. The SERVER can only receive either a REP_MSG
or a DUP_MSG
, and can distinguish between them by looking at the BLOCK_NUMBER
to see if it is a DUP_WARN
.
1 to 512 bytes, excluding SHORT_MSG
sized bytes
DATA_MSG
cannot be mistaken for a SHORT_MSG
by using an extra zero pad byte.A signal to the SERVER that the data in the next block is the same as the last, and it is to simply copy the previous buffer to file again. Then the CLIENT will expect an ACK
with the same CRC32
as before, but the block number incremented from there.
A signal to the SERVER that the next block has the same CRC32
as the previous block but the data is different.
A DUP_MSG
followed by a DATA_MSG
is sent separately and sequentially, with no pause between them to wait for an ACK. The CLIENT will then expect an ACK for the DATA_MSG
. If the ACK does not arrive, both the DUP_MSG
and the DATA_MSG
must be resent.
Optional - see section 6. "Duplicate Blocks of Data and CRC Collisions" below.
0 bytes
To begin, the SERVER is listening for UDP packets on a port known to the CLIENT. The CLIENT broadcasts a HDR_MSG
every WAIT_TIME
seconds to the SERVER at the expected port.
HDR_MSG
and prepares to create a file with the filename passed.ACK_MSG
to the CLIENT, with the CRC32
calculated from the HDR_MSG
received, and BLOCK_NUMBER
field set to zero.When the CLIENT receives the ACK_MSG
, it verifies that it contains the CRC32
of the header it sent and a BLOCK_NUMBER
of 0. The CLIENT then proceeds to send a DATA_MSG
with the first 512 byte block of file data. No header or ancillary data is sent with these blocks. The handshake is thus established. The CLIENT always waits for an ACK_MSG
from the SERVER before sending each next message.
ACK_MSG
after each message received, consisting of the CRC32
computed on that block plus the block number the SERVER assigned it.When the CLIENT receives an ACK_MSG
, it checks the CRC32
and block number in that message. If they are the same values it has recorded for the last block sent, it proceeds to send the next block. If these values correspond to the previous block sent, it assumes the latest block (or message) was not received, and resends it. If these values correspond to neither the last block nor the previous block, it displays an error and exits.
If the SERVER does not hear from the CLIENT within a WAIT_TIME
period, it resends its last ACK_MSG
. The SERVER will continue resending the ACK_MSG
until the CLIENT responds, or it has reached MAX_RESENDS
.
If the CLIENT does not receive an ACK_MSG
within a WAIT_TIME
period, it resends its last block or message. The CLIENT will continue resending until the SERVER responds, or it has reached MAX_RESENDS
. If either of these reach MAX_RESENDS
it will display an error and exit.
Since the CLIENT includes no headers or ancillary data along with the buffers it sends, processing the acknowledgement from the SERVER is key to ensure synchronization with the SERVER as sending of the file progresses. The SERVER only ever sends one kind of message, the ACK
, which contains the CRC32
of the last block it received and also the block number the SERVER assigned to that block.
Of course, the very first message received by the SERVER is not a block of file data, so 0 is returned in the ACK
as the block number. The first block of file data will be assigned a block number of zero, and block numbers will increment from there.
Through analyses of the ACK
the CLIENT learns how it is going with the SERVER. If the CRC32
and block number agree with the CLIENT's record of the last block it sent, the CLIENT can rest assured that the SERVER has received that buffer and that all is well on that end. If the ACK
instead contains the CRC32
and block number for the previous block sent, the CLIENT knows that the SERVER never got the latest block it sent and that it should send it again.
If the CRC32
or block number do not match up with either the values for the last block or the block sent before that, the CLIENT understands that the SERVER is hopelessly confused and terminates the session. There is nothing the CLIENT can do. Of course, a method could be developed to attempt a recovery, but in reality this never happens in bug-free builds. One could simply try sending the file again should the first attempt fail.
It is essential for the health of any connectionless network that unacknowledged packets be generated at exponentially decreasing rates. Depending on the application (e.g., in potentially high traffic networks), consider resending at exponentially decreasing rates, i.e., after 2 seconds, then again after 4 seconds, then again after 8 seconds, etc. The SOCKET_TIMEOUT
value should be reconsidered in this case.
If the CLIENT encounters a block of data in the file stream that computes to the same CRC32
as the previous block, it compares the data in that block with the data in the previous block...
REP_MSG
to the SERVER with the CRC32
of that block and the next block number in the sequence. (Example - sending a large executable file that may have two or more blocks of zeros in sequence.)DUP_WARN_MSG
, consisting of the CRC32
of that block and the next block number in the sequence in a DUP_MSG
, then sending the block itself as a DATA_MSG
. This would be an extremely rare occurrence. There is a nearly infinitesimal probability that data would have two blocks with different data, but the same CRC32
in sequence. However, handling for this is presented to complete the logic of the protocol.The SERVER employs alternating buffers to receive data such that the last block received is not overwritten by the next block or message to arrive.
When the SERVER receives a REP_MSG
it writes the previous DATA_MSG
received to file (again) and responds with an ACK_MSG
as usual.
The SERVER employs redundancy checking on each DATA_MSG
or REP_MSG
that arrives to verify if it already has that block with that CRC32
. If it is redundant, it silently discards the block - without reply.
When the SERVER receives a DUP_MSG
, it disables its redundancy checking for the next block, awaits the next block and processes it as normal when it arrives. However, before processing it it checks the CRC32
of that block against the last and the previous. If that CRC32
is the same as the last and the one before the last, it is silently discarded as an inadvertent duplicate. Finally, it re-enables redundancy checking.
As the SERVER receives each DATA_MSG
or REP_MSG
, it verifies that redundancy checking has not been inadvertently left disabled by the expectation set up by a DUP_MSG
that was not followed by a DATA_MSG
.
When the CLIENT completes sending all full 512 byte blocks of file data, the last remaining data may be any number of bytes between 1 and 512. If by coincidence the number of bytes happens to be SHORT_MSG_SIZE
(8 bytes), it records the CRC32
for that block before zero padding to SHORT_MSG_SIZE
+ 1 (9 bytes).
If the SERVER receives SHORT_MSG_SIZE
+ 1 bytes, it knows this must be the last block in the file, and that it might include a 0 pad byte. It compares what the final file size will be with the last byte against the file size it received in the beginning to know whether to drop the last byte. If it drops the last byte, it must recalculate the CRC for the ACK
excluding that last byte.
When the CLIENT completes sending all file data, it may display a success message to the user and sends a sequence of (MAX_RESENDS
+ 1) EOF messages with a half-second pause between each and terminates, without looking for a response from the SERVER.
When the SERVER receives an EOF, it may display a success message to the user, closes the file, and terminates, sending no acknowledgment back to the CLIENT.
MFTP does not address security considerations
While the MFTP’s throughput may initially appear limited compared to conventional reliable data protocols, leveraging multi-path transmission across multiple ports significantly enhances its bandwidth capabilities. This approach requires a straightforward protocol for port negotiation, ensuring efficient data distribution.
In practice, the protocol would allocate file segments to ports sequentially, with the lowest port number handling the initial segment and the highest port number managing the final segment.
Once the port negotiation is complete, the MFTP can operate concurrently on each port without modification, thus providing a reliable and scalable solution for data transfer over UDP.
To understand the relevant RFCs for a protocol like MFTP that aims to provide reliable data transport over UDP, we can look into a few key documents that outline best practices and methodologies for such implementations:
This document provides comprehensive guidelines for using UDP, emphasizing the necessity of implementing mechanisms to handle varying path conditions, such as transmission delays and congestion levels. It also covers congestion control and other essential features for reliable data transport over UDP. Read more.
This RFC details the features and primitives of UDP and UDP-Lite, including how these protocols provide datagram services. It includes information on the APIs for sending and receiving datagrams, as well as the importance of checksum protection for ensuring data integrity. Read more.
Although this is an expired draft, it discusses a reliable UDP protocol that builds on the principles of RFC 1151 and RFC 908. It outlines mechanisms for reliable, in-order delivery of packets and flexible design parameters suitable for various applications, including telecommunications signaling. Read more.