A checksum is a technique used to determine the authenticity of received data, i.e., to detect whether there was an error in transmission.
Along with the data that needs to be sent, the sender uses an algorithm to calculate the checksum of the data and sends it along. When the receiver gets the data, it calculates the checksum of the received data using the same algorithm and compares it with the transmitted checksum. If they both match, it means the transmission was error-free.
UDP is a transport layer protocol that enables applications to send and receive data, especially when it is time-sensitive. UDP uses a checksum to detect whether the received data has been altered.
The data being sent is divided into 16-bit chunks. These chunks are then added, any generated carry is added back to the sum. Then, the 1’s complement of the sum is performed and put in the checksum field of the UDP segment.
Suppose the data we want to send consists of the following three words:
0110011001100000
0101010101010101
1000111100001100
Adding the first two words:
0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
_____________________________
1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1
Then, adding the third word to this sum:
1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1
1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 0
____________________________
0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1
However, there is a carry out, which we need to add to the final sum again:
0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1 + 1 =
0 1 0 0 1 0 1 0 1 1 0 0 0 0 1 0
Finally, we take the 1’s complement of the final sum, which in this case, becomes:
1011010100111101
At the receiver side, all the 16-bit data chunks are added again, with any overflow being wrapped around. The checksum is also added to the final result. The answer at the receiver’s side should consist of all ones. If a single bit is zero, it means an error occurred during transmission.
In the example given above, the data would be added at the receiver’s side to get 0100101011000010, considering the data was transferred correctly. Then, the receiver would add the checksum to it, which was 1011010100111101
0 1 0 0 1 0 1 0 1 1 0 0 0 0 1 0
1 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1
____________________________
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
All ones in the final result indicate that there were no problems. If, however, during transmission, even a single bit of the data was altered, the sum would be different; hence, the final result would consist of at least one zero. This way, the error would be detected.
The MD5 is an algorithm used to find a 128-bit hash value of the data being sent. This can also be used to check data integrity. Changing the data very slightly can change this hash value. An example is:
The data to be transmitted:
The quick brown fox jumped over the lazy dog
MD5 hash value generated by the sender:
08a008a01d498c404b0c30852b39d3b8
Now, if the data was altered e.g., a period added at the end:
The quick brown fox jumped over the lazy dog.
The new MD5 value would be:
5c6ffbdd40d9556b73a21e63c3e0e904 , which is obviously completely different.
MD5 is used in the distribution packages of Unix-based operating systems. It is also used by file servers so that, when users download a file, they can compare the computed value to see if it was downloaded correctly.
There are many benefits of using a checksum, the major one being its simplicity and ease of checking data corruption. However, checksum does not indicate exactly where in the data there was a problem, nor does it provide any error correction.