What are Reed-Solomon erasure codes?

Reed-Solomon codes are used to detect and correct errors in the data introduced during data transmission or on the data storage devices, such as a scratch on CDs corrupting the data. We will answer for error correction, where we know the position of the erred data. Such errors are known as erasures.

Components

Reed-Solomon-based error correction works by adding some extra bits called parity at the end of the actual data and using those extra bits to recover the corrupted data. The code consists of two components –– an Encoder and a Decoder, as shown in the illustration below.

Architecture of Reed-solomon based error correction system
Architecture of Reed-solomon based error correction system

Encoder

The encoder takes data as data symbols, where each data symbol refers to a fixed number of bits b of the data. The encoder appends parity symbols to data symbols. The data and parity symbols together make a codeword. Given a symbol size (in bits) b, we can have maximum n=(2b1)n= (2^b - 1) symbols in a codeword.

Encoding data symbols by appending parity symbols
Encoding data symbols by appending parity symbols

If we have d data symbols in a codeword with a b-bit symbol size, the parity symbols that can be added are n-d, and the number of symbol errors we can correct in a codeword depends on the number of parity symbols added to that codeword.

Let’s say we have the data symbol of size 4 bits. The maximum number of symbols that we can have in a codeword with a 4-bit symbol size is 15, as calculated below:

n=241n=2^{4} - 1= 15

If there are 10 data symbols in the codeword with a 4-bit symbol size, we can add a maximum of 5 parity symbols, as calculated below:

nd=1510=5n-d=15-10=5

Decoder

With n-d parity bits, the decoder can correct up to ndn-d symbol erasures. If the total number of erasures is less than or equal to ndn-d then the decoder recovers the original transmitted or stored data. Otherwise, the decoder shows a message that it can’t recover the original data, or it may also be possible that it does incorrect decoding that doesn’t recover the original data but gives the incorrect data without any message shown to the end user.

Now, let’s see the mathematical method used to correct erasures and how it works through an example.

Lagrange interpolation

With Lagrange interpolation, we find a polynomial of one degree less than the number of data symbols in the codeword. The encoder generates the polynomial using data symbols and finds parity symbols to be appended to the data using the generated polynomial. The decoder generates the polynomial using the symbols (data and parity) that are received at the receiver's side to recover data symbol erasures.

Let’s take an example of three data symbols, 4, 5, and 3, and create a polynomial of degree two for these points, add the parity symbol and see how the parity symbol is used by the decoder to find erased data symbol in the following slides.

The sender wants to send three data symbols to a receiver over a network
1 of 37

Trade-off

If we have to deal with the worst case i.e. all data symbols lost, we need to add dd parity symbols (the number of data symbols in the codeword) per codeword to recover the original data. The amount of processing power required for encoding and decoding Reed-Solomon codes increases with the number of parity symbols appended to the codeword. While a large number of parity symbols allows for the correction of many errors, it also necessitates more computing power than a small number of parity symbols.

Copyright ©2024 Educative, Inc. All rights reserved