This blog discusses digital images and some compression techniques to represent them efficiently. We may create images by graphical methods or by capturing a natural image, so an image can be synthetic or natural. A natural image captured with a camera's help can be analog or digital. A digital image is a discrete signal that can be imagined as a matrix of values. These values are called samples, and different cameras can capture different amounts of samples per unit of space. Images like these captured by digital cameras are called raster images, and the values that represent these images are collectively called spatial data.
A single value of spatial data in a digital image is called a pixel. The memory required to represent a pixel normally classifies the images into black-and-white, grayscale, or color images. The black-and-white images require only one bit per pixel, which means in one byte, we can store eight pixels, whereas the grayscale images normally use one byte (8 bits) to store one pixel. In color images, three bytes are usually used to represent a pixel. Since there are three primary colors, one byte for each is used to describe a mix of three primary colors for one pixel.
First, let's ask whether we really need to compress images, especially when digital memory is getting cheaper and cheaper. If we have a 48-megapixel camera on a mobile phone to capture an image, how much storage space would it require?
Let's say we want to share this image with a friend on WhatsApp. How much time would it take? We have an internet connection with an upload speed of five Mbps. Assuming a good quality of service, let’s calculate the required time.
Increasing the upload speed to double still keeps the time it takes to send the image in minutes, which is quite high. There is an obvious need for compression here, and it can benefit us in the following three ways:
It can reduce communication time.
It can reduce the cost of communication.
It can reduce the required storage space.
Let's now look at the types of image compression.
There are mainly two types of compression techniques—lossless and lossy.
The lossless image compression techniques include the methods to represent an image in a compact and efficient way without losing any information present in the original image. Examples of such algorithms include Huffman, arithmetic, differential, run-length, and dictionary-based coding techniques. The main idea is to exploit different redundancies in the data that provide the margin for a compact representation.
The lossy image compression techniques essentially lose some information with an acceptable compromise on image quality. An important part of techniques like these is to segregate the parts of the information that are highly useful or critical to represent the image from the parts that are less useful. The critical part of the information depends on the context and use of the image. For example, in a selfie, the part of the information which is not important for the human visual system is not critical, but in an X-ray image that needs to be fed to software for analysis, there is no unimportant information. In an application in which the license plates of the vehicles are captured, the characters and numerals on the plate are critical. A lossy image compression method normally consists of two steps. The image data is transformed into the frequency domain in the first step. The purpose of this transformation is to decorrelate the image data. Part of the information is dropped as a second step to get a compact data representation. Discrete cosine, Walsh-Hadamard, or Karhunen–Loève transforms are used for the first step. In the second step, some kind of quantization technique is performed. This two-step process is also called transform coding.
There are two main parts of the image compression process. The first part is to compress a raw image, producing a compressed file. This is called the encoding process, and the piece of software that accomplishes this task is called an encoder. The second part of the process is to take a compressed image file and produce the image in an uncompressed format, referred to as a reconstructed image. This part of the process is called decoding, and the piece of software that performs this task is called a decoder. The software that includes both an encoder and a corresponding decoder is called a codec, short for coding plus decoding.
The different compression techniques mentioned above can be combined to design a codec. If we design our own codec, compress images with it, and share those images with different people, we have to provide the decoder to all the people involved so that they can decompress the files compressed by our encoder. Since there are different compression algorithms, we can design too many different codecs with the combination of these algorithms. This crisis gives rise to the need for standardization of the codec. That means the sequence of algorithms, variations of the parameters to these algorithms, and, most importantly, the format of the bits in the compressed files need to be standardized. The standardization provides us the advantage of interoperability and the provision of specialized hardware to achieve enhanced performance.
The JPEG standard has been widely accepted since its creation in 1992 for natural images containing raster data. It's the first choice for digital photography. The acronym JPEG stands for Joint Photographic Experts Group, the team that made this standard. This team consists of two subgroups—one from International Organization for Standardization (ISO) and one from the International Telecommunication Union Telecommunication Standardization Sector (ITU-T).
The image files compressed with JPEG can have a .jpg, .jpeg, .jfif, or .pjpeg extension. The following table summarizes basic information regarding different image file types.
File Format | File Extension | Compression Techniques |
---|---|---|
JPEG | .jpg, .jpeg, .jfif, .pjpeg | Lossless and lossy methods |
GIF | .gif | Lossless methods |
PNG | .png | Lossless methods |
BMP | .bmp | Lossless methods |
In the JPEG standard, chroma subsampling and transform coding are lossy, while lossless methods include run-length encoding and entropy coding. The other standards mentioned in the table use Lempel–Ziv–Welch (LZW), a dictionary-based, lossless encoding method, alone or in combination with entropy coding methods.
The JPEG codec is also referred to as the block-based image coding. This is because it divides the image into blocks of
In the block diagram shown above, each block deserves a separate blog post to learn the full details of how each works. The purpose of this blog post was to give an abstract, global view. To start programming your own custom image codec in Python, you might want to start with the following hands-on projects:
Free Resources