Home/Blog/Web Development/Digital image compression: An overview
Home/Blog/Web Development/Digital image compression: An overview

Digital image compression: An overview

7 min read
Jun 13, 2023

Introduction to Digital Images#

This blog discusses digital images and some compression techniques to represent them efficiently. We may create images by graphical methods or by capturing a natural image, so an image can be synthetic or natural. A natural image captured with a camera's help can be analog or digital. A digital image is a discrete signal that can be imagined as a matrix of values. These values are called samples, and different cameras can capture different amounts of samples per unit of space. Images like these captured by digital cameras are called raster images, and the values that represent these images are collectively called spatial data.

A single value of spatial data in a digital image is called a pixel. The memory required to represent a pixel normally classifies the images into black-and-white, grayscale, or color images. The black-and-white images require only one bit per pixel, which means in one byte, we can store eight pixels, whereas the grayscale images normally use one byte (8 bits) to store one pixel. In color images, three bytes are usually used to represent a pixel. Since there are three primary colors, one byte for each is used to describe a mix of three primary colors for one pixel.

Examples of a pixel with different memory requirements
Examples of a pixel with different memory requirements

Motivation and benefits of compression#

First, let's ask whether we really need to compress images, especially when digital memory is getting cheaper and cheaper. If we have a 48-megapixel camera on a mobile phone to capture an image, how much storage space would it require?

Storage space =48×106 pixels×3 bytes per pixel.=144×106 bytes.144 megabytes.\begin{align*} \text{Storage space } &= 48\times 10^6 \text{ pixels} \times 3 \text{ bytes per pixel}.\\ &= 144\times 10^6 \text{ bytes.}\\ & \approx 144 \text{ megabytes.} \end{align*}

Let's say we want to share this image with a friend on WhatsApp. How much time would it take? We have an internet connection with an upload speed of five Mbps. Assuming a good quality of service, let’s calculate the required time.

Upload time =144×106 bytes×8 bits per byte5×106 bits per second=144×85 seconds231 seconds=3.84 minutes\begin{align*} \text{Upload time } &= { 144\times 10^6 \text{ bytes} \times 8 \text{ bits per byte} \over{5\times 10^6} \text{ bits per second}}\\ &= {{144\times 8}\over{5}} \text{ seconds}\\ & \approx 231 \text{ seconds} \\ &= 3.84 \text{ minutes} \end{align*}

Increasing the upload speed to double still keeps the time it takes to send the image in minutes, which is quite high. There is an obvious need for compression here, and it can benefit us in the following three ways:

  • It can reduce communication time.

  • It can reduce the cost of communication.

  • It can reduce the required storage space.

Let's now look at the types of image compression.

Types of compression#

There are mainly two types of compression techniques—lossless and lossy.

Lossless compression#

The lossless image compression techniques include the methods to represent an image in a compact and efficient way without losing any information present in the original image. Examples of such algorithms include Huffman, arithmetic, differential, run-length, and dictionary-based coding techniques. The main idea is to exploit different redundancies in the data that provide the margin for a compact representation.

Lossy compression#

The lossy image compression techniques essentially lose some information with an acceptable compromise on image quality. An important part of techniques like these is to segregate the parts of the information that are highly useful or critical to represent the image from the parts that are less useful. The critical part of the information depends on the context and use of the image. For example, in a selfie, the part of the information which is not important for the human visual system is not critical, but in an X-ray image that needs to be fed to software for analysis, there is no unimportant information. In an application in which the license plates of the vehicles are captured, the characters and numerals on the plate are critical. A lossy image compression method normally consists of two steps. The image data is transformed into the frequency domain in the first step. The purpose of this transformation is to decorrelate the image data. Part of the information is dropped as a second step to get a compact data representation. Discrete cosine, Walsh-Hadamard, or Karhunen–Loève transforms are used for the first step. In the second step, some kind of quantization technique is performed. This two-step process is also called transform coding.

The digital image compression process#

There are two main parts of the image compression process. The first part is to compress a raw image, producing a compressed file. This is called the encoding process, and the piece of software that accomplishes this task is called an encoder. The second part of the process is to take a compressed image file and produce the image in an uncompressed format, referred to as a reconstructed image. This part of the process is called decoding, and the piece of software that performs this task is called a decoder. The software that includes both an encoder and a corresponding decoder is called a codec, short for coding plus decoding.

Overall image compression process
Overall image compression process

The need for standardization#

The different compression techniques mentioned above can be combined to design a codec. If we design our own codec, compress images with it, and share those images with different people, we have to provide the decoder to all the people involved so that they can decompress the files compressed by our encoder. Since there are different compression algorithms, we can design too many different codecs with the combination of these algorithms. This crisis gives rise to the need for standardization of the codec. That means the sequence of algorithms, variations of the parameters to these algorithms, and, most importantly, the format of the bits in the compressed files need to be standardized. The standardization provides us the advantage of interoperability and the provision of specialized hardware to achieve enhanced performance.

The JPEG standard has been widely accepted since its creation in 1992 for natural images containing raster data. It's the first choice for digital photography. The acronym JPEG stands for Joint Photographic Experts Group, the team that made this standard. This team consists of two subgroups—one from International Organization for Standardization (ISO) and one from the International Telecommunication Union Telecommunication Standardization Sector (ITU-T).

Famous standard image formats#

The image files compressed with JPEG can have a .jpg, .jpeg, .jfif, or .pjpeg extension. The following table summarizes basic information regarding different image file types.

File Format File Extension Compression Techniques
JPEG .jpg, .jpeg, .jfif, .pjpeg Lossless and lossy methods
GIF .gif Lossless methods
PNG .png Lossless methods
BMP .bmp Lossless methods

In the JPEG standard, chroma subsampling and transform coding are lossy, while lossless methods include run-length encoding and entropy coding. The other standards mentioned in the table use Lempel–Ziv–Welch (LZW), a dictionary-based, lossless encoding method, alone or in combination with entropy coding methods.

The JPEG standard codec#

The JPEG codec is also referred to as the block-based image coding. This is because it divides the image into blocks of 8×88\times 8 pixels after making the dimensions of the image compatible by using zero padding, if required. The block-based approach makes the individual compression methods more efficient and provides the benefit of error localization. If the image is in color, RGB color space is converted to YCbCr space, with an option to subsample Cb and Cr. In the YCbCr format, the Y represents the luminance or brightness component, whereas Cb and Cr are the chrominance components representing the color part. Each component plane or channel is encoded separately, treating one block of 8×88\times 8 pixels at a time. The overall block diagram of the JPEG codec is shown below to give an abstract view of its working mechanism:

Block diagram of JPEG codec
Block diagram of JPEG codec

In the block diagram shown above, each block deserves a separate blog post to learn the full details of how each works. The purpose of this blog post was to give an abstract, global view. To start programming your own custom image codec in Python, you might want to start with the following hands-on projects:


Written By:
Laeeq Aslam
Join 2.5 million developers at
Explore the catalog

Free Resources