What are floating-point binary numbers?

Computers process floating-point numbers, in the form of zeros and ones, by storing them in floating-point representations. Floating-point numbers are represented in the computer systems using the IEEE 754 standard.

According to the IEEE 754 standard, a floating-point number has three components:

Base: This is also known as radix. Its value is 2 for binary and 10 for decimal. Here, we'll restrict ourselves to binary.
Precision: This is the number of significant digits. These digits are bits in the case of binary.
Exponent range: This is specified by the number of exponent bits.

Half-precision (16 bits)

In half-precision, we have 16 bits, and they are divided in the following way:

Sign bit (1): A sign bit represents whether a number is positive (0) or negative (1)
Exponent bits (5): Exponent bits represent the power of 2 that will be multiplied with the mantissa.
Mantissa bits (10): Mantissa bits are the fractional part of the binary number expressed in scientific notation. Since only $1$ will always be the leading number when representing a binary number in scientific notation, we do not need to add it. That's why we only concern ourselves with the fractional part.

Note: Some authors use the term fraction bits instead of mantissa bits.

We reserve a bit in exponent bits since exponents can be positive or negative. We can calculate the exponent range with the following formula:

Explanation

Line 5: Here, we use the struct.pack() function. The purpose of this function is to convert a given list of values into their corresponding string representation by using a format string. This is the first argument in the function. Here, instead of a list of values, we only convert one number.
Line 6: The result of the function needs to be properly formatted. Here, we use f-strings that allow us to format strings in a compact manner. The format for this string is 0>8b , which loosely translates to "convert the number into binary and represent it in eight characters. If needed, append zeros in the front."

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

What are floating-point binary numbers?

Half-precision (16 bits)

Example

Single precision (32 bits)

Implementation

Explanation