Floating Point Rounding Problem — Part 01

IEEE — 754 (Standard for floating point Arithmetic)

It is a standards that specify interchange and arithmetic formats and methods for binary and decimal floating point arithmetic in programming. It describes about three component method when representing the floating point numbers. They are sign bit, exponent value and mantissa value.

  • Sign bit — It is a component that express whether the value is positive or negative. 0 represents a positive number while 1 represents a negative number.
  • Exponent value Positive and negative exponents must be represented in the exponent field. To obtain the stored exponent, a bias is applied to the actual exponent.
  • Mantissa value — The mantissa is the significant digits of a number in scientific notation or a floating-point number. We only have two digits here, O and 1. As a result, a normalized mantissa has only one 1 to the left of the decimal.
IEEE — 754 standards for different types
Step 01: Converting 8 for the binary format
Step 02: Converting 0.3 for the binary format
Step 03: Binary representation for 8.3
Step 04: Converting to scientific representation
  • Representing sign it value: Check whether value is positive or negative. If it is positive assign 0 for it and if it is negative assign 1 for it.
  • Representing exponent value: Select the standard exponent value from the table according to your precision.(If it is single precision value = 8 and for double precision value = 11 and see the above table.). Let’s get single precision and exponent value is 8. So its exponent value =2⁸. As it has to represent both positive and negative value the value is -2⁸ — 2⁸ -1 . That is -128 and +127. Then take the exponent bias from the scientific representation and add that value into the exponent value. Now we have the exact exponent value and convert it into the binary format.
Step 05: SIgn bit and Exponent value
  • Representing mantissa value: Now check the standard mantissa value from the table and get the binary representation in the scientific format and write the mantissa value representation until the standard count. As an example if you want to represent it into single precision you should have 23 numbers in mantissa value. (In here we not count the first 1 in the scientific representation for the count. Only take the values after the decimal point.)
Step 05: Mantissa Value
  • After collecting above three value now you have the complete IEEE-754 Standard representation for the selected floating value. (IEEE representation for the 8.3 )
Step 05: IEEE-754 Representation for 8.3
IEEE-754 standard value for the 8.3 generated from online calculator



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Thilini Weerasinghe

Thilini Weerasinghe


Currently working as an Associate Software Engineer at Virtusa. Has completed degree in B.Sc (Hons) Computing & Information Systems. After all I am a Human...