ieee-754 – Page 2 – Tarik Billa

Converting IEEE 754 floating point in Haskell Word32/64 to and from Haskell Float/Double

August 24, 2023 by Tarik

Simon Marlow mentions another approach in GHC bug 2209 (also linked to from Bryan O’Sullivan’s answer) You can achieve the desired effect using castSTUArray, incidentally (this is the way we do it in GHC). I’ve used this option in some of my libraries in order to avoid the unsafePerformIO required for the FFI marshalling method. … Read more

Ranges of floating point datatype in C?

August 22, 2023 by Tarik

A 32 bit floating point number has 23 + 1 bits of mantissa and an 8 bit exponent (-126 to 127 is used though) so the largest number you can represent is: (1 + 1 / 2 + … 1 / (2 ^ 23)) * (2 ^ 127) = (2 ^ 23 + 2 ^ … Read more

Does the C++ standard specify anything on the representation of floating point numbers?

August 1, 2023 by Tarik

From N3337: [basic.fundamental/8]: There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of … Read more

Do any real-world CPUs not use IEEE 754?

July 26, 2023 by Tarik

Other than flawed Pentiums, any x86 or x64-based CPU is using IEEE 754 as their floating-point arithmetic standard. Here are a brief overview of the FPA standards and their adoptions. IEEE 754: Intel x86, and all RISC systems (IBM Power and PowerPC, Compaq/DEC Alpha, HP PA-RISC, Motorola 68xxx and 88xxx, SGI (MIPS) R-xxxx, Sun SPARC, … Read more

sign changes when going from int to float and back

July 24, 2023 by Tarik

Your program is invoking undefined behavior because of an overflow in the conversion from floating-point to integer. What you see is only the usual symptom on x86 processors. The float value nearest to 2147483584 is 231 exactly (the conversion from integer to floating-point usually rounds to the nearest, which can be up, and is up … Read more

In binary notation, what is the meaning of the digits after the radix point “.”?

July 21, 2023 by Tarik

Simple place value. In base 10, you have these places: … 103 102 101 100 . 10-1 10-2 10-3 … … thousands, hundreds, tens, ones . tenths, hundredths, thousandths … Similarly, in binary (base 2) you have: … 23 22 21 20 . 2-1 2-2 2-3 … … eights, fours, twos, ones . halves, quarters, … Read more

What would cause the C/C++

June 9, 2023 by Tarik

This behavior is due to the /fp:fast MSVC compiler option, which (among other things) permits the compiler to perform comparisons without regard to proper NaN behavior in an effort to generate faster code. Using /fp:precise or /fp:strict instead causes these comparisons to behave as expected when presented with NaN arguments.

Portability of binary serialization of double/float type in C++

May 28, 2023 by Tarik

Brian “Beej Jorgensen” Hall gives in his Guide to Network Programming some code to pack float (resp. double) to uint32_t (resp. uint64_t) to be able to safely transmit it over the network between two machine that may not both agree to their representation. It has some limitation, mainly it does not support NaN and infinity. … Read more

Coercing floating-point to be deterministic in .NET?

May 23, 2023 by Tarik

Just what is this “hint” to the runtime? As you conjecture, the compiler tracks whether a conversion to double or float was actually present in the source code, and if it was, it always inserts the appropriate conv opcode. Does the C# spec stipulate that an explicit cast to float causes the insertion of a … Read more

Double vs float on the iPhone

May 21, 2023 by Tarik

The iPhone can do both single and double precision arithmetic in hardware. On the 1176 (original iPhone and iPhone3G), they operate at approximately the same speed, though you can fit more single-precision data in the caches. On the Cortex-A8 (iPhone3GS, iPhone4 and iPad), single-precision arithmetic is done on the NEON unit instead of VFP, and … Read more