Converting IEEE 754 floating point in Haskell Word32/64 to and from Haskell Float/Double

Simon Marlow mentions another approach in GHC bug 2209 (also linked to from Bryan O’Sullivan’s answer) You can achieve the desired effect using castSTUArray, incidentally (this is the way we do it in GHC). I’ve used this option in some of my libraries in order to avoid the unsafePerformIO required for the FFI marshalling method. … Read more

Does the C++ standard specify anything on the representation of floating point numbers?

From N3337: [basic.fundamental/8]: There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of … Read more

Do any real-world CPUs not use IEEE 754?

Other than flawed Pentiums, any x86 or x64-based CPU is using IEEE 754 as their floating-point arithmetic standard. Here are a brief overview of the FPA standards and their adoptions. IEEE 754: Intel x86, and all RISC systems (IBM Power and PowerPC, Compaq/DEC Alpha, HP PA-RISC, Motorola 68xxx and 88xxx, SGI (MIPS) R-xxxx, Sun SPARC, … Read more

sign changes when going from int to float and back

Your program is invoking undefined behavior because of an overflow in the conversion from floating-point to integer. What you see is only the usual symptom on x86 processors. The float value nearest to 2147483584 is 231 exactly (the conversion from integer to floating-point usually rounds to the nearest, which can be up, and is up … Read more

What would cause the C/C++

This behavior is due to the /fp:fast MSVC compiler option, which (among other things) permits the compiler to perform comparisons without regard to proper NaN behavior in an effort to generate faster code. Using /fp:precise or /fp:strict instead causes these comparisons to behave as expected when presented with NaN arguments.

Portability of binary serialization of double/float type in C++

Brian “Beej Jorgensen” Hall gives in his Guide to Network Programming some code to pack float (resp. double) to uint32_t (resp. uint64_t) to be able to safely transmit it over the network between two machine that may not both agree to their representation. It has some limitation, mainly it does not support NaN and infinity. … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)