Question: Implicit conversion from long long to float yields unexpected result

Question

Implicit conversion from long long to float yields unexpected result

Answers 2
Added at 2017-01-04 03:01
Tags
Question

In an attempt to verify (using VS2012) a book's claim (2nd sentence) that

When we assign an integral value to an object of floating-point type, the fractional part is zero. 
Precision may be lost if the integer has more bits than the floating-point object can accommodate.

I wrote the following wee prog:

#include <iostream>
#include <iomanip>

using std::cout;
using std::setprecision;

int main()
{
    long long i = 4611686018427387905; // 2^62 + 2^0

    float f = i; 

    std::streamsize prec = cout.precision();

    cout << i << " " << setprecision(20) << f << setprecision(prec) << std::endl;

    return 0;
}

The output is

4611686018427387905 4611686018427387900

I expected output of the form

4611686018427387905 4611690000000000000

How is a 4-byte float able to retain so much info about an 8-byte integer? Is there a value for i that actually demonstrates the claim?

Answers
nr: #1 dodano: 2017-01-04 03:01

Floats don't store their data in base 10, they store it in base 2. Thus, 4611690000000000000 isn't actually a very round number. It's binary representation is:

100000000000000000000111001111100001000001110001010000000000000. 

As you can see, that would take a lot of data to precisely record. The number that's actually printed, however, has the following binary representation:

11111111111111111111111111111111111111111111111111111111111100

As you can see, that's a much rounder number, and the fact that it's off by 4 from a power of two is likely due to rounding in the convert-to-base-10 algorithm.

As an example of a number that won't fit in a float properly, try the number you expected:

4611690000000000000

You'll notice that that will come out very differently.

nr: #2 dodano: 2017-01-04 03:01

The float retains so much information because you're working with a number that is so close to a power of 2.

The float format stores numbers in basically binary scientific notation. In your case, it gets stored as something like

1.0000000...[61 zeroes]...00000001 * 2^62.

The float format can't store 62 decimal places, so the final 1 gets cut off... but we're left with 2^62, which is almost exactly equal to the number you're trying to store.

I'm bad at manufacturing examples, but CERT isn't; you can view an example of what happens with bungled number conversions here. Note that the example is in Java, but C++ uses the same floating point types; additionally, the first example is a conversion between a 4-byte int and a 4-byte float, but this further proves your point (there's less integer information that needs to be stored than there is in your example, yet it still fails).

Source Show
◀ Wstecz