Data inspector: round floats to precision/significant digits

Wishlists for new functionality and features.
Post Reply
Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Data inspector: round floats to precision/significant digits

Post by Maël »

The basis of this feature request was a mail sent to me:
One comment I had was that the single-precision floating point values are displayed with too many significant digits. For example, a value might be shown as "-0.0149999996647239", but really it only has 6 significant digits and should be shown as -0.015. (Of course, floats are not exact in many circumstances, so it would not be surprising to see "-0.0149998" or something for a different value).

Maybe you could implement a feature to shorten the representation of the single-precision float type, so that unnecessary digits are not shown?

Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Re: Data inspector: round floats to precision/significant digits

Post by Maël »

Floating point numbers are tricky. Interestingly I have been writing toy programs recently to convert floats to floating point binary strings (not floating point decimal) and back to their IEEE single or double float formats.

This requires more research to be done correctly, for now I'll rely on the Delphi internal StrToFloat/FloatToStr functions.

Some notes follow.


When considering how many digits to display, various factors have to be considered.

One major question is whether round-trip data conversion should merely not introduce errors (i.e., x==StrToFloat(FloatToStr(x)) should be true) and other digits that do not affect this result should be truncated or rounded off.
Or should the actually accurate representation as decimal float be shown, that most closely matches the binary float, even if it would not affect round-trip conversions?

Regarding the first option, Wikipedia (or the referenced paper) claims that:
"If an IEEE 754 single-precision number is converted to a decimal string with at least 9 significant digits, and then converted back to single-precision representation, the final result must match the original number.[5]"
This means: x==StrToFloat(FloatToStr(x)) is true if FloatToStr(x) has at least 9 significant digits.

This site states other values:
https://www.exploringbinary.com/decimal ... t-numbers/

But it also states that for the other round-trip direction only 6 significant digits need to be considered:
If a decimal string with at most 6 significant digits is converted to IEEE 754 single-precision representation, and then converted back to a decimal string with the same number of digits, the final result should match the original string.
This means: x==FloatToStr(StrToFloat(x)) is true if x has at most 6 significant digits and the result or the right hand side has at most 6 significant digits as well.


An example to show the difference between accurate representation, and round-trip data retention:

Consider the value 0x3F800001 which is an IEEE 754 encoded single precision float.

It corresponds to this binary number:
0 01111111 00000000000000000000001

The sign bit is 0, the exponent is 01111111 = 127. Since the exponent is biased by 127 in single precision float format the actual exponent is 127-127 = 0.

Now on to the 3rd binary number, the significand.
Only the right most bit (bit 0) in the significand is set.
bit 0 = 2^-23 * 1 = 0.00000011920928955078125 (exactly)

bit 23 = 2^0 * 1 (implicitly set to 1 for normalized representation)

So the accurate number would be
(1 + 0.00000011920928955078125) * 2^exponent =
1.00000011920928955078125 * 2^0 =
1.00000011920928955078125

Rounding to just 6 digits would make it indistinguishable from 1.0.

Wikipedia (or the referenced paper) claims that:
"If an IEEE 754 single-precision number is converted to a decimal string with at least 9 significant digits, and then converted back to single-precision representation, the final result must match the original number.[5]"

Indeed, when you round "1.00000011920928955078125" to 9 significant digits (1.00000012) it converts to single float format and back to a decimal string correctly.

Some references:
https://www.exploringbinary.com/decimal ... t-numbers/
https://www.exploringbinary.com/maximum ... nt-numbers
https://en.wikipedia.org/wiki/Single-pr ... int_format
Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic (page 4)
https://stackoverflow.com/questions/509 ... -to-string
https://github.com/JackTrapper/Exact-Fl ... g-Routines
https://github.com/rkennedy/exact-float
John Herbsters ExactFloatToStr(x:Extended)
Other useful contributions by John Herbster: https://cc.embarcadero.com/Author/358

https://stackoverflow.com/questions/302 ... r-the-hood

Best explanation and summary:
Good explanation, nice research and literature review (4 papers) on printing floating point numbers, including the reference functions written by David Gay: http://www.ryanjuckett.com/programming/ ... t-numbers/

Two other relevant papers (apparently discussed in the link above):
https://www.cs.indiana.edu/~dyb/pubs/FP ... PLDI96.pdf
Most recent (2010):
https://www.cs.tufts.edu/~nr/cs257/arch ... printf.pdf

Another more recent option used in Swift:
https://github.com/google/double-conversion/issues/27

nneonneo
Posts: 3
Joined: 21 Jul 2018 03:37

Re: Data inspector: round floats to precision/significant digits

Post by nneonneo »

I'm the one who posted the original message.

I think 6 digits was too low in my initial message; indeed, 23 bits of precision may require ~8 decimal digits to display accurately (then add +1 digit for the implicit 1). But this is only an estimate.

I found a library that implements exact round-trip float<->string conversions with proper rounding and minimal representation length: https://github.com/jwiegley/gdtoa (mirrored from the original gdtoa at http://www.netlib.org/fp/). It's written by the same guy (David M. Gay) who implemented the famous "dtoa" algorithm which is used by many systems for printing floats and doubles (for example, the Python programming language uses it to represent their double-precision floats using the minimal possible representation).

For your sample input, g_ffmt prints 1.0000001, which does indeed return 0x3f800001 when parsed with the provided "strtof" function (and also when using C's `sscanf` with "%f").

I hacked up a quick test program that also serves to demonstrate how to use g_ffmt and strtof:

Code: Select all

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include "gdtoa.h"

float reinterpret_int(uint32_t val) {
    float res;
    memcpy(&res, &val, 4);
    return res;
}

uint32_t reinterpret_float(float val) {
    uint32_t res;
    memcpy(&res, &val, 4);
    return res;
}

int ffmt(char *buf, unsigned bufsize, float f) {
    if(g_ffmt(buf, &f, 0, bufsize) == NULL) {
        return -1;
    }
    return 0;
}

void test(uint32_t rep1) {
    float f1 = reinterpret_int(rep1);
    char buf[32];
    ffmt(buf, sizeof(buf), f1);
    float f2 = strtof(buf, NULL);
    float f3;
    sscanf(buf, "%f", &f3);
    uint32_t rep2 = reinterpret_float(f2);
    uint32_t rep3 = reinterpret_float(f3);
    printf("0x%08x %s 0x%08x 0x%08x\n", rep1, buf, rep2, rep3);
}

int main() {
    test(0x3f800000u);
    test(0x3f800001u);
    test(0x3f800002u);
    test(0x58aaaaaau);
}
`gdtoa` can also be used with doubles (use g_dfmt/strtod), and will also produce "minimal" representations there as well.

Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Re: Data inspector: round floats to precision/significant digits

Post by Maël »

Thanks for your feedback, I added some additional references.

Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Re: Data inspector: round floats to precision/significant digits

Post by Maël »

I had a look at the current status of related libraries.

As discussed in https://github.com/google/double-conversion/issues/27 the most current one, that supersedes all previous ones and guarantees a shortest string in all cases, can be found here:
https://github.com/ulfjack/ryu

But as mentioned in this issue https://github.com/ulfjack/ryu/issues/111 there are no string2float functions yet.

A one way direction makes no sense, and since other implementations/algorithms don't provide the shortest string representation always (see the Ryu paper), I will wait until this has matured more.

There is a plugin framework, now, if anybody wants to implement this with the currently available libs:
https://github.com/maelh/hxd-plugin-framework

Until then, this stays on hold.

Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Re: Data inspector: round floats to precision/significant digits

Post by Maël »

Further references and related discussions:

Reddit - Ryu: a new algorithm to quickly convert floating point numbers to decimal strings
Ryu GitHub issue - Losing the battle to convert a string to double
Python - Faster float / string conversion (Ryu)
Microsoft C++ standard library developers: the speedups are indeed massive due to algorithmic improvements of Ryu

Notes from MS talk on <charconv> using Ryu for to_chars(), and bignums for from_chars(): https://github.com/CppCon/CppCon2019/tr ... final_boss
Slides: https://github.com/CppCon/CppCon2019/ra ... n_2019.pdf

Using to_chars() and from_chars() seems to be the easiest option, allowing to rely on further improvements, and providing support for hexfloats. The remaining issue is no support for half floats, yet, which Ryu provides at least for a to_chars()-like function, but no parsing besides for doubles ("from_chars"), which on top of that is only experimental. So the best option is still MS <charconv>.

From https://en.cppreference.com/w/cpp/utility/to_chars :
The guarantee that std::from_chars can recover every floating-point value formatted by to_chars exactly is only provided if both functions are from the same implementation.
Since the C++ standard guarantees round-trip conversion (if to_chars() and from_chars() are from the same implementation) and MS's implementation uses Ryu, which creates the shortest representation, we should be fine.

Unfortunately, MS's charconv implementation does not support Extended/float128 since long double maps to double using static casts.
MS charconv on GitHub:
https://github.com/microsoft/STL/blob/m ... c/charconv

Also useful, notes from MS's application to include their charconv changes to libc++
https://reviews.llvm.org/D70631

These notes point out that float128 is a possible future, as it becomes part of libc++. Using Clang instead of MSVC as compiler for the obj-files, should allow for that (since Clang really supports long double's, while MSVC assumes long double = double), once I decide to support float128 in HxD's datainspector.

Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Re: Data inspector: round floats to precision/significant digits

Post by Maël »

clang 10 supports float80 on x86-64 (not sure if Win64, too, or just Linux):
https://godbolt.org/z/_edvbT

Maël
Site Admin
Posts: 1214
Joined: 12 Mar 2005 14:15

Re: Data inspector: round floats to precision/significant digits

Post by Maël »

Thanks to Rick Regan from exploringbinary.com, who wrote a very useful article on converting decimal strings to floating point numbers (with the best possible accuracy), I have a better understanding how this process works in principle, using BigIntegers.

He also kindly pointed me to an article from Jon Skeet who wrote a converter for the reverse direction. It generates potentially very long decimal strings, but without loosing any accuracy. I had Delphi-code from John Herbster, but it was less easy to follow.

Interesting paper called Ry ̄u Revisited: Printf Floating Point Conversion; even if not providing a final implementation, the pseudocode might be helpful to understand other code.

Post Reply