Data inspector: round floats to precision/significant digits

Post by **Maël** » 21 Jul 2018 02:29

The basis of this feature request was a mail sent to me:

One comment I had was that the single-precision floating point values are displayed with too many significant digits. For example, a value might be shown as "-0.0149999996647239", but really it only has 6 significant digits and should be shown as -0.015. (Of course, floats are not exact in many circumstances, so it would not be surprising to see "-0.0149998" or something for a different value).

Maybe you could implement a feature to shorten the representation of the single-precision float type, so that unnecessary digits are not shown?

Post by **Maël** » 21 Jul 2018 02:54

Floating point numbers are tricky. Interestingly I have been writing toy programs recently to convert floats to floating point binary strings (not floating point decimal) and back to their IEEE single or double float formats.

This requires more research to be done correctly, for now I'll rely on the Delphi internal StrToFloat/FloatToStr functions.

Some notes follow.

When considering how many digits to display, various factors have to be considered.

One major question is whether round-trip data conversion should merely not introduce errors (i.e., x==StrToFloat(FloatToStr(x)) should be true) and other digits that do not affect this result should be truncated or rounded off.
Or should the actually accurate representation as decimal float be shown, that most closely matches the binary float, even if it would not affect round-trip conversions?

Regarding the first option, Wikipedia (or the referenced paper) claims that:
"If an IEEE 754 single-precision number is converted to a decimal string with at least 9 significant digits, and then converted back to single-precision representation, the final result must match the original number.[5]"
This means: x==StrToFloat(FloatToStr(x)) is true if FloatToStr(x) has at least 9 significant digits.

This site states other values:
https://www.exploringbinary.com/decimal ... t-numbers/

But it also states that for the other round-trip direction only 6 significant digits need to be considered:

If a decimal string with at most 6 significant digits is converted to IEEE 754 single-precision representation, and then converted back to a decimal string with the same number of digits, the final result should match the original string.

This means: x==FloatToStr(StrToFloat(x)) is true if x has at most 6 significant digits and the result or the right hand side has at most 6 significant digits as well.

An example to show the difference between accurate representation, and round-trip data retention:

Consider the value 0x3F800001 which is an IEEE 754 encoded single precision float.

It corresponds to this binary number:
0 01111111 00000000000000000000001

The sign bit is 0, the exponent is 01111111 = 127. Since the exponent is biased by 127 in single precision float format the actual exponent is 127-127 = 0.

Now on to the 3rd binary number, the significand.
Only the right most bit (bit 0) in the significand is set.
bit 0 = 2^-23 * 1 = 0.00000011920928955078125 (exactly)

bit 23 = 2^0 * 1 (implicitly set to 1 for normalized representation)

So the accurate number would be
(1 + 0.00000011920928955078125) * 2^exponent =
1.00000011920928955078125 * 2^0 =
1.00000011920928955078125

Rounding to just 6 digits would make it indistinguishable from 1.0.

Wikipedia (or the referenced paper) claims that:
"If an IEEE 754 single-precision number is converted to a decimal string with at least 9 significant digits, and then converted back to single-precision representation, the final result must match the original number.[5]"

Indeed, when you round "1.00000011920928955078125" to 9 significant digits (1.00000012) it converts to single float format and back to a decimal string correctly.

Some references:
https://www.exploringbinary.com/decimal ... t-numbers/
https://www.exploringbinary.com/maximum ... nt-numbers
https://en.wikipedia.org/wiki/Single-pr ... int_format
Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic (page 4)
https://stackoverflow.com/questions/509 ... -to-string
https://github.com/JackTrapper/Exact-Fl ... g-Routines
https://github.com/rkennedy/exact-float
John Herbsters ExactFloatToStr(x:Extended)
Other useful contributions by John Herbster: https://cc.embarcadero.com/Author/358

https://stackoverflow.com/questions/302 ... r-the-hood

Best explanation and summary:
Good explanation, nice research and literature review (4 papers) on printing floating point numbers, including the reference functions written by David Gay: http://www.ryanjuckett.com/programming/ ... t-numbers/

Two other relevant papers (apparently discussed in the link above):
https://www.cs.indiana.edu/~dyb/pubs/FP ... PLDI96.pdf
Most recent (2010):
https://www.cs.tufts.edu/~nr/cs257/arch ... printf.pdf

Another more recent option used in Swift:
https://github.com/google/double-conversion/issues/27

nneonneo · Post by **nneonneo** » 21 Jul 2018 03:46

I'm the one who posted the original message.

I think 6 digits was too low in my initial message; indeed, 23 bits of precision may require ~8 decimal digits to display accurately (then add +1 digit for the implicit 1). But this is only an estimate.

I found a library that implements exact round-trip float<->string conversions with proper rounding and minimal representation length: https://github.com/jwiegley/gdtoa (mirrored from the original gdtoa at http://www.netlib.org/fp/). It's written by the same guy (David M. Gay) who implemented the famous "dtoa" algorithm which is used by many systems for printing floats and doubles (for example, the Python programming language uses it to represent their double-precision floats using the minimal possible representation).

For your sample input, g_ffmt prints 1.0000001, which does indeed return 0x3f800001 when parsed with the provided "strtof" function (and also when using C's `sscanf` with "%f").

I hacked up a quick test program that also serves to demonstrate how to use g_ffmt and strtof:

Code: Select all

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include "gdtoa.h"

float reinterpret_int(uint32_t val) {
    float res;
    memcpy(&res, &val, 4);
    return res;
}

uint32_t reinterpret_float(float val) {
    uint32_t res;
    memcpy(&res, &val, 4);
    return res;
}

int ffmt(char *buf, unsigned bufsize, float f) {
    if(g_ffmt(buf, &f, 0, bufsize) == NULL) {
        return -1;
    }
    return 0;
}

void test(uint32_t rep1) {
    float f1 = reinterpret_int(rep1);
    char buf[32];
    ffmt(buf, sizeof(buf), f1);
    float f2 = strtof(buf, NULL);
    float f3;
    sscanf(buf, "%f", &f3);
    uint32_t rep2 = reinterpret_float(f2);
    uint32_t rep3 = reinterpret_float(f3);
    printf("0x%08x %s 0x%08x 0x%08x\n", rep1, buf, rep2, rep3);
}

int main() {
    test(0x3f800000u);
    test(0x3f800001u);
    test(0x3f800002u);
    test(0x58aaaaaau);
}

`gdtoa` can also be used with doubles (use g_dfmt/strtod), and will also produce "minimal" representations there as well.

Post by **Maël** » 21 Jul 2018 10:26

Thanks for your feedback, I added some additional references.

Post by **Maël** » 23 May 2019 18:57

I had a look at the current status of related libraries.

As discussed in https://github.com/google/double-conversion/issues/27 the most current one, that supersedes all previous ones and guarantees a shortest string in all cases, can be found here:
https://github.com/ulfjack/ryu

But as mentioned in this issue https://github.com/ulfjack/ryu/issues/111 there are no string2float functions yet.

A one way direction makes no sense, and since other implementations/algorithms don't provide the shortest string representation always (see the Ryu paper), I will wait until this has matured more.

There is a plugin framework, now, if anybody wants to implement this with the currently available libs:
https://github.com/maelh/hxd-plugin-framework

Until then, this stays on hold.

Post by **Maël** » 29 Mar 2020 01:32

Further references and related discussions:

Reddit - Ryu: a new algorithm to quickly convert floating point numbers to decimal strings
Ryu GitHub issue - Losing the battle to convert a string to double
Python - Faster float / string conversion (Ryu)
Microsoft C++ standard library developers: the speedups are indeed massive due to algorithmic improvements of Ryu

Notes from MS talk on <charconv> using Ryu for to_chars(), and bignums for from_chars(): https://github.com/CppCon/CppCon2019/tr ... final_boss
Slides: https://github.com/CppCon/CppCon2019/ra ... n_2019.pdf

Using to_chars() and from_chars() seems to be the easiest option, allowing to rely on further improvements, and providing support for hexfloats. The remaining issue is no support for half floats, yet, which Ryu provides at least for a to_chars()-like function, but no parsing besides for doubles ("from_chars"), which on top of that is only experimental. So the best option is still MS <charconv>.

From https://en.cppreference.com/w/cpp/utility/to_chars :

The guarantee that std::from_chars can recover every floating-point value formatted by to_chars exactly is only provided if both functions are from the same implementation.

Since the C++ standard guarantees round-trip conversion (if to_chars() and from_chars() are from the same implementation) and MS's implementation uses Ryu, which creates the shortest representation, we should be fine.

Unfortunately, MS's charconv implementation does not support Extended/float128 since long double maps to double using static casts.
MS charconv on GitHub:
https://github.com/microsoft/STL/blob/m ... c/charconv

Also useful, notes from MS's application to include their charconv changes to libc++
https://reviews.llvm.org/D70631

These notes point out that float128 is a possible future, as it becomes part of libc++. Using Clang instead of MSVC as compiler for the obj-files, should allow for that (since Clang really supports long double's, while MSVC assumes long double = double), once I decide to support float128 in HxD's datainspector.

Post by **Maël** » 12 May 2020 08:39

clang 10 supports float80 on x86-64 (not sure if Win64, too, or just Linux):
https://godbolt.org/z/_edvbT

Post by **Maël** » 13 May 2020 09:46

Thanks to Rick Regan from exploringbinary.com, who wrote a very useful article on converting decimal strings to floating point numbers (with the best possible accuracy), I have a better understanding how this process works in principle, using BigIntegers.

He also kindly pointed me to an article from Jon Skeet who wrote a converter for the reverse direction. It generates potentially very long decimal strings, but without loosing any accuracy. I had Delphi-code from John Herbster, but it was less easy to follow.

Interesting paper called Ry ̄u Revisited: Printf Floating Point Conversion; even if not providing a final implementation, the pseudocode might be helpful to understand other code.

Post by **Maël** » 13 Feb 2021 15:41

The next release of GCC and LLVM have the necessary support for round-trip conversion, by choosing rounding to nearest and closest values. At least the versions in development have it implemented now. GCC has it for float80 as well, and has float80 support under Windows, even on x86-64.
The only pecularity with GCC is that float80 is padded to 16 bytes, so the first 10 bytes are a float80 as usual, and match Extended in Delphi, then 6 more padding bytes follow. This has to be considered in passing arguments. But ideally I would pass a byte array anyways in function calls, since Delphi x86-64 does not support Extended anymore.

Judging from the commit messages / patch comments for GCC, they use Ryu's code, for the round-trip conversion, so everything should look fine.

Waiting for the next releases of GCC (11) and LLVM (12), then using the conversion functions and linking to them with a bit of glue code from Delphi, should solve this issue. At least compiling code that uses __float80/long double and std::from_chars/std::to_chars with the trunk version of GCC and LLVM on https://godbolt.org/ works well, while GCC 10 and LLVM 11 still fail.

Also support hexfloats then.

Edit (9.4.2021): LLVM/Clang is less well suited as it will rely on the platform libraries (glibcxx on Linux, and MSVCRT on Windows). Since we want to support 80 bit floats, and MSVC does not support them for x86_64, we will use GCC instead, which is supposed to provide this support, also under Windows.

Post by **Maël** » 21 Apr 2021 19:18

Now that LLVM/Clang 12.0.0 is out, it is confirmed that there is still no full charconv support, since it relies on glibcxx from GCC (which is still at 10.3.0), which does not yet support floats. Under Windows LLVM relies on MSVCRT which does not support float80.

Using -stdlib=libc++ to ensure LLVM's own c++ library is used still fails to compile under Linux, and has no effect under Windows.

Relying on GCC 11 to be ported to Windows is the only option unless LLVM starts implementing charconv properly on Windows.

Post by **Maël** » 28 Apr 2021 08:06

Finally, GCC 11.1 was released. It is not yet ported to Windows/MYSYS2/Cygwin, but an experimental build is available at:
https://github.com/brechtsanders/winlib ... /releases/

Post by **Maël** » 29 Apr 2021 08:58

Understanding Ryu: https://www.youtube.com/watch?v=kw-U6smcLzk
https://dl.acm.org/doi/pdf/10.1145/3192366.3192369
Reverse direction: https://www.youtube.com/watch?v=8afbTaA-gOQ
https://www.youtube.com/watch?v=LXF-wcoeT0o

References to correctly and precisely implement %e, %f, %g, and str(float) from Python:
https://www.cplusplus.com/reference/cstdio/printf/
https://docs.python.org/3.6/library/str ... i-language
https://stackoverflow.com/questions/541 ... ifier-mean
https://docs.python.org/3/tutorial/floatingpoint.html

https://stackoverflow.com/questions/104 ... t-sequence

Linking obj files from gcc/g++, strip incompatible parts for Delphi/extract obj files from lib files:
https://gcc.gnu.org/onlinedocs/gccint/I ... y-routines
https://gcc.gnu.org/onlinedocs/gcc-6.2. ... Types.html
https://github.com/ulfjack/ryu
https://lifeinhex.com/linking-omf-objec ... th-delphi/
https://stackoverflow.com/questions/542 ... elphi-prog
https://www.agner.org/optimize/

Ryu will return a decimal floating point number in normalized exponential notation (aka. traditional scientific notation):
D,dddd...*10^exp
where D is a digit from 1..9, and d a digit from 0..9.
So the leading digit is always larger than 0, and the comma directly follows the first digit.
Fixed floating point is achieved by scaling by 10^exp such that we end up with an exponential representation with 10^0=1.

Post by **Maël** » 01 May 2021 22:03

Important issue regarding issue and printing floating points with a precision that exceeds the one provided by the shortest roundtripping representation compued by Ryu:
https://github.com/ulfjack/ryu/issues/27

Post by **Maël** » 01 May 2021 22:13

I wrote a wrapper to use Ryu from Delphi, which works well, including printing shortest round-tripping float representation, using locale information (FormatSettings.DecimalSeparator).

The issue is however I would need to rely on the other Delphi functions for printing arbitrary precision floating point numbers, as mentioned earlier, and the Delphi implementation is not exact/precise, and does not support float80 or float16.

Converting back from floating point string to binary representation would equally rely on the imprecise Delphi RTL functions.

I'll try using C++ 17 charconv for that, or a modified version. So far it takes links in too much unrelated code and is not customlizable enough.

Post by **Maël** » 05 May 2021 11:04

ULP: https://stackoverflow.com/questions/439 ... -precision

https://matthew-brett.github.io/teachin ... error.html
https://matthew-brett.github.io/teachin ... ting-point

https://ciechanow.ski/exposing-floating-point/

mh-nexus.de

Data inspector: round floats to precision/significant digits

Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits

Re: Data inspector: round floats to precision/significant digits