RFR: 4511638: Double.toString(double) sometimes produces incorrect results [v2]

Sat Oct 9 11:18:08 UTC 2021

On Fri, 16 Apr 2021 11:30:32 GMT, Raffaello Giulietti <github.com+70726043+rgiulietti at openjdk.org> wrote:

>> Hello,
>> 
>> here's a PR for a patch submitted on March 2020 [1](https://cr.openjdk.java.net/~bpb/4511638/webrev.04/) when Mercurial was a thing.
>> 
>> The patch has been edited to adhere to OpenJDK code conventions about multi-line (block) comments. Nothing in the code proper has changed, except for the addition of redundant but clarifying parentheses in some expressions.
>> 
>> 
>> Greetings
>> Raffaello
>
> Raffaello Giulietti has updated the pull request incrementally with one additional commit since the last revision:
> 
>   4511638: Double.toString(double) sometimes produces incorrect results

(comment by Guy Steele)

Hi, Raffaello,

I will try to compose this message in plain ASCII and hope it does not get garbled.

I have now re-read the Schubfach paper, and with the extra information in your previous email, I think I understood it a lot better this time. I am very enthusiastic about the approach. This is exactly what JonL White was trying to puzzle out some five decades ago: just how many digits do you need in your powers of ten to get accurate print-read round-trips?

I am especially appreciative of the attention paid in the Schubfach paper to parameter M, which ensures that two-digit possibilities are considered in cases where it is required to print two significant digits anyway (scientific notation).

I have also read the two papers I could find about Ryu. Schubfach is clearly an improvement over Ryu because it avoids the iteration, among other reasons. But the second paper, about Ryu for printf, addresses the specific difficulties of generating digits for printf format specifiers, and raises the interesting question of whether it would be worthwhile also to design a version of Schubfach that handles printf requirements.

So I think it would be well worth incorporating the Schubfach algorithm into the JDK codebase. I am reassured that Schubfach has undergone extensive testing on trillions of randomly chosen values. But I would be further reassured by a statement that two important classes of values have been _exhaustively_ tested:

(1) All positive powers of 2 representable in the double format, plus the maximum representable value. (These are the values that are surrounded by irregular spacing.) Furthermore, out of caution one should also test every fp value within Z ulps of such a value; perhaps Z should be 64 or even 1024.

(2) All representable fp values that are the result of rounding-to-nearest some decimal value of the form y•10^n, where y is a member of, say, either D_1, D_2, D_3, or D_4. (These are values that may be especially susceptible to generation of too many digits by an insufficiently careful algorithm, and one reason might be insufficient precision or other algorithmic error in connection with a table of powers of ten.) Furthermore, out of caution one should also test every fp value within Y ulps of such a value; perhaps Y should be 64 or even 1024.

In every case, the testing should include (a) ensuring round-trip print-read behavior; (b) comparing to what is produced by the current JDK algorithm, and investigating any cases that differ.

And if other similar “edge cases” can be identified from the structure of the code, those would be worth focusing on as well.

If this testing has been or could be done, I would say, yes, certainly adopt Schubfach for Java. I would also recommend submitting the Schubfach paper to an appropriate conference or journal to get the benefit of further peer review of the algorithm and its write-up.

—Guy

----

(my reply)

Hi Guy,

for some reason your comments still appear garbled on the GitHub PR page and don't make it to the core-libs-dev mailing list at all. Luckily, they appear intelligible in my mailbox, so I'll keep going with prepending your comments in my replies: not ideal but good enough.

Thanks so much for re-reading my "paper".

printf()

There are some issues to consider when trying to apply Schubfach to printf(), the main one being that printf() allows to specify an arbitrary length for the resulting decimal. This means, for example, that unlimited precision arithmetic is unavoidable. But it might be worthwhile to investigate using Schubfach for lengths up to H (9 and 17 for float and double, resp.) and fall back to unlimited precision beyond that.
Before that, however, I would prefer to finally push Schubfach in the OpenJDK codebase for the toString() cases and close this PR.

Tests

Below, by "extensive test" I mean not only that the outcomes convert back without loss of information, but that they fully meet the spec about minimal number of digits, closeness, correct formatting (normal viz scientific), character set, etc.

All currently available tests are in the contributed code of this PR and will be part of the OpenJDK once integrated.

- All powers of 2 and the extreme values are already extensively tested.
- All values closest to powers of 10 are extensively tested.
- All values proposed by Paxson [1] are extensively tested.
- A configurable number of random values are tested at each round (currently 4 * 1'000'000 random values). Should a value fail, there's enough diagnostic information for further investigation.

I'll add extensive tests for the values you propose in point (1) and (2), setting Z = Y = 1024.

As for comparison with the current JDK behavior, there are already a bunch of values for which extensive tests fail on the current JDK but pass with Schubfach.

It would be cumbersome, if possible at all, to have both the current JDK and Schubfach implementations in the same OpenJDK codebase to be able to compare the outcomes. I performed comparisons in a different constellation, with Schubfach as an external library, but this is hardly a possibility in the core-libs. Needless to say, Schubfach outcomes are either the same as in JDK or better (shorter or closest to the fp value).

Peer reviewed publication

Shortening my 27 pages writing and re-formating it to meet a journal standards for publication would require investing yet another substantial amount of time. I'm not sure I'm prepared for that, given that I've no personal interest in a journal publication: I'm not an academic, I'm not pursuing a career...
But I promise I'll think about reconsidering my position on this ;-)

Greetings
Raffaello

----

[1] Paxson V, "A Program for Testing IEEE Decimal-Binary Conversion"

-------------

PR: https://git.openjdk.java.net/jdk/pull/3402