RFR: 4511638: Double.toString(double) sometimes produces incorrect results [v2]
Raffaello Giulietti
duke at openjdk.java.net
Wed Oct 13 08:37:54 UTC 2021
On Fri, 16 Apr 2021 11:30:32 GMT, Raffaello Giulietti <duke at openjdk.java.net> wrote:
>> Hello,
>>
>> here's a PR for a patch submitted on March 2020 [1](https://cr.openjdk.java.net/~bpb/4511638/webrev.04/) when Mercurial was a thing.
>>
>> The patch has been edited to adhere to OpenJDK code conventions about multi-line (block) comments. Nothing in the code proper has changed, except for the addition of redundant but clarifying parentheses in some expressions.
>>
>>
>> Greetings
>> Raffaello
>
> Raffaello Giulietti has updated the pull request incrementally with one additional commit since the last revision:
>
> 4511638: Double.toString(double) sometimes produces incorrect results
(Replies from Guy Steele)
> Hi Guy,
>
> for some reason your comments still appear garbled on the GitHub PR page and don't make it to the core-libs-dev mailing list at all. Luckily, they appear intelligible in my mailbox, so I'll keep going with prepending your comments in my replies: not ideal but good enough.
>
> Thanks so much for re-reading my "paper".
>
> printf()
>
> There are some issues to consider when trying to apply Schubfach to printf(), the main one being that printf() allows to specify an arbitrary length for the resulting decimal. This means, for example, that unlimited precision arithmetic is unavoidable. But it might be worthwhile to investigate using Schubfach for lengths up to H (9 and 17 for float and double, resp.) and fall back to unlimited precision beyond that.
> Before that, however, I would prefer to finally push Schubfach in the OpenJDK codebase for the toString() cases and close this PR.
I completely agree that using Schubfach to solve only the toString() problems would be a _major_ improvement in the situation, and this should not wait for exploration of the printf problem. But I suspect that using Schubfach for lengths up to H would cover a very large fraction of actual usage, and would improve both quality and speed, and therefore would be worth exploring later.
> Tests
>
> Below, by "extensive test" I mean not only that the outcomes convert back without loss of information, but that they fully meet the spec about minimal number of digits, closeness, correct formatting (normal viz scientific), character set, etc.
>
> All currently available tests are in the contributed code of this PR and will be part of the OpenJDK once integrated.
>
> * All powers of 2 and the extreme values are already extensively tested.
> * All values closest to powers of 10 are extensively tested.
> * All values proposed by Paxson [1] are extensively tested.
I have now read through the Paxson paper. Does this refer to the values listed in his Tables 3 and 4, or to other values instead or in addition?
> * A configurable number of random values are tested at each round (currently 4 * 1'000'000 random values). Should a value fail, there's enough diagnostic information for further investigation.
>
> I'll add extensive tests for the values you propose in point (1) and (2), setting Z = Y = 1024.
>
I do think that would lend further confidence.
> As for comparison with the current JDK behavior, there are already a bunch of values for which extensive tests fail on the current JDK but pass with Schubfach.
Yes, thanks for supplying some of those.
> It would be cumbersome, if possible at all, to have both the current JDK and Schubfach implementations in the same OpenJDK codebase to be able to compare the outcomes. I performed comparisons in a different constellation, with Schubfach as an external library, but this is hardly a possibility in the core-libs. Needless to say, Schubfach outcomes are either the same as in JDK or better (shorter or closest to the fp value).
Okay.
I will mention here, for the record, that there is one other sort of test that could be performed that I think I have not yet seen you mention: a monotonicity test of the kind used by David Hough’s Testbase (mentioned by Paxson). However, a little thought reveals that such a test made unnecessary by the round-trip test. So a monotonicity test would be a good idea when testing printf case, but is not needed for the toString case.
Therefore, if you add the few tests I have suggested, I think that we can say with the best certainty we can have, short of separately testing every possible double value, that Schubfach is extremely well tested and ready for adoption into Java.
> Peer reviewed publication
>
> Shortening my 27 pages writing and re-formating it to meet a journal standards for publication would require investing yet another substantial amount of time. I'm not sure I'm prepared for that, given that I've no personal interest in a journal publication: I'm not an academic, I'm not pursuing a career...
> But I promise I'll think about reconsidering my position on this ;-)
>
Please do think about reconsidering.
There are several reasons to publish an “academic” paper:
- Earning “merit badges” that lead to academic tenure
- Reputation more generally—which maybe you don’t care much about, but it’s one way to ensure that the contributions of Dmitry Nadezhin are not forgotten
- Making sure that the technical ideas are not lost: an academic journal provides a “permanent home” for documentation and a search engine
- Provides a place for others who build on the work to cite
- The publication process engages other minds and eyeballs that may improve the writeup, sometimes in surprisingly good ways (in particular, I am sure that good referees would insist that you include all the details about testing that I had to drag out of you over the course of several email exchanges—if this information had beennin the original writeup, I should simply have said, “Great! All done! Go for it!”).
There are many different archival venues with different tradeoffs.
- Publishing at arxiv.org<http://arxiv.org> is free, takes very little time, imposes no special formatting restrictions, has no page limit (so you don’t have to shorten anything), and has no review process. I would recommend doing this much right away.
- Some academic journals have no page limit, or have page limits up around 50 pages; this takes time but would give you rigorous peer review (multiple rounds if necessary).
- Conferences do have page limits (anywhere from 5 to 30 pages, depending on the conference), but take less time and give you some peer review.
If this code goes into the Java codebase, then first and foremost I want to make sure that some version of your complete writeup, expanded to describe the testing procedures, remains available to those who will have to maintain the code decades into the future. Second, I would like to make it easy for implementors of other programming languages to adopt this solution also. This is too important a problem and EVERY programming language that supports floating-point values must solve it. I want to help make sure that from now on it will be solved well, and right now Schubfach is the best solution I have seen.
—Guy
-------------
PR: https://git.openjdk.java.net/jdk/pull/3402
More information about the core-libs-dev
mailing list