RFR: 8282773: Refactor parsing of integer VM options [v2]

Thu Mar 10 02:23:48 UTC 2022

On Wed, 9 Mar 2022 22:16:43 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Background:
>> 
>> When a VM option is given an integer value (e.g., `-XX:ActiveProcessorCount=123`), we call the `set_numeric_flag` function in arguments.cpp. The old implementation always calls `strtoull()` to get an unsigned 64-bit number, and then tries to cast/truncate the result to the target type, performing sign conversion as needed.
>> 
>> This is cumbersome and buggy, because the target type (such as `size_t`) may have different bit-widths on different platforms. Also, manual conversion of unsigned to sign values can run into corner cases.
>> 
>> The fix:
>> 
>> - Use C++ templates and SFINAE to automatically choose the correct `parse_integer_impl()` function for parsing the input string. The VM should avoid doing cast and sign conversion as much as possible.
>> - Use a similar technique to write a comprehensive set of test cases to validate the implementation. (This has revealed several failures in the old implementation).
>> 
>> Notes on K/M/G/T suffix handling:
>> 
>> For input like `-XX:ActiveProcessorCount=2048m`, the old implementation checks for overflow with `(n * suffix / suffix) == n`. This works for unsigned types because the [C++ specification](https://en.cppreference.com/w/cpp/language/operator_arithmetic) allows unsigned arithmetic to overflow. However, overflowing a signed integer is UB. Therefore, I wrote a new function `multiply_by_1k()` to check for overflow without potentially undefined behavior. Thanks to @kimbarrett for helping me come up with the solution.
>> 
>> Tests:
>> 
>> tiers 1-2 passed. tiers 3-5 in progress.
>
> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision:
> 
>  - Merge branch 'master' into new-interger-argument-parser
>  - fixed handling of errno
>  - add #include <limits> for std::numeric_limits<T>
>  - avoid UB in multiply_by_1k(); more clean-up
>  - avoid UB operations in k/m/g/t muplication
>  - refactored test cases
>  - step3
>  - step2
>  - tmp1

Hi Ioi,

This looks good I do have a few comments below.

Thanks,
David

src/hotspot/share/runtime/arguments.cpp line 751:

> 749: static bool parse_integer_impl(const char *s, char **endptr, int base, T* result) {
> 750:   // Can't use strtol because it doesn't detect (at least on Linux/gcc) overflowing
> 751:   // input such as "0x123456789" or "-0x800000001"

This doesn't seem to be true in general. The Linux manpage indicates overflow and underflow are detected. And a simple test program confirms:

 > ./conversion
Overflow detected okay for 0x123456789 (val = 2147483647)
Overflow detected okay for -0x800000001 (val = -2147483648)

src/hotspot/share/runtime/arguments.cpp line 945:

> 943:   } else if (flag->is_double()) {
> 944:     jlong v;
> 945:     if (parse_integer(value, &v)) {

I don't see how this can parse floating-point values ???

test/hotspot/gtest/runtime/test_arguments.cpp line 292:

> 290: }
> 291: 
> 292: #define INTEGER_TEST_TABLE(f) \

This looks extensive but it is hard to tell all the necessary boundary cases are covered. It would be clearer to see things like min_x-1 and max_x+1.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7763