RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
Magnus Ihse Bursie
ihse at openjdk.java.net
Thu Sep 30 10:04:35 UTC 2021
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu <jiefu at openjdk.org> wrote:
> Hi all,
>
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
>
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided
>
>
> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4].
>
> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`).
> And I think it's safe to do so.
>
> This is because:
> 1) There are actually no non-ASCII chars for package/class/method/signature names.
> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`.
>
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
>
> So I suggest to remove these non-ASCII code to make HotSpot to be more portable.
> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>
> Testing:
> - Build tests on Windows
> - tier1~3 on Linux/x64
>
> Thanks.
> Best regards,
> Jie
>
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
Some misc remarks from a build PoV:
* We count language and region settings as a build environment requirement, not a portability issue.
* It is really a shame that Microsoft is making changes to these so darned hard. On all other platforms, LC_ALL=C in the make file just fixes the problem. :-(
* But we do want the JDK to be easy to build, so this means that we might need to support building on more than en_US on Windows, at least until Microsoft get's their act together.
>From what I see in the discussion here there seems to be no clarity in what range of character the specification allows. This needs to be absolutely clear for any changes here -- we can't filter out legal characters just because they are problematic to build on non en_US platforms.
However, I'm thinking that you need to take a step back and see what you are really trying to solve. To me, it seems that sscanf is not the right tool for the job, and the fact that it has worked until now is more a lucky coincidence. It seems, from a quick glance, that you should consider the input a byte array, and process it like that, instead of a string, if the encoding is unclear, and the spec is talking about character values (like 0x7f) rather than what characters they are supposed to represent in a specific encoding.
-------------
PR: https://git.openjdk.java.net/jdk/pull/5704
More information about the hotspot-compiler-dev
mailing list