RFR: 8301971: Make JDK source code UTF-8 [v3]

Magnus Ihse Bursie ihse at openjdk.org
Wed Apr 16 09:50:49 UTC 2025


On Tue, 15 Apr 2025 23:20:45 GMT, Sergey Bylokhov <serb at openjdk.org> wrote:

> can we also force this rule by the jcheck?

Well, yes and no. First, we can verify that we do not have invalid UTF-8. That might be a signal that the encoding is wrong. But then this check needs to be able to distinguish between pure binary files that happen to look like improperly encoded UTF-8 files, and actually incorrectly encoded text files. In the end, this is likely to be more of an heuristic for a warning, rather than something we can block integration on.

Secondly, files can have incorrect encodings but still pass as valid UTF-8. Only a human can tell that the content would be incorrect if we were to assume the encoding is UTF-8 instead of e.g. latin-1. This cannot be checked by jcheck, but must be caught by reviewers.

I have beeb thinking, though, to add a warning to jcheck for adding non-ASCII characters to known text file types. As a general rule, this is acceptable but should only be done judiciously, so it would be good to have jcheck point it out. That would also give you an extra chance to verify the encoding.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2809028487


More information about the build-dev mailing list