RFC: Set UTF-8 as source file encoding on Windows

Yasumasa Suenaga suenaga at oss.nttdata.com
Tue Feb 7 11:25:42 UTC 2023


Hi all,

We are discussing about source file encoding in PR #12436 [1]

I saw some C4819 warnings on Windows when I tried to build OpenJDK on Windows with Japanese locale (CP932). C4819 means the source file contains characters which cl.exe cannot be handled in the current code page (CP932 in my case).

I proposed to suppress C4819 in PR #12436, #12437 [2], and #12435 [3]. I heared JDK folks have discussed about source file encoding in some times, and it looks like that we expect UTF-8.
So I want to propose to add `-utf-8` to CFLAGS for Windows. What do you think?

The change is here: https://github.com/YaSuenag/jdk/commit/272678f8f0a74d893d98b507f2c0562bff900b9d


In GCC, the compiler expects UTF-8 as a source file encoding [4].
OTOH cl.exe will use current user code page when the source does not have BOM [5] in Windows. So I think we should think about Linux (in other platforms eg macOS, I guess we can ignore because we haven't see any reports which relate to the locale, and they can be set the locale straightly - WSL cannot do so).

This proposal affects all native components in JDK, so I want to discuss about this topic before filing this to JBS and sending PR for this.


And also I think we should describe about source file encoding in some place. It may be "Operating System Requirements" in building.md . Let me know if better place.


Thanks,

Yasumasa



[1] https://github.com/openjdk/jdk/pull/12436
[2] https://github.com/openjdk/jdk/pull/12437
[3] https://github.com/openjdk/jdk/pull/12435
[4] https://gcc.gnu.org/onlinedocs/gcc-12.2.0/cpp/Character-sets.html
[5] https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170



More information about the build-dev mailing list