RFR: 8303623: Compiler should disallow non-standard UTF-8 string encodings [v2]
Archie L. Cobbs
duke at openjdk.org
Sat Mar 18 18:04:20 UTC 2023
On Sat, 18 Mar 2023 04:05:12 GMT, Vicente Romero <vromero at openjdk.org> wrote:
> I think this one needs a CSR as this fix could provoke binary incompatibilities
Good point - thanks. I'll add it.
> src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/ClassFile.java line 109:
>
>> 107: public enum Version {
>> 108: V45_3(45, 3), // base level for all attributes
>> 109: V48(48, 0), // JDK 1.4
>
> not sure why we are referring to a previous version
Here's why: Part of this change is to disallow encodings that are longer than necessary, for example, encoding the character `0x0100` as `e0 84 80` instead of `c4 80`. This is in accordance with the current JVMS.
However, classfiles prior to major version 48 were allowed to contain longer-than-necessary character encodings - most likely, I'm guessing, because of insufficiently strict validation in the early JVM implementations. So when we are parsing a UTF-8 sequence in a classfile, we need to know whether the classfile's major version is before or after 48 to know whether or not we should allow such longer-than-necessary character encodings. If we didn't do this, then we might incorrectly break a compilation where someone was compiling against some very old classfiles.
-------------
PR: https://git.openjdk.org/jdk/pull/12893
More information about the compiler-dev
mailing list