RFR: 8303623: Compiler should disallow non-standard UTF-8 string encodings [v2]
Vicente Romero
vromero at openjdk.org
Sat Mar 18 18:29:21 UTC 2023
On Sat, 18 Mar 2023 18:01:02 GMT, Archie L. Cobbs <duke at openjdk.org> wrote:
>> src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/ClassFile.java line 109:
>>
>>> 107: public enum Version {
>>> 108: V45_3(45, 3), // base level for all attributes
>>> 109: V48(48, 0), // JDK 1.4
>>
>> not sure why we are referring to a previous version
>
> Here's why: Part of this change is to disallow encodings that are longer than necessary, for example, encoding the character `0x0100` as `e0 84 80` instead of `c4 80`. This is in accordance with the current JVMS.
>
> However, classfiles prior to major version 48 were allowed to contain longer-than-necessary character encodings - most likely, I'm guessing, because of insufficiently strict validation in the early JVM implementations. So when we are parsing a UTF-8 sequence in a classfile, we need to know whether the classfile's major version is before or after 48 to know whether or not we should allow such longer-than-necessary character encodings. If we didn't do this, then we might incorrectly break a compilation where someone was compiling against some very old classfiles.
I see your point is that those class files (version >= 48) won't be accepted by the JVM anyway
-------------
PR: https://git.openjdk.org/jdk/pull/12893
More information about the compiler-dev
mailing list