RFR: 8303623: Compiler should disallow non-standard UTF-8 string encodings [v2]

Sat Mar 18 18:29:21 UTC 2023

On Sat, 18 Mar 2023 18:01:02 GMT, Archie L. Cobbs <duke at openjdk.org> wrote:

>> src/jdk.compiler/share/classes/com/sun/tools/javac/jvm/ClassFile.java line 109:
>> 
>>> 107:     public enum Version {
>>> 108:         V45_3(45, 3), // base level for all attributes
>>> 109:         V48(48, 0),   // JDK 1.4
>> 
>> not sure why we are referring to a previous version
>
> Here's why: Part of this change is to disallow encodings that are longer than necessary, for example, encoding the character `0x0100` as `e0 84 80` instead of `c4 80`. This is in accordance with the current JVMS.
> 
> However, classfiles prior to major version 48 were allowed to contain longer-than-necessary character encodings - most likely, I'm guessing, because of insufficiently strict validation in the early JVM implementations. So when we are parsing a UTF-8 sequence in a classfile, we need to know whether the classfile's major version is before or after 48 to know whether or not we should allow such longer-than-necessary character encodings. If we didn't do this, then we might incorrectly break a compilation where someone was compiling against some very old classfiles.

I see your point is that those class files (version >= 48) won't be accepted by the JVM anyway

-------------

PR: https://git.openjdk.org/jdk/pull/12893