RFR: 8328874: Class::forName0 should validate the class name length early [v13]

Thu Sep 4 18:32:44 UTC 2025

On Thu, 4 Sep 2025 18:19:40 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> src/java.base/share/classes/jdk/internal/util/ModifiedUtf.java line 37:
>> 
>>> 35: public abstract class ModifiedUtf {
>>> 36:     //Max length in Modified UTF-8 bytes for class names.(see max_symbol_length in symbol.hpp)
>>> 37:     public static final int JAVA_CLASSNAME_MAX_LEN = 65535;
>> 
>> max_symbol_length is not just class names - it is presumably the limit for modified UTF-8, as seen in `java.io.DataOutput::writeUTF`. We can just use a more generic name like `MAX_ENCODED_LENGTH`.
>
> There is no maximum length of an encoded UTF-8 string. The "modified UTF-8" is modified because it encodes a zero byte using the 2-byte version so the result never contains a null.  Allowing in some use cases to terminated the encoded UTF-8 bytes using a nul byte.
> In the DataOutput case, it was desirable to provide the length of the encoded bytes to make it easy to read or skip the encoded UTF-8. It improved some stream decoding but increased the cost of writing because the encoded length was needed before writing. It also prevented an exact size allocation before decoding.  In retrospect, it could have provided both the encoded and decoded lengths, saving some allocations.
> In ObjectOutputStream, the stream protocol had both long and short forms because Strings can be much longer.
> The method names and constants are specific to the encoding of **Class** names and that should be reflected in their names.

These are specific to the encoding of all UTF-8 Class File constant too, instead of being Class specific.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26802#discussion_r2323100636