RFR: 8310901: Convert String::newStringNoRepl with Latin-1 to String::newStringLatin1NoRepl [v4]

Tue Jun 27 22:43:05 UTC 2023

On Tue, 27 Jun 2023 21:03:05 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

> The goal of removing the try/catch from HexFormat can be solved locally by creating a private method in HexFormat that swallows the exception. There is no need to add a method to Java Lang Access or change 3 other files. The method `newStringUTF8NoRepl` is different, it can/does throw runtime exceptions but there is no need for try/catch in that case, no change is needed there.

The different functions of `newStringNoRepl` from public APIs are: 1) It will try not to copy the byte array as much as possible; 2) It will throw an exception when there are  malformed or unmappable byte sequence.

Currently, `newStringNoRepl` has two use cases: 1) `HexFormat`; 2) `Files.readString`. If #14578 is merged, it will have a third use case.

If we review these use cases, we will realize that `newStringNoRepl` did not make any use cases happy:

* For `HexFormat` and [#14578](https://github.com/openjdk/jdk/pull/14578), `newStringNoRepl` is used to access `String(byte[],byte)` outside of `java.lang`.
  In this usage, `newStringNoRepl` is inconvenient to use. Users not only need to pass the charset, but also need to catch `CharacterCodingException`, even though it is actually impossible to occur.
  The purpose of this PR is to provide a better way of doing this.
* For `Files.readString`: Although `newStringNoRepl` was originally created for `Files.readString`, `Files.readString` has changed.
  Currently `Files.readString` is implemented as follows:

  https://github.com/openjdk/jdk/blob/7f094353673f5047643a2d7b512d0de8c665f215/src/java.base/share/classes/java/nio/file/Files.java#L3342-L3350

Obviously, for security reasons, additional check `path.getClass().getModule() != Object.class.getModule()` has been added here.

This makes `newStringNoRepl` awkward here. When reading a file containing non-ASCII characters from a non-default filesystem, an unnecessary copy of the byte array occurs here.

---

Since this method is so lame, I'm planning to do some retrofits:

1. Create a new method `newStringLatin1NoRepl`, use it to access `String(byte[],byte)` outside of `java.lang`;
2. Delete `newStringUTF8NoRepl` and add a new parameter `boolean noShare` to `newStringNoRepl`.

The new signature for `newStringNoRepl` looks like this:

String newStringNoRepl(byte[] bytes, Charset cs, boolean noShare) throws CharacterCodingException;

`newStringUTF8NoRepl(bytes)` can be replaced with `newStringNoRepl(bytes, UTF_8, true)`.

`Files.readString` can be optimized to:

return JLA.newStringNoRepl(readAllBytes(path), cs, path.getClass().getModule() != Object.class.getModule());

This is my idea. Can you take a look?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14655#issuecomment-1610315852