RFR: 8310901: Convert String::newStringNoRepl with Latin-1 to String::newStringLatin1NoRepl [v4]
Glavo
duke at openjdk.org
Tue Jun 27 22:43:05 UTC 2023
On Tue, 27 Jun 2023 21:03:05 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
> The goal of removing the try/catch from HexFormat can be solved locally by creating a private method in HexFormat that swallows the exception. There is no need to add a method to Java Lang Access or change 3 other files. The method `newStringUTF8NoRepl` is different, it can/does throw runtime exceptions but there is no need for try/catch in that case, no change is needed there.
The different functions of `newStringNoRepl` from public APIs are: 1) It will try not to copy the byte array as much as possible; 2) It will throw an exception when there are malformed or unmappable byte sequence.
Currently, `newStringNoRepl` has two use cases: 1) `HexFormat`; 2) `Files.readString`. If #14578 is merged, it will have a third use case.
If we review these use cases, we will realize that `newStringNoRepl` did not make any use cases happy:
* For `HexFormat` and [#14578](https://github.com/openjdk/jdk/pull/14578), `newStringNoRepl` is used to access `String(byte[],byte)` outside of `java.lang`.
In this usage, `newStringNoRepl` is inconvenient to use. Users not only need to pass the charset, but also need to catch `CharacterCodingException`, even though it is actually impossible to occur.
The purpose of this PR is to provide a better way of doing this.
* For `Files.readString`: Although `newStringNoRepl` was originally created for `Files.readString`, `Files.readString` has changed.
Currently `Files.readString` is implemented as follows:
https://github.com/openjdk/jdk/blob/7f094353673f5047643a2d7b512d0de8c665f215/src/java.base/share/classes/java/nio/file/Files.java#L3342-L3350
Obviously, for security reasons, additional check `path.getClass().getModule() != Object.class.getModule()` has been added here.
This makes `newStringNoRepl` awkward here. When reading a file containing non-ASCII characters from a non-default filesystem, an unnecessary copy of the byte array occurs here.
---
Since this method is so lame, I'm planning to do some retrofits:
1. Create a new method `newStringLatin1NoRepl`, use it to access `String(byte[],byte)` outside of `java.lang`;
2. Delete `newStringUTF8NoRepl` and add a new parameter `boolean noShare` to `newStringNoRepl`.
The new signature for `newStringNoRepl` looks like this:
String newStringNoRepl(byte[] bytes, Charset cs, boolean noShare) throws CharacterCodingException;
`newStringUTF8NoRepl(bytes)` can be replaced with `newStringNoRepl(bytes, UTF_8, true)`.
`Files.readString` can be optimized to:
return JLA.newStringNoRepl(readAllBytes(path), cs, path.getClass().getModule() != Object.class.getModule());
This is my idea. Can you take a look?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14655#issuecomment-1610315852
More information about the core-libs-dev
mailing list