From naoto at openjdk.org Fri Jan 2 19:03:04 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 2 Jan 2026 19:03:04 GMT Subject: RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests Message-ID: Removing `static` from the JUnit test cases so that they are executed ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/29021/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29021&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374433 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29021/head:pull/29021 PR: https://git.openjdk.org/jdk/pull/29021 From iris at openjdk.org Fri Jan 2 19:07:53 2026 From: iris at openjdk.org (Iris Clark) Date: Fri, 2 Jan 2026 19:07:53 GMT Subject: RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: References: Message-ID: On Fri, 2 Jan 2026 18:55:33 GMT, Naoto Sato wrote: > Removing `static` from the JUnit test cases so that they are executed Marked as reviewed by iris (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29021#pullrequestreview-3623731349 From joehw at openjdk.org Fri Jan 2 19:37:02 2026 From: joehw at openjdk.org (Joe Wang) Date: Fri, 2 Jan 2026 19:37:02 GMT Subject: RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: References: Message-ID: On Fri, 2 Jan 2026 18:55:33 GMT, Naoto Sato wrote: > Removing `static` from the JUnit test cases so that they are executed Marked as reviewed by joehw (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29021#pullrequestreview-3623770093 From duke at openjdk.org Sun Jan 4 06:09:09 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sun, 4 Jan 2026 06:09:09 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Sat, 26 Jul 2025 10:10:40 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with four additional commits since the last revision: > > - Update `@bug` in correct file > - Add default implementation on codePointCount in CharSequence > - Update `@bug` entries in test class doc comments > - Discard changes on code whose form is not `str.codePointCount(0, str.length())` I had Copilot (GPT-4.1) create a draft: > ## Summary > > Add a no-argument `codePointCount()` method to `CharSequence` and `String` to count the number of Unicode code points in the entire sequence or string. > > ## Problem > > Currently, `String.codePointCount` and `CharSequence.codePointCount` only provide an overload that requires start and end indices. Developers often expect an overload with no arguments that returns the code point count of the entire string or sequence. Without this, developers resort to verbose or less efficient workarounds, such as using `codePoints().count()` (which yields every code point, adding unnecessary overhead) or calling `codePointCount(0, str.length())` (which is more verbose, requires a temporary variable, and performs an extra boundary check). > > A common use case involves enforcing maximum character limits on user input, particularly for fields stored in databases such as MySQL or PostgreSQL. Both database systems can consider the declared length of `VARCHAR(n)` columns as the number of Unicode code points, not just the number of `char` units or bytes for character sets like UTF-8 or UTF8MB4. Correctly counting code points is essential for supporting internationalized input, emoji, and non-BMP characters. For example, the NIST SP 800-63B guideline specifies that passwords should be checked in terms of the number of Unicode code points. > > ## Solution > > Introduce default no-argument `codePointCount()` methods in both the `CharSequence` interface and the `String` class. The new method returns the number of Unicode code points in the entire character sequence, equivalent to invoking `codePointCount(0, length())`, but provides better readability and avoids unnecessary overhead. The implementation in `CharSequence` is a default method, while `String` provides an explicit override for potential performance optimization. > > ## Specification > > Add to `java.lang.CharSequence` interface: > ```java > /** > * Returns the number of Unicode code points in this character sequence. > * Equivalent to {@code codePointCount(0, length())}. > * > * @return the number of Unicode code points in this sequence > * @since N > */ > default int codePointCount() { > return codePointCount(0, length()); > } > ``` > > Add to `java.lang.String` class: > ```java > /** > * Returns the number of Unicode code points in this string. > * Equivalent to {@code codePointCount(0, length())}. > * > * @return the number of Unicode code points in this string > * @since N > */ > @Override > public int codePointCount() { > return codePointCount(0, length()); > } > ``` > > Here, `N` refers to the next Java platform version in which this change will be available. > > Informative Supplement: > > - Implementation: [GitHub PR 26461](https://github.com/openjdk/jdk/pull/26461) > - Example use cases: > ```java > // For user names stored in MySQL (or PostgreSQL) VARCHAR(20), which counts code points: > if (userName.codePointCount() > 20) { > IO.println("The user name is too long to store in VARCHAR(20) in utf8mb4 MySQL/PostgreSQL!"); > } > // Password policy: require at least 8 Unicode characters (code points) as per NIST SP 800-63B: > if (password.codePointCount() < 8) { > IO.println("Password is too short!"); > } > ``` > > References: > - [MySQL VARCHAR documentation](https://dev.mysql.com/doc/refman/8.0/en/char.html) > - [PostgreSQL Character Types](https://www.postgresql.org/docs/current/datatype-character.html) > - [NIST SP 800-63B ?5.1.1.2](https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver)
Markdown Source ## Summary Add a no-argument `codePointCount()` method to `CharSequence` and `String` to count the number of Unicode code points in the entire sequence or string. ## Problem Currently, `String.codePointCount` and `CharSequence.codePointCount` only provide an overload that requires start and end indices. Developers often expect an overload with no arguments that returns the code point count of the entire string or sequence. Without this, developers resort to verbose or less efficient workarounds, such as using `codePoints().count()` (which yields every code point, adding unnecessary overhead) or calling `codePointCount(0, str.length())` (which is more verbose, requires a temporary variable, and performs an extra boundary check). A common use case involves enforcing maximum character limits on user input, particularly for fields stored in databases such as MySQL or PostgreSQL. Both database systems can consider the declared length of `VARCHAR(n)` columns as the number of Unicode code points, not just the number of `char` units or bytes for character sets like UTF-8 or UTF8MB4. Correctly counting code points is essential for supporting internationalized input, emoji, and non-BMP characters. For example, the NIST SP 800-63B guideline specifies that passwords should be checked in terms of the number of Unicode code points. ## Solution Introduce default no-argument `codePointCount()` methods in both the `CharSequence` interface and the `String` class. The new method returns the number of Unicode code points in the entire character sequence, equivalent to invoking `codePointCount(0, length())`, but provides better readability and avoids unnecessary overhead. The implementation in `CharSequence` is a default method, while `String` provides an explicit override for potential performance optimization. ## Specification Add to `java.lang.CharSequence` interface: /** * Returns the number of Unicode code points in this character sequence. * Equivalent to {@code codePointCount(0, length())}. * * @return the number of Unicode code points in this sequence * @since N */ default int codePointCount() { return codePointCount(0, length()); } Add to `java.lang.String` class: /** * Returns the number of Unicode code points in this string. * Equivalent to {@code codePointCount(0, length())}. * * @return the number of Unicode code points in this string * @since N */ @Override public int codePointCount() { return codePointCount(0, length()); } Here, `N` refers to the next Java platform version in which this change will be available. Informative Supplement: - Implementation: [GitHub PR 26461](https://github.com/openjdk/jdk/pull/26461) - Example use cases: ```java // For user names stored in MySQL (or PostgreSQL) VARCHAR(20), which counts code points: if (userName.codePointCount() > 20) { IO.println("The user name is too long to store in VARCHAR(20) in utf8mb4 MySQL/PostgreSQL!"); } // Password policy: require at least 8 Unicode characters (code points) as per NIST SP 800-63B: if (password.codePointCount() < 8) { IO.println("Password is too short!"); } ``` References: - [MySQL VARCHAR documentation](https://dev.mysql.com/doc/refman/8.0/en/char.html) - [PostgreSQL Character Types](https://www.postgresql.org/docs/current/datatype-character.html) - [NIST SP 800-63B ?5.1.1.2](https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver)
Needs to be fixed: > for character sets like UTF-8 or UTF8MB4. ? "for character sets like UTF-8 (utf8mb4 in MySQL)." ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3707771006 From duke at openjdk.org Sun Jan 4 07:25:06 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sun, 4 Jan 2026 07:25:06 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: <3JOprtfyEvBcaGwc36e-4qD7pXLFJ_bWYce2CHerALQ=.95148986-fa08-4ec7-93d9-0014d075abb0@github.com> References: <3JOprtfyEvBcaGwc36e-4qD7pXLFJ_bWYce2CHerALQ=.95148986-fa08-4ec7-93d9-0014d075abb0@github.com> Message-ID: On Sat, 8 Nov 2025 13:53:27 GMT, Chen Liang wrote: >> The CSR text is not modified from the boilerplate, but I have no authority to modify it. > > Hi @tats-u, if you can provide the text for the CSR, I can help upload your text to the JBS. Unfortunately you need a JBS account in order to update the CSR. @liach Fixed: > ## Summary > > Add a no-argument `codePointCount()` method to `CharSequence` and `String` to count the number of Unicode code points in the entire sequence or string. > > ## Problem > > Currently, `String.codePointCount` and `CharSequence.codePointCount` only provide an overload that requires start and end indices. Developers often expect an overload with no arguments that returns the code point count of the entire string or sequence. Without this, developers resort to verbose or less efficient workarounds, such as using `codePoints().count()` (which yields every code point, adding unnecessary overhead) or calling `codePointCount(0, str.length())` (which is more verbose, requires a temporary variable, and performs an extra boundary check). > > A common use case involves enforcing maximum character limits on user input, particularly for fields stored in databases such as MySQL or PostgreSQL. Both database systems can consider the declared length of `VARCHAR(n)` columns as the number of Unicode code points, not just the number of `char` units or bytes for character sets like UTF-8 (utf8mb4 in MySQL). Correctly counting code points is essential for supporting internationalized input, emoji, and non-BMP characters. For example, the NIST SP 800-63B guideline specifies that passwords should be checked in terms of the number of Unicode code points. > > ## Solution > > Introduce default no-argument `codePointCount()` methods in both the `CharSequence` interface and the `String` class. The new method returns the number of Unicode code points in the entire character sequence, equivalent to invoking `codePointCount(0, length())`, but provides better readability and avoids unnecessary overhead. The implementation in `CharSequence` is a default method, while `String` provides an explicit override for potential performance optimization. > > ## Specification > > Add to `java.lang.CharSequence` interface: > ```java > /** > * Returns the number of Unicode code points in this character sequence. > * Equivalent to {@code codePointCount(0, length())}. > * > * @return the number of Unicode code points in this sequence > * @since N > */ > default int codePointCount() { > return codePointCount(0, length()); > } > ``` > > Add to `java.lang.String` class: > ```java > /** > * Returns the number of Unicode code points in this string. > * Equivalent to {@code codePointCount(0, length())}. > * > * @return the number of Unicode code points in this string > * @since N > */ > @Override > public int codePointCount() { > return codePointCount(0, length()); > } > ``` > > Here, `N` refers to the next Java platform version in which this change will be available. > > Informative Supplement: > > - Implementation: [GitHub PR 26461](https://github.com/openjdk/jdk/pull/26461) > - Example use cases: > ```java > // For user names stored in MySQL (or PostgreSQL) VARCHAR(20), which counts code points: > if (userName.codePointCount() > 20) { > IO.println("The user name is too long to store in VARCHAR(20) in utf8mb4 MySQL/PostgreSQL!"); > } > // Password policy: require at least 8 Unicode characters (code points) as per NIST SP 800-63B: > if (password.codePointCount() < 8) { > IO.println("Password is too short!"); > } > ``` > > References: > - [MySQL VARCHAR documentation](https://dev.mysql.com/doc/refman/8.4/en/char.html) > - [PostgreSQL Character Types](https://www.postgresql.org/docs/current/datatype-character.html) > - [NIST SP 800-63B ?5.1.1.2](https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver)
Markdown Source ## Summary Add a no-argument `codePointCount()` method to `CharSequence` and `String` to count the number of Unicode code points in the entire sequence or string. ## Problem Currently, `String.codePointCount` and `CharSequence.codePointCount` only provide an overload that requires start and end indices. Developers often expect an overload with no arguments that returns the code point count of the entire string or sequence. Without this, developers resort to verbose or less efficient workarounds, such as using `codePoints().count()` (which yields every code point, adding unnecessary overhead) or calling `codePointCount(0, str.length())` (which is more verbose, requires a temporary variable, and performs an extra boundary check). A common use case involves enforcing maximum character limits on user input, particularly for fields stored in databases such as MySQL or PostgreSQL. Both database systems can consider the declared length of `VARCHAR(n)` columns as the number of Unicode code points, not just the number of `char` units or bytes for character sets like UTF-8 (utf8mb4 in MySQL). Correctly counting code points is essential for supporting internationalized input, emoji, and non-BMP characters. For example, the NIST SP 800-63B guideline specifies that passwords should be checked in terms of the number of Unicode code points. ## Solution Introduce default no-argument `codePointCount()` methods in both the `CharSequence` interface and the `String` class. The new method returns the number of Unicode code points in the entire character sequence, equivalent to invoking `codePointCount(0, length())`, but provides better readability and avoids unnecessary overhead. The implementation in `CharSequence` is a default method, while `String` provides an explicit override for potential performance optimization. ## Specification Add to `java.lang.CharSequence` interface: /** * Returns the number of Unicode code points in this character sequence. * Equivalent to {@code codePointCount(0, length())}. * * @return the number of Unicode code points in this sequence * @since N */ default int codePointCount() { return codePointCount(0, length()); } Add to `java.lang.String` class: /** * Returns the number of Unicode code points in this string. * Equivalent to {@code codePointCount(0, length())}. * * @return the number of Unicode code points in this string * @since N */ @Override public int codePointCount() { return codePointCount(0, length()); } Here, `N` refers to the next Java platform version in which this change will be available. Informative Supplement: - Implementation: [GitHub PR 26461](https://github.com/openjdk/jdk/pull/26461) - Example use cases: ```java // For user names stored in MySQL (or PostgreSQL) VARCHAR(20), which counts code points: if (userName.codePointCount() > 20) { IO.println("The user name is too long to store in VARCHAR(20) in utf8mb4 MySQL/PostgreSQL!"); } // Password policy: require at least 8 Unicode characters (code points) as per NIST SP 800-63B: if (password.codePointCount() < 8) { IO.println("Password is too short!"); } ``` References: - [MySQL VARCHAR documentation](https://dev.mysql.com/doc/refman/8.4/en/char.html) - [PostgreSQL Character Types](https://www.postgresql.org/docs/current/datatype-character.html) - [NIST SP 800-63B ?5.1.1.2](https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3707818947 From duke at openjdk.org Mon Jan 5 08:41:50 2026 From: duke at openjdk.org (Johny Jose) Date: Mon, 5 Jan 2026 08:41:50 GMT Subject: RFR: 8373476 : (tz) Update Timezone Data to 2025c Message-ID: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> tzdata changes for 2025c ------------- Commit messages: - 8373476: (tz) Update Timezone Data to 2025c Changes: https://git.openjdk.org/jdk/pull/29029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373476 Stats: 132 lines in 11 files changed: 67 ins; 8 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/29029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29029/head:pull/29029 PR: https://git.openjdk.org/jdk/pull/29029 From coffeys at openjdk.org Mon Jan 5 09:18:22 2026 From: coffeys at openjdk.org (Sean Coffey) Date: Mon, 5 Jan 2026 09:18:22 GMT Subject: RFR: 8373476 : (tz) Update Timezone Data to 2025c In-Reply-To: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> References: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> Message-ID: On Mon, 5 Jan 2026 08:33:32 GMT, Johny Jose wrote: > tzdata changes for 2025c LGTM ------------- Marked as reviewed by coffeys (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29029#pullrequestreview-3626034314 From jlu at openjdk.org Mon Jan 5 17:21:47 2026 From: jlu at openjdk.org (Justin Lu) Date: Mon, 5 Jan 2026 17:21:47 GMT Subject: RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: References: Message-ID: On Fri, 2 Jan 2026 18:55:33 GMT, Naoto Sato wrote: > Removing `static` from the JUnit test cases so that they are executed Marked as reviewed by jlu (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29021#pullrequestreview-3627735928 From naoto at openjdk.org Mon Jan 5 17:56:30 2026 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 5 Jan 2026 17:56:30 GMT Subject: RFR: 8373476 : (tz) Update Timezone Data to 2025c In-Reply-To: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> References: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> Message-ID: On Mon, 5 Jan 2026 08:33:32 GMT, Johny Jose wrote: > tzdata changes for 2025c LGTM. For the JIRA issue, please fix the last paragraph in the problem text: Commentary now also uses characters from the set -''""?? as this can be useful and should work with current applications. This also affects data in iso3166.tab and zone1970.tab, which now contain strings like "C?te d'Ivoire" instead of "C?te d'Ivoire". which seems to have converted the non-ASCII quotes to ASCII equivalents, which made the paragraph not making sense. FWIW, the original was: Commentary now also uses characters from the set ??????? as this can be useful and should work with current applications. This also affects data in iso3166.tab and zone1970.tab, which now contain strings like ?C?te d?Ivoire? instead of ?C?te d'Ivoire? src/java.base/share/data/tzdata/iso3166.tab line 44: > 42: # ?Czech Republic? and ?Turkey? rather than ?Czechia? and ?T?rkiye?), > 43: # and sometimes to omit needless detail or churn (e.g., ?Netherlands? > 44: # rather than ?Netherlands (the)? or ?Netherlands (Kingdom of the)?). It's interesting that they started using non-ASCII quotes in comments, which is different from our policy, but we need to live with it as it is the upstream change ------------- Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29029#pullrequestreview-3627828729 PR Review Comment: https://git.openjdk.org/jdk/pull/29029#discussion_r2662309506 From vyazici at openjdk.org Tue Jan 6 08:23:16 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Tue, 6 Jan 2026 08:23:16 GMT Subject: RFR: 8374523: [BACKOUT] Move input validation checks to Java for java.lang.StringCoding intrinsics Message-ID: Back out `java.lang.StringCoding` changes delivered in [JDK-8361842] (655dc516c22), which causes regressions reported in [JDK-8374210]. [JDK-8361842]: https://bugs.openjdk.org/browse/JDK-8361842 [JDK-8374210]: https://bugs.openjdk.org/browse/JDK-8374210 ------------- Commit messages: - Revert "8361842: Move input validation checks to Java for java.lang.StringCoding intrinsics" Changes: https://git.openjdk.org/jdk/pull/29055/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29055&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374523 Stats: 437 lines in 23 files changed: 25 ins; 331 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/29055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29055/head:pull/29055 PR: https://git.openjdk.org/jdk/pull/29055 From vyazici at openjdk.org Tue Jan 6 08:23:16 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Tue, 6 Jan 2026 08:23:16 GMT Subject: RFR: 8374523: [BACKOUT] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 08:16:16 GMT, Volkan Yazici wrote: > Back out `java.lang.StringCoding` changes delivered in [JDK-8361842] (655dc516c22), which causes regressions reported in [JDK-8374210]. > > [JDK-8361842]: https://bugs.openjdk.org/browse/JDK-8361842 > [JDK-8374210]: https://bugs.openjdk.org/browse/JDK-8374210 Tier 1-3 are clear on c8acc80b8c6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29055#issuecomment-3713645184 From duke at openjdk.org Tue Jan 6 10:09:58 2026 From: duke at openjdk.org (duke) Date: Tue, 6 Jan 2026 10:09:58 GMT Subject: RFR: 8373476 : (tz) Update Timezone Data to 2025c In-Reply-To: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> References: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> Message-ID: On Mon, 5 Jan 2026 08:33:32 GMT, Johny Jose wrote: > tzdata changes for 2025c @johnyjose30 Your change (at version 512f7a5dcb5340647720760b847a52f15bcfd98a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29029#issuecomment-3714052685 From duke at openjdk.org Tue Jan 6 10:09:59 2026 From: duke at openjdk.org (Johny Jose) Date: Tue, 6 Jan 2026 10:09:59 GMT Subject: RFR: 8373476 : (tz) Update Timezone Data to 2025c In-Reply-To: References: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> Message-ID: On Mon, 5 Jan 2026 17:42:08 GMT, Naoto Sato wrote: >> tzdata changes for 2025c > > src/java.base/share/data/tzdata/iso3166.tab line 44: > >> 42: # ?Czech Republic? and ?Turkey? rather than ?Czechia? and ?T?rkiye?), >> 43: # and sometimes to omit needless detail or churn (e.g., ?Netherlands? >> 44: # rather than ?Netherlands (the)? or ?Netherlands (Kingdom of the)?). > > It's interesting that they started using non-ASCII quotes in comments, which is different from our policy, but we need to live with it as it is the upstream change Updated Jira ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29029#discussion_r2664372684 From duke at openjdk.org Tue Jan 6 10:42:21 2026 From: duke at openjdk.org (Johny Jose) Date: Tue, 6 Jan 2026 10:42:21 GMT Subject: Integrated: 8373476 : (tz) Update Timezone Data to 2025c In-Reply-To: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> References: <3O30FDn_YYGrefjF0MMHyuR9_jblBwjU3ZCHyhqJ5HU=.98a2a89b-a6e8-42e5-9894-3f2ac1727f04@github.com> Message-ID: On Mon, 5 Jan 2026 08:33:32 GMT, Johny Jose wrote: > tzdata changes for 2025c This pull request has now been integrated. Changeset: 5df183be Author: Johny Jose Committer: Sean Coffey URL: https://git.openjdk.org/jdk/commit/5df183be6c484d8f9635fac149caf5e2079c5561 Stats: 132 lines in 11 files changed: 67 ins; 8 del; 57 mod 8373476: (tz) Update Timezone Data to 2025c Reviewed-by: coffeys, naoto ------------- PR: https://git.openjdk.org/jdk/pull/29029 From naoto at openjdk.org Tue Jan 6 16:31:38 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 6 Jan 2026 16:31:38 GMT Subject: RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: References: Message-ID: <6ELTvLimbKKHD_x-JQPvPLAbPRXj2H93S_-3u6DZ_8Y=.ffcbe1d2-5dce-465b-8f3e-3f3f969d3d26@github.com> On Fri, 2 Jan 2026 18:55:33 GMT, Naoto Sato wrote: > Removing `static` from the JUnit test cases so that they are executed Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29021#issuecomment-3715362758 From naoto at openjdk.org Tue Jan 6 16:31:39 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 6 Jan 2026 16:31:39 GMT Subject: Integrated: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: References: Message-ID: On Fri, 2 Jan 2026 18:55:33 GMT, Naoto Sato wrote: > Removing `static` from the JUnit test cases so that they are executed This pull request has now been integrated. Changeset: 136ac0d1 Author: Naoto Sato URL: https://git.openjdk.org/jdk/commit/136ac0d10b92df8875f36c717e85595740b50ed2 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8374433: java/util/Locale/PreserveTagCase.java does not run any tests Reviewed-by: iris, joehw, jlu ------------- PR: https://git.openjdk.org/jdk/pull/29021 From jlu at openjdk.org Tue Jan 6 18:22:45 2026 From: jlu at openjdk.org (Justin Lu) Date: Tue, 6 Jan 2026 18:22:45 GMT Subject: RFR: 8373830: Refactor test/jdk/java/time/test tests to use JUnit over TestNG [v3] In-Reply-To: References: Message-ID: > Please review this PR which migrates the java.time tests from TestNG to JUnit. The java.time tests use TestNG based on the directory level settings configured by TEST.properties, so they are best migrated altogether. This is a large PR, so I have tried to make the changes clear by commit. > > First, the auto conversion tool is run in https://github.com/openjdk/jdk/commit/b1fd7dbdec85aac5a44cc875e57a36be8f1b6974. > https://github.com/openjdk/jdk/commit/3805cfd8765c0c76b61893dcf1670951402f98c3 and https://github.com/openjdk/jdk/commit/b697ca5d9a8067bcecea2dfb32f92f7699085dee are required so that the tests can actually compile and run. > https://github.com/openjdk/jdk/commit/d07c912c4c16d2b3307e489563f148f71cfdf4a4 addresses the timeout annotation which was not covered by the auto conversion tool. > The rest of the commits are aesthetic related. > > Before conversion stats > > > Test results: passed: 187 > Framework-based tests: 32,339 = 32,339 TestNG + 0 JUnit > > > After conversion stats > > > Test results: passed: 187 > Framework-based tests: 32,339 = 0 TestNG + 32,339 JUnit Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Removing now outdated testNG comment - Merge branch 'master' into java.time-to-JUnit - Merge branch 'master' into java.time-to-JUnit - nontestng -> nonjunit - Fix comments - Fixing @timeout as well as unrelated stray spacing - Apply copyright years - Cleaning up some leftover whitespace from tool - Automated conversion created statement lambdas for exception tests. Modify to expression lambdas - Cleaning up unused JUnit imports - ... and 3 more: https://git.openjdk.org/jdk/compare/c6f3f85a...1cb313ea ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28911/files - new: https://git.openjdk.org/jdk/pull/28911/files/d90d23ed..1cb313ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28911&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28911&range=01-02 Stats: 8409 lines in 2284 files changed: 2778 ins; 1646 del; 3985 mod Patch: https://git.openjdk.org/jdk/pull/28911.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28911/head:pull/28911 PR: https://git.openjdk.org/jdk/pull/28911 From naoto at openjdk.org Tue Jan 6 18:57:51 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 6 Jan 2026 18:57:51 GMT Subject: RFR: 8373830: Refactor test/jdk/java/time/test tests to use JUnit over TestNG [v3] In-Reply-To: References: Message-ID: On Tue, 6 Jan 2026 18:22:45 GMT, Justin Lu wrote: >> Please review this PR which migrates the java.time tests from TestNG to JUnit. The java.time tests use TestNG based on the directory level settings configured by TEST.properties, so they are best migrated altogether. This is a large PR, so I have tried to make the changes clear by commit. >> >> First, the auto conversion tool is run in https://github.com/openjdk/jdk/commit/b1fd7dbdec85aac5a44cc875e57a36be8f1b6974. >> https://github.com/openjdk/jdk/commit/3805cfd8765c0c76b61893dcf1670951402f98c3 and https://github.com/openjdk/jdk/commit/b697ca5d9a8067bcecea2dfb32f92f7699085dee are required so that the tests can actually compile and run. >> https://github.com/openjdk/jdk/commit/d07c912c4c16d2b3307e489563f148f71cfdf4a4 addresses the timeout annotation which was not covered by the auto conversion tool. >> The rest of the commits are aesthetic related. >> >> Before conversion stats >> >> >> Test results: passed: 187 >> Framework-based tests: 32,339 = 32,339 TestNG + 0 JUnit >> >> >> After conversion stats >> >> >> Test results: passed: 187 >> Framework-based tests: 32,339 = 0 TestNG + 32,339 JUnit > > Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Removing now outdated testNG comment > - Merge branch 'master' into java.time-to-JUnit > - Merge branch 'master' into java.time-to-JUnit > - nontestng -> nonjunit > - Fix comments > - Fixing @timeout as well as unrelated stray spacing > - Apply copyright years > - Cleaning up some leftover whitespace from tool > - Automated conversion created statement lambdas for exception tests. Modify to expression lambdas > - Cleaning up unused JUnit imports > - ... and 3 more: https://git.openjdk.org/jdk/compare/23530f6a...1cb313ea Still looks good. I think the new commit warrants a copy right year increment ------------- PR Comment: https://git.openjdk.org/jdk/pull/28911#issuecomment-3715907068 From jlu at openjdk.org Tue Jan 6 19:24:35 2026 From: jlu at openjdk.org (Justin Lu) Date: Tue, 6 Jan 2026 19:24:35 GMT Subject: RFR: 8373830: Refactor test/jdk/java/time/test tests to use JUnit over TestNG [v4] In-Reply-To: References: Message-ID: > Please review this PR which migrates the java.time tests from TestNG to JUnit. The java.time tests use TestNG based on the directory level settings configured by TEST.properties, so they are best migrated altogether. This is a large PR, so I have tried to make the changes clear by commit. > > First, the auto conversion tool is run in https://github.com/openjdk/jdk/commit/b1fd7dbdec85aac5a44cc875e57a36be8f1b6974. > https://github.com/openjdk/jdk/commit/3805cfd8765c0c76b61893dcf1670951402f98c3 and https://github.com/openjdk/jdk/commit/b697ca5d9a8067bcecea2dfb32f92f7699085dee are required so that the tests can actually compile and run. > https://github.com/openjdk/jdk/commit/d07c912c4c16d2b3307e489563f148f71cfdf4a4 addresses the timeout annotation which was not covered by the auto conversion tool. > The rest of the commits are aesthetic related. > > Before conversion stats > > > Test results: passed: 187 > Framework-based tests: 32,339 = 32,339 TestNG + 0 JUnit > > > After conversion stats > > > Test results: passed: 187 > Framework-based tests: 32,339 = 0 TestNG + 32,339 JUnit Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Bumping copyright year for TCKDateTimeFormatter.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28911/files - new: https://git.openjdk.org/jdk/pull/28911/files/1cb313ea..7e1ae3c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28911&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28911&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28911.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28911/head:pull/28911 PR: https://git.openjdk.org/jdk/pull/28911 From naoto at openjdk.org Tue Jan 6 19:24:36 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 6 Jan 2026 19:24:36 GMT Subject: RFR: 8373830: Refactor test/jdk/java/time/test tests to use JUnit over TestNG [v4] In-Reply-To: References: Message-ID: <463l2VqYZdLGc3Aw2YJd5gXNMHGbPoklAroJjs951-Y=.65f94b1b-85c2-4741-8cb5-f850f196ad65@github.com> On Tue, 6 Jan 2026 19:21:03 GMT, Justin Lu wrote: >> Please review this PR which migrates the java.time tests from TestNG to JUnit. The java.time tests use TestNG based on the directory level settings configured by TEST.properties, so they are best migrated altogether. This is a large PR, so I have tried to make the changes clear by commit. >> >> First, the auto conversion tool is run in https://github.com/openjdk/jdk/commit/b1fd7dbdec85aac5a44cc875e57a36be8f1b6974. >> https://github.com/openjdk/jdk/commit/3805cfd8765c0c76b61893dcf1670951402f98c3 and https://github.com/openjdk/jdk/commit/b697ca5d9a8067bcecea2dfb32f92f7699085dee are required so that the tests can actually compile and run. >> https://github.com/openjdk/jdk/commit/d07c912c4c16d2b3307e489563f148f71cfdf4a4 addresses the timeout annotation which was not covered by the auto conversion tool. >> The rest of the commits are aesthetic related. >> >> Before conversion stats >> >> >> Test results: passed: 187 >> Framework-based tests: 32,339 = 32,339 TestNG + 0 JUnit >> >> >> After conversion stats >> >> >> Test results: passed: 187 >> Framework-based tests: 32,339 = 0 TestNG + 32,339 JUnit > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Bumping copyright year for TCKDateTimeFormatter.java Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28911#pullrequestreview-3632176797 From jlu at openjdk.org Tue Jan 6 19:27:10 2026 From: jlu at openjdk.org (Justin Lu) Date: Tue, 6 Jan 2026 19:27:10 GMT Subject: Integrated: 8373830: Refactor test/jdk/java/time/test tests to use JUnit over TestNG In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 23:01:07 GMT, Justin Lu wrote: > Please review this PR which migrates the java.time tests from TestNG to JUnit. The java.time tests use TestNG based on the directory level settings configured by TEST.properties, so they are best migrated altogether. This is a large PR, so I have tried to make the changes clear by commit. > > First, the auto conversion tool is run in https://github.com/openjdk/jdk/commit/b1fd7dbdec85aac5a44cc875e57a36be8f1b6974. > https://github.com/openjdk/jdk/commit/3805cfd8765c0c76b61893dcf1670951402f98c3 and https://github.com/openjdk/jdk/commit/b697ca5d9a8067bcecea2dfb32f92f7699085dee are required so that the tests can actually compile and run. > https://github.com/openjdk/jdk/commit/d07c912c4c16d2b3307e489563f148f71cfdf4a4 addresses the timeout annotation which was not covered by the auto conversion tool. > The rest of the commits are aesthetic related. > > Before conversion stats > > > Test results: passed: 187 > Framework-based tests: 32,339 = 32,339 TestNG + 0 JUnit > > > After conversion stats > > > Test results: passed: 187 > Framework-based tests: 32,339 = 0 TestNG + 32,339 JUnit This pull request has now been integrated. Changeset: 53300b4a Author: Justin Lu URL: https://git.openjdk.org/jdk/commit/53300b4ac12240ea08227386412bfb90650c0aee Stats: 13724 lines in 186 files changed: 2264 ins; 691 del; 10769 mod 8373830: Refactor test/jdk/java/time/test tests to use JUnit over TestNG 8373829: Refactor test/jdk/java/time/tck tests to use JUnit over TestNG Reviewed-by: naoto ------------- PR: https://git.openjdk.org/jdk/pull/28911 From lkorinth at openjdk.org Wed Jan 7 12:35:42 2026 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 7 Jan 2026 12:35:42 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 10:02:41 GMT, Leo Korinth wrote: >> This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. >> >> It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. >> >> This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. >> >> I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. > > Leo Korinth has updated the pull request incrementally with 561 additional commits since the last revision: > > - Merge branch 'master' into _8367993 > - 8366058: Outdated comment in WinCAPISeedGenerator > > Reviewed-by: mullan > - 8357258: x86: Improve receiver type profiling reliability > > Reviewed-by: kvn, vlivanov > - 8373704: Improve "SocketException: Protocol family unavailable" message > > Reviewed-by: lucy, jpai > - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently > > Reviewed-by: jiefu, jbhateja, erfang, qamai > - 8343809: Add requires tag to mark tests that are incompatible with exploded image > > Reviewed-by: alanb, dholmes > - 8374465: Spurious dot in documentation for JVMTI ClassLoad > > Reviewed-by: kbarrett > - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket > > Reviewed-by: djelinski, mpowers, ascarpino > - 8374444: Fix simple -Wzero-as-null-pointer-constant warnings > > Reviewed-by: aboldtch > - 8373847: Test javax/swing/JMenuItem/MenuItemTest/bug6197830.java failed because The test case automatically fails when clicking any items in the ?Nothing? menu in all four windows (Left-to-right)-Menu Item Test and (Right-to-left)-Menu Item Test > > Reviewed-by: serb, aivanov, dnguyen > - ... and 551 more: https://git.openjdk.org/jdk/compare/b907b295...0ece3767 I will redo the merge, I have done something strange. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28723#issuecomment-3718660595 From lkorinth at openjdk.org Wed Jan 7 12:58:43 2026 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 7 Jan 2026 12:58:43 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v3] In-Reply-To: References: Message-ID: > This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. > > It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. > > This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. > > I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 564 commits: - Merge branch '8373253' into 8367993 - Merge branch 'master' into _8373253 - Merge branch 'master' into _8367993 - 8366058: Outdated comment in WinCAPISeedGenerator Reviewed-by: mullan - 8357258: x86: Improve receiver type profiling reliability Reviewed-by: kvn, vlivanov - 8373704: Improve "SocketException: Protocol family unavailable" message Reviewed-by: lucy, jpai - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Reviewed-by: jiefu, jbhateja, erfang, qamai - 8343809: Add requires tag to mark tests that are incompatible with exploded image Reviewed-by: alanb, dholmes - 8374465: Spurious dot in documentation for JVMTI ClassLoad Reviewed-by: kbarrett - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket Reviewed-by: djelinski, mpowers, ascarpino - ... and 554 more: https://git.openjdk.org/jdk/compare/2aa8aa4b...28ccbb68 ------------- Changes: https://git.openjdk.org/jdk/pull/28723/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28723&range=02 Stats: 130308 lines in 3967 files changed: 83803 ins; 29735 del; 16770 mod Patch: https://git.openjdk.org/jdk/pull/28723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28723/head:pull/28723 PR: https://git.openjdk.org/jdk/pull/28723 From vyazici at openjdk.org Thu Jan 8 09:51:51 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Thu, 8 Jan 2026 09:51:51 GMT Subject: [jdk26] RFR: 8374700: [BACKOUT] Move input validation checks to Java for java.lang.StringCoding intrinsics Message-ID: Backport of [JDK-8374210] integrated in 7e18de13 by #29055. [JDK-8374210]: https://bugs.openjdk.org/browse/JDK-8374210 ------------- Commit messages: - Backport 7e18de137c3b5f08a479af2b64eb22923261900b Changes: https://git.openjdk.org/jdk/pull/29112/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29112&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374700 Stats: 437 lines in 23 files changed: 25 ins; 331 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/29112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29112/head:pull/29112 PR: https://git.openjdk.org/jdk/pull/29112 From sshivang at openjdk.org Fri Jan 9 03:35:13 2026 From: sshivang at openjdk.org (Shivangi Gupta) Date: Fri, 9 Jan 2026 03:35:13 GMT Subject: [jdk26] RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests Message-ID: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> Hi all, This pull request contains a backport of commit [136ac0d1](https://github.com/openjdk/jdk/commit/136ac0d10b92df8875f36c717e85595740b50ed2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Naoto Sato on 6 Jan 2026 and was reviewed by Iris Clark, Joe Wang and Justin Lu. Thanks! ------------- Commit messages: - Backport 136ac0d10b92df8875f36c717e85595740b50ed2 Changes: https://git.openjdk.org/jdk/pull/29132/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29132&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374433 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29132.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29132/head:pull/29132 PR: https://git.openjdk.org/jdk/pull/29132 From djelinski at openjdk.org Fri Jan 9 06:58:14 2026 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 9 Jan 2026 06:58:14 GMT Subject: [jdk26] RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> References: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> Message-ID: On Fri, 9 Jan 2026 03:28:32 GMT, Shivangi Gupta wrote: > Hi all, > > This pull request contains a backport of commit [136ac0d1](https://github.com/openjdk/jdk/commit/136ac0d10b92df8875f36c717e85595740b50ed2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Naoto Sato on 6 Jan 2026 and was reviewed by Iris Clark, Joe Wang and Justin Lu. > > Thanks! > > > Straight Backport . The test is passing in JDK26 CI. Marked as reviewed by djelinski (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29132#pullrequestreview-3642636489 From duke at openjdk.org Fri Jan 9 07:06:53 2026 From: duke at openjdk.org (duke) Date: Fri, 9 Jan 2026 07:06:53 GMT Subject: [jdk26] RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> References: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> Message-ID: On Fri, 9 Jan 2026 03:28:32 GMT, Shivangi Gupta wrote: > Hi all, > > This pull request contains a backport of commit [136ac0d1](https://github.com/openjdk/jdk/commit/136ac0d10b92df8875f36c717e85595740b50ed2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Naoto Sato on 6 Jan 2026 and was reviewed by Iris Clark, Joe Wang and Justin Lu. > > Thanks! > > > Straight Backport . The test is passing in JDK26 CI. @Shivangi-aa Your change (at version bac68a1580e1e1a3584b2e6108f93c05172d31f6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29132#issuecomment-3727484964 From shade at openjdk.org Fri Jan 9 07:22:39 2026 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 Jan 2026 07:22:39 GMT Subject: [jdk26] RFR: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> References: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> Message-ID: <93ngjmdyG_BAuP0O7w1M94aIuYkBQ1uuMh23FftLqzw=.33f3ca05-8abf-45fd-95aa-04347a13e45e@github.com> On Fri, 9 Jan 2026 03:28:32 GMT, Shivangi Gupta wrote: > Hi all, > > This pull request contains a backport of commit [136ac0d1](https://github.com/openjdk/jdk/commit/136ac0d10b92df8875f36c717e85595740b50ed2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Naoto Sato on 6 Jan 2026 and was reviewed by Iris Clark, Joe Wang and Justin Lu. > > Thanks! > > > Straight Backport . The test is passing in JDK26 CI. Test-only change, passes the RDP1 bar. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29132#issuecomment-3727520792 From sshivang at openjdk.org Fri Jan 9 07:24:22 2026 From: sshivang at openjdk.org (Shivangi Gupta) Date: Fri, 9 Jan 2026 07:24:22 GMT Subject: [jdk26] Integrated: 8374433: java/util/Locale/PreserveTagCase.java does not run any tests In-Reply-To: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> References: <4O1VuuFA0uZGCkWjzMMfVASBY24BDxOTs0EAAEaQnAY=.ffe671f8-8317-4ad0-8ea8-3e38688b1c63@github.com> Message-ID: On Fri, 9 Jan 2026 03:28:32 GMT, Shivangi Gupta wrote: > Hi all, > > This pull request contains a backport of commit [136ac0d1](https://github.com/openjdk/jdk/commit/136ac0d10b92df8875f36c717e85595740b50ed2) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Naoto Sato on 6 Jan 2026 and was reviewed by Iris Clark, Joe Wang and Justin Lu. > > Thanks! > > > Straight Backport . The test is passing in JDK26 CI. This pull request has now been integrated. Changeset: 9ba5d6f8 Author: Shivangi Gupta Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/9ba5d6f8e7652754cfc91c76dc08ed1f10f2bc98 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8374433: java/util/Locale/PreserveTagCase.java does not run any tests Reviewed-by: djelinski Backport-of: 136ac0d10b92df8875f36c717e85595740b50ed2 ------------- PR: https://git.openjdk.org/jdk/pull/29132 From sjohanss at openjdk.org Fri Jan 9 08:47:01 2026 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 9 Jan 2026 08:47:01 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 12:58:43 GMT, Leo Korinth wrote: >> This change moves almost all of the ConcurrentMark initialisation from its constructor to the method `G1ConcurrentMark::fully_initialize()`. Thus, creation time of the VM can be slightly improved by postponing creation of ConcurrentMark. Most time is saved postponing creation of statistics buffers and threads. >> >> It is not obvious that this is the best solution. I have earlier experimented with lazily allocating statistics buffers _only_. One could also initialise a little bit more eagerly (for example the concurrent mark thread) and maybe get a slightly cleaner change. However IMO it seems better to not have ConcurrentMark "half initiated" with a created mark thread, but un-initialised worker threads. >> >> This change is depending on the integration of https://bugs.openjdk.org/browse/JDK-8373253. >> >> I will be out for vacation, and will be back after new year (and will not answer questions during that time), but I thought I get the pull request out now so that you can have a look. > > Leo Korinth has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 564 commits: > > - Merge branch '8373253' into 8367993 > - Merge branch 'master' into _8373253 > - Merge branch 'master' into _8367993 > - 8366058: Outdated comment in WinCAPISeedGenerator > > Reviewed-by: mullan > - 8357258: x86: Improve receiver type profiling reliability > > Reviewed-by: kvn, vlivanov > - 8373704: Improve "SocketException: Protocol family unavailable" message > > Reviewed-by: lucy, jpai > - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently > > Reviewed-by: jiefu, jbhateja, erfang, qamai > - 8343809: Add requires tag to mark tests that are incompatible with exploded image > > Reviewed-by: alanb, dholmes > - 8374465: Spurious dot in documentation for JVMTI ClassLoad > > Reviewed-by: kbarrett > - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket > > Reviewed-by: djelinski, mpowers, ascarpino > - ... and 554 more: https://git.openjdk.org/jdk/compare/2aa8aa4b...28ccbb68 Thanks for looking into this Leo. Overall I think it looks good, just some small questions and suggestions. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1637: > 1635: > 1636: bool G1CollectedHeap::concurrent_mark_is_terminating() const { > 1637: assert(_cm != nullptr, "thread must exist in order to check if mark is terminating"); I think it would make sense to add `&& _cm->is_fully_initialized()` to really make sure the thread has been created. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2427: > 2425: if (_cm->is_fully_initialized()) { > 2426: tc->do_thread(_cm->cm_thread()); > 2427: } Since the _cm_thread is now in `G1ConcurrentMark` this should be handled in `G1ConcurrentMark::threads_do()` src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2549: > 2547: void G1CollectedHeap::start_concurrent_cycle(bool concurrent_operation_is_full_mark) { > 2548: assert(!_cm->in_progress(), "Can not start concurrent operation while in progress"); > 2549: assert(_cm->is_fully_initialized(), "sanity"); Not sure this sanity assert is needed `_cm->in_progress()` will always return `false` if not fully initialized, so the above assert will cover this. If we still want it, I think it should be moved above the `in_progress()` assert. src/hotspot/share/gc/g1/g1PeriodicGCTask.cpp line 46: > 44: return false; > 45: } > 46: Why is this needed? The initial young collection will make sure concurrent marking gets initialized, right? src/hotspot/share/gc/g1/g1Policy.cpp line 744: > 742: if (!_g1h->concurrent_mark()->is_fully_initialized()) { > 743: return false; > 744: } Is this needed? The `in_progress()` check below makes sure to only check the cm_thread when fully initialized. src/hotspot/share/gc/g1/g1YoungCollector.cpp line 1127: > 1125: > 1126: void G1YoungCollector::collect() { > 1127: _g1h->_cm->fully_initialize(); I think it would make more sense to do this in `G1CollectedHeap::do_collection_pause_at_safepoint()`. There we check if we should start concurrent mark, so maybe the initialization could be done only if we are about to start concurrent mark. If we can do the initialization after the actual young collection, then we could maybe even move the initialization into `G1CollectedHeap::start_concurrent_cycle(...)` ------------- Changes requested by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28723#pullrequestreview-3639436840 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2672366755 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675276733 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675291347 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675313622 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675328503 PR Review Comment: https://git.openjdk.org/jdk/pull/28723#discussion_r2675249630 From stefank at openjdk.org Fri Jan 9 12:09:22 2026 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 Jan 2026 12:09:22 GMT Subject: RFR: 8367993: G1: Speed up ConcurrentMark initialization [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jan 2026 12:33:41 GMT, Leo Korinth wrote: >> Leo Korinth has updated the pull request incrementally with 561 additional commits since the last revision: >> >> - Merge branch 'master' into _8367993 >> - 8366058: Outdated comment in WinCAPISeedGenerator >> >> Reviewed-by: mullan >> - 8357258: x86: Improve receiver type profiling reliability >> >> Reviewed-by: kvn, vlivanov >> - 8373704: Improve "SocketException: Protocol family unavailable" message >> >> Reviewed-by: lucy, jpai >> - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently >> >> Reviewed-by: jiefu, jbhateja, erfang, qamai >> - 8343809: Add requires tag to mark tests that are incompatible with exploded image >> >> Reviewed-by: alanb, dholmes >> - 8374465: Spurious dot in documentation for JVMTI ClassLoad >> >> Reviewed-by: kbarrett >> - 8374317: Change GCM IV size to 12 bytes when encrypting/decrypting TLS session ticket >> >> Reviewed-by: djelinski, mpowers, ascarpino >> - 8374444: Fix simple -Wzero-as-null-pointer-constant warnings >> >> Reviewed-by: aboldtch >> - 8373847: Test javax/swing/JMenuItem/MenuItemTest/bug6197830.java failed because The test case automatically fails when clicking any items in the ?Nothing? menu in all four windows (Left-to-right)-Menu Item Test and (Right-to-left)-Menu Item Test >> >> Reviewed-by: serb, aivanov, dnguyen >> - ... and 551 more: https://git.openjdk.org/jdk/compare/b907b295...0ece3767 > > I will redo the merge, I have done something strange. @lkorinth Something went wrong with your merge and now there's a bunch of unrelated labels, which results in updates being sent to misc mailing lists that has no interest in this PR. Could you remove all those labels? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28723#issuecomment-3728642315 From cushon at openjdk.org Tue Jan 13 08:21:31 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Tue, 13 Jan 2026 08:21:31 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset Message-ID: This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. --- Benchmark (encoding) (stringLength) Mode Cnt Score Error Units StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s StringLoopJmhBenchmark.getBytesLength LATIN1 100000 thrpt 5 1136360.285 ? 426475.121 ops/s StringLoopJmhBenchmark.getBytesLength UTF16 10 thrpt 5 329508584.830 ? 6277534.933 ops/s StringLoopJmhBenchmark.getBytesLength UTF16 100 thrpt 5 86396600.366 ? 4287569.267 ops/s StringLoopJmhBenchmark.getBytesLength UTF16 1000 thrpt 5 10037994.564 ? 779239.446 ops/s StringLoopJmhBenchmark.getBytesLength UTF16 100000 thrpt 5 99218.929 ? 2854.843 ops/s StringLoopJmhBenchmark.utf8LenByLoop ASCII 10 thrpt 5 409066999.717 ? 25444799.130 ops/s StringLoopJmhBenchmark.utf8LenByLoop ASCII 100 thrpt 5 72126088.461 ? 42992009.452 ops/s StringLoopJmhBenchmark.utf8LenByLoop ASCII 1000 thrpt 5 8300806.448 ? 533912.423 ops/s StringLoopJmhBenchmark.utf8LenByLoop ASCII 100000 thrpt 5 87356.021 ? 7863.743 ops/s StringLoopJmhBenchmark.utf8LenByLoop LATIN1 10 thrpt 5 356802960.574 ? 24814016.238 ops/s StringLoopJmhBenchmark.utf8LenByLoop LATIN1 100 thrpt 5 85043539.617 ? 30538310.706 ops/s StringLoopJmhBenchmark.utf8LenByLoop LATIN1 1000 thrpt 5 9952675.100 ? 2922230.486 ops/s StringLoopJmhBenchmark.utf8LenByLoop LATIN1 100000 thrpt 5 79410.881 ? 50777.786 ops/s StringLoopJmhBenchmark.utf8LenByLoop UTF16 10 thrpt 5 304196311.102 ? 20381571.060 ops/s StringLoopJmhBenchmark.utf8LenByLoop UTF16 100 thrpt 5 84223829.681 ? 10787815.139 ops/s StringLoopJmhBenchmark.utf8LenByLoop UTF16 1000 thrpt 5 11046224.275 ? 1200731.406 ops/s StringLoopJmhBenchmark.utf8LenByLoop UTF16 100000 thrpt 5 112590.802 ? 3741.019 ops/s ------------- Commit messages: - Whitespace - Apply suggestions from code review - 8372353: API to compute the byte length of a String encoded in a given Charset Changes: https://git.openjdk.org/jdk/pull/28454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372353 Stats: 213 lines in 4 files changed: 213 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From duke at openjdk.org Tue Jan 13 08:21:33 2026 From: duke at openjdk.org (ExE Boss) Date: Tue, 13 Jan 2026 08:21:33 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 14:58:55 GMT, Liam Miller-Cushon wrote: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... src/java.base/share/classes/java/lang/String.java line 2127: > 2125: * equivalent to this string, {@code false} otherwise > 2126: * > 2127: * @see #compareTo(String) For the?**BOM**?less **UTF?16**?charsets, this?can simply?return `value.length < 72: stringData += (char) (Math.random() * 26) + 'a'; > 73: } > 74: stringData += c; Maybe avoid?creating intermediate?strings in?a?loop to?avoid excess?GC?pressure? Suggestion: var stringDataBuilder = new StringBuilder(stringLength + 1); while (stringDataBuilder.length() < stringLength) { stringDataBuilder.append((char) (Math.random() * 26) + 'a'); } stringData = stringDataBuilder.append(c).toString(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2552768341 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2552801055 From cushon at openjdk.org Tue Jan 13 08:21:34 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Tue, 13 Jan 2026 08:21:34 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset In-Reply-To: References: Message-ID: On Sat, 22 Nov 2025 09:37:31 GMT, ExE Boss wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > src/java.base/share/classes/java/lang/String.java line 2127: > >> 2125: * equivalent to this string, {@code false} otherwise >> 2126: * >> 2127: * @see #compareTo(String) > > For the?**BOM**?less **UTF?16**?charsets, this?can simply?return `value.length < > Suggestion: > > if (cs instanceof sun.nio.cs.UTF_16LE || > cs instanceof sun.nio.cs.UTF_16BE) { > return value.length << (1 - coder()); > } > return getBytes(cs).length; > > > [^1]: Lone?surrogates get?replaced with?`U+FFFD` when?encoding to?**UTF?16** by?[`String?::getBytes?(Charset)`], and?all?of?**LATIN1** can?be?encoded in?**UTF?16**. > > [`String?::getBytes?(Charset)`]: https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/String.html#getBytes(java.nio.charset.Charset) Thanks! There is more work that could be done for other charsets here, I focused on UTF-8 and the bytesCompatible case as a proof of concept, and as a way to start discussing this. It may or may not make sense to have optimized paths for all of the other standard charsets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2556171650 From rriggs at openjdk.org Tue Jan 13 22:04:39 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 13 Jan 2026 22:04:39 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset In-Reply-To: References: Message-ID: <-qTE7Ac7lWOtINGFlc43ZX_fkW08IV8uHhqLX7q0K8I=.c8562893-8b3e-4940-909e-9dc14008dc74@github.com> On Fri, 21 Nov 2025 14:58:55 GMT, Liam Miller-Cushon wrote: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... The test has an odd mix of throwing Exception and RuntimeException. It would be good to upgrade the test to use JUnit (though it could/should be a separate PR). src/java.base/share/classes/java/lang/String.java line 2112: > 2110: * > 2111: *

The result will be the same value as {@code getBytes(charset).length}. > 2112: * An @implNote or @apiNote maybe useful to indicate that this may allocate memory to compute the length for some Charsets. src/java.base/share/classes/java/lang/String.java line 2120: > 2118: return encodedLengthUTF8(coder, value); > 2119: } > 2120: if (bytesCompatible(cs, 0, value.length)) { BytesCompatible gives a non-optimal answer for a US_ASCII input that has chars > 0x7f. src/java.base/share/classes/java/lang/String.java line 2125: > 2123: if (cs instanceof sun.nio.cs.UTF_16LE || > 2124: cs instanceof sun.nio.cs.UTF_16BE) { > 2125: return value.length << (1 - coder()); Please encapsulate this computation `byteFor(int length, coder) {...}` to make it easier to re-use and document. ------------- PR Review: https://git.openjdk.org/jdk/pull/28454#pullrequestreview-3658097768 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2688260162 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2688257004 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2688253744 From cushon at openjdk.org Wed Jan 14 10:59:32 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 14 Jan 2026 10:59:32 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/5c6c4f13..791954d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=00-01 Stats: 36 lines in 1 file changed: 29 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Wed Jan 14 10:59:36 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 14 Jan 2026 10:59:36 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: <-qTE7Ac7lWOtINGFlc43ZX_fkW08IV8uHhqLX7q0K8I=.c8562893-8b3e-4940-909e-9dc14008dc74@github.com> References: <-qTE7Ac7lWOtINGFlc43ZX_fkW08IV8uHhqLX7q0K8I=.c8562893-8b3e-4940-909e-9dc14008dc74@github.com> Message-ID: On Tue, 13 Jan 2026 22:02:13 GMT, Roger Riggs wrote: > The test has an odd mix of throwing Exception and RuntimeException. It would be good to upgrade the test to use JUnit (though it could/should be a separate PR). It seemed like `test/jdk/java/lang/String/Encodings.java` mostly uses `Exception`, and `test/jdk/sun/nio/cs/TestStringCoding.java` mostly uses `RuntimeException`, I was trying to be consistent with the existing code. I can take a look at migrating to Junit as a separate cleanup. > src/java.base/share/classes/java/lang/String.java line 2112: > >> 2110: * >> 2111: *

The result will be the same value as {@code getBytes(charset).length}. >> 2112: * > > An @implNote or @apiNote maybe useful to indicate that this may allocate memory to compute the length for some Charsets. Done, thanks > src/java.base/share/classes/java/lang/String.java line 2120: > >> 2118: return encodedLengthUTF8(coder, value); >> 2119: } >> 2120: if (bytesCompatible(cs, 0, value.length)) { > > BytesCompatible gives a non-optimal answer for a US_ASCII input that has chars > 0x7f. I updated this to not use `bytesCompatible`, and optimized the US_ASCII case. > src/java.base/share/classes/java/lang/String.java line 2125: > >> 2123: if (cs instanceof sun.nio.cs.UTF_16LE || >> 2124: cs instanceof sun.nio.cs.UTF_16BE) { >> 2125: return value.length << (1 - coder()); > > Please encapsulate this computation `byteFor(int length, coder) {...}` to make it easier to re-use and document. Done ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3748957104 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2689955692 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2689958009 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2689959343 From cushon at openjdk.org Wed Jan 14 15:39:43 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Wed, 14 Jan 2026 15:39:43 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 10:59:32 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback I drafted a CSR: https://bugs.openjdk.org/browse/JDK-8375318 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3750134777 From rriggs at openjdk.org Wed Jan 14 22:53:51 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 14 Jan 2026 22:53:51 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 10:59:32 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback src/java.base/share/classes/java/lang/String.java line 1080: > 1078: return value.length; > 1079: } > 1080: int len = value.length >> 1; I don't think I understand what's being done and what Charset encoder it is mimicking. It probably needs to document the assumptions about unmappable characters and malformed surrogates. (Likely it is correct since the test of US_ASCII passes, but could use an explanation). src/java.base/share/classes/java/lang/String.java line 2130: > 2128: > 2129: /** > 2130: * Returns the length in bytes of the given String encoded with the given {@link Charset}. You can use the javadoc tag `@return` and skip the duplication. This first sentence reads better then the @return below since it emphasies the "encoded string" aspect. src/java.base/share/classes/java/lang/String.java line 2136: > 2134: * @implNote This method may allocate memory to compute the length for some charsets. > 2135: * > 2136: * @param cs the charset used to the compute the length Capitalize and perhaps link "Charset". src/java.base/share/classes/java/lang/String.java line 2155: > 2153: } > 2154: > 2155: private int byteFor(int length, int coder) { Add a //comment please. The method name should be plural. `bytesForCoder`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692343837 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692299982 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692301234 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2692304224 From cushon at openjdk.org Thu Jan 15 09:13:15 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 09:13:15 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v3] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/791954d6..972553f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=01-02 Stats: 15 lines in 1 file changed: 6 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Thu Jan 15 09:13:18 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 09:13:18 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jan 2026 22:49:06 GMT, Roger Riggs wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback > > src/java.base/share/classes/java/lang/String.java line 1080: > >> 1078: return value.length; >> 1079: } >> 1080: int len = value.length >> 1; > > I don't think I understand what's being done and what Charset encoder it is mimicking. > It probably needs to document the assumptions about unmappable characters and malformed surrogates. > (Likely it is correct since the test of US_ASCII passes, but could use an explanation). I added some `//` comments documenting which methods the `encodedLength*` methods are mimicking. The logic here should be identical to `encodeASCII` (except that it isn't allocating and writing to a destination array). The handling of unmappable characters and malformed surrogates should match `encodeASCII`. > src/java.base/share/classes/java/lang/String.java line 2130: > >> 2128: >> 2129: /** >> 2130: * Returns the length in bytes of the given String encoded with the given {@link Charset}. > > You can use the javadoc tag `@return` and skip the duplication. This first sentence reads better then the @return below since it emphasies the "encoded string" aspect. Sorry I'm not sure I understand, can you clarify how that would work? The javadoc can't start with `@return`, it needs to be a non-tag sentence fragment (the build enables doclint to enforce this). > src/java.base/share/classes/java/lang/String.java line 2136: > >> 2134: * @implNote This method may allocate memory to compute the length for some charsets. >> 2135: * >> 2136: * @param cs the charset used to the compute the length > > Capitalize and perhaps link "Charset". Done > src/java.base/share/classes/java/lang/String.java line 2155: > >> 2153: } >> 2154: >> 2155: private int byteFor(int length, int coder) { > > Add a //comment please. > The method name should be plural. `bytesForCoder`. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2693532105 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2693479863 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2693480259 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2693532859 From aloraini.omar at gmail.com Thu Jan 15 09:00:51 2026 From: aloraini.omar at gmail.com (Omar Aloraini) Date: Thu, 15 Jan 2026 12:00:51 +0300 Subject: Message Format 2.0 Message-ID: Are there any plans to incorporate Message Format 2.0 [0] into java.base? There also seems to be ongoing work on defining a 'container' for the translation files (an alternative to .properties) [1]. [0] https://www.unicode.org/reports/tr35/tr35-73/tr35-messageFormat.html [1] https://github.com/w3c/i18n-discuss/blob/gh-pages/explainers/message-resources.md -------------- next part -------------- An HTML attachment was scrubbed... URL: From rriggs at openjdk.org Thu Jan 15 13:31:41 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 13:31:41 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 08:47:47 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 2130: >> >>> 2128: >>> 2129: /** >>> 2130: * Returns the length in bytes of the given String encoded with the given {@link Charset}. >> >> You can use the javadoc tag `@return` and skip the duplication. This first sentence reads better then the @return below since it emphasies the "encoded string" aspect. > > Sorry I'm not sure I understand, can you clarify how that would work? > > The javadoc can't start with `@return`, it needs to be a non-tag sentence fragment (the build enables doclint to enforce this). fyi, New javadoc functionality in JDK 22 enabled @return as an inline tag. (see the [Javadoc tag specification](https://docs.oracle.com/en/java/javase/22/docs/specs/javadoc/doc-comment-spec.html)) `As an inline tag, provides content for the first sentence of a method's main description, and a "Returns" section, as if @return description were also present. In the default English locale, the first sentence is Returns description .` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2694390929 From cushon at openjdk.org Thu Jan 15 13:41:08 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 13:41:08 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v4] In-Reply-To: References: Message-ID: <7NQ2MCXYWVpXWXsGYGHR9DtPNxB1H9Wv9BqNqi7smdw=.d4d850cb-1dcb-4b69-a44d-9584bdb68ec8@github.com> > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Use the @return inline tag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/972553f5..cf4e59f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From rriggs at openjdk.org Thu Jan 15 13:41:11 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 13:41:11 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 09:04:09 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 1080: >> >>> 1078: return value.length; >>> 1079: } >>> 1080: int len = value.length >> 1; >> >> I don't think I understand what's being done and what Charset encoder it is mimicking. >> It probably needs to document the assumptions about unmappable characters and malformed surrogates. >> (Likely it is correct since the test of US_ASCII passes, but could use an explanation). > > I added some `//` comments documenting which methods the `encodedLength*` methods are mimicking. The logic here should be identical to `encodeASCII` (except that it isn't allocating and writing to a destination array). > > The handling of unmappable characters and malformed surrogates should match `encodeASCII`. Thanks for the doc update. Duplicating code (almost) is a unfortunate side-effect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2694424323 From cushon at openjdk.org Thu Jan 15 13:41:12 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 13:41:12 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2] In-Reply-To: References: Message-ID: <_aVd-ZtOFww9VJ5xUH1r_9iIdPPyIE-bXYFveiwQ1U4=.a03c9434-fe43-4d1b-b9b2-75548759253f@github.com> On Thu, 15 Jan 2026 13:28:23 GMT, Roger Riggs wrote: >> Sorry I'm not sure I understand, can you clarify how that would work? >> >> The javadoc can't start with `@return`, it needs to be a non-tag sentence fragment (the build enables doclint to enforce this). > > fyi, New javadoc functionality in JDK 22 enabled @return as an inline tag. (see the [Javadoc tag specification](https://docs.oracle.com/en/java/javase/22/docs/specs/javadoc/doc-comment-spec.html)) > `As an inline tag, provides content for the first sentence of a method's main description, and a "Returns" section, as if @return description were also present. In the default English locale, the first sentence is Returns description .` Thanks, that's good to know! I updated to use an `@return` inline tag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2694423894 From rriggs at openjdk.org Thu Jan 15 13:54:04 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 13:54:04 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v4] In-Reply-To: <7NQ2MCXYWVpXWXsGYGHR9DtPNxB1H9Wv9BqNqi7smdw=.d4d850cb-1dcb-4b69-a44d-9584bdb68ec8@github.com> References: <7NQ2MCXYWVpXWXsGYGHR9DtPNxB1H9Wv9BqNqi7smdw=.d4d850cb-1dcb-4b69-a44d-9584bdb68ec8@github.com> Message-ID: <0YR50Asq8EGZTHIhl0wUwPYLuTDgy4GSwED7oDI9rUk=.b792a4fd-b909-4875-a0e3-e49561a5e16c@github.com> On Thu, 15 Jan 2026 13:41:08 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Use the @return inline tag There should also be a test of the API basics in the test/jdk/java/lang/String directory including NullPointerException. And it can reference the encoding tests in sun/nio/cs/... There is also existing tests of encoding functions in test/jdk/java/lang/String/Encodings.java (but may not be the right place). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3754969182 From cushon at openjdk.org Thu Jan 15 14:42:34 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 14:42:34 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: Message-ID: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Update tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/cf4e59f3..f9139b24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=03-04 Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Thu Jan 15 14:42:35 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 14:42:35 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v4] In-Reply-To: <0YR50Asq8EGZTHIhl0wUwPYLuTDgy4GSwED7oDI9rUk=.b792a4fd-b909-4875-a0e3-e49561a5e16c@github.com> References: <7NQ2MCXYWVpXWXsGYGHR9DtPNxB1H9Wv9BqNqi7smdw=.d4d850cb-1dcb-4b69-a44d-9584bdb68ec8@github.com> <0YR50Asq8EGZTHIhl0wUwPYLuTDgy4GSwED7oDI9rUk=.b792a4fd-b909-4875-a0e3-e49561a5e16c@github.com> Message-ID: On Thu, 15 Jan 2026 13:49:54 GMT, Roger Riggs wrote: > There should also be a test of the API basics in the test/jdk/java/lang/String directory including NullPointerException. Thanks, I added test coverage of `NullPointerException` to `test/jdk/java/lang/String/Exceptions.java`. There is also an assertion in `test/jdk/java/lang/String/Encodings.java` to ensure it matches `getBytes(...).length`. Is there additional API test coverage that should be added to `test/jdk/java/lang/String/`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3755197805 From rriggs at openjdk.org Thu Jan 15 15:45:12 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 15:45:12 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 14:42:34 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update tests src/java.base/share/classes/java/lang/String.java line 2135: > 2133: * {@return the length in bytes of the given String encoded with the given {@link Charset}} > 2134: * > 2135: *

The result will be the same value as {@code getBytes(charset).length}. Please @linkplain to the getBytes(cs) method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2694897294 From rriggs at openjdk.org Thu Jan 15 15:51:12 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 15:51:12 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 14:42:34 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update tests This looks good. In the CSR, I'd move the paragraph starting "Computing the..." to the solution section and describe the possible optimizations without referring to specific implementation details. The javadoc in the specification section needs an update. Moving from Draft to Proposed and fixVersion = 27 will get it on the radar of the CSR reviewers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3755500610 From cushon at openjdk.org Thu Jan 15 16:27:58 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 16:27:58 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v6] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: @linkplain ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/f9139b24..818162de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Thu Jan 15 16:28:01 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 16:28:01 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 15:40:42 GMT, Roger Riggs wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Update tests > > src/java.base/share/classes/java/lang/String.java line 2135: > >> 2133: * {@return the length in bytes of the given String encoded with the given {@link Charset}} >> 2134: * >> 2135: *

The result will be the same value as {@code getBytes(charset).length}. > > Please @linkplain to the getBytes(cs) method. Done, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695088558 From vyazici at openjdk.org Thu Jan 15 16:33:30 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Thu, 15 Jan 2026 16:33:30 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 14:42:34 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update tests src/java.base/share/classes/java/lang/String.java line 1097: > 1095: return dp; > 1096: } > 1097: Verified that this is indeed identical to `encodeASCII` such that only the length computation is taken into account. src/java.base/share/classes/java/lang/String.java line 1512: > 1510: return dp; > 1511: } > 1512: Verified that this is indeed identical to `encodeUTF8` such that only the length computation is taken into account. src/java.base/share/classes/java/lang/String.java line 1585: > 1583: > 1584: // This follows the implementation of encodeUTF8_UTF16 > 1585: private static int encodedLengthUTF8_UTF16(byte[] val) { Doesn't this duplicate the `computeSizeUTF8_UTF16`? AFAICS, `computeSizeUTF8_UTF16` is missing the ASCII fast loop, but we can enhance it. FWIW, if we decide reuse `computeSizeUTF8_UTF16`, it might be better to rename it to `encodedLengthUTF8_UTF16`, which will be in line with the introduced `encodedLength*` method family. src/java.base/share/classes/java/lang/String.java line 2143: > 2141: */ > 2142: public int getBytesLength(Charset cs) { > 2143: if (cs == UTF_8.INSTANCE) { It'd be nice to catch null values as early as possible. I suggest adding a `Objects.requireNonNull(cs)` along with `@throws NullPointerException If {@code cs} is null` in docs. src/java.base/share/classes/java/lang/String.java line 2148: > 2146: if (isLatin1()) { > 2147: return value.length; > 2148: } Any particular reason you avoided introducing a `encodedLength8859_1` here? (There is a `encode8859_1` method.) src/java.base/share/classes/java/lang/String.java line 2151: > 2149: } else if (cs == US_ASCII.INSTANCE) { > 2150: return encodedLengthASCII(coder, value); > 2151: } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof sun.nio.cs.UTF_16BE) { I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by @ExE-Boss [here]. Though I'm not really sure if this is really needed. I cannot spot any other usage of these constants in `java.base`, except `jdk.internal.foreign.StringSupport`, which is irrelevant. [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695007692 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695024667 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695036430 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695065623 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695076034 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695105076 From vyazici at openjdk.org Thu Jan 15 16:33:33 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Thu, 15 Jan 2026 16:33:33 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: <94U314HJBN96ktT1MqD-GOo43rAbnQZAEFcJzdHGa6E=.e9b215c1-b1ec-4522-b257-d098d4663783@github.com> On Thu, 15 Jan 2026 16:22:05 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 2135: >> >>> 2133: * {@return the length in bytes of the given String encoded with the given {@link Charset}} >>> 2134: * >>> 2135: *

The result will be the same value as {@code getBytes(charset).length}. >> >> Please @linkplain to the getBytes(cs) method. > > Done, thanks The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695118671 From cushon at openjdk.org Thu Jan 15 16:52:33 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 16:52:33 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 15:46:57 GMT, Roger Riggs wrote: > This looks good. Thanks for the review! > In the CSR, I'd move the paragraph starting "Computing the..." to the solution section and describe the possible optimizations without referring to specific implementation details. Done, thanks, I rephrased the 'Solution' section a bit to try to discuss the potential optimizations in a more general way. > The javadoc in the specification section needs an update. Moving from Draft to Proposed and fixVersion = 27 will get it on the radar of the CSR reviewers. Done ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3755845597 From cushon at openjdk.org Thu Jan 15 17:05:57 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 17:05:57 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 16:10:57 GMT, Volkan Yazici wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Update tests > > src/java.base/share/classes/java/lang/String.java line 1585: > >> 1583: >> 1584: // This follows the implementation of encodeUTF8_UTF16 >> 1585: private static int encodedLengthUTF8_UTF16(byte[] val) { > > Doesn't this duplicate the `computeSizeUTF8_UTF16`? > > AFAICS, `computeSizeUTF8_UTF16` is missing the ASCII fast loop, but we can enhance it. > > FWIW, if we decide reuse `computeSizeUTF8_UTF16`, it might be better to rename it to `encodedLengthUTF8_UTF16`, which will be in line with the introduced `encodedLength*` method family. Thanks for the catch, good point I will look at switching to `computeSizeUTF8_UTF16`. `computeSizeUTF8_UTF16` returns `long`, this raises a question of what to do in that case. The return type of `getBytesLength` could potentially be `long` and allow computing the encoded length of strings that wouldn't fit into an array if they were encoded. Or it could throw an exception in that case, similar to `getBytes`, and have an `int` return type ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695225769 From naoto.sato at oracle.com Thu Jan 15 17:17:17 2026 From: naoto.sato at oracle.com (Naoto Sato) Date: Thu, 15 Jan 2026 09:17:17 -0800 Subject: Message Format 2.0 In-Reply-To: References: Message-ID: <61db8ae5-7a61-4ea0-9799-66b9a37d0b8d@oracle.com> Thanks for the suggestion. At the moment, we don't have any plan to support LDML MessageFormat in the JDK, considering the amount of required work. I've filed an RFE for future consideration: https://bugs.openjdk.org/browse/JDK-8375468 Naoto On 1/15/26 1:00 AM, Omar Aloraini wrote: > Are there any plans to incorporate?Message Format 2.0 [0] into > java.base? There also seems to be ongoing work on defining a 'container' > for the translation files (an alternative?to .properties) [1]. > > [0] https://www.unicode.org/reports/tr35/tr35-73/tr35-messageFormat.html > > [1] https://github.com/w3c/i18n-discuss/blob/gh-pages/explainers/ > message-resources.md explainers/message-resources.md> From cushon at openjdk.org Thu Jan 15 17:20:09 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 17:20:09 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v7] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Deduplicate with computeSizeUTF8_UTF16 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/818162de..855b1298 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=05-06 Stats: 60 lines in 1 file changed: 15 ins; 41 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From rriggs at openjdk.org Thu Jan 15 17:34:42 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 17:34:42 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v7] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 17:20:09 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Deduplicate with computeSizeUTF8_UTF16 src/java.base/share/classes/java/lang/String.java line 1498: > 1496: if (length > (long)Integer.MAX_VALUE) { > 1497: throw new IllegalStateException("Required length exceeds implementation limit"); > 1498: } This is more like a should never reach here; the OOME thrown by encodedLengthUTF8_UTF16 should ocur. IllegalStateException usually refers to a programming error. The other occurrence like this throws OOME. src/java.base/share/classes/java/lang/String.java line 2112: > 2110: * > 2111: * @param cs The {@link Charset} used to the compute the length > 2112: * @throws NullPointerException If {@code cs} is {@code null} @throws clauses for NPE are usually omitted, the class javadoc specifies the behavior for the whole class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695311406 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695316740 From vyazici at openjdk.org Thu Jan 15 17:34:52 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Thu, 15 Jan 2026 17:34:52 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 17:02:01 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 1585: >> >>> 1583: >>> 1584: // This follows the implementation of encodeUTF8_UTF16 >>> 1585: private static int encodedLengthUTF8_UTF16(byte[] val) { >> >> Doesn't this duplicate the `computeSizeUTF8_UTF16`? >> >> AFAICS, `computeSizeUTF8_UTF16` is missing the ASCII fast loop, but we can enhance it. >> >> FWIW, if we decide reuse `computeSizeUTF8_UTF16`, it might be better to rename it to `encodedLengthUTF8_UTF16`, which will be in line with the introduced `encodedLength*` method family. > > Thanks for the catch, good point I will look at switching to `computeSizeUTF8_UTF16`. > > `computeSizeUTF8_UTF16` returns `long`, this raises a question of what to do in that case. The return type of `getBytesLength` could potentially be `long` and allow computing the encoded length of strings that wouldn't fit into an array if they were encoded. Or it could throw an exception in that case, similar to `getBytes`, and have an `int` return type `computeSizeUTF8_UTF16` is only used in `encodeUTF8_UTF16`: long allocLen = (sl * 3 < 0) ? computeSizeUTF8_UTF16(val, exClass) : sl * 3; if (allocLen > (long)Integer.MAX_VALUE) { throw new OutOfMemoryError("Required length exceeds implementation limit"); } I guess we can move `if (allocLen > (long)Integer.MAX_VALUE)` check to `computeSizeUTF8_UTF16` and make its return type `int`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695315669 From cushon at openjdk.org Thu Jan 15 18:29:34 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 18:29:34 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/855b1298..d725c8b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=06-07 Stats: 42 lines in 1 file changed: 25 ins; 10 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Thu Jan 15 18:29:38 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 18:29:38 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v7] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 17:27:11 GMT, Roger Riggs wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Deduplicate with computeSizeUTF8_UTF16 > > src/java.base/share/classes/java/lang/String.java line 1498: > >> 1496: if (length > (long)Integer.MAX_VALUE) { >> 1497: throw new IllegalStateException("Required length exceeds implementation limit"); >> 1498: } > > This is more like a should never reach here; the OOME thrown by encodedLengthUTF8_UTF16 should ocur. > IllegalStateException usually refers to a programming error. > The other occurrence like this throws OOME. Thanks, what do you think about refactoring the OOME into `encodedLengthUTF8_UTF16` and having it return `int`? > src/java.base/share/classes/java/lang/String.java line 2112: > >> 2110: * >> 2111: * @param cs The {@link Charset} used to the compute the length >> 2112: * @throws NullPointerException If {@code cs} is {@code null} > > @throws clauses for NPE are usually omitted, the class javadoc specifies the behavior for the whole class. Removed, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695393117 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695393824 From cushon at openjdk.org Thu Jan 15 18:29:42 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 18:29:42 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 16:17:22 GMT, Volkan Yazici wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Update tests > > src/java.base/share/classes/java/lang/String.java line 2143: > >> 2141: */ >> 2142: public int getBytesLength(Charset cs) { >> 2143: if (cs == UTF_8.INSTANCE) { > > It'd be nice to catch null values as early as possible. I suggest adding a `Objects.requireNonNull(cs)` along with `@throws NullPointerException If {@code cs} is null` in docs. I added the `requireNonNull`, omitting the `@throws` as suggested in https://github.com/openjdk/jdk/pull/28454/changes#r2695394410 > src/java.base/share/classes/java/lang/String.java line 2148: > >> 2146: if (isLatin1()) { >> 2147: return value.length; >> 2148: } > > Any particular reason you avoided introducing a `encodedLength8859_1` here? (There is a `encode8859_1` method.) I have tentatively added `encodedLength8859_1` `encode8859_1` is implemented in terms of the `implEncodeISOArray`, so it is less similar than the other examples. In general I figured there was a tradeoff between the performance benefit and the additional code to have fast paths for each charset, and UTF-8 may be more frequently used. > src/java.base/share/classes/java/lang/String.java line 2151: > >> 2149: } else if (cs == US_ASCII.INSTANCE) { >> 2150: return encodedLengthASCII(coder, value); >> 2151: } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof sun.nio.cs.UTF_16BE) { > > I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by @ExE-Boss [here]. Though I'm not really sure if this is really needed. I cannot spot any other usage of these constants in `java.base`, except `jdk.internal.foreign.StringSupport`, which is irrelevant. > > [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341 I don't have a strong opinion about these charsets. It's nice that the encoded length for them can be calculated in constant time, but on the other hand if they are less frequently used and there isn't precedent for special casing them in `java.base`, then this part could be dropped. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695394410 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695468134 PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695471999 From rriggs at openjdk.org Thu Jan 15 19:27:10 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 19:27:10 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: On Thu, 15 Jan 2026 18:18:31 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 2151: >> >>> 2149: } else if (cs == US_ASCII.INSTANCE) { >>> 2150: return encodedLengthASCII(coder, value); >>> 2151: } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof sun.nio.cs.UTF_16BE) { >> >> I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by @ExE-Boss [here]. Though I'm not really sure if this is really needed. I cannot spot any other usage of these constants in `java.base`, except `jdk.internal.foreign.StringSupport`, which is irrelevant. >> >> [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341 > > I don't have a strong opinion about these charsets. It's nice that the encoded length for them can be calculated in constant time, but on the other hand if they are less frequently used and there isn't precedent for special casing them in `java.base`, then this part could be dropped. While is convenient that those UTF16 charsets have a easy to compute size, I doubt those two are in sufficient use to justify a commitment support them in the fast path. If you are going to support charsets beyond the most common utf8, ascii, and ISO-8856-1, then computing the encoded length should delegated to the Charset itself and have separate code in different packages. Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful for single byte formats, but if `maxBytesPerChar` is equal to `averageBytesPerChar` that might be a useful shortcut. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695660230 From naoto at openjdk.org Thu Jan 15 19:42:19 2026 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 15 Jan 2026 19:42:19 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 18:29:34 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback src/java.base/share/classes/java/lang/String.java line 2127: > 2125: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. > 2126: * > 2127: * @implNote This method may allocate memory to compute the length for some charsets. Would it help if we describe the benefit of this method? Ie, for some charsets it won't allocate memory thus faster than getBytes(Charset)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695707693 From cushon at openjdk.org Thu Jan 15 20:05:04 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 20:05:04 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 19:39:28 GMT, Naoto Sato wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Review feedback > > src/java.base/share/classes/java/lang/String.java line 2127: > >> 2125: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. >> 2126: * >> 2127: * @implNote This method may allocate memory to compute the length for some charsets. > > Would it help if we describe the benefit of this method? Ie, for some charsets it won't allocate memory thus faster than getBytes(Charset)? I think that makes sense, I'm not sure what the best way to characterize it is. Probably we don't want to promise specific optimizations. What do you think about: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}, * and will have equivalent or better performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695775575 From cushon at openjdk.org Thu Jan 15 20:05:05 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 20:05:05 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> Message-ID: <1Lx3cEk1vZnCgoRMNbocsykbFMitu3LtLgAX3k8iAo8=.b8a22bac-246c-479c-8acf-175427e32e5b@github.com> On Thu, 15 Jan 2026 19:23:43 GMT, Roger Riggs wrote: > While is convenient that those UTF16 charsets have a easy to compute size, I doubt those two are in sufficient use to justify a commitment support them in the fast path. If you are going to support charsets beyond the most common utf8, ascii, and ISO-8856-1, then computing the encoded length should delegated to the Charset itself and have separate code in different packages. Thanks, that makes sense to me. My opinion is that a large amount of the value here is in optimizing UTF-8, and that there's an argument to optimize the other standard charsets that `String` has other fast paths for, but sharply diminishing returns beyond that. I would be inclined to stop at the standard charsets, but also happy to make changes if there's a preference for having more or fewer fast paths. > Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful for single byte formats, but if `maxBytesPerChar` is equal to `averageBytesPerChar` that might be a useful shortcut. I had a quick look at that, and saw errors for `IBM-Thai`: CharsetEncoder encoder = cs.newEncoder(); if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() == encoder.averageBytesPerChar()) { return value.length * (int) encoder.maxBytesPerChar(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695769015 From cushon at openjdk.org Thu Jan 15 20:07:58 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 20:07:58 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v9] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Mention performance in the docs, and drop UTF_16BE/LE fast paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/d725c8b1..81d132f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=07-08 Stats: 11 lines in 1 file changed: 1 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From rriggs at openjdk.org Thu Jan 15 20:27:57 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 20:27:57 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v7] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 17:51:57 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 1498: >> >>> 1496: if (length > (long)Integer.MAX_VALUE) { >>> 1497: throw new IllegalStateException("Required length exceeds implementation limit"); >>> 1498: } >> >> This is more like a should never reach here; the OOME thrown by encodedLengthUTF8_UTF16 should ocur. >> IllegalStateException usually refers to a programming error. >> The other occurrence like this throws OOME. > > Thanks, what do you think about refactoring the OOME into `encodedLengthUTF8_UTF16` and having it return `int`? That's fine, the `long` return was to simplify handling of too large returns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695831889 From naoto at openjdk.org Thu Jan 15 20:27:59 2026 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 15 Jan 2026 20:27:59 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 20:02:59 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 2127: >> >>> 2125: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. >>> 2126: * >>> 2127: * @implNote This method may allocate memory to compute the length for some charsets. >> >> Would it help if we describe the benefit of this method? Ie, for some charsets it won't allocate memory thus faster than getBytes(Charset)? > > I think that makes sense, I'm not sure what the best way to characterize it is. Probably we don't want to promise specific optimizations. What do you think about: > > > *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}, > * and will have equivalent or better performance. I think this won't be a normative spec, so I'd move the performance description to `@apiNote` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695837680 From rriggs at openjdk.org Thu Jan 15 20:28:00 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 15 Jan 2026 20:28:00 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: <1Lx3cEk1vZnCgoRMNbocsykbFMitu3LtLgAX3k8iAo8=.b8a22bac-246c-479c-8acf-175427e32e5b@github.com> References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> <1Lx3cEk1vZnCgoRMNbocsykbFMitu3LtLgAX3k8iAo8=.b8a22bac-246c-479c-8acf-175427e32e5b@github.com> Message-ID: <-_Cf-s2WvPvCPen3n-Nly4PPX8OZHOQmjKn3UJH95KU=.3afd762b-16a9-4327-9e0d-76d026d55779@github.com> On Thu, 15 Jan 2026 20:00:43 GMT, Liam Miller-Cushon wrote: >> While is convenient that those UTF16 charsets have a easy to compute size, I doubt those two are in sufficient use to justify a commitment support them in the fast path. >> If you are going to support charsets beyond the most common utf8, ascii, and ISO-8856-1, then >> computing the encoded length should delegated to the Charset itself and have separate code in different packages. >> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful for single byte formats, but if `maxBytesPerChar` is equal to `averageBytesPerChar` that might be a useful shortcut. > >> While is convenient that those UTF16 charsets have a easy to compute size, I doubt those two are in sufficient use to justify a commitment support them in the fast path. If you are going to support charsets beyond the most common utf8, ascii, and ISO-8856-1, then computing the encoded length should delegated to the Charset itself and have separate code in different packages. > > Thanks, that makes sense to me. My opinion is that a large amount of the value here is in optimizing UTF-8, and that there's an argument to optimize the other standard charsets that `String` has other fast paths for, but sharply diminishing returns beyond that. I would be inclined to stop at the standard charsets, but also happy to make changes if there's a preference for having more or fewer fast paths. > >> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful for single byte formats, but if `maxBytesPerChar` is equal to `averageBytesPerChar` that might be a useful shortcut. > > I had a quick look at that, and saw errors for `IBM-Thai`: > > > CharsetEncoder encoder = cs.newEncoder(); > if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() == encoder.averageBytesPerChar()) { > return value.length * (int) encoder.maxBytesPerChar(); > } Its good to start with only the most common Charsets, and see if the API is adopted and anyone comments on a performance problem with other Charsets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695835410 From cushon at openjdk.org Thu Jan 15 22:17:14 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 22:17:14 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v10] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: implNote ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/81d132f8..a6b37002 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=08-09 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Thu Jan 15 22:17:16 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Thu, 15 Jan 2026 22:17:16 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: Message-ID: On Thu, 15 Jan 2026 20:23:55 GMT, Naoto Sato wrote: >> I think that makes sense, I'm not sure what the best way to characterize it is. Probably we don't want to promise specific optimizations. What do you think about: >> >> >> *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}, >> * and will have equivalent or better performance. > > I think this won't be a normative spec, so I'd move the performance description to `@apiNote` Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2696149599 From naoto at openjdk.org Fri Jan 16 00:23:30 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 16 Jan 2026 00:23:30 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: Message-ID: <7w_dZZlP0wD8At5LkqzovVSKb8UAFy6fN3kLDH-7lRk=.4c1a29bf-a10e-4a12-8f18-ab92369ed7c1@github.com> On Thu, 15 Jan 2026 22:13:43 GMT, Liam Miller-Cushon wrote: >> I think this won't be a normative spec, so I'd move the performance description to `@apiNote` > > Done I think `@apiNote` is more appropriate here, as the message applies to the API users, not API implementors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2696423836 From cushon at openjdk.org Fri Jan 16 09:46:54 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 16 Jan 2026 09:46:54 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: References: Message-ID: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Switch to @apiNote ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/a6b37002..2614c356 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Fri Jan 16 09:46:55 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 16 Jan 2026 09:46:55 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: <7w_dZZlP0wD8At5LkqzovVSKb8UAFy6fN3kLDH-7lRk=.4c1a29bf-a10e-4a12-8f18-ab92369ed7c1@github.com> References: <7w_dZZlP0wD8At5LkqzovVSKb8UAFy6fN3kLDH-7lRk=.4c1a29bf-a10e-4a12-8f18-ab92369ed7c1@github.com> Message-ID: On Fri, 16 Jan 2026 00:19:51 GMT, Naoto Sato wrote: >> Done > > I think `@apiNote` is more appropriate here, as the message applies to the API users, not API implementors. Thanks, I switched to `@apiNote`. Is https://openjdk.org/jeps/8068562 a good reference for distinction between these tags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2697737150 From cushon at openjdk.org Fri Jan 16 11:07:56 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 16 Jan 2026 11:07:56 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5] In-Reply-To: <94U314HJBN96ktT1MqD-GOo43rAbnQZAEFcJzdHGa6E=.e9b215c1-b1ec-4522-b257-d098d4663783@github.com> References: <5zNvMwXRSYwIPJXv7E9WI7hFI4WKrAmYuOIkGr8LP6E=.191d9b99-851e-4b2f-b200-6a7ade231cb3@github.com> <94U314HJBN96ktT1MqD-GOo43rAbnQZAEFcJzdHGa6E=.e9b215c1-b1ec-4522-b257-d098d4663783@github.com> Message-ID: On Thu, 15 Jan 2026 16:30:26 GMT, Volkan Yazici wrote: >> Done, thanks > > The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. Thanks, I switched to just `@link` instead of `@linkplain`+`@code` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2698076823 From rriggs at openjdk.org Fri Jan 16 15:39:15 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 16 Jan 2026 15:39:15 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Fri, 16 Jan 2026 09:46:54 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Switch to @apiNote Looks good. A second reviewer is a good idea for new APIs. ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28454#pullrequestreview-3671471698 PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3760625524 From naoto at openjdk.org Fri Jan 16 18:34:04 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 16 Jan 2026 18:34:04 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Fri, 16 Jan 2026 09:46:54 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Switch to @apiNote Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28454#pullrequestreview-3672192064 From naoto at openjdk.org Fri Jan 16 18:34:05 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 16 Jan 2026 18:34:05 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v8] In-Reply-To: References: <7w_dZZlP0wD8At5LkqzovVSKb8UAFy6fN3kLDH-7lRk=.4c1a29bf-a10e-4a12-8f18-ab92369ed7c1@github.com> Message-ID: <3gBidrzlHSA7ev-2GRarh-WG5rjhK0uDoZfqcAVoO8c=.5ea2f86b-715d-4bea-8ab5-3a04db2a7459@github.com> On Fri, 16 Jan 2026 09:43:30 GMT, Liam Miller-Cushon wrote: > Is https://openjdk.org/jeps/8068562 a good reference for distinction between these tags? Yes. It does state that `@implNote` can state performance characteristics, but in this case I think `@apiNote` is more appropriate so that users will know the rationale behind the introduction of this method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2699558630 From liach at openjdk.org Sun Jan 18 08:16:30 2026 From: liach at openjdk.org (Chen Liang) Date: Sun, 18 Jan 2026 08:16:30 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Fri, 16 Jan 2026 09:46:54 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Switch to @apiNote Question: Have you considered the handling of replacement characters? They currently are counted into the returned length, but I wonder whether users actually want to print those characters as-is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3765042910 From alanb at openjdk.org Sun Jan 18 09:09:52 2026 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 18 Jan 2026 09:09:52 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Sun, 18 Jan 2026 08:13:40 GMT, Chen Liang wrote: > Question: Have you considered the handling of replacement characters? They currently are counted into the returned length, but I wonder whether users actually want to print those characters as-is. That is a good point. As `getBytes(Charset)` is specified to replace malformed-input and unmappable-character sequences, and the proposed method is specified to return the equivalent of `getBytes(Charset).length` then the returned length has to include them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3765082490 From cushon at openjdk.org Mon Jan 19 08:16:17 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 19 Jan 2026 08:16:17 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Sun, 18 Jan 2026 09:06:31 GMT, Alan Bateman wrote: > > Question: Have you considered the handling of replacement characters? They currently are counted into the returned length, but I wonder whether users actually want to print those characters as-is. > > That is a good point. As `getBytes(Charset)` is specified to replace malformed-input and unmappable-character sequences, and the proposed method is specified to return the equivalent of `getBytes(Charset).length` then the returned length has to include them. The motivating use cases I've seen for this method are to compute the length of encoded data that contains strings, where the strings would be encoded with `getBytes`. The CSR gives the example of encoding multiple large strings into a single array. Specifying the output in terms of `getBytes(cs).length` is necessary for that use-case, and requires the handling of replacement characters and unpaired surrogates to be the same between the two methods. Do you see alternatives that should be considered? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3767013988 From cushon at openjdk.org Mon Jan 19 11:34:10 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 19 Jan 2026 11:34:10 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Fri, 16 Jan 2026 09:46:54 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Switch to @apiNote src/java.base/share/classes/java/lang/String.java line 1169: > 1167: while (sp < sl) { > 1168: char c = StringUTF16.getChar(val, sp++); > 1169: if (c > 0x80) { The handling of `c == 0x80` isn't consistent with `encodedLengthASCII`, but also it doesn't matter because this is redundant with the `isHighSurrogate` test below. Thinking about this a little more, `encodedLength8859_1` and `encodedLengthASCII` are identical, and could be merged together. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2704380997 From cushon at openjdk.org Mon Jan 19 11:39:57 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 19 Jan 2026 11:39:57 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v12] In-Reply-To: References: Message-ID: <5KNXQmMrEmm3gY7F6idi3SYbA-QikCyVzwkuz3w6Clk=.a3c7039a-f554-436d-ae2d-25ce7cf09060@github.com> > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Merge and optimize latin1 and ascii paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/2614c356..fd989e87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=10-11 Stats: 41 lines in 1 file changed: 8 ins; 26 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Mon Jan 19 11:39:58 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 19 Jan 2026 11:39:58 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Mon, 19 Jan 2026 11:30:51 GMT, Liam Miller-Cushon wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to @apiNote > > src/java.base/share/classes/java/lang/String.java line 1169: > >> 1167: while (sp < sl) { >> 1168: char c = StringUTF16.getChar(val, sp++); >> 1169: if (c > 0x80) { > > The handling of `c == 0x80` isn't consistent with `encodedLengthASCII`, but also it doesn't matter because this is redundant with the `isHighSurrogate` test below. > > Thinking about this a little more, `encodedLength8859_1` and `encodedLengthASCII` are identical, and could be merged together. I have tentatively merged `encodedLength8859_1` and `encodedLengthASCII` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2704396396 From vyazici at openjdk.org Mon Jan 19 19:22:57 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 19 Jan 2026 19:22:57 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v12] In-Reply-To: <5KNXQmMrEmm3gY7F6idi3SYbA-QikCyVzwkuz3w6Clk=.a3c7039a-f554-436d-ae2d-25ce7cf09060@github.com> References: <5KNXQmMrEmm3gY7F6idi3SYbA-QikCyVzwkuz3w6Clk=.a3c7039a-f554-436d-ae2d-25ce7cf09060@github.com> Message-ID: On Mon, 19 Jan 2026 11:39:57 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Merge and optimize latin1 and ascii paths fd989e87da4 LGTM and I've confirmed that the following tests pass on all supported major platforms: test/jdk/java/lang/String/Encodings.java test/jdk/java/lang/String/Exceptions.java test/jdk/sun/nio/cs/TestStringCoding.java ------------- Marked as reviewed by vyazici (Committer). PR Review: https://git.openjdk.org/jdk/pull/28454#pullrequestreview-3679326198 From alanb at openjdk.org Mon Jan 19 19:36:22 2026 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 19 Jan 2026 19:36:22 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: On Mon, 19 Jan 2026 08:14:11 GMT, Liam Miller-Cushon wrote: > The motivating use cases I've seen for this method are to compute the length of encoded data that contains strings, where the strings would be encoded with `getBytes`. The CSR gives the example of encoding multiple large strings into a single array. Specifying the output in terms of `getBytes(cs).length` is necessary for that use-case, and requires the handling of replacement characters and unpaired surrogates to be the same between the two methods. Do you see alternatives that should be considered? The comment wasn't questing the addition. Instead we are saying that there is no mention of coding-error actions. More specifically, I think we should insert a sentence before "The result will be the same ..." to say that the returned length takes account of the replacement of malformed-input and unmappable-character sequences with the charset's default replacement byte array. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3769823410 From cushon at openjdk.org Mon Jan 19 22:06:45 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 19 Jan 2026 22:06:45 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v13] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Add a note about malformed-input and unmappable-character sequences ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/fd989e87..08929964 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=11-12 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Mon Jan 19 22:06:45 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Mon, 19 Jan 2026 22:06:45 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11] In-Reply-To: References: <8XbPxy1-8hhVxCIC6T4SXyH7v48UjNry2qJ_MQnqyfg=.f68b328d-dd68-43fc-9846-6bccf9dbf9ca@github.com> Message-ID: <86vTkfnNdJ6oX58wLLIQMRoEIGjXR71SU4EJu31lY18=.8c7a2c4d-4591-44e8-8dcc-cb5b3a33e434@github.com> On Mon, 19 Jan 2026 19:33:26 GMT, Alan Bateman wrote: > The comment wasn't questing the addition. Instead we are saying that there is no mention of coding-error actions. More specifically, I think we should insert a sentence before "The result will be the same ..." to say that the returned length takes account of the replacement of malformed-input and unmappable-character sequences with the charset's default replacement byte array. Got it, thanks! I added a sentence, suggestions welcome if you have a better phrasing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3770259967 From alanb at openjdk.org Tue Jan 20 10:13:15 2026 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 20 Jan 2026 10:13:15 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v13] In-Reply-To: References: Message-ID: On Mon, 19 Jan 2026 22:06:45 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Add a note about malformed-input and unmappable-character sequences src/java.base/share/classes/java/lang/String.java line 2112: > 2110: * sequences with the charset's default replacement byte array. > 2111: * > 2112: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. The wording is good. As "The result will be the same ..." follows the previous sentence then I think you can drop the paragraph tag so that it goes into the same paragraph. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2707658422 From cushon at openjdk.org Tue Jan 20 10:20:10 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Tue, 20 Jan 2026 10:20:10 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v14] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Remove paragraph break ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/08929964..77bc5b9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=12-13 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Tue Jan 20 10:20:14 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Tue, 20 Jan 2026 10:20:14 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v13] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 10:09:39 GMT, Alan Bateman wrote: >> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: >> >> Add a note about malformed-input and unmappable-character sequences > > src/java.base/share/classes/java/lang/String.java line 2112: > >> 2110: * sequences with the charset's default replacement byte array. >> 2111: * >> 2112: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. > > The wording is good. As "The result will be the same ..." follows the previous sentence then I think you can drop the paragraph tag so that it goes into the same paragraph. Thanks, done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2707682613 From alanb at openjdk.org Tue Jan 20 14:43:12 2026 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 20 Jan 2026 14:43:12 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v13] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 10:15:12 GMT, Liam Miller-Cushon wrote: >> src/java.base/share/classes/java/lang/String.java line 2112: >> >>> 2110: * sequences with the charset's default replacement byte array. >>> 2111: * >>> 2112: *

The result will be the same value as {@link #getBytes(Charset) getBytes(cs).length}. >> >> The wording is good. As "The result will be the same ..." follows the previous sentence then I think you can drop the paragraph tag so that it goes into the same paragraph. > > Thanks, done Update API docs looks okay to me. I cleaned up the CSR a bit, and added myself as Reviewer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2708647875 From dfenacci at openjdk.org Tue Jan 20 18:54:04 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 20 Jan 2026 18:54:04 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics Message-ID: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> ## Issue This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. ## Causes The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. ## Fix A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. # Testing * Tier 1-3+ * 2 JTReg tests added * `TestRangeCheck.java` as regression test for the reported issue * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion ------------- Commit messages: - JDK-8374852: revert unchanged tests - JDK-8374852: shorten line lenght in test - JDK-8374852: revert comment change - JDK-8374852: correct comment and make more concise - Update test/hotspot/jtreg/compiler/intrinsics/string/TestRangeCheck.java - JDK-8374852: fix generate_limit_guard opaque handling and remove unneeded positive flag - JDK-8374852: remove compileonly - JDK-8374852: remove VerifyIntrinsicChecks and refactor opaque flag - JDK-8374852: add forgotten opaque guard node handling in clone_iff - JDK-8374852: 120 max char for comment - ... and 8 more: https://git.openjdk.org/jdk/compare/6d1bfdf7...ff228576 Changes: https://git.openjdk.org/jdk/pull/29164/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374582 Stats: 435 lines in 28 files changed: 328 ins; 20 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From vyazici at openjdk.org Tue Jan 20 18:54:09 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Tue, 20 Jan 2026 18:54:09 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> On Mon, 12 Jan 2026 10:29:39 GMT, Damon Fenacci wrote: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Marked as reviewed by vyazici (Committer). Verified that 3c466d372b7 is a clean revert of 7e18de137c3 delivered in [JDK-8374210]. [JDK-8374210]: https://bugs.openjdk.org/browse/JDK-8374210 src/hotspot/share/opto/c2_globals.hpp line 680: > 678: develop(bool, VerifyIntrinsicChecks, false, \ > 679: "Verify in intrinsic that Java level checks work as expected") \ > 680: \ I suggest removing the `VerifyIntrinsicChecks` flag. Given `OpaqueGuard` already verifies the value when `#ifdef ASSERT`, does `VerifyIntrinsicChecks` serve any purpose anymore? src/hotspot/share/opto/library_call.hpp line 170: > 168: Node* length, bool char_count, > 169: bool halt_on_oob = false, > 170: bool is_opaque = false); Do we really need to introduce two new toggles: `halt_on_oob` and `is_opaque`? At all call-sites either one of the following is used: 1. `halt_on_oob=true, is_opaque=!VerifyIntrinsicChecks` 2. defaults (i.e., `halt_on_oob=is_opaque=false`) Can we instead only settle one, e.g., `halt_on_oob=VerifyIntrinsicChecks`? src/hotspot/share/opto/loopopts.cpp line 1: > 1: /* What is the reason that the new `OpaqueGuard` is not taken into account in `PhaseIdealLoop::clone_iff`? src/hotspot/share/opto/macro.cpp line 2565: > 2563: // Tests with OpaqueGuard nodes are implicitly known to be true or false. Replace the node with appropriate value. In debug builds, > 2564: // we leave the test in the graph to have an additional sanity check at runtime. If the test fails (i.e. a bug), > 2565: // we will execute a Halt node. *Nit:* Can we adhere to the max. 120 (or even better, 80!) characters per line limit of the file? src/hotspot/share/opto/macro.cpp line 2569: > 2567: _igvn.replace_node(n, n->in(1)); > 2568: #else > 2569: _igvn.replace_node(n, _igvn.intcon(0)); Curious: why do we invoke `intcon(0)` for `OpaqueGuard`, whereas it was `intcon(1)` for `OpaqueNotNull` slightly above? src/hotspot/share/opto/opaquenode.hpp line 160: > 158: // we keep the actual checks as additional verification code (i.e. removing OpaqueGuardNode and use the BoolNode > 159: // inputs instead). > 160: class OpaqueGuardNode : public Node { With the `OpaqueGuardNode::is_positive` flag gone, `OpaqueGuardNode` looks pretty much identical to `OpaqueNotNullNode`. Is there a code reuse opportunity we can take advantage of? test/hotspot/jtreg/compiler/intrinsics/TestVerifyIntrinsicChecks.java line 1: > 1: /* Since the `VerifyIntrinsicChecks` flag is gone, AFAICT, all following changes can be reverted: git rm test/hotspot/jtreg/compiler/intrinsics/TestVerifyIntrinsicChecks.java git checkout upstream/HEAD -- \ test/hotspot/jtreg/compiler/intrinsics/string/TestCountPositives.java \ test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java \ test/hotspot/jtreg/compiler/intrinsics/string/TestHasNegatives.java \ test/hotspot/jtreg/compiler/patches/java.base/java/lang/Helper.java test/hotspot/jtreg/compiler/intrinsics/string/TestRangeCheck.java line 32: > 30: * -XX:CompileCommand=inline,java.lang.StringCoding::* > 31: * -XX:CompileCommand=exclude,jdk.internal.util.Preconditions::checkFromIndexSize > 32: * -XX:CompileCommand=compileonly,compiler.intrinsics.string.TestRangeCheck::test Is this necessary? (This wasn't used in `TestStringConstruction`.) test/hotspot/jtreg/compiler/intrinsics/string/TestRangeCheck.java line 58: > 56: // cut off the dead code. As a result, -1 is fed as input into the > 57: // StringCoding::countPositives0 intrinsic which is replaced by TOP and causes a > 58: // failure in the matcher. I'd appreciate it if we can be more elaborate for less C2-illiterate people like myself. ? Suggestion: // Calling `StringCoding::countPositives`, which is a "front door" // to the `StringCoding::countPositives0` intrinsic. // `countPositives` validates its input using // `Preconditions::checkFromIndexSize`, which also maps to an // intrinsic. When `checkFromIndexSize` is not inlined, C2 does not // know about the explicit range checks, and does not cut off the // dead code. As a result, an invalid value (e.g., `-1`) can be fed // as input into the `countPositives0` intrinsic, got replaced // by TOP, and cause a failure in the matcher. ------------- PR Review: https://git.openjdk.org/jdk/pull/29164#pullrequestreview-3681112226 PR Comment: https://git.openjdk.org/jdk/pull/29164#issuecomment-3738455817 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2689568427 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2687948444 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2685859575 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2685838328 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2705884654 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2705885810 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2704760982 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2689735070 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2689780537 From dfenacci at openjdk.org Tue Jan 20 18:54:10 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 20 Jan 2026 18:54:10 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Mon, 12 Jan 2026 13:03:58 GMT, Volkan Yazici wrote: >> ## Issue >> >> This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. >> >> This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. >> >> ## Causes >> >> The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. >> >> ## Fix >> >> A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: >> https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 >> This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. >> >> # Testing >> >> * Tier 1-3+ >> * 2 JTReg tests added >> * `TestRangeCheck.java` as regression test for the reported issue >> * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion > > Verified that 3c466d372b7 is a clean revert of 7e18de137c3 delivered in [JDK-8374210]. > > [JDK-8374210]: https://bugs.openjdk.org/browse/JDK-8374210 Thanks for your review @vy. In addition to the changes you suggested I also fixed the opaque node value in `LibraryCallKit::generate_limit_guard` which was wrong (I then removed the `is_positive` flag altogether since it was `false` in both cases) and added `TestOpaqueGuardNodes.java` to test that the opaque nodes are added and later removed. > src/hotspot/share/opto/c2_globals.hpp line 680: > >> 678: develop(bool, VerifyIntrinsicChecks, false, \ >> 679: "Verify in intrinsic that Java level checks work as expected") \ >> 680: \ > > I suggest removing the `VerifyIntrinsicChecks` flag. Given `OpaqueGuard` already verifies the value when `#ifdef ASSERT`, does `VerifyIntrinsicChecks` serve any purpose anymore? Done. > src/hotspot/share/opto/loopopts.cpp line 1: > >> 1: /* > > What is the reason that the new `OpaqueGuard` is not taken into account in `PhaseIdealLoop::clone_iff`? Oversight ? Thanks! Fixed. > src/hotspot/share/opto/macro.cpp line 2565: > >> 2563: // Tests with OpaqueGuard nodes are implicitly known to be true or false. Replace the node with appropriate value. In debug builds, >> 2564: // we leave the test in the graph to have an additional sanity check at runtime. If the test fails (i.e. a bug), >> 2565: // we will execute a Halt node. > > *Nit:* Can we adhere to the max. 120 (or even better, 80!) characters per line limit of the file? Fair enough (good to know: I wasn't aware of such limit). > src/hotspot/share/opto/macro.cpp line 2569: > >> 2567: _igvn.replace_node(n, n->in(1)); >> 2568: #else >> 2569: _igvn.replace_node(n, _igvn.intcon(0)); > > Curious: why do we invoke `intcon(0)` for `OpaqueGuard`, whereas it was `intcon(1)` for `OpaqueNotNull` slightly above? In `OpaqueGuard`'s case we know that the input is always "false" (so, we set 0 as its input). For `OpaqueNotNull` we know that the input is always "true" (so, we set 1 as its input). > src/hotspot/share/opto/opaquenode.hpp line 160: > >> 158: // we keep the actual checks as additional verification code (i.e. removing OpaqueGuardNode and use the BoolNode >> 159: // inputs instead). >> 160: class OpaqueGuardNode : public Node { > > With the `OpaqueGuardNode::is_positive` flag gone, `OpaqueGuardNode` looks pretty much identical to `OpaqueNotNullNode`. Is there a code reuse opportunity we can take advantage of? It is true that they do pretty much the same thing ("avoid" C2 optimisations for checks) but I'd argue they are semantically slightly different: one prevents optimisations where we know the value cannot be null, the other where we know the value is in range. We could actually have only one class (e.g. with a `positive` flag like before) but I'm not sure it would be a cleaner/nicer solution. ? > test/hotspot/jtreg/compiler/intrinsics/TestVerifyIntrinsicChecks.java line 1: > >> 1: /* > > Since the `VerifyIntrinsicChecks` flag is gone, AFAICT, all following changes can be reverted: > > > git rm test/hotspot/jtreg/compiler/intrinsics/TestVerifyIntrinsicChecks.java > git checkout upstream/HEAD -- \ > test/hotspot/jtreg/compiler/intrinsics/string/TestCountPositives.java \ > test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java \ > test/hotspot/jtreg/compiler/intrinsics/string/TestHasNegatives.java \ > test/hotspot/jtreg/compiler/patches/java.base/java/lang/Helper.java Totally. Done. > test/hotspot/jtreg/compiler/intrinsics/string/TestRangeCheck.java line 32: > >> 30: * -XX:CompileCommand=inline,java.lang.StringCoding::* >> 31: * -XX:CompileCommand=exclude,jdk.internal.util.Preconditions::checkFromIndexSize >> 32: * -XX:CompileCommand=compileonly,compiler.intrinsics.string.TestRangeCheck::test > > Is this necessary? (This wasn't used in `TestStringConstruction`.) Nope (leftover from debugging). Removed ------------- PR Comment: https://git.openjdk.org/jdk/pull/29164#issuecomment-3755585217 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2694787876 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2694785777 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2694785429 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2707280040 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2707283139 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2707272331 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2694788432 From vyazici at openjdk.org Tue Jan 20 18:54:11 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Tue, 20 Jan 2026 18:54:11 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Tue, 13 Jan 2026 20:01:31 GMT, Volkan Yazici wrote: >> ## Issue >> >> This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. >> >> This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. >> >> ## Causes >> >> The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. >> >> ## Fix >> >> A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: >> https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 >> This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. >> >> # Testing >> >> * Tier 1-3+ >> * 2 JTReg tests added >> * `TestRangeCheck.java` as regression test for the reported issue >> * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion > > src/hotspot/share/opto/library_call.hpp line 170: > >> 168: Node* length, bool char_count, >> 169: bool halt_on_oob = false, >> 170: bool is_opaque = false); > > Do we really need to introduce two new toggles: `halt_on_oob` and `is_opaque`? At all call-sites either one of the following is used: > > 1. `halt_on_oob=true, is_opaque=!VerifyIntrinsicChecks` > 2. defaults (i.e., `halt_on_oob=is_opaque=false`) > > Can we instead only settle one, e.g., `halt_on_oob=VerifyIntrinsicChecks`? Giving this a second thought, do we need these two flags anyway? That is, 1. We can remove `if (is_opaque)` add the `OpaqueGuard` anyway, since it is ineffective for `!ASSERT`. (This is what `must_be_not_null` does too.) 2. We can replace `if (halt_on_oob) { ... } else { ... }` with `#ifdef ASSERT ...`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2689690430 From dfenacci at openjdk.org Tue Jan 20 18:54:12 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 20 Jan 2026 18:54:12 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Wed, 14 Jan 2026 09:38:40 GMT, Volkan Yazici wrote: >> src/hotspot/share/opto/library_call.hpp line 170: >> >>> 168: Node* length, bool char_count, >>> 169: bool halt_on_oob = false, >>> 170: bool is_opaque = false); >> >> Do we really need to introduce two new toggles: `halt_on_oob` and `is_opaque`? At all call-sites either one of the following is used: >> >> 1. `halt_on_oob=true, is_opaque=!VerifyIntrinsicChecks` >> 2. defaults (i.e., `halt_on_oob=is_opaque=false`) >> >> Can we instead only settle one, e.g., `halt_on_oob=VerifyIntrinsicChecks`? > > Giving this a second thought, do we need these two flags anyway? That is, > > 1. We can remove `if (is_opaque)` add the `OpaqueGuard` anyway, since it is ineffective for `!ASSERT`. (This is what `must_be_not_null` does too.) > 2. We can replace `if (halt_on_oob) { ... } else { ... }` with `#ifdef ASSERT ...`. 1. I'm not sure we can always do that: `LibraryCallKit::generate_string_range_check` is called from places that don't yet have Java range checks and we must not add an opaque node in those cases (or we end up without checks in prod builds). 2. For a similar reason I'd leave `if (halt_on_oob)` condition: for calls to `LibraryCallKit::generate_string_range_check` that don't yet have Java range checks the method behaves like it did before. For "new" calls it adds the `Halt` node (which will then be removed together with the guard in prod builds). So, on the one hand we can keep `halt_on_oob` alone as discriminant between "new" and "old" call sites. On the other we can get rid of `VerifyIntrinsicChecks` because we implicitly add the additional range check in debug builds (always). I've modified the code accordingly. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2694787303 From vyazici at openjdk.org Tue Jan 20 18:54:12 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Tue, 20 Jan 2026 18:54:12 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Tue, 20 Jan 2026 08:27:59 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/opaquenode.hpp line 160: >> >>> 158: // we keep the actual checks as additional verification code (i.e. removing OpaqueGuardNode and use the BoolNode >>> 159: // inputs instead). >>> 160: class OpaqueGuardNode : public Node { >> >> With the `OpaqueGuardNode::is_positive` flag gone, `OpaqueGuardNode` looks pretty much identical to `OpaqueNotNullNode`. Is there a code reuse opportunity we can take advantage of? > > It is true that they do pretty much the same thing ("avoid" C2 optimisations for checks) but I'd argue they are semantically slightly different: one prevents optimisations where we know the value cannot be null, the other where we know the value is in range. We could actually have only one class (e.g. with a `positive` flag like before) but I'm not sure it would be a cleaner/nicer solution. ? Fair enough ? I was just curious. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2707418513 From duke at openjdk.org Tue Jan 20 18:54:14 2026 From: duke at openjdk.org (ExE Boss) Date: Tue, 20 Jan 2026 18:54:14 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Wed, 14 Jan 2026 10:05:23 GMT, Volkan Yazici wrote: >> ## Issue >> >> This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. >> >> This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. >> >> ## Causes >> >> The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. >> >> ## Fix >> >> A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: >> https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 >> This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. >> >> # Testing >> >> * Tier 1-3+ >> * 2 JTReg tests added >> * `TestRangeCheck.java` as regression test for the reported issue >> * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion > > test/hotspot/jtreg/compiler/intrinsics/string/TestRangeCheck.java line 58: > >> 56: // cut off the dead code. As a result, -1 is fed as input into the >> 57: // StringCoding::countPositives0 intrinsic which is replaced by TOP and causes a >> 58: // failure in the matcher. > > I'd appreciate it if we can be more elaborate for less C2-illiterate people like myself. ? > > Suggestion: > > // Calling `StringCoding::countPositives`, which is a "front door" > // to the `StringCoding::countPositives0` intrinsic. > // `countPositives` validates its input using > // `Preconditions::checkFromIndexSize`, which also maps to an > // intrinsic. When `checkFromIndexSize` is not inlined, C2 does not > // know about the explicit range checks, and does not cut off the > // dead code. As a result, an invalid value (e.g., `-1`) can be fed > // as input into the `countPositives0` intrinsic, got replaced > // by TOP, and cause a failure in the matcher. **Nit:** Using??get??here is?grammatically?better: // intrinsic. When `checkFromIndexSize` is not inlined, C2 does not // know about the explicit range checks, and does not cut off the - // as input into the `countPositives0` intrinsic, got replaced + // as input into the `countPositives0` intrinsic, get replaced // by TOP, and cause a failure in the matcher. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2690260434 From dfenacci at openjdk.org Tue Jan 20 18:54:14 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 20 Jan 2026 18:54:14 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Wed, 14 Jan 2026 12:34:51 GMT, ExE Boss wrote: >> test/hotspot/jtreg/compiler/intrinsics/string/TestRangeCheck.java line 58: >> >>> 56: // cut off the dead code. As a result, -1 is fed as input into the >>> 57: // StringCoding::countPositives0 intrinsic which is replaced by TOP and causes a >>> 58: // failure in the matcher. >> >> I'd appreciate it if we can be more elaborate for less C2-illiterate people like myself. ? >> >> Suggestion: >> >> // Calling `StringCoding::countPositives`, which is a "front door" >> // to the `StringCoding::countPositives0` intrinsic. >> // `countPositives` validates its input using >> // `Preconditions::checkFromIndexSize`, which also maps to an >> // intrinsic. When `checkFromIndexSize` is not inlined, C2 does not >> // know about the explicit range checks, and does not cut off the >> // dead code. As a result, an invalid value (e.g., `-1`) can be fed >> // as input into the `countPositives0` intrinsic, got replaced >> // by TOP, and cause a failure in the matcher. > > **Nit:** Using??get??here is?grammatically?better: > > // intrinsic. When `checkFromIndexSize` is not inlined, C2 does not > // know about the explicit range checks, and does not cut off the > - // as input into the `countPositives0` intrinsic, got replaced > + // as input into the `countPositives0` intrinsic, get replaced > // by TOP, and cause a failure in the matcher. Done. Thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2694948915 From naoto at openjdk.org Tue Jan 20 21:45:57 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 20 Jan 2026 21:45:57 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v14] In-Reply-To: References: Message-ID: On Tue, 20 Jan 2026 10:20:10 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Remove paragraph break Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28454#pullrequestreview-3684413844 From liach at openjdk.org Fri Jan 23 00:14:18 2026 From: liach at openjdk.org (Chen Liang) Date: Fri, 23 Jan 2026 00:14:18 GMT Subject: RFR: 8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance [v7] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 10:02:36 GMT, Shaojin Wen wrote: >> This PR optimizes the parsing performance of DateTimeFormatter by replacing HashMap with EnumMap in scenarios where the keys are exclusively ChronoField enum values. >> >> When parsing date/time strings, DateTimeFormatter creates HashMaps to store intermediate parsed values. HashMap has more overhead for operations compared to specialized map implementations. >> >> Since ChronoField is an enum and all keys in these maps are ChronoField instances, we can use EnumMap instead, which provides better performance for enum keys due to its optimized internal structure. >> >> Parsing scenarios show improvements from 12% to 95% > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > remove redundant checkField Just noted the Map is passed to both: 1. `Chronology.resolveDate(Map, ResolverStyle)` 2. `TemporalField.resolve(Map, TemporalAccessor, ResolverStyle)` We need to ensure there is no custom `Chronology` for this optimization. ------------- Changes requested by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28471#pullrequestreview-3695184550 From liach at openjdk.org Fri Jan 23 00:17:56 2026 From: liach at openjdk.org (Chen Liang) Date: Fri, 23 Jan 2026 00:17:56 GMT Subject: RFR: 8372460: Use EnumMap instead of HashMap for DateTimeFormatter parsing to improve performance [v7] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 10:02:36 GMT, Shaojin Wen wrote: >> This PR optimizes the parsing performance of DateTimeFormatter by replacing HashMap with EnumMap in scenarios where the keys are exclusively ChronoField enum values. >> >> When parsing date/time strings, DateTimeFormatter creates HashMaps to store intermediate parsed values. HashMap has more overhead for operations compared to specialized map implementations. >> >> Since ChronoField is an enum and all keys in these maps are ChronoField instances, we can use EnumMap instead, which provides better performance for enum keys due to its optimized internal structure. >> >> Parsing scenarios show improvements from 12% to 95% > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > remove redundant checkField Given this wide exposure, I think we might still go back to a custom Map - `Parsed` has little control over Chronology. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28471#issuecomment-3787511735 From vyazici at openjdk.org Fri Jan 23 13:50:39 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Fri, 23 Jan 2026 13:50:39 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Mon, 12 Jan 2026 10:29:39 GMT, Damon Fenacci wrote: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion I'd like to provide some help for reviewers: 1. [JDK-8361842] (integrated in 655dc516c22) implemented changes for `java.lang.StringCoding` 2. [JDK-8374210] (integrated in 7e18de137c3) reported regressions against JDK-8361842, and used as the BACKOUT issue. 3. [JDK-8374582] (this PR) is the REDO of JDK-8361842, plus the fix for regressions reported in JDK-8374210 That is, this PR starts with 3c466d372b7 (i.e, the revert of 7e18de137c3), and continues with the fix, which is **the interesting part, and that can be viewed by diff'ing 3c466d372b7...ff22857609d**. (ff22857609d is the last commit as of date.) [JDK-8361842]: https://bugs.openjdk.org/browse/JDK-8361842 [JDK-8374210]: https://bugs.openjdk.org/browse/JDK-8374210 [JDK-8374582]: https://bugs.openjdk.org/browse/JDK-8374582 ------------- PR Comment: https://git.openjdk.org/jdk/pull/29164#issuecomment-3790314570 From cushon at openjdk.org Fri Jan 23 14:28:30 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 23 Jan 2026 14:28:30 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v15] In-Reply-To: References: Message-ID: <0zrTlZ3EhMyHLYXbqHOafsaywNA9TOJbf-2fSs3MPsI=.d60ca8e5-9816-4136-b65b-3cbb1dc1e6ff@github.com> > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Clarify that "It" in the javadoc means "This method" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/77bc5b9e..51bf1510 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From naoto at openjdk.org Fri Jan 23 20:41:23 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 23 Jan 2026 20:41:23 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets Message-ID: This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/29393/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8210336 Stats: 53 lines in 3 files changed: 49 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/29393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29393/head:pull/29393 PR: https://git.openjdk.org/jdk/pull/29393 From liach at openjdk.org Mon Jan 26 01:51:00 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 01:51:00 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: <0unjF0mIb3vR8aHBAq6Q_ioqtjwRerQl_slPxcDB5wY=.e9dea4be-2ac8-47f8-a975-d6452f8b4b61@github.com> On Sat, 26 Jul 2025 10:10:40 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with four additional commits since the last revision: > > - Update `@bug` in correct file > - Add default implementation on codePointCount in CharSequence > - Update `@bug` entries in test class doc comments > - Discard changes on code whose form is not `str.codePointCount(0, str.length())` Sorry for taking a million years but I finally uploaded your CSR to JBS. Was busy with Valhalla and other things. I proofread and thought it is fine, and asked other JDK engineers to review the CSR too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3797589845 From liach at openjdk.org Mon Jan 26 02:04:53 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 02:04:53 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 20:34:24 GMT, Naoto Sato wrote: > This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. The time tck tests only have toString<>parse round trip and bad parse data. Do you think we need a third type of data for valid parsing but not recoverable in toString? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29393#issuecomment-3797609387 From naoto at openjdk.org Mon Jan 26 18:54:54 2026 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 26 Jan 2026 18:54:54 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v2] In-Reply-To: References: Message-ID: > This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: Added a test that asserts parse with hour-only offset ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29393/files - new: https://git.openjdk.org/jdk/pull/29393/files/14c4beb1..1b253613 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=00-01 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29393/head:pull/29393 PR: https://git.openjdk.org/jdk/pull/29393 From liach at openjdk.org Mon Jan 26 19:34:06 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 19:34:06 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v2] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 18:54:54 GMT, Naoto Sato wrote: >> This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. > > Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: > > Added a test that asserts parse with hour-only offset Thanks, the new test looks good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29393#issuecomment-3801338775 From liach at openjdk.org Mon Jan 26 19:39:51 2026 From: liach at openjdk.org (Chen Liang) Date: Mon, 26 Jan 2026 19:39:51 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Sat, 26 Jul 2025 10:10:40 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with four additional commits since the last revision: > > - Update `@bug` in correct file > - Add default implementation on codePointCount in CharSequence > - Update `@bug` entries in test class doc comments > - Discard changes on code whose form is not `str.codePointCount(0, str.length())` src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 539: > 537: * @return the number of Unicode code points in this String > 538: * @since 26 > 539: */ Suggestion: /** * @since 27 */ src/java.base/share/classes/java/lang/CharSequence.java line 262: > 260: * > 261: * @return the number of Unicode code points in this sequence > 262: * @since 26 Suggestion: * {@return the number of Unicode code points in this character sequence} * Unpaired surrogates count as one code point each. * * @since 27 src/java.base/share/classes/java/lang/Character.java line 9965: > 9963: * @since 26 > 9964: */ > 9965: public static int codePointCount(CharSequence seq) { Let's remove this method, given we have a method on CharSequence already. src/java.base/share/classes/java/lang/Character.java line 10011: > 10009: * @throws NullPointerException if {@code a} is null. > 10010: * @since 26 > 10011: */ Suggestion: /** * {@return the number of Unicode code points in the {@code char} array} * Unpaired surrogates count as one code point each. * * @param a the {@code char} array * @throws NullPointerException if {@code a} is null * @since 27 */ src/java.base/share/classes/java/lang/String.java line 1723: > 1721: * > 1722: * @return the number of Unicode code points in this String > 1723: * @since 26 Suggestion: * {@return the number of Unicode code points in this String} * Unpaired surrogates count as one code point each. * * @since 27 src/java.base/share/classes/java/lang/StringBuffer.java line 274: > 272: > 273: /** > 274: * @since 26 Suggestion: * @since 27 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2728958484 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2728956859 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2728961159 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2728967673 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2728954676 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2728962198 From rriggs at openjdk.org Mon Jan 26 22:53:52 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 26 Jan 2026 22:53:52 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Sat, 26 Jul 2025 10:10:40 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with four additional commits since the last revision: > > - Update `@bug` in correct file > - Add default implementation on codePointCount in CharSequence > - Update `@bug` entries in test class doc comments > - Discard changes on code whose form is not `str.codePointCount(0, str.length())` src/java.base/share/classes/java/lang/Character.java line 10012: > 10010: * @since 26 > 10011: */ > 10012: public static int codePointCount(char[] a) { Regardless of the current usage, the parameter name `a` results in a confusing javadoc line. "a the". My preference would be one of `chars`, `seq`, or .... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2729580337 From duke at openjdk.org Mon Jan 26 23:44:34 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Mon, 26 Jan 2026 23:44:34 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v4] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Improve JavaDoc Co-authored-by: Chen Liang ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/0e55e35c..4744ee69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=02-03 Stats: 23 lines in 5 files changed: 0 ins; 11 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From naoto at openjdk.org Mon Jan 26 23:58:13 2026 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 26 Jan 2026 23:58:13 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 23:44:34 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Improve JavaDoc > > Co-authored-by: Chen Liang src/java.base/share/classes/java/lang/Character.java line 10004: > 10002: /** > 10003: * {@return the number of Unicode code points in the {@code char} array} > 10004: * Unpaired surrogates count as one code point each. It'd be better to replace "surrogates" with "surrogate code units." Applies to other method descriptions too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2729719433 From duke at openjdk.org Mon Jan 26 23:58:14 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Mon, 26 Jan 2026 23:58:14 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 23:53:06 GMT, Naoto Sato wrote: >> Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve JavaDoc >> >> Co-authored-by: Chen Liang > > src/java.base/share/classes/java/lang/Character.java line 10004: > >> 10002: /** >> 10003: * {@return the number of Unicode code points in the {@code char} array} >> 10004: * Unpaired surrogates count as one code point each. > > It'd be better to replace "surrogates" with "surrogate code units." Applies to other method descriptions too. Agree. I have personally disliked the expression "surrogate" for such code units. We will need another tickets for JavaDoc for the other methods that are not concerned with this ticket. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2729723123 From duke at openjdk.org Tue Jan 27 03:43:39 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 03:43:39 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 23:55:12 GMT, Tatsunori Uchino wrote: >> src/java.base/share/classes/java/lang/Character.java line 10004: >> >>> 10002: /** >>> 10003: * {@return the number of Unicode code points in the {@code char} array} >>> 10004: * Unpaired surrogates count as one code point each. >> >> It'd be better to replace "surrogates" with "surrogate code units." Applies to other method descriptions too. > > Agree. I have personally disliked the expression "surrogate" for such code units. We will need another tickets for JavaDoc for the other methods that are not concerned with this ticket. [Unicode seems to treat "_isolated_ surrogate code unit" as a first-class citizen.](https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654) https://www.unicode.org/charts/PDF/UDC00.pdf https://www.google.com/search?q=site:unicode.org+%22isolated+surrogate+code%22 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2730108791 From duke at openjdk.org Tue Jan 27 03:49:34 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 03:49:34 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v5] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Rename parameter names from `a` to `seq` `chars` is too confusing with `char` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/4744ee69..6d2805d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Tue Jan 27 03:49:37 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 03:49:37 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 22:51:26 GMT, Roger Riggs wrote: >> Tatsunori Uchino has updated the pull request incrementally with four additional commits since the last revision: >> >> - Update `@bug` in correct file >> - Add default implementation on codePointCount in CharSequence >> - Update `@bug` entries in test class doc comments >> - Discard changes on code whose form is not `str.codePointCount(0, str.length())` > > src/java.base/share/classes/java/lang/Character.java line 10012: > >> 10010: * @since 26 >> 10011: */ >> 10012: public static int codePointCount(char[] a) { > > Regardless of the current usage, the parameter name `a` results in a confusing javadoc line. "a the". > My preference would be one of `chars`, `seq`, or .... I found `chars` is too confusing with `char` in JSDoc. I'll chose `seq`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2730114109 From chagedorn at openjdk.org Tue Jan 27 09:33:02 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 09:33:02 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Mon, 12 Jan 2026 10:29:39 GMT, Damon Fenacci wrote: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Overall, the fix idea with `Opaque` nodes looks good to me! src/hotspot/share/opto/library_call.hpp line 161: > 159: Node* generate_negative_guard(Node* index, RegionNode* region, > 160: // resulting CastII of index: > 161: Node* *pos_index = nullptr, Suggestion: Node** pos_index = nullptr, ------------- PR Review: https://git.openjdk.org/jdk/pull/29164#pullrequestreview-3710064929 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2731064422 From chagedorn at openjdk.org Tue Jan 27 09:33:03 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 Jan 2026 09:33:03 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Tue, 20 Jan 2026 09:06:27 GMT, Volkan Yazici wrote: >> It is true that they do pretty much the same thing ("avoid" C2 optimisations for checks) but I'd argue they are semantically slightly different: one prevents optimisations where we know the value cannot be null, the other where we know the value is in range. We could actually have only one class (e.g. with a `positive` flag like before) but I'm not sure it would be a cleaner/nicer solution. ? > > Fair enough ? I was just curious. I was about to ask the same question. It seems like both `OpaqueNotNullNode` and `OpaqueGuardNode` behave the same apart from eventually folding to a false or true constant. They might have slightly different reasons for adding them but AFAIU, they are both intended to keep control and data in sync. Apart from duplicating most of the logic and comments, an additional challenge with having both nodes is that we need to special case both nodes at various points in the code which makes it more complex and raises the question if we could really observe them both or not (would not be a problem when only having one node type). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2731057975 From duke at openjdk.org Tue Jan 27 11:01:29 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 11:01:29 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v6] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Remove `Character.codePointCount` overload ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/6d2805d3..9af51fc7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=04-05 Stats: 22 lines in 1 file changed: 0 ins; 22 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Tue Jan 27 11:04:49 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 11:04:49 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v6] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 11:01:29 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Remove `Character.codePointCount` overload Do I need to merge master? It looks like the CI is failing for reasons unrelated to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3804516489 From duke at openjdk.org Tue Jan 27 11:33:21 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 11:33:21 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v7] In-Reply-To: References: Message-ID: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Replace "unpaired surrogates" with "isolated surrogate code units" https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 https://www.unicode.org/charts/PDF/UDC00.pdf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/9af51fc7..471b4308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=05-06 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Tue Jan 27 13:58:31 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Tue, 27 Jan 2026 13:58:31 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v7] In-Reply-To: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> References: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> Message-ID: On Tue, 27 Jan 2026 11:33:21 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Replace "unpaired surrogates" with "isolated surrogate code units" > > https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 > https://www.unicode.org/charts/PDF/UDC00.pdf Do we need to remove or preserve `Character.codePointCount(char[])`? I think it's inconsistent to have codePointCount(char[]) and codePointCount(CharSequence, int, int) but not codePointCount(CharSequence). We may need to update the CSR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3805342031 From dfenacci at openjdk.org Tue Jan 27 16:22:52 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 27 Jan 2026 16:22:52 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v2] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8374852: fix macro expansion for OpaqueCheck - JDK-8374852: use only one opaque node ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/ff228576..b79738c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=00-01 Stats: 95 lines in 15 files changed: 11 ins; 42 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From dfenacci at openjdk.org Tue Jan 27 16:22:52 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 27 Jan 2026 16:22:52 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v2] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> <6E6brqNG-kkjvis3nNZrX5YFDNX5dRNTS2igk2BjVzs=.4d6ab39a-b6de-49f1-a51f-6723e8c59833@github.com> Message-ID: On Tue, 27 Jan 2026 09:24:34 GMT, Christian Hagedorn wrote: >> Fair enough ? I was just curious. > > I was about to ask the same question. It seems like both `OpaqueNotNullNode` and `OpaqueGuardNode` behave the same apart from eventually folding to a false or true constant. They might have slightly different reasons for adding them but AFAIU, they are both intended to keep control and data in sync. Apart from duplicating most of the logic and comments, an additional challenge with having both nodes is that we need to special case both nodes at various points in the code which makes it more complex and raises the question if we could really observe them both or not (would not be a problem when only having one node type). Thanks for reviewing @chhagedorn. > special case both nodes at various points Good point. I guess better have only one after all. Changed. (I called it `OpaqueCheck` for lack of a better idea) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2732811990 From dfenacci at openjdk.org Tue Jan 27 16:26:19 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 27 Jan 2026 16:26:19 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v3] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8374852: fix star layout Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/b79738c3..0ef73ef9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From naoto at openjdk.org Tue Jan 27 21:22:28 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 27 Jan 2026 21:22:28 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v3] In-Reply-To: References: Message-ID: <2WlcRXVtwHGmCe0zREIeiY_RCmGPllR0but9viDs5l4=.f6ce512b-52bc-4ddb-ae43-065c658e64da@github.com> > This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: Modified the test to cover all ISO formatters (sans *LOCAL*), not only the changed ones ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29393/files - new: https://git.openjdk.org/jdk/pull/29393/files/1b253613..659e633a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=01-02 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/29393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29393/head:pull/29393 PR: https://git.openjdk.org/jdk/pull/29393 From jlu at openjdk.org Tue Jan 27 23:31:06 2026 From: jlu at openjdk.org (Justin Lu) Date: Tue, 27 Jan 2026 23:31:06 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v3] In-Reply-To: <2WlcRXVtwHGmCe0zREIeiY_RCmGPllR0but9viDs5l4=.f6ce512b-52bc-4ddb-ae43-065c658e64da@github.com> References: <2WlcRXVtwHGmCe0zREIeiY_RCmGPllR0but9viDs5l4=.f6ce512b-52bc-4ddb-ae43-065c658e64da@github.com> Message-ID: <3bnefQnx_P6NVSmfgP_yUs0EQUWUgAH776MX-u3Q160=.44bfc19d-7bea-4cf1-8188-3fb5a1b5d58e@github.com> On Tue, 27 Jan 2026 21:22:28 GMT, Naoto Sato wrote: >> This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. > > Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: > > Modified the test to cover all ISO formatters (sans *LOCAL*), not only the changed ones The impl and spec changes look consistent with [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051). I see that `ISO_INSTANT` does not need the fix due to [JDK-8365182](https://bugs.openjdk.org/browse/JDK-8365182). This change looks good to me. test/jdk/java/time/test/java/time/format/TestDateTimeFormatter.java line 361: > 359: @ParameterizedTest > 360: @MethodSource("data_iso_short_offset_parse") > 361: public void test_iso_short_offset_parse(String text, DateTimeFormatter formatter) { Even though this is primarily a parsing test, since we are already adding a test, I think it would not hurt to also check the "+/-00" shorthand offset cases. (Since the formatted text would take a different form than a non zero hour only offset.) ------------- Marked as reviewed by jlu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29393#pullrequestreview-3713862331 PR Review Comment: https://git.openjdk.org/jdk/pull/29393#discussion_r2734178642 From naoto at openjdk.org Tue Jan 27 23:56:36 2026 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 27 Jan 2026 23:56:36 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v4] In-Reply-To: References: Message-ID: > This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: +00/-00 offsets tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29393/files - new: https://git.openjdk.org/jdk/pull/29393/files/659e633a..257272e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29393&range=02-03 Stats: 40 lines in 1 file changed: 22 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/29393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29393/head:pull/29393 PR: https://git.openjdk.org/jdk/pull/29393 From jlu at openjdk.org Wed Jan 28 00:13:00 2026 From: jlu at openjdk.org (Justin Lu) Date: Wed, 28 Jan 2026 00:13:00 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v4] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 23:56:36 GMT, Naoto Sato wrote: >> This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. > > Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: > > +00/-00 offsets tests Marked as reviewed by jlu (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29393#pullrequestreview-3714023791 From chagedorn at openjdk.org Wed Jan 28 08:23:31 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jan 2026 08:23:31 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v3] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Tue, 27 Jan 2026 16:26:19 GMT, Damon Fenacci wrote: >> ## Issue >> >> This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. >> >> This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. >> >> ## Causes >> >> The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. >> >> ## Fix >> >> A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: >> https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 >> This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. >> >> # Testing >> >> * Tier 1-3+ >> * 2 JTReg tests added >> * `TestRangeCheck.java` as regression test for the reported issue >> * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8374852: fix star layout > > Co-authored-by: Christian Hagedorn Thanks for unifying the two opaque nodes! I have some more comments. src/hotspot/share/opto/macro.cpp line 2559: > 2557: #else > 2558: bool is_positive = n->as_OpaqueCheck()->is_positive(); > 2559: _igvn.replace_node(n, _igvn.intcon(is_positive?1:0)); Suggestion: _igvn.replace_node(n, _igvn.intcon(is_positive ? 1 : 0)); src/hotspot/share/opto/opaquenode.hpp line 146: > 144: // builds, we keep the actual checks as additional verification code (i.e. removing OpaqueCheckNodes and use the > 145: // BoolNode inputs instead). > 146: class OpaqueCheckNode : public Node { I've also thought about the name. `OpaqueCheck` is already a good indication what the node is about. Maybe we could go a step further and call it `OpaqueConstantBoolNode` to emphasize more that it is belonging to a `BoolNode` whose result we already know. What do you think? Then we could also think about changing `_positive` to `_constant` (still can be a boolean to just pass true and false which seems more intuitive then passing in 1 and 0). src/hotspot/share/opto/opaquenode.hpp line 148: > 146: class OpaqueCheckNode : public Node { > 147: private: > 148: bool _positive; Now that we define a field, we also need to override `size_of()` (see for example `OpaqueMultiversioningNode`). src/hotspot/share/opto/opaquenode.hpp line 150: > 148: bool _positive; > 149: public: > 150: OpaqueCheckNode(Compile* C, Node* tst, bool positive) : Node(nullptr, tst), _positive(positive) { `tst` is probably almost always a `BoolNode`. I'm wondering if it could also be a constant because we already folded the `BoolNode`? But then it's probably also useless to create the opaque node in the first place. src/hotspot/share/opto/opaquenode.hpp line 159: > 157: virtual const Type* Value(PhaseGVN* phase) const; > 158: virtual const Type* bottom_type() const { return TypeInt::BOOL; } > 159: bool is_positive() { return _positive; } When going with `_constant`, we could turn this into int constant() const { return _constant ? 1 : 0; } ------------- PR Review: https://git.openjdk.org/jdk/pull/29164#pullrequestreview-3715097474 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2735306919 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2735376625 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2735315675 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2735369034 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2735392835 From duke at openjdk.org Wed Jan 28 10:11:29 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 10:11:29 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v7] In-Reply-To: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> References: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> Message-ID: On Tue, 27 Jan 2026 11:33:21 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Replace "unpaired surrogates" with "isolated surrogate code units" > > https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 > https://www.unicode.org/charts/PDF/UDC00.pdf /home/runner/work/jdk/jdk/test/jdk/java/lang/Character/Supplementary.java:352: error: incompatible types: String cannot be converted to char[] int n = Character.codePointCount(str); Seeing this error message, users will think that Java is clunky and unhelpful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3810291676 From rriggs at openjdk.org Wed Jan 28 14:12:10 2026 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 28 Jan 2026 14:12:10 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v4] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 23:56:36 GMT, Naoto Sato wrote: >> This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. > > Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: > > +00/-00 offsets tests Looks good. ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29393#pullrequestreview-3716922549 From dfenacci at openjdk.org Wed Jan 28 16:10:53 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 28 Jan 2026 16:10:53 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v4] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: - JDK-8374582: fix indent - JDK-8374582: constant - JDK-8374582: add size_of - JDK-8374852: OpaqueCheck -> OpaqueConstantBool - JDK-8374852: fix number of OpaqueCheck nodes in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/0ef73ef9..a3690526 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=02-03 Stats: 162 lines in 16 files changed: 60 ins; 60 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From dfenacci at openjdk.org Wed Jan 28 16:10:57 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 28 Jan 2026 16:10:57 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v3] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Wed, 28 Jan 2026 08:13:23 GMT, Christian Hagedorn wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8374852: fix star layout >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/opto/opaquenode.hpp line 146: > >> 144: // builds, we keep the actual checks as additional verification code (i.e. removing OpaqueCheckNodes and use the >> 145: // BoolNode inputs instead). >> 146: class OpaqueCheckNode : public Node { > > I've also thought about the name. `OpaqueCheck` is already a good indication what the node is about. Maybe we could go a step further and call it `OpaqueConstantBoolNode` to emphasize more that it is belonging to a `BoolNode` whose result we already know. What do you think? > > Then we could also think about changing `_positive` to `_constant` (still can be a boolean to just pass true and false which seems more intuitive then passing in 1 and 0). I was still had a doubt about what to put first (`Constant` or `Bool`) but I think `ConstantBool` is actually more correct. I suppose `_constant` is better than `_value` since we use it already ? Done. > src/hotspot/share/opto/opaquenode.hpp line 148: > >> 146: class OpaqueCheckNode : public Node { >> 147: private: >> 148: bool _positive; > > Now that we define a field, we also need to override `size_of()` (see for example `OpaqueMultiversioningNode`). Good to know. Thanks! > src/hotspot/share/opto/opaquenode.hpp line 150: > >> 148: bool _positive; >> 149: public: >> 150: OpaqueCheckNode(Compile* C, Node* tst, bool positive) : Node(nullptr, tst), _positive(positive) { > > `tst` is probably almost always a `BoolNode`. I'm wondering if it could also be a constant because we already folded the `BoolNode`? But then it's probably also useless to create the opaque node in the first place. Hmmm... I find it hard to totally exclude a constant (e.g. if its inputs are constant...?). In that case we could skip all the opaque business (I guess in the few places where new `OpaqueConstantBool` nodes are created). On the other hand the opaque node should only really delay the folding... ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2737356877 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2737353309 PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2737355777 From dfenacci at openjdk.org Wed Jan 28 16:23:21 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 28 Jan 2026 16:23:21 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v5] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8374582: fix comment layout - JDK-8374582: fix constructor argument name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/a3690526..bddec5b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=03-04 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From naoto at openjdk.org Wed Jan 28 18:05:50 2026 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 28 Jan 2026 18:05:50 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v7] In-Reply-To: References: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> Message-ID: On Tue, 27 Jan 2026 13:53:37 GMT, Tatsunori Uchino wrote: > Do we need to remove or preserve `Character.codePointCount(char[])`? I think it's inconsistent to have codePointCount(char[]) and codePointCount(CharSequence, int, int) but not codePointCount(CharSequence). We may need to update the CSR. I would rather focus on addressing the use case, not pursuing the consistency. I would not introduce any methods on Character, and just have CharSequence.codePointCount() (and its implementations) for this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3812935919 From vyazici at openjdk.org Wed Jan 28 19:06:27 2026 From: vyazici at openjdk.org (Volkan Yazici) Date: Wed, 28 Jan 2026 19:06:27 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v5] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Wed, 28 Jan 2026 16:23:21 GMT, Damon Fenacci wrote: >> ## Issue >> >> This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. >> >> This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. >> >> ## Causes >> >> The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. >> >> ## Fix >> >> A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: >> https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 >> This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. >> >> # Testing >> >> * Tier 1-3+ >> * 2 JTReg tests added >> * `TestRangeCheck.java` as regression test for the reported issue >> * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - JDK-8374582: fix comment layout > - JDK-8374582: fix constructor argument name Copyright years don't point to 2026 for the following touched files: src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp src/hotspot/cpu/x86/macroAssembler_x86.cpp src/hotspot/share/classfile/vmIntrinsics.hpp src/hotspot/share/opto/classes.hpp src/hotspot/share/opto/escape.cpp src/hotspot/share/opto/library_call.cpp src/hotspot/share/opto/library_call.hpp src/hotspot/share/opto/loopTransform.cpp src/hotspot/share/opto/loopopts.cpp src/hotspot/share/opto/macro.cpp src/hotspot/share/opto/node.hpp src/hotspot/share/opto/opaquenode.cpp src/hotspot/share/opto/opaquenode.hpp src/hotspot/share/opto/split_if.cpp src/java.base/share/classes/java/lang/String.java src/java.base/share/classes/java/lang/StringCoding.java src/java.base/share/classes/java/lang/System.java src/java.base/share/classes/jdk/internal/access/JavaLangAccess.java src/java.base/share/classes/sun/nio/cs/CESU_8.java src/java.base/share/classes/sun/nio/cs/DoubleByte.java src/java.base/share/classes/sun/nio/cs/ISO_8859_1.java src/java.base/share/classes/sun/nio/cs/SingleByte.java src/java.base/share/classes/sun/nio/cs/US_ASCII.java src/java.base/share/classes/sun/nio/cs/UTF_8.java src/jdk.charsets/share/classes/sun/nio/cs/ext/EUC_JP.java.template test/hotspot/jtreg/compiler/escapeAnalysis/TestCanReduceCheckUsersDifferentIfs.java test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java test/hotspot/jtreg/compiler/unsafe/OpaqueAccesses.java ------------- PR Review: https://git.openjdk.org/jdk/pull/29164#pullrequestreview-3718529560 From liach at openjdk.org Wed Jan 28 20:19:40 2026 From: liach at openjdk.org (Chen Liang) Date: Wed, 28 Jan 2026 20:19:40 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v7] In-Reply-To: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> References: <3Mtp1GkFexGb5k5VDb8pUuSrrJmu2cZRaa9fZQTLMZI=.52182522-c1f5-40a1-8a2d-186b0c8ea587@github.com> Message-ID: On Tue, 27 Jan 2026 11:33:21 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Replace "unpaired surrogates" with "isolated surrogate code units" > > https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 > https://www.unicode.org/charts/PDF/UDC00.pdf I think we can remove `Character.codePointCount(char[])`: users can use `CharBuffer.wrap(char[]).codePointCount()` instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3813712436 From duke at openjdk.org Wed Jan 28 23:05:55 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:05:55 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v8] In-Reply-To: References: Message-ID: <_MAwK_d-zDhdmZ7N7BAdUfvFktIp4wqWCB5qndtLiKI=.8adf2f88-8d6e-47fe-bfc6-60de6d82b069@github.com> > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Fix JavaDoc of `AbstractStringBuilder::codePointCount` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/471b4308..a26398c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=06-07 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Wed Jan 28 23:15:49 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:15:49 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jan 2026 19:33:10 GMT, Chen Liang wrote: >> Tatsunori Uchino has updated the pull request incrementally with four additional commits since the last revision: >> >> - Update `@bug` in correct file >> - Add default implementation on codePointCount in CharSequence >> - Update `@bug` entries in test class doc comments >> - Discard changes on code whose form is not `str.codePointCount(0, str.length())` > > src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 539: > >> 537: * @return the number of Unicode code points in this String >> 538: * @since 26 >> 539: */ > > Suggestion: > > /** > * @since 27 > */ Why did you strip the JSDoc? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2738998166 From duke at openjdk.org Wed Jan 28 23:25:30 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:25:30 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v9] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Remove `Character.codePointCount()` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/a26398c9..c9719d4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=07-08 Stats: 26 lines in 3 files changed: 0 ins; 23 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Wed Jan 28 23:28:38 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:28:38 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v9] In-Reply-To: References: Message-ID: <4gihcQXdppCsDHfrXcO21Iu8PfRKO9LgkbWGvOQoxa8=.2e1ae2f8-1fbe-4e2f-a48e-273cdcd48204@github.com> On Wed, 28 Jan 2026 23:25:30 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Remove `Character.codePointCount()` I eliminated the _latest_ commit that has only change like trash. Sorry for the force-push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3814452281 From duke at openjdk.org Wed Jan 28 23:34:56 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:34:56 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v10] In-Reply-To: References: Message-ID: <0x9Cl2x8qYLGAs5_rMBy4NYd8NhmXMPBjiK0foy5E14=.54b69e8d-49a6-4036-bcfe-1764c3e48c34@github.com> > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Fix double empty lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/c9719d4e..4f80009e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Wed Jan 28 23:40:14 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:40:14 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v11] In-Reply-To: References: Message-ID: <0Ix-aiMznxsuePv7iSAqOId1aV2zTFOsW96OeD2D5cg=.fbd08430-b244-4170-8736-517ce84e08d4@github.com> > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Update year in copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/4f80009e..2cf8f448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=09-10 Stats: 7 lines in 7 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Wed Jan 28 23:58:55 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Wed, 28 Jan 2026 23:58:55 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v11] In-Reply-To: <0Ix-aiMznxsuePv7iSAqOId1aV2zTFOsW96OeD2D5cg=.fbd08430-b244-4170-8736-517ce84e08d4@github.com> References: <0Ix-aiMznxsuePv7iSAqOId1aV2zTFOsW96OeD2D5cg=.fbd08430-b244-4170-8736-517ce84e08d4@github.com> Message-ID: On Wed, 28 Jan 2026 23:40:14 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Update year in copyright "Create sysroot" fails: s390x: Run sudo debootstrap --no-merged-usr --arch=s390x --verbose --include=fakeroot,symlinks,build-essential,libx11-dev,libxext-dev,libxrender-dev,libxrandr-dev,libxtst-dev,libxt-dev,libcups2-dev,libfontconfig1-dev,libasound2-dev,libfreetype-dev,libpng-dev --resolve-deps --variant=minbase bullseye sysroot https://httpredir.debian.org/debian/ W: Cannot check Release signature; keyring file not available /usr/share/keyrings/debian-archive-keyring.gpg I: Retrieving InRelease E: Invalid Release file, no entry for main/binary-s390x/Packages Error: Process completed with exit code 1. ppc64le: Run sudo debootstrap --no-merged-usr --arch=ppc64el --verbose --include=fakeroot,symlinks,build-essential,libx11-dev,libxext-dev,libxrender-dev,libxrandr-dev,libxtst-dev,libxt-dev,libcups2-dev,libfontconfig1-dev,libasound2-dev,libfreetype-dev,libpng-dev --resolve-deps --variant=minbase bullseye sysroot https://httpredir.debian.org/debian/ W: Cannot check Release signature; keyring file not available /usr/share/keyrings/debian-archive-keyring.gpg I: Retrieving InRelease E: Invalid Release file, no entry for main/binary-ppc64el/Packages Error: Process completed with exit code 1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3814551416 From liach at openjdk.org Thu Jan 29 00:54:11 2026 From: liach at openjdk.org (Chen Liang) Date: Thu, 29 Jan 2026 00:54:11 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jan 2026 23:13:04 GMT, Tatsunori Uchino wrote: >> src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 539: >> >>> 537: * @return the number of Unicode code points in this String >>> 538: * @since 26 >>> 539: */ >> >> Suggestion: >> >> /** >> * @since 27 >> */ > > Why did you strip the JSDoc? By default the docs will be inherited from `CharSequence` except the `@throws` tags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2739264010 From duke at openjdk.org Thu Jan 29 03:43:24 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Thu, 29 Jan 2026 03:43:24 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v3] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 00:51:21 GMT, Chen Liang wrote: >> Why did you strip the JSDoc? > > By default the docs will be inherited from `CharSequence` except the `@throws` tags. I see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2739776799 From duke at openjdk.org Thu Jan 29 03:51:15 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Thu, 29 Jan 2026 03:51:15 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v12] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Don't use removed `Character::codePointCount` overload ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/2cf8f448..5df93771 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=10-11 Stats: 6 lines in 2 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From dfenacci at openjdk.org Thu Jan 29 07:32:12 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 29 Jan 2026 07:32:12 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v6] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 31 additional commits since the last revision: - JDK-8374582: merge and update copyright year - Merge branch 'master' into JDK-8374582 - Merge branch 'master' into JDK-8374582 - JDK-8374582: fix comment layout - JDK-8374582: fix constructor argument name - JDK-8374582: fix indent - JDK-8374582: constant - JDK-8374582: add size_of - JDK-8374852: OpaqueCheck -> OpaqueConstantBool - JDK-8374852: fix number of OpaqueCheck nodes in test - ... and 21 more: https://git.openjdk.org/jdk/compare/7e545381...a587a269 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/bddec5b5..a587a269 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=04-05 Stats: 61562 lines in 1281 files changed: 34520 ins; 12116 del; 14926 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From dfenacci at openjdk.org Thu Jan 29 07:32:12 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 29 Jan 2026 07:32:12 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v5] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Wed, 28 Jan 2026 19:03:23 GMT, Volkan Yazici wrote: > Copyright years don't point to 2026 for the following touched files: Right! Fixed. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29164#issuecomment-3815950950 From duke at openjdk.org Thu Jan 29 10:16:18 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Thu, 29 Jan 2026 10:16:18 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v13] In-Reply-To: References: Message-ID: <8LAUF2YRcQ2lzYfhuAf9CN91Yw02rwx89dh3aNSEw-Y=.a17a103e-82ec-430a-a0e8-008f352fa9c5@github.com> > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Fix comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/5df93771..8835ab3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=11-12 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Thu Jan 29 10:47:55 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Thu, 29 Jan 2026 10:47:55 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Merge remote-tracking branch 'origin/master' into codepoint-count - Fix comments - Don't use removed `Character::codePointCount` overload - Update year in copyright - Fix double empty lines - Remove `Character.codePointCount()` - Replace "unpaired surrogates" with "isolated surrogate code units" https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 https://www.unicode.org/charts/PDF/UDC00.pdf - Remove `Character.codePointCount` overload - Rename parameter names from `a` to `seq` `chars` is too confusing with `char` - Improve JavaDoc Co-authored-by: Chen Liang - ... and 10 more: https://git.openjdk.org/jdk/compare/681e4ec8...198b3188 ------------- Changes: https://git.openjdk.org/jdk/pull/26461/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=13 Stats: 80 lines in 7 files changed: 67 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From dfenacci at openjdk.org Thu Jan 29 12:42:44 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 29 Jan 2026 12:42:44 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v7] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8374582: add flagless to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/a587a269..083d5698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=05-06 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From liach at openjdk.org Thu Jan 29 22:31:04 2026 From: liach at openjdk.org (Chen Liang) Date: Thu, 29 Jan 2026 22:31:04 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 10:47:55 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge remote-tracking branch 'origin/master' into codepoint-count > - Fix comments > - Don't use removed `Character::codePointCount` overload > - Update year in copyright > - Fix double empty lines > - Remove `Character.codePointCount()` > - Replace "unpaired surrogates" with "isolated surrogate code units" > > https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 > https://www.unicode.org/charts/PDF/UDC00.pdf > - Remove `Character.codePointCount` overload > - Rename parameter names from `a` to `seq` > > `chars` is too confusing with `char` > - Improve JavaDoc > > Co-authored-by: Chen Liang > - ... and 10 more: https://git.openjdk.org/jdk/compare/681e4ec8...198b3188 Please fix this identified thread safety issue. Submitted your patch to our CI for testing. I built the Javadoc locally, the since-only tag works like this so they are fine: image src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 542: > 540: return count; > 541: } > 542: return StringUTF16.codePointCount(value, 0, count); Suggestion: return StringUTF16.codePointCountSB(value, 0, count); In buggy user program that use StringBuilder from more than one threads, we can have `value.length < count`, so we must perform this call checked. I think we need a new entry for this method in `test/jdk/java/lang/StringBuilder/StressSBTest.java` too. ------------- Changes requested by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26461#pullrequestreview-3725332918 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2743790652 From liach at openjdk.org Thu Jan 29 23:20:12 2026 From: liach at openjdk.org (Chen Liang) Date: Thu, 29 Jan 2026 23:20:12 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 22:21:41 GMT, Chen Liang wrote: >> Tatsunori Uchino has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: >> >> - Merge remote-tracking branch 'origin/master' into codepoint-count >> - Fix comments >> - Don't use removed `Character::codePointCount` overload >> - Update year in copyright >> - Fix double empty lines >> - Remove `Character.codePointCount()` >> - Replace "unpaired surrogates" with "isolated surrogate code units" >> >> https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 >> https://www.unicode.org/charts/PDF/UDC00.pdf >> - Remove `Character.codePointCount` overload >> - Rename parameter names from `a` to `seq` >> >> `chars` is too confusing with `char` >> - Improve JavaDoc >> >> Co-authored-by: Chen Liang >> - ... and 10 more: https://git.openjdk.org/jdk/compare/681e4ec8...198b3188 > > src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 542: > >> 540: return count; >> 541: } >> 542: return StringUTF16.codePointCount(value, 0, count); > > Suggestion: > > return StringUTF16.codePointCountSB(value, 0, count); > > In buggy user program that use StringBuilder from more than one threads, we can have `value.length < count`, so we must perform this call checked. > > I think we need a new entry for this method in `test/jdk/java/lang/StringBuilder/StressSBTest.java` too. This should be sufficient for StressSBTest: diff --git a/test/jdk/java/lang/StringBuilder/StressSBTest.java b/test/jdk/java/lang/StringBuilder/StressSBTest.java index a5dc6672f07..6219851ee3b 100644 --- a/test/jdk/java/lang/StringBuilder/StressSBTest.java +++ b/test/jdk/java/lang/StringBuilder/StressSBTest.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2025, 2026, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -279,6 +279,7 @@ private boolean invokeByMethodNameAndType(String name, MethodType mt, StringBuil case "charAt(StringBuilder,int)char" -> sb.charAt(5); case "codePointAt(StringBuilder,int)int" -> sb.codePointAt(4); case "codePointBefore(StringBuilder,int)int" -> sb.codePointBefore(3); + case "codePointCount(StringBuilder)int" -> sb.codePointCount(); case "codePointCount(StringBuilder,int,int)int" -> sb.codePointCount(3, 9); case "offsetByCodePoints(StringBuilder,int,int)int" -> sb.offsetByCodePoints(3, 7); case "lastIndexOf(StringBuilder,String,int)int" -> sb.lastIndexOf("A", 45); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2743920523 From duke at openjdk.org Thu Jan 29 23:20:13 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Thu, 29 Jan 2026 23:20:13 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: <7ewFBUTLHTBhkQscDm-0kI_oZzZhBQHsNwWgyheRXf4=.41edf5e5-ae20-4d97-ab59-11bf3194d637@github.com> On Thu, 29 Jan 2026 23:14:28 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 542: >> >>> 540: return count; >>> 541: } >>> 542: return StringUTF16.codePointCount(value, 0, count); >> >> Suggestion: >> >> return StringUTF16.codePointCountSB(value, 0, count); >> >> In buggy user program that use StringBuilder from more than one threads, we can have `value.length < count`, so we must perform this call checked. >> >> I think we need a new entry for this method in `test/jdk/java/lang/StringBuilder/StressSBTest.java` too. > > This should be sufficient for StressSBTest: > > diff --git a/test/jdk/java/lang/StringBuilder/StressSBTest.java b/test/jdk/java/lang/StringBuilder/StressSBTest.java > index a5dc6672f07..6219851ee3b 100644 > --- a/test/jdk/java/lang/StringBuilder/StressSBTest.java > +++ b/test/jdk/java/lang/StringBuilder/StressSBTest.java > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > + * Copyright (c) 2025, 2026, Oracle and/or its affiliates. All rights reserved. > * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > * > * This code is free software; you can redistribute it and/or modify it > @@ -279,6 +279,7 @@ private boolean invokeByMethodNameAndType(String name, MethodType mt, StringBuil > case "charAt(StringBuilder,int)char" -> sb.charAt(5); > case "codePointAt(StringBuilder,int)int" -> sb.codePointAt(4); > case "codePointBefore(StringBuilder,int)int" -> sb.codePointBefore(3); > + case "codePointCount(StringBuilder)int" -> sb.codePointCount(); > case "codePointCount(StringBuilder,int,int)int" -> sb.codePointCount(3, 9); > case "offsetByCodePoints(StringBuilder,int,int)int" -> sb.offsetByCodePoints(3, 7); > case "lastIndexOf(StringBuilder,String,int)int" -> sb.lastIndexOf("A", 45); Do we not need other additional test cases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2743926620 From duke at openjdk.org Thu Jan 29 23:20:13 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Thu, 29 Jan 2026 23:20:13 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: <7ewFBUTLHTBhkQscDm-0kI_oZzZhBQHsNwWgyheRXf4=.41edf5e5-ae20-4d97-ab59-11bf3194d637@github.com> References: <7ewFBUTLHTBhkQscDm-0kI_oZzZhBQHsNwWgyheRXf4=.41edf5e5-ae20-4d97-ab59-11bf3194d637@github.com> Message-ID: On Thu, 29 Jan 2026 23:17:07 GMT, Tatsunori Uchino wrote: >> This should be sufficient for StressSBTest: >> >> diff --git a/test/jdk/java/lang/StringBuilder/StressSBTest.java b/test/jdk/java/lang/StringBuilder/StressSBTest.java >> index a5dc6672f07..6219851ee3b 100644 >> --- a/test/jdk/java/lang/StringBuilder/StressSBTest.java >> +++ b/test/jdk/java/lang/StringBuilder/StressSBTest.java >> @@ -1,5 +1,5 @@ >> /* >> - * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. >> + * Copyright (c) 2025, 2026, Oracle and/or its affiliates. All rights reserved. >> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> * >> * This code is free software; you can redistribute it and/or modify it >> @@ -279,6 +279,7 @@ private boolean invokeByMethodNameAndType(String name, MethodType mt, StringBuil >> case "charAt(StringBuilder,int)char" -> sb.charAt(5); >> case "codePointAt(StringBuilder,int)int" -> sb.codePointAt(4); >> case "codePointBefore(StringBuilder,int)int" -> sb.codePointBefore(3); >> + case "codePointCount(StringBuilder)int" -> sb.codePointCount(); >> case "codePointCount(StringBuilder,int,int)int" -> sb.codePointCount(3, 9); >> case "offsetByCodePoints(StringBuilder,int,int)int" -> sb.offsetByCodePoints(3, 7); >> case "lastIndexOf(StringBuilder,String,int)int" -> sb.lastIndexOf("A", 45); > > Do we not need other additional test cases? Do we not need other additional test cases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2743926750 From naoto at openjdk.org Thu Jan 29 23:32:57 2026 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 29 Jan 2026 23:32:57 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 10:47:55 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge remote-tracking branch 'origin/master' into codepoint-count > - Fix comments > - Don't use removed `Character::codePointCount` overload > - Update year in copyright > - Fix double empty lines > - Remove `Character.codePointCount()` > - Replace "unpaired surrogates" with "isolated surrogate code units" > > https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-3/#G1654 > https://www.unicode.org/charts/PDF/UDC00.pdf > - Remove `Character.codePointCount` overload > - Rename parameter names from `a` to `seq` > > `chars` is too confusing with `char` > - Improve JavaDoc > > Co-authored-by: Chen Liang > - ... and 10 more: https://git.openjdk.org/jdk/compare/681e4ec8...198b3188 src/java.base/share/classes/java/lang/String.java line 1886: > 1884: * {@return the number of Unicode code points in this String} > 1885: * Isolated surrogate code units count as one code point each. > 1886: * I think this can be deleted as well. test/jdk/java/lang/StringBuilder/Supplementary.java line 218: > 216: > 217: /** > 218: * Test codePointCount(int, int) & codePointCount() I think a separate test method needs to be added solely for `codePointCount()` to follow the pattern in this test file. Bonus if you could rename each `testX` to `testMethodName` (This test seems to have a lot of room for refactoring, but they are for another time) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2743905940 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2743956162 From chagedorn at openjdk.org Fri Jan 30 10:47:17 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 Jan 2026 10:47:17 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v7] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Thu, 29 Jan 2026 12:42:44 GMT, Damon Fenacci wrote: >> ## Issue >> >> This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. >> >> This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. >> >> ## Causes >> >> The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. >> >> ## Fix >> >> A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: >> https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 >> This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. >> >> # Testing >> >> * Tier 1-3+ >> * 2 JTReg tests added >> * `TestRangeCheck.java` as regression test for the reported issue >> * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8374582: add flagless to test src/hotspot/share/opto/library_call.cpp line 894: > 892: > 893: inline Node* LibraryCallKit::generate_negative_guard(Node* index, RegionNode* region, > 894: Node* *pos_index, bool is_opaque) { Suggestion: Node** pos_index, bool is_opaque) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2745638291 From chagedorn at openjdk.org Fri Jan 30 10:47:19 2026 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 Jan 2026 10:47:19 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v3] In-Reply-To: References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: On Wed, 28 Jan 2026 16:02:56 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/opaquenode.hpp line 150: >> >>> 148: bool _positive; >>> 149: public: >>> 150: OpaqueCheckNode(Compile* C, Node* tst, bool positive) : Node(nullptr, tst), _positive(positive) { >> >> `tst` is probably almost always a `BoolNode`. I'm wondering if it could also be a constant because we already folded the `BoolNode`? But then it's probably also useless to create the opaque node in the first place. > > Hmmm... I find it hard to totally exclude a constant (e.g. if its inputs are constant...?). In that case we could skip all the opaque business (I guess in the few places where new `OpaqueConstantBool` nodes are created). On the other hand the opaque node should only really delay the folding... ? I think folding is fine since we implement `Value()` to take the input's `Value()`. My understanding is that we insert an additional check that is actually not needed because we already checked it in Java code. So, it should be true at that point but C2 does not know that. We still insert the check in order to make sure to also fold control away if data is dying. Once we know that data will not die anymore, we can remove the useless check again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29164#discussion_r2745545348 From dfenacci at openjdk.org Fri Jan 30 13:56:01 2026 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 30 Jan 2026 13:56:01 GMT Subject: RFR: 8374582: [REDO] Move input validation checks to Java for java.lang.StringCoding intrinsics [v8] In-Reply-To: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> References: <3ci9RXEra2BlQPhYl-M0Wnu3hRpWaDvxPnMRzFnJA_k=.67795fb3-95d1-449b-a7a9-44b3776aa626@github.com> Message-ID: > ## Issue > > This is a redo of [JDK-8361842](https://bugs.openjdk.org/browse/JDK-8361842) which was backed out by [JDK-8374210](https://bugs.openjdk.org/browse/JDK-8374210) due to C2-related regressions. The original change moved input validation checks for java.lang.StringCoding from the intrinsic to Java code (leaving the intrinsic check only with the `VerifyIntrinsicChecks` flag). Refer to the [original PR](https://github.com/openjdk/jdk/pull/25998) for details. > > This additional issue happens because, in some cases, for instance when the Java checking code is not inlined and we give an out-of-range constant as input, we fold the data path but not the control path and we crash in the backend. > > ## Causes > > The cause of this is that the out-of-range constant (e.g. -1) floats into the intrinsic and there (assuming the input is valid) we add a constraint to its type to positive integers (e.g. to compute the array address) which makes it top. > > ## Fix > > A possible fix is to introduce an opaque node (OpaqueGuardNode) similar to what we do in `must_be_not_null` for values that we know cannot be null: > https://github.com/openjdk/jdk/blob/ce721665cd61d9a319c667d50d9917c359d6c104/src/hotspot/share/opto/graphKit.cpp#L1484 > This will temporarily add the range check to ensure that C2 figures that out-of-range values cannot reach the intrinsic. Then, during macro expansion, we replace the opaque node with the corresponding constant (true/false) in product builds such that the actually unneeded guards are folded and do not end up in the emitted code. > > # Testing > > * Tier 1-3+ > * 2 JTReg tests added > * `TestRangeCheck.java` as regression test for the reported issue > * `TestOpaqueGuardNodes.java` to check that opaque guard nodes are added when parsing and removed at macro expansion Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8374582: fix star layout Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29164/files - new: https://git.openjdk.org/jdk/pull/29164/files/083d5698..c5390e4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29164&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29164.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29164/head:pull/29164 PR: https://git.openjdk.org/jdk/pull/29164 From cushon at openjdk.org Fri Jan 30 14:32:19 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 30 Jan 2026 14:32:19 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v16] In-Reply-To: References: Message-ID: <3tyFN2PROU4kkRO5MICX8E01SajaBatOT7vK4Bv-h7Q=.bff53067-5443-4c3b-a1d4-e7708b9d8af1@github.com> > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Update javadoc to refer to 'this {@code String}', not 'the given String' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/51bf1510..6acd1191 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From duke at openjdk.org Fri Jan 30 15:20:03 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Fri, 30 Jan 2026 15:20:03 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v15] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Use `codePointCountSB` and add its test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/198b3188..585ce36e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=13-14 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Fri Jan 30 15:24:54 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Fri, 30 Jan 2026 15:24:54 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: On Thu, 29 Jan 2026 23:29:40 GMT, Naoto Sato wrote: > Bonus if you could rename each testX to testMethodName "`testCodePointCount`" has already been occupied: https://github.com/tats-u/jdk/blob/585ce36e125ec2f79483311512d8789ff39b0df9/test/jdk/java/lang/StringBuilder/Supplementary.java#L365-L376 https://github.com/tats-u/jdk/blob/585ce36e125ec2f79483311512d8789ff39b0df9/test/jdk/java/lang/StringBuilder/Supplementary.java#L247-L251 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2746778790 From liach at openjdk.org Fri Jan 30 15:40:36 2026 From: liach at openjdk.org (Chen Liang) Date: Fri, 30 Jan 2026 15:40:36 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v15] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 15:20:03 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Use `codePointCountSB` and add its test Since we are touching `CharSequence`, we might need to revise `CharBuffer` (through `X-Buffer.java.template`) to specify `codePointCount` is a relative operation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26461#issuecomment-3824360182 From cushon at openjdk.org Fri Jan 30 15:56:20 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 30 Jan 2026 15:56:20 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17] In-Reply-To: References: Message-ID: > This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. > > --- > > > Benchmark (encoding) (stringLength) Mode Cnt Score Error Units > StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s > StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s > StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s > StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s > StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s > StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s > StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s > StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s > StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ? 3839233.582 ops/s > StringLoopJmhBenchmark.ge... Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: Rename getBytesLength to getByteLength ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28454/files - new: https://git.openjdk.org/jdk/pull/28454/files/6acd1191..f23b5c24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28454&range=15-16 Stats: 15 lines in 5 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28454/head:pull/28454 PR: https://git.openjdk.org/jdk/pull/28454 From cushon at openjdk.org Fri Jan 30 15:56:21 2026 From: cushon at openjdk.org (Liam Miller-Cushon) Date: Fri, 30 Jan 2026 15:56:21 GMT Subject: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v16] In-Reply-To: <3tyFN2PROU4kkRO5MICX8E01SajaBatOT7vK4Bv-h7Q=.bff53067-5443-4c3b-a1d4-e7708b9d8af1@github.com> References: <3tyFN2PROU4kkRO5MICX8E01SajaBatOT7vK4Bv-h7Q=.bff53067-5443-4c3b-a1d4-e7708b9d8af1@github.com> Message-ID: On Fri, 30 Jan 2026 14:32:19 GMT, Liam Miller-Cushon wrote: >> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ? 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ? 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ? 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ? 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ? 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ? 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ? 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ? 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ? 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ? 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ? 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ? 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ? 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ? 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ? 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ? 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ? 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ? 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Update javadoc to refer to 'this {@code String}', not 'the given String' I have made some updates including to the CSR [JDK-8375318](https://bugs.openjdk.org/browse/JDK-8375318) * a Javadoc oversight: "the given String" was updated to "this {@code String}" (the string being operated on is the current instance, not an input to the method). * the method name: I now prefer the name `getByteLength`, which was raised by @jddarcy in the CSR. I think `getBytesLength` is also OK. I updated the CSR to discuss some pros and cons * whether the method should return `int` or `long`. I think that `int` (as originally proposed) is better, but have updated the CSR to discuss that alternative in more detail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3824421699 From naoto at openjdk.org Fri Jan 30 16:09:57 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 30 Jan 2026 16:09:57 GMT Subject: RFR: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets [v4] In-Reply-To: References: Message-ID: On Tue, 27 Jan 2026 23:56:36 GMT, Naoto Sato wrote: >> This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. > > Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: > > +00/-00 offsets tests Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/29393#issuecomment-3824486290 From naoto at openjdk.org Fri Jan 30 16:13:38 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 30 Jan 2026 16:13:38 GMT Subject: Integrated: 8210336: DateTimeFormatter predefined formatters should support short time zone offsets In-Reply-To: References: Message-ID: On Fri, 23 Jan 2026 20:34:24 GMT, Naoto Sato wrote: > This PR is a follow-on fix to [JDK-8032051](https://bugs.openjdk.org/browse/JDK-8032051), where it allowed short offsets only for ZonedDateTime parsing. This fix allows all predefined ISO formatters to accept short offsets. A corresponding CSR has been drafted. This pull request has now been integrated. Changeset: c1c543cc Author: Naoto Sato URL: https://git.openjdk.org/jdk/commit/c1c543cc81b4b73ebf228fb817227309b0cff990 Stats: 80 lines in 3 files changed: 76 ins; 1 del; 3 mod 8210336: DateTimeFormatter predefined formatters should support short time zone offsets Reviewed-by: jlu, rriggs ------------- PR: https://git.openjdk.org/jdk/pull/29393 From naoto at openjdk.org Fri Jan 30 16:23:15 2026 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 30 Jan 2026 16:23:15 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 15:21:39 GMT, Tatsunori Uchino wrote: >> test/jdk/java/lang/StringBuilder/Supplementary.java line 218: >> >>> 216: >>> 217: /** >>> 218: * Test codePointCount(int, int) & codePointCount() >> >> I think a separate test method needs to be added solely for `codePointCount()` to follow the pattern in this test file. Bonus if you could rename each `testX` to `testMethodName` >> (This test seems to have a lot of room for refactoring, but they are for another time) > >> Bonus if you could rename each testX to testMethodName > > "`testCodePointCount`" has already been occupied: > > https://github.com/tats-u/jdk/blob/585ce36e125ec2f79483311512d8789ff39b0df9/test/jdk/java/lang/StringBuilder/Supplementary.java#L365-L376 > > https://github.com/tats-u/jdk/blob/585ce36e125ec2f79483311512d8789ff39b0df9/test/jdk/java/lang/StringBuilder/Supplementary.java#L247-L251 We could distinguish them like `testCodePointCountNoArg` and `testCodePointCountTwoArgs` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2747019135 From duke at openjdk.org Sat Jan 31 08:15:55 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 08:15:55 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v16] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Add `codePointCount` for `CharBuffer` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/585ce36e..a291e1bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=14-15 Stats: 19 lines in 1 file changed: 19 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From alanb at openjdk.org Sat Jan 31 08:51:10 2026 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 31 Jan 2026 08:51:10 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v16] In-Reply-To: References: Message-ID: On Sat, 31 Jan 2026 08:15:55 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Add `codePointCount` for `CharBuffer` src/java.base/share/classes/java/nio/X-Buffer.java.template line 2062: > 2060: /** > 2061: * {@return the number of Unicode code points in this character sequence} > 2062: * Isolated surrogate code units count as one code point each. I agree the override needs to be specified but it will need to be specified to count the code points in the between the position (inclusive) and the limit (exclusive). src/java.base/share/classes/java/nio/X-Buffer.java.template line 2070: > 2068: int lim = limit(); > 2069: int count = l; > 2070: for (int i = position(); i < lim;) { There will need to a robustness pass done on this override as the CharBuffer may be backed by off-heap memory. Look at the existing overides to see examples where it captures the limit and position once. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2749219147 PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2749219908 From duke at openjdk.org Sat Jan 31 12:51:41 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 12:51:41 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v17] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Fix incorrect logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/a291e1bb..77583bdc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=15-16 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Sat Jan 31 12:51:43 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 12:51:43 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v16] In-Reply-To: References: Message-ID: On Sat, 31 Jan 2026 08:48:09 GMT, Alan Bateman wrote: >> Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: >> >> Add `codePointCount` for `CharBuffer` > > src/java.base/share/classes/java/nio/X-Buffer.java.template line 2070: > >> 2068: int lim = limit(); >> 2069: int count = l; >> 2070: for (int i = position(); i < lim;) { > > There will need a robustness pass done on this override as the CharBuffer may be backed by off-heap memory. Look at the existing overrides to see examples where it captures the limit and position once. In the first place the logic turned out to be wrong. Could you give me more concrete comment based on the new change? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2749529891 From duke at openjdk.org Sat Jan 31 13:34:49 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 13:34:49 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v18] In-Reply-To: References: Message-ID: <56CnxfQ7B7MlL4-tu2Cs-J1gmwAc0OGKM72ApDqIRmk=.72d806d0-4b0c-44c5-b310-b628c16b24fd@github.com> > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with three additional commits since the last revision: - Split testcases for `StringBuilder.codePointCount` - Update year - Improve JavaDoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/77583bdc..cabb3efe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=16-17 Stats: 31 lines in 3 files changed: 19 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From duke at openjdk.org Sat Jan 31 13:34:52 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 13:34:52 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v16] In-Reply-To: References: Message-ID: On Sat, 31 Jan 2026 08:46:52 GMT, Alan Bateman wrote: >> Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: >> >> Add `codePointCount` for `CharBuffer` > > src/java.base/share/classes/java/nio/X-Buffer.java.template line 2062: > >> 2060: /** >> 2061: * {@return the number of Unicode code points in this character sequence} >> 2062: * Isolated surrogate code units count as one code point each. > > I agree the override needs to be specified but it will need to be specified to count the code points between the position (inclusive) and the limit (exclusive). JavaDoc updated ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2749563799 From duke at openjdk.org Sat Jan 31 13:34:54 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 13:34:54 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v14] In-Reply-To: References: Message-ID: On Fri, 30 Jan 2026 16:20:13 GMT, Naoto Sato wrote: >>> Bonus if you could rename each testX to testMethodName >> >> "`testCodePointCount`" has already been occupied: >> >> https://github.com/tats-u/jdk/blob/585ce36e125ec2f79483311512d8789ff39b0df9/test/jdk/java/lang/StringBuilder/Supplementary.java#L365-L376 >> >> https://github.com/tats-u/jdk/blob/585ce36e125ec2f79483311512d8789ff39b0df9/test/jdk/java/lang/StringBuilder/Supplementary.java#L247-L251 > > We could distinguish them like `testCodePointCountNoArg` and `testCodePointCountTwoArgs` I see ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2749563375 From duke at openjdk.org Sat Jan 31 14:31:39 2026 From: duke at openjdk.org (Tatsunori Uchino) Date: Sat, 31 Jan 2026 14:31:39 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v19] In-Reply-To: References: Message-ID: > Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. > > > if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { > throw new Exception("exceeding length"); > } > > > Is a CSR required to this change? Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: Add missing semicolon ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26461/files - new: https://git.openjdk.org/jdk/pull/26461/files/cabb3efe..81245159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26461&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/26461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26461/head:pull/26461 PR: https://git.openjdk.org/jdk/pull/26461 From alanb at openjdk.org Sat Jan 31 15:59:07 2026 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 31 Jan 2026 15:59:07 GMT Subject: RFR: 8364007: Add no-argument codePointCount method to CharSequence and String [v19] In-Reply-To: References: Message-ID: On Sat, 31 Jan 2026 14:31:39 GMT, Tatsunori Uchino wrote: >> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks. >> >> >> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) { >> throw new Exception("exceeding length"); >> } >> >> >> Is a CSR required to this change? > > Tatsunori Uchino has updated the pull request incrementally with one additional commit since the last revision: > > Add missing semicolon src/java.base/share/classes/java/nio/X-Buffer.java.template line 2061: > 2059: > 2060: /** > 2061: * {@return the number of Unicode code points in this character sequence Can you change the first sentence to : "{@return the number of Unicode code points in this character buffer}". The rest can go into a second paragraph that starts with "The number of Unicode code points in this character buffer is the number of Unicode code points between the position (inclusive) and the limit (exclusive).". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2749697267