From naoto at openjdk.org Tue Apr 1 16:26:22 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 1 Apr 2025 16:26:22 GMT Subject: Integrated: 8353118: Deprecate the use of `java.locale.useOldISOCodes` system property In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 20:17:30 GMT, Naoto Sato wrote: > Proposing to remove the `java.locale.useOldISOCodes` system property. This property is for backward compatibility introduced back in JDK17 and I believe it is now fine to remove it. In this PR targeting JDK25, it emits a deprecate-for-removal warning on startup if the system property is set to true (no behavioral change except the warning). The plan is eventually to remove it after JDK25. A corresponding CSR has been drafted. This pull request has now been integrated. Changeset: 564066d5 Author: Naoto Sato URL: https://git.openjdk.org/jdk/commit/564066d549cf4ec7608f57ea4910b5813f7353c3 Stats: 23 lines in 3 files changed: 11 ins; 1 del; 11 mod 8353118: Deprecate the use of `java.locale.useOldISOCodes` system property Reviewed-by: iris, jlu ------------- PR: https://git.openjdk.org/jdk/pull/24302 From naoto at openjdk.org Tue Apr 1 16:26:22 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 1 Apr 2025 16:26:22 GMT Subject: RFR: 8353118: Deprecate the use of `java.locale.useOldISOCodes` system property In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 20:17:30 GMT, Naoto Sato wrote: > Proposing to remove the `java.locale.useOldISOCodes` system property. This property is for backward compatibility introduced back in JDK17 and I believe it is now fine to remove it. In this PR targeting JDK25, it emits a deprecate-for-removal warning on startup if the system property is set to true (no behavioral change except the warning). The plan is eventually to remove it after JDK25. A corresponding CSR has been drafted. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24302#issuecomment-2769911047 From jlu at openjdk.org Tue Apr 1 16:52:19 2025 From: jlu at openjdk.org (Justin Lu) Date: Tue, 1 Apr 2025 16:52:19 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate Message-ID: Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. ------------- Commit messages: - call 2-arg parse in example - init Changes: https://git.openjdk.org/jdk/pull/24361/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353322 Stats: 20 lines in 1 file changed: 17 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24361/head:pull/24361 PR: https://git.openjdk.org/jdk/pull/24361 From naoto at openjdk.org Tue Apr 1 18:22:15 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 1 Apr 2025 18:22:15 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:45:26 GMT, Justin Lu wrote: > Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. src/java.base/share/classes/java/text/ChoiceFormat.java line 571: > 569: * {@snippet lang=java : > 570: * var fmt = new ChoiceFormat("0#foo|1#bar|2#baz"); > 571: * fmt.parse("baz", new ParsePosition(0)); // returns 2 This returns `2.0`? src/java.base/share/classes/java/text/ChoiceFormat.java line 576: > 574: * > 575: * @implNote The {@code Number} subtype returned by the JDK reference > 576: * implementation of this method is always {@code Double}. Do we need to use `@implNote` here? Since choices are `double`s (as in the class description), I think we can safely say this returns a `Double` as in normative text. If some implementation returns an `Integer`, I think it is a bug. Returning a `Double.NaN` for no-match may be considered implNote though (one might throw an exception). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2023404649 PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2023462230 From jlu at openjdk.org Tue Apr 1 19:04:25 2025 From: jlu at openjdk.org (Justin Lu) Date: Tue, 1 Apr 2025 19:04:25 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v2] In-Reply-To: References: Message-ID: > Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: reflect Naoto's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24361/files - new: https://git.openjdk.org/jdk/pull/24361/files/faaa9b9c..24d57bb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=00-01 Stats: 10 lines in 1 file changed: 0 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24361/head:pull/24361 PR: https://git.openjdk.org/jdk/pull/24361 From jlu at openjdk.org Tue Apr 1 19:04:25 2025 From: jlu at openjdk.org (Justin Lu) Date: Tue, 1 Apr 2025 19:04:25 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 18:17:12 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> reflect Naoto's review > > src/java.base/share/classes/java/text/ChoiceFormat.java line 576: > >> 574: * >> 575: * @implNote The {@code Number} subtype returned by the JDK reference >> 576: * implementation of this method is always {@code Double}. > > Do we need to use `@implNote` here? Since choices are `double`s (as in the class description), I think we can safely say this returns a `Double` as in normative text. If some implementation returns an `Integer`, I think it is a bug. Returning a `Double.NaN` for no-match may be considered implNote though (one might throw an exception). I was either way on the `implNote`, since I thought an implementation could decide to normalize a double limit to an integral type. However that's probably unlikely and I agree the wording can be fine as normative since ChoiceFormat is composed of doubles. I think it's best to make returning Double.NaN normative (i.e. not allow flexibility for throwing an exception). The `NumberFormat.parse(String, ParsePosition)` methods return a failure value instead of throwing like `parse(String)` does. (E.g. DecimalFormat returns null on failed parse for 2 arg parse.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2023523992 From naoto at openjdk.org Tue Apr 1 19:42:20 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 1 Apr 2025 19:42:20 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v2] In-Reply-To: References: Message-ID: <_wstnNEYpUPVZk5cU_nvJjetseFzPNBAJLohGEAawGA=.965b06ff-d215-440c-b3bd-489244947550@github.com> On Tue, 1 Apr 2025 19:04:25 GMT, Justin Lu wrote: >> Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > reflect Naoto's review I am OK with returning `Double.NaN` as normative. I believe the risk is quite low, and it would be only a conformance issue (no practical problem will arise) src/java.base/share/classes/java/text/ChoiceFormat.java line 564: > 562: * {@code Double}. The value returned is the {@code limit} corresponding > 563: * to the {@code format} that is the longest substring of the input text. > 564: * Matching is done in ascending order, when multiple {@code formats} match Nit: {@code format}s src/java.base/share/classes/java/text/ChoiceFormat.java line 584: > 582: * first index of the character that caused the parse to fail. > 583: * @return A Number which represents the {@code limit} corresponding to the {@code > 584: * format} parsed. We could clarify the no match case with `Double.NaN` here too ------------- PR Review: https://git.openjdk.org/jdk/pull/24361#pullrequestreview-2733870693 PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2023570395 PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2023571566 From jlu at openjdk.org Tue Apr 1 20:32:45 2025 From: jlu at openjdk.org (Justin Lu) Date: Tue, 1 Apr 2025 20:32:45 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v3] In-Reply-To: References: Message-ID: > Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Address further comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24361/files - new: https://git.openjdk.org/jdk/pull/24361/files/24d57bb2..d3864418 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24361/head:pull/24361 PR: https://git.openjdk.org/jdk/pull/24361 From jlu at openjdk.org Tue Apr 1 20:37:07 2025 From: jlu at openjdk.org (Justin Lu) Date: Tue, 1 Apr 2025 20:37:07 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v2] In-Reply-To: <_wstnNEYpUPVZk5cU_nvJjetseFzPNBAJLohGEAawGA=.965b06ff-d215-440c-b3bd-489244947550@github.com> References: <_wstnNEYpUPVZk5cU_nvJjetseFzPNBAJLohGEAawGA=.965b06ff-d215-440c-b3bd-489244947550@github.com> Message-ID: On Tue, 1 Apr 2025 19:36:04 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> reflect Naoto's review > > src/java.base/share/classes/java/text/ChoiceFormat.java line 564: > >> 562: * {@code Double}. The value returned is the {@code limit} corresponding >> 563: * to the {@code format} that is the longest substring of the input text. >> 564: * Matching is done in ascending order, when multiple {@code formats} match > > Nit: {@code format}s Sounds good. Addressed the conformance issue possibility in the CSR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2023639927 From naoto at openjdk.org Tue Apr 1 22:56:12 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 1 Apr 2025 22:56:12 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v3] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 20:32:45 GMT, Justin Lu wrote: >> Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Address further comments LGTM ------------- Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24361#pullrequestreview-2734200809 From alanb at openjdk.org Wed Apr 2 10:07:19 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 2 Apr 2025 10:07:19 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v3] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 20:32:45 GMT, Justin Lu wrote: >> Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Address further comments src/java.base/share/classes/java/text/ChoiceFormat.java line 562: > 560: /** > 561: * Parses a {@code Number} from the input text, the subtype of which is always > 562: * {@code Double}. The value returned is the {@code limit} corresponding I wonder if we could improve the first sentence, e.g. "Parses the input text from the parse position as a Double" ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2024515574 From jlu at openjdk.org Wed Apr 2 19:46:12 2025 From: jlu at openjdk.org (Justin Lu) Date: Wed, 2 Apr 2025 19:46:12 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v4] In-Reply-To: References: Message-ID: > Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Alan's review - Improve first sentence ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24361/files - new: https://git.openjdk.org/jdk/pull/24361/files/d3864418..0ffdef97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24361&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24361.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24361/head:pull/24361 PR: https://git.openjdk.org/jdk/pull/24361 From jlu at openjdk.org Wed Apr 2 19:46:13 2025 From: jlu at openjdk.org (Justin Lu) Date: Wed, 2 Apr 2025 19:46:13 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:05:04 GMT, Alan Bateman wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> Address further comments > > src/java.base/share/classes/java/text/ChoiceFormat.java line 562: > >> 560: /** >> 561: * Parses a {@code Number} from the input text, the subtype of which is always >> 562: * {@code Double}. The value returned is the {@code limit} corresponding > > I wonder if we could improve the first sentence, e.g. "Parses the input text from the parse position as a Double" ? Right, I think we can make the sub-type wording simplification and should mention `ParsePosition`'s role in the method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2025480425 From naoto at openjdk.org Wed Apr 2 19:50:53 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 2 Apr 2025 19:50:53 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v4] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:46:12 GMT, Justin Lu wrote: >> Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Alan's review - Improve first sentence Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24361#pullrequestreview-2737401960 From alanb at openjdk.org Thu Apr 3 06:25:50 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 3 Apr 2025 06:25:50 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:43:29 GMT, Justin Lu wrote: >> src/java.base/share/classes/java/text/ChoiceFormat.java line 562: >> >>> 560: /** >>> 561: * Parses a {@code Number} from the input text, the subtype of which is always >>> 562: * {@code Double}. The value returned is the {@code limit} corresponding >> >> I wonder if we could improve the first sentence, e.g. "Parses the input text from the parse position as a Double" ? > > Right, I think we can make the sub-type wording simplification and should mention `ParsePosition`'s role in the method. Thanks for the update, it reads much better now, no other comments from me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24361#discussion_r2026287478 From jlu at openjdk.org Fri Apr 4 21:29:18 2025 From: jlu at openjdk.org (Justin Lu) Date: Fri, 4 Apr 2025 21:29:18 GMT Subject: RFR: 8353713: Improve Currency.getInstance exception handling Message-ID: Please review this PR which improves some Currency `IllegalArgumentException`s by including the input in the message. This could be a currency code, country code, or locale. This change also includes tests to check the messages for an invalid country via the region override as well as an invalid country code within a 3 length currency code. ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/24459/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24459&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353713 Stats: 38 lines in 2 files changed: 13 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24459/head:pull/24459 PR: https://git.openjdk.org/jdk/pull/24459 From naoto at openjdk.org Fri Apr 4 22:54:53 2025 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 4 Apr 2025 22:54:53 GMT Subject: RFR: 8353713: Improve Currency.getInstance exception handling In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 21:25:00 GMT, Justin Lu wrote: > Please review this PR which improves some Currency `IllegalArgumentException`s by including the input in the message. This could be a currency code, country code, or locale. This change also includes tests to check the messages for an invalid country via the region override as well as an invalid country code within a 3 length currency code. Looks good. test/jdk/java/util/Currency/CurrencyTest.java line 102: > 100: IllegalArgumentException ex = assertThrows(IllegalArgumentException.class, () -> > 101: Currency.getInstance(badCode), "getInstance() did not throw IAE"); > 102: assertEquals("The country code: \"%s\" is not a valid ISO 3166 code" Since the test is not parameterized, we can simply use ".." inside the expected string literal. ------------- PR Review: https://git.openjdk.org/jdk/pull/24459#pullrequestreview-2744244252 PR Review Comment: https://git.openjdk.org/jdk/pull/24459#discussion_r2029528590 From jlu at openjdk.org Fri Apr 4 23:03:23 2025 From: jlu at openjdk.org (Justin Lu) Date: Fri, 4 Apr 2025 23:03:23 GMT Subject: RFR: 8353713: Improve Currency.getInstance exception handling [v2] In-Reply-To: References: Message-ID: > Please review this PR which improves some Currency `IllegalArgumentException`s by including the input in the message. This could be a currency code, country code, or locale. This change also includes tests to check the messages for an invalid country via the region override as well as an invalid country code within a 3 length currency code. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Naoto's review -> use str literal since not param test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24459/files - new: https://git.openjdk.org/jdk/pull/24459/files/e79241d0..dab5091b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24459&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24459&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24459/head:pull/24459 PR: https://git.openjdk.org/jdk/pull/24459 From naoto at openjdk.org Mon Apr 7 16:30:49 2025 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 7 Apr 2025 16:30:49 GMT Subject: RFR: 8353713: Improve Currency.getInstance exception handling [v2] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 23:03:23 GMT, Justin Lu wrote: >> Please review this PR which improves some Currency `IllegalArgumentException`s by including the input in the message. This could be a currency code, country code, or locale. This change also includes tests to check the messages for an invalid country via the region override as well as an invalid country code within a 3 length currency code. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Naoto's review -> use str literal since not param test Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24459#pullrequestreview-2747430868 From jlu at openjdk.org Mon Apr 7 20:48:17 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 7 Apr 2025 20:48:17 GMT Subject: RFR: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate [v4] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 19:46:12 GMT, Justin Lu wrote: >> Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Alan's review - Improve first sentence Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24361#issuecomment-2784586618 From jlu at openjdk.org Mon Apr 7 20:48:17 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 7 Apr 2025 20:48:17 GMT Subject: Integrated: 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:45:26 GMT, Justin Lu wrote: > Please review this PR which specifies the `ChoiceFormat#parse(String, ParsePosition)` method. A corresponding CSR is filed. The current specification is simply "Parses a Number from the input text" which does not indicate how the value is returned. The criteria for a match, as well as no match should be made clear. This pull request has now been integrated. Changeset: a8dfcf55 Author: Justin Lu URL: https://git.openjdk.org/jdk/commit/a8dfcf55849775a7ac4822a8b7661f20f1b33bb0 Stats: 17 lines in 1 file changed: 14 ins; 0 del; 3 mod 8353322: Specification of ChoiceFormat#parse(String, ParsePosition) is inadequate Reviewed-by: naoto ------------- PR: https://git.openjdk.org/jdk/pull/24361 From jlu at openjdk.org Tue Apr 8 17:40:24 2025 From: jlu at openjdk.org (Justin Lu) Date: Tue, 8 Apr 2025 17:40:24 GMT Subject: Integrated: 8353713: Improve Currency.getInstance exception handling In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 21:25:00 GMT, Justin Lu wrote: > Please review this PR which improves some Currency `IllegalArgumentException`s by including the input in the message. This could be a currency code, country code, or locale. This change also includes tests to check the messages for an invalid country via the region override as well as an invalid country code within a 3 length currency code. This pull request has now been integrated. Changeset: 5cac5796 Author: Justin Lu URL: https://git.openjdk.org/jdk/commit/5cac579619164b9a664327a4f71c4de7e7575276 Stats: 37 lines in 2 files changed: 12 ins; 0 del; 25 mod 8353713: Improve Currency.getInstance exception handling Reviewed-by: naoto ------------- PR: https://git.openjdk.org/jdk/pull/24459 From ihse at openjdk.org Wed Apr 9 15:09:58 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 15:09:58 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> On Thu, 11 May 2023 20:21:57 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Convert the merged master changes to UTF-8 > - Merge master and fix conflicts > - Close streams when finished loading into props > - Adjust CF test to read in with UTF-8 to fix failing test > - Reconvert CS.properties to UTF-8 > - Revert all changes to CurrencySymbols.properties > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties > - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties line 22: > 20: # Peter Smolik > 21: Cp1250 WINDOWS-1250 0x00FF > 22: # Patch attributed to havardw at underdusken.no (H?vard Wigtil) This does not seem to have been a correct conversion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2035582242 From jlu at openjdk.org Wed Apr 9 21:28:41 2025 From: jlu at openjdk.org (Justin Lu) Date: Wed, 9 Apr 2025 21:28:41 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: On Wed, 9 Apr 2025 15:06:32 GMT, Magnus Ihse Bursie wrote: >> Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Convert the merged master changes to UTF-8 >> - Merge master and fix conflicts >> - Close streams when finished loading into props >> - Adjust CF test to read in with UTF-8 to fix failing test >> - Reconvert CS.properties to UTF-8 >> - Revert all changes to CurrencySymbols.properties >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties >> - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a > > src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties line 22: > >> 20: # Peter Smolik >> 21: Cp1250 WINDOWS-1250 0x00FF >> 22: # Patch attributed to havardw at underdusken.no (H?vard Wigtil) > > This does not seem to have been a correct conversion. Right, that `?` looks to have been incorrectly converted during the ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion as this change is from some time ago.) Since the change occurs in a comment (thankfully), it should be harmless and the next upstream update of this file would overwrite this incorrect change. However, this file does not seem to be updated that often, so I can also file an issue to correct this if you would prefer that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036165417 From ihse at openjdk.org Thu Apr 10 07:34:37 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 07:34:37 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: On Wed, 9 Apr 2025 21:26:15 GMT, Justin Lu wrote: >> src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties line 22: >> >>> 20: # Peter Smolik >>> 21: Cp1250 WINDOWS-1250 0x00FF >>> 22: # Patch attributed to havardw at underdusken.no (H?vard Wigtil) >> >> This does not seem to have been a correct conversion. > > Right, that `?` looks to have been incorrectly converted during the ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion as this change is from some time ago.) > > Since the change occurs in a comment (thankfully), it should be harmless and the next upstream update of this file would overwrite this incorrect change. However, this file does not seem to be updated that often, so I can also file an issue to correct this if you would prefer that. You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right now, where I will include a fix for this as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036695622 From ihse at openjdk.org Thu Apr 10 07:34:37 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 07:34:37 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: On Thu, 10 Apr 2025 07:31:37 GMT, Magnus Ihse Bursie wrote: >> Right, that `?` looks to have been incorrectly converted during the ISO-8859-1 to UTF-8 conversion. (I can't find the script used for conversion as this change is from some time ago.) >> >> Since the change occurs in a comment (thankfully), it should be harmless and the next upstream update of this file would overwrite this incorrect change. However, this file does not seem to be updated that often, so I can also file an issue to correct this if you would prefer that. > > You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right now, where I will include a fix for this as well. If anything, I might be a bit worried that there are more incorrect conversions stemming from this PR, that my automated tools and manual scanning has not revealed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036696723 From eirbjo at openjdk.org Thu Apr 10 08:10:42 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Thu, 10 Apr 2025 08:10:42 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> Message-ID: <6c6DqyCqyPonBZgUU8BpYJR3JQvMXjWm9ulq4SN25Do=.77775825-716d-4908-ae24-c4cf1ead78a5@github.com> On Thu, 10 Apr 2025 07:32:18 GMT, Magnus Ihse Bursie wrote: >> You don't have to do that, I'm working on an omnibus UTF-8 fixing PR right now, where I will include a fix for this as well. > > If anything, I might be a bit worried that there are more incorrect conversions stemming from this PR, that my automated tools and manual scanning has not revealed. Some observations: 1: This PR seems to have been abondoned, so perhaps this discussion belongs in #15694 ? 2: The `?` (Unicode 'Latin small letter a with ring above' U+00E5) was correctly encoded as 0xEF in ISO-8859-1 previous to this change. 3: The conversion changed this `0xEF` to the three-byte sequence `ef bf bd` 4: This is as-if the file was incorrctly decoded using UTF-8, then encoded using UTF-8: byte[] origBytes = "?".getBytes(StandardCharsets.ISO_8859_1); String decoded = new String(origBytes, StandardCharsets.UTF_8); byte[] encoded = decoded.getBytes(StandardCharsets.UTF_8); String hex = HexFormat.of().formatHex(encoded); assertEquals("efbfbd", hex); ``` Like @magicus I'm worried that similar incorrect decoding could have been introduced by the same script in other files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036767319 From ihse at openjdk.org Thu Apr 10 08:38:38 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 08:38:38 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: <6c6DqyCqyPonBZgUU8BpYJR3JQvMXjWm9ulq4SN25Do=.77775825-716d-4908-ae24-c4cf1ead78a5@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> <_YOUyzMbSEXFduCKVgyis37kwTlGSjBbP8VlFu3xQpU=.9b668e2a-8f91-476d-8914-13dc33a0b9e5@github.com> <6c6DqyCqyPonBZgUU8BpYJR3JQvMXjWm9ulq4SN25Do=.77775825-716d-4908-ae24-c4cf1ead78a5@github.com> Message-ID: On Thu, 10 Apr 2025 08:08:02 GMT, Eirik Bj?rsn?s wrote: >> If anything, I might be a bit worried that there are more incorrect conversions stemming from this PR, that my automated tools and manual scanning has not revealed. > > Some observations: > > 1: This PR seems to have been abondoned, so perhaps this discussion belongs in #15694 ? > > 2: The `?` (Unicode 'Latin small letter a with ring above' U+00E5) was correctly encoded as 0xEF in ISO-8859-1 previous to this change. > > 3: The conversion changed this `0xEF` to the three-byte sequence `ef bf bd` > > 4: This is as-if the file was incorrctly decoded using UTF-8, then encoded using UTF-8: > > > byte[] origBytes = "?".getBytes(StandardCharsets.ISO_8859_1); > String decoded = new String(origBytes, StandardCharsets.UTF_8); > byte[] encoded = decoded.getBytes(StandardCharsets.UTF_8); > String hex = HexFormat.of().formatHex(encoded); > assertEquals("efbfbd", hex); > ``` > > Like @magicus I'm worried that similar incorrect decoding could have been introduced by the same script in other files. > This PR seems to have been abondoned, so perhaps this discussion belongs in https://github.com/openjdk/jdk/pull/15694 ? Oh, I didn't notice this was supplanted by another PR. It might be better to continue there, yes. Even if closed PRs seldom are the best places to conduct discussions, I think it might be a good idea to scrutinize all files modified by this script. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12726#discussion_r2036820765 From ihse at openjdk.org Thu Apr 10 08:41:45 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 08:41:45 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It would improve readability to see the native characters instead of escape sequences (especially for the L10n process). The majority of files changed are localized resource files. >> >> This change converts the Unicode escape sequences in the JDK .properties files (both in src and test) to UTF-8 native characters. Additionally, the build logic is adjusted to read the .properties files in UTF-8 while generating the ListResourceBundle files. >> >> The only escape sequence not converted was `\u0020` as this is used to denote intentional trailing white space. (E.g. `key=This is the value:\u0020`) >> >> The conversion was done using native2ascii with options `-reverse -encoding UTF-8`. >> >> If this PR is integrated, the IDE default encoding for .properties files need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 unless manually changed). > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Replace InputStreamReader with BufferedReader Continuing the discussion that was started at a predecessor to this PR, https://github.com/openjdk/jdk/pull/12726#discussion_r2035582242. At least one incorrect conversion has been found in this PR. It might be worthwhile to double- and triple-check all the other conversions as well. As part of https://bugs.openjdk.org/browse/JDK-8301971 I am trying various ways of detecting files without UTF-8 encoding, but it is still a bit of hit and miss, since there are no surefire way of telling which encoding a file has, only heuristics. So finding and following up potential sources of error is important. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2791991649 PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2791997157 From eirbjo at openjdk.org Thu Apr 10 08:48:37 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Thu, 10 Apr 2025 08:48:37 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: References: Message-ID: <0q0gTsqIsYtmzAfNYbBXksUXKdZh2uzQ9yvSETKAP88=.137372e6-d63e-4539-b196-4bd9ef1ddd16@github.com> On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It would improve readability to see the native characters instead of escape sequences (especially for the L10n process). The majority of files changed are localized resource files. >> >> This change converts the Unicode escape sequences in the JDK .properties files (both in src and test) to UTF-8 native characters. Additionally, the build logic is adjusted to read the .properties files in UTF-8 while generating the ListResourceBundle files. >> >> The only escape sequence not converted was `\u0020` as this is used to denote intentional trailing white space. (E.g. `key=This is the value:\u0020`) >> >> The conversion was done using native2ascii with options `-reverse -encoding UTF-8`. >> >> If this PR is integrated, the IDE default encoding for .properties files need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 unless manually changed). > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Replace InputStreamReader with BufferedReader FWIW, I checked out the revision of the commit previous to this change and found the following: % git checkout b55e418a077791b39992042411cde97f68dc39fe^ % find src -name "*.properties" | xargs file | grep -v ASCII src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties: ISO-8859 text src/java.xml.crypto/share/classes/com/sun/org/apache/xml/internal/security/resource/xmlsecurity_de.properties: Unicode text, UTF-8 text, with very long lines (322) Which indicates that that this is the only non-ASCII, non-UTF-8 property file. So we may be lucky. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2792014164 From ihse at openjdk.org Thu Apr 10 09:45:56 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 09:45:56 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: References: Message-ID: On Wed, 13 Sep 2023 17:38:28 GMT, Justin Lu wrote: >> JDK .properties files still use ISO-8859-1 encoding with escape sequences. It would improve readability to see the native characters instead of escape sequences (especially for the L10n process). The majority of files changed are localized resource files. >> >> This change converts the Unicode escape sequences in the JDK .properties files (both in src and test) to UTF-8 native characters. Additionally, the build logic is adjusted to read the .properties files in UTF-8 while generating the ListResourceBundle files. >> >> The only escape sequence not converted was `\u0020` as this is used to denote intentional trailing white space. (E.g. `key=This is the value:\u0020`) >> >> The conversion was done using native2ascii with options `-reverse -encoding UTF-8`. >> >> If this PR is integrated, the IDE default encoding for .properties files need to be updated to UTF-8. (IntelliJ IDEA locks .properties files as ISO-8859-1 unless manually changed). > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Replace InputStreamReader with BufferedReader Thanks for checking! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2792170460 From ihse at openjdk.org Thu Apr 10 10:18:13 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:18:13 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 497: > 495: /* > 496: The algorithm below is based on Intel publication: > 497: "Fast SHA-256 Implementations on Intel(R) Architecture Processors" by Jim Guilford, Kirk Yap and Vinodh Gopal. Note: There is of course a unicode `?` symbol, which is what it was originally before it was botched here, but I found no reason to keep this, and in the spirit of JDK-8354213, I thought it better to use pure ASCII here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037012318 From ihse at openjdk.org Thu Apr 10 10:18:13 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:18:13 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding Message-ID: I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. Methodology used: I have run four different tools for using different heuristics for determining the encoding of a file: * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) * uchardet (a modern version by freedesktop, used by e.g. Firefox) * enca (targeted towards obscure code pages) * libmagic / `file --mime-encoding` They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` >From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: * All files where at least one tool claimed it to be UTF-8 * All files where at least one tool claimed it to be *not* UTF-8 For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling extensions (most of these are in tests). The BOM files were only pointed out by chardetect; I did run an additional search for UTF-8 BOM markers over the code base to make sure I did not miss any others (since chardetect apart from this did a not-so-perfect job). The files included in this PR are what I actually found that had encoding errors or issues. ------------- Commit messages: - Remove UTF-8 BOM (byte-order mark) which is discouraged by the Unicode Consortium - Fix incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24566/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24566&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354266 Stats: 32 lines in 13 files changed: 0 ins; 2 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24566/head:pull/24566 PR: https://git.openjdk.org/jdk/pull/24566 From ihse at openjdk.org Thu Apr 10 10:23:56 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:23:56 GMT Subject: RFR: 8354273: Restore even more pointless unicode characters to ASCII Message-ID: As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some additional places where unicode characters are unnecessarily used instead of pure ASCII. ------------- Commit messages: - 8354273: Restore even more pointless unicode characters to ASCII Changes: https://git.openjdk.org/jdk/pull/24567/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24567&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354273 Stats: 9 lines in 6 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24567.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24567/head:pull/24567 PR: https://git.openjdk.org/jdk/pull/24567 From ihse at openjdk.org Thu Apr 10 10:36:31 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:36:31 GMT Subject: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2] In-Reply-To: References: Message-ID: > As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some additional places where unicode characters are unnecessarily used instead of pure ASCII. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Remove incorrectly copied "?anchor" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24567/files - new: https://git.openjdk.org/jdk/pull/24567/files/d9527eb9..876708c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24567&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24567&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24567.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24567/head:pull/24567 PR: https://git.openjdk.org/jdk/pull/24567 From ihse at openjdk.org Thu Apr 10 10:39:32 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:39:32 GMT Subject: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:36:31 GMT, Magnus Ihse Bursie wrote: >> As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some additional places where unicode characters are unnecessarily used instead of pure ASCII. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove incorrectly copied "?anchor" src/java.xml/share/legal/xmlxsd.md line 29: > 27: https://www.w3.org/copyright/software-license-2023/" > 28: > 29: Disclaimers ?anchor This is an incorrectly copied piece of html; compare how the very same license is handled in e.g. `src/java.xml/share/legal/schema10part1.md`. The ? is the non-ascii character that triggered my detection of this, but the entire "anchor" string is incorrect here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24567#discussion_r2037047696 From rgiulietti at openjdk.org Thu Apr 10 11:49:30 2025 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 10 Apr 2025 11:49:30 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:14:40 GMT, Magnus Ihse Bursie wrote: >> I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. >> >> BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. >> >> Methodology used: >> >> I have run four different tools for using different heuristics for determining the encoding of a file: >> * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) >> * uchardet (a modern version by freedesktop, used by e.g. Firefox) >> * enca (targeted towards obscure code pages) >> * libmagic / `file --mime-encoding` >> >> They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: >> * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` >> >> From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: >> * All files where at least one tool claimed it to be UTF-8 >> * All files where at least one tool claimed it to be *not* UTF-8 >> >> For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. >> >> For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure... > > src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 497: > >> 495: /* >> 496: The algorithm below is based on Intel publication: >> 497: "Fast SHA-256 Implementations on Intel(R) Architecture Processors" by Jim Guilford, Kirk Yap and Vinodh Gopal. > > Note: There is of course a unicode `?` symbol, which is what it was originally before it was botched here, but I found no reason to keep this, and in the spirit of JDK-8354213, I thought it better to use pure ASCII here. I guess the difference at L.1 in the various files is just the BOM? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037161789 From ihse at openjdk.org Thu Apr 10 13:17:24 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 13:17:24 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 11:46:45 GMT, Raffaello Giulietti wrote: > I guess the difference at L.1 in the various files is just the BOM? Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037357899 From rgiulietti at openjdk.org Thu Apr 10 13:56:42 2025 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 10 Apr 2025 13:56:42 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... I only checked these 13 files to be UTF-8 encoded and without BOM. ------------- Marked as reviewed by rgiulietti (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24566#pullrequestreview-2756936848 From naoto at openjdk.org Thu Apr 10 17:12:26 2025 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 10 Apr 2025 17:12:26 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... src/java.desktop/share/legal/lcms.md line 72: > 70: Mateusz Jurczyk (Google) > 71: Paul Miller > 72: S?bastien L?on I cannot comment on capitalization here, but if we wanted to lowercase them, should they be e-grave instead of e-acute? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037895884 From rgiulietti at openjdk.org Thu Apr 10 17:26:30 2025 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 10 Apr 2025 17:26:30 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 17:09:27 GMT, Naoto Sato wrote: >> I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. >> >> BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. >> >> Methodology used: >> >> I have run four different tools for using different heuristics for determining the encoding of a file: >> * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) >> * uchardet (a modern version by freedesktop, used by e.g. Firefox) >> * enca (targeted towards obscure code pages) >> * libmagic / `file --mime-encoding` >> >> They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: >> * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` >> >> From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: >> * All files where at least one tool claimed it to be UTF-8 >> * All files where at least one tool claimed it to be *not* UTF-8 >> >> For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. >> >> For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure... > > src/java.desktop/share/legal/lcms.md line 72: > >> 70: Mateusz Jurczyk (Google) >> 71: Paul Miller >> 72: S?bastien L?on > > I cannot comment on capitalization here, but if we wanted to lowercase them, should they be e-grave instead of e-acute? If this is a French name, it's e acute: ?. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037917708 From erikj at openjdk.org Thu Apr 10 17:37:26 2025 From: erikj at openjdk.org (Erik Joelsson) Date: Thu, 10 Apr 2025 17:37:26 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: <4fRjwM-P0XuOWk9QjYl9zji51zLn7wwsFKlo7tJt3JM=.976560e0-39c6-4633-bc8d-279deb1ebea3@github.com> On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24566#pullrequestreview-2757703868 From naoto at openjdk.org Thu Apr 10 17:41:25 2025 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 10 Apr 2025 17:41:25 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: <7hmmP0I0kH0UiF8cV-CkNnpdQFkddrt3TYEkFltoj8U=.3bf6bcbf-3771-4628-82e0-f678f7366d8a@github.com> On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24566#pullrequestreview-2757716905 From eirbjo at openjdk.org Thu Apr 10 18:33:26 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Thu, 10 Apr 2025 18:33:26 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 17:23:37 GMT, Raffaello Giulietti wrote: > If this is a French name, it's e acute: ?. Supported by this Wikipedia page listing S.L as an LCMS developer: https://en.wikipedia.org/wiki/Little_CMS ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2038022994 From eirbjo at openjdk.org Thu Apr 10 18:45:28 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Thu, 10 Apr 2025 18:45:28 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... src/java.desktop/share/legal/lcms.md line 103: > 101: Tim Zaman > 102: Amir Montazery and Open Source Technology Improvement Fund (ostif.org), Google, for fuzzer fundings. > 103: ``` This introduces an empty trailing line. I see you have removed trailing whitespace elsewhere. Was this intentional, to avoid the file ending with the three ticks? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2038071768 From jlu at openjdk.org Thu Apr 10 18:47:53 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 10 Apr 2025 18:47:53 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2] In-Reply-To: <0q0gTsqIsYtmzAfNYbBXksUXKdZh2uzQ9yvSETKAP88=.137372e6-d63e-4539-b196-4bd9ef1ddd16@github.com> References: <0q0gTsqIsYtmzAfNYbBXksUXKdZh2uzQ9yvSETKAP88=.137372e6-d63e-4539-b196-4bd9ef1ddd16@github.com> Message-ID: <9aQcWun5KNgHgELVwkc3478_RtqfhRL1Cxvyn2Yl0Nw=.07ee596f-e738-4796-8d27-14621ed8860c@github.com> On Thu, 10 Apr 2025 08:44:28 GMT, Eirik Bj?rsn?s wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace InputStreamReader with BufferedReader > > FWIW, I checked out the revision of the commit previous to this change and found the following: > > > % git checkout b55e418a077791b39992042411cde97f68dc39fe^ > % find src -name "*.properties" | xargs file | grep -v ASCII > src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/Encodings.properties: > ISO-8859 text > src/java.xml.crypto/share/classes/com/sun/org/apache/xml/internal/security/resource/xmlsecurity_de.properties: > Unicode text, UTF-8 text, with very long lines (322) > > > Which indicates that that this is the only non-ASCII, non-UTF-8 property file. So we may be lucky. This conversion was performed under the assumption of ASCII set and Unicode escape sequences, which is the format we expect for the translation process for .properties files. That file should have been omitted from this change. Thank you @eirbjo and @magicus for the analysis and checking! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15694#issuecomment-2794828598 From eirbjo at openjdk.org Thu Apr 10 19:09:35 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Thu, 10 Apr 2025 19:09:35 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... LGTM. There are some whitepace releated changes in this PR which seem okay, but has no mention in either the JBS or PR description. Perhaps a short mention of this intention in either place would be good for future historians. (BTW, I enjoyed seeing separate commits for the encoding and BOM changes, makes it easier to verify each!) ------------- Marked as reviewed by eirbjo (Committer). PR Review: https://git.openjdk.org/jdk/pull/24566#pullrequestreview-2758055634 From ihse at openjdk.org Thu Apr 10 21:28:31 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 21:28:31 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... The whitespace changes are my editor removing whitespaces at the end of a line. This is a thing we enforce for many files types, but the check does not yet formally include .txt files. I have been working from time to time with trying to extend the set of files covered by this check, so I have in general not tried to circumvent my editor when it strips trailing whitespaces even for files that we do not yet require no trailing whitespaces in jcheck. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24566#issuecomment-2795201480 From ihse at openjdk.org Thu Apr 10 21:28:32 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 21:28:32 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: <1IvhgoM9LMGg7s2kq_N0V7F1GCh-xFBnauZ9Ajk2Txo=.672329ea-e4c9-437c-a8b7-0502a9fdd414@github.com> On Thu, 10 Apr 2025 19:06:35 GMT, Eirik Bj?rsn?s wrote: > (BTW, I enjoyed seeing separate commits for the encoding and BOM changes, makes it easier to verify each!) Thanks! I do very much like myself to review PRs that has separate logical commits, so I try to produce such myself. I'm glad to hear it was appreciated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24566#issuecomment-2795203125 From ihse at openjdk.org Thu Apr 10 21:28:32 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 21:28:32 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 18:30:22 GMT, Eirik Bj?rsn?s wrote: >> If this is a French name, it's e acute: ?. > >> If this is a French name, it's e acute: ?. > > Supported by this Wikipedia page listing S.L as an LCMS developer: > > https://en.wikipedia.org/wiki/Little_CMS It's not a mistake in capitalization, it's a mistake for two different characters in two different encodings. (Probably iso-8859-1 mistaken as ansi iirc.) I verified the developers name at the original file in the LCMS repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2038362034 From rriggs at openjdk.org Thu Apr 10 22:10:43 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 10 Apr 2025 22:10:43 GMT Subject: RFR: 8354335: No longer deprecate wrapper class constructors for removal Message-ID: Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, Character, Double, Float, Integer, Long, Short. And add `SuppressWarnings("deprecation") `where needed; and remove `SuppressWarnings("removal")` ------------- Commit messages: - 8354335: No longer deprecate wrapper class constructors for removal Changes: https://git.openjdk.org/jdk/pull/24586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354335 Stats: 23 lines in 9 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24586/head:pull/24586 PR: https://git.openjdk.org/jdk/pull/24586 From liach at openjdk.org Thu Apr 10 23:43:23 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 10 Apr 2025 23:43:23 GMT Subject: RFR: 8354335: No longer deprecate wrapper class constructors for removal In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 22:05:04 GMT, Roger Riggs wrote: > Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, Character, Double, Float, Integer, Long, Short. > And add `SuppressWarnings("deprecation") `where needed; and remove `SuppressWarnings("removal")` The wrapper classes and MemberName changes look good. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24586#pullrequestreview-2758769422 From serb at openjdk.org Fri Apr 11 03:37:29 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Fri, 11 Apr 2025 03:37:29 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... src/demo/share/java2d/J2DBench/resources/textdata/arabic.ut8.txt line 11: > 9: ???????? ???????????? ?????????????? "??????????????" ???????? ?????????? ?????? ???????? ???? ???????? ???????????? ?????????????????? ???????? ?????? ?????????? ???? ?????? ?????????????? ???? ?????????????? ??????????????????. ?????? ?????? ???????? ???????????? "??????????????" ???????? ???????? ???????? ???????????????? ???????????? ???????????????? ?????? ?????????????? ?????? ?????????? ????.????.????. (IBM)?? ???????? (APPLE)?? ???????????????????? ?????????????? (Hewlett-Packard) ?? ???????????????????? (Microsoft)?? ???????????????? (Oracle) ?? ???? (Sun) ????????????. ?????? ???? ?????????????????? ?????????????????? ?????????????? (?????? ?????? ?????????????? "????????" "JAVA" ???????? "?????? ???? ????" "XML" ???????? ???????????? ???????????? ??????????????????) ?????????? ?????????????? "??????????????". ?????????? ?????? ?????? ?? ?????? "??????????????" ???? ???????????????????? ???????????????? ???????????? ???????????????? ???????????????????? ???????? ?????? ???? (ISO 10646) . > 10: > 11: ???? ???????? ???????????? "??????????????" ?????????????? ?????????????? ???????? ?????????????? ?????????????? ?????????? ???? ?????? ???????????????????? ?????????????? ???? ?????????? ?????????????????? ?????????? ???????????? ???? ????????????. ?????? ?????????????? "??????????????" ???? ???????? ?????????????????? ?????????? ?????? ?????????? ???????? ???????????? ???? ?????????????? ?????????????????? ?????????????????? ?????????????? ??????????????. ?????? ???? ?????????????? "??????????????" ???????????????? ?????????????? ???? ?????????? ???????????????? ?????? ???????????? ?????????????????? ?????? ???? ?????? ???? ?????????????? ???? ???????????????? ???????? ?????? ???? ???????? ???? ???????????? ?????????? ?????????? ?????? ???????????? ???????????? ?????????????? ???? ?????????? ???? ??????????. ?????????????? ?????? ?????????????? "??????????????" ?????????? ???????????????? ???? ???????????????? ?????? ?????????????? ???????????????? ???????????????? ?????? ??? ? ?????????? ?????????????????? ???????? ?????????? ?????????????? ?????????????? ?????????????? ???????????????? ???????????? ???????? ?????? ???? ???????????? ?????? ????????????????. Looks like most of the changes in java2d/* are related to spaces at the end of the line? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2038746193 From ihse at openjdk.org Fri Apr 11 10:27:40 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 11 Apr 2025 10:27:40 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 03:35:11 GMT, Sergey Bylokhov wrote: >> I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. >> >> BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. >> >> Methodology used: >> >> I have run four different tools for using different heuristics for determining the encoding of a file: >> * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) >> * uchardet (a modern version by freedesktop, used by e.g. Firefox) >> * enca (targeted towards obscure code pages) >> * libmagic / `file --mime-encoding` >> >> They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: >> * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` >> >> From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: >> * All files where at least one tool claimed it to be UTF-8 >> * All files where at least one tool claimed it to be *not* UTF-8 >> >> For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. >> >> For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure... > > src/demo/share/java2d/J2DBench/resources/textdata/arabic.ut8.txt line 11: > >> 9: ???????? ???????????? ?????????????? "??????????????" ???????? ?????????? ?????? ???????? ???? ???????? ???????????? ?????????????????? ???????? ?????? ?????????? ???? ?????? ?????????????? ???? ?????????????? ??????????????????. ?????? ?????? ???????? ???????????? "??????????????" ???????? ???????? ???????? ???????????????? ???????????? ???????????????? ?????? ?????????????? ?????? ?????????? ????.????.????. (IBM)?? ???????? (APPLE)?? ???????????????????? ?????????????? (Hewlett-Packard) ?? ???????????????????? (Microsoft)?? ???????????????? (Oracle) ?? ???? (Sun) ????????????. ?????? ???? ?????????????????? ?????????????????? ?????????????? (?????? ?????? ?????????????? "????????" "JAVA" ???????? "?????? ???? ????" "XML" ???????? ???????????? ???????????? ??????????????????) ?????????? ?????????????? "??????????????". ?????????? ?????? ?????? ?? ?????? "??????????????" ???? ???????????????????? ???????????????? ???????????? ???????????????? ???????????????????? ???????? ????? ????? (ISO 10646) . >> 10: >> 11: ???? ???????? ???????????? "??????????????" ?????????????? ?????????????? ???????? ?????????????? ?????????????? ?????????? ???? ?????? ???????????????????? ?????????????? ???? ?????????? ?????????????????? ?????????? ???????????? ???? ????????????. ?????? ?????????????? "??????????????" ???? ???????? ?????????????????? ?????????? ?????? ?????????? ???????? ???????????? ???? ?????????????? ?????????????????? ?????????????????? ?????????????? ??????????????. ?????? ???? ?????????????? "??????????????" ???????????????? ?????????????? ???? ?????????? ???????????????? ?????? ???????????? ?????????????????? ?????? ???? ?????? ???? ?????????????? ???? ???????????????? ???????? ?????? ???? ???????? ???? ???????????? ?????????? ?????????? ?????? ???????????? ???????????? ?????????????? ???? ?????????? ???? ??????????. ?????????????? ?????? ?????????????? "??????????????" ?????????? ???????????????? ???? ???????????????? ?????? ?????????????? ???????????????? ???????????????? ?????? ?? ?? ?????????? ?????????????????? ???????? ?????????? ?????????????? ?????????????? ?????????????? ???????????????? ???????????? ???????? ?????? ???? ???????????? ?????? ????????????????. > > Looks like most of the changes in java2d/* are related to spaces at the end of the line? No, that are just incidental changes (see https://github.com/openjdk/jdk/pull/24566#issuecomment-2795201480). The actual change for the java2d files is the removal of the initial UTF-8 BOM. Github has a hard time showing this though, since the BOM is not visible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2039258980 From eirbjo at openjdk.org Fri Apr 11 10:27:40 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Fri, 11 Apr 2025 10:27:40 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 10:21:32 GMT, Magnus Ihse Bursie wrote: >> src/demo/share/java2d/J2DBench/resources/textdata/arabic.ut8.txt line 11: >> >>> 9: ???????? ???????????? ?????????????? "??????????????" ???????? ?????????? ?????? ???????? ???? ???????? ???????????? ?????????????????? ???????? ?????? ?????????? ???? ?????? ?????????????? ???? ?????????????? ??????????????????. ?????? ?????? ???????? ???????????? "??????????????" ???????? ???????? ???????? ???????????????? ???????????? ???????????????? ?????? ?????????????? ?????? ?????????? ????.????.????. (IBM)?? ???????? (APPLE)?? ???????????????????? ?????????????? (Hewlett-Packard) ?? ???????????????????? (Microsoft)?? ???????????????? (Oracle) ?? ???? (Sun) ????????????. ?????? ???? ?????????????????? ?????????????????? ?????????????? (?????? ?????? ?????????????? "????????" "JAVA" ???????? "?????? ???? ????" "XML" ???????? ???????????? ???????????? ??????????????????) ?????????? ?????????????? "??????????????". ?????????? ?????? ?????? ?? ?????? "??????????????" ???? ???????????????????? ???????????????? ???????????? ???????????????? ???????????????????? ???????? ???? ?????? (ISO 10646) . >>> 10: >>> 11: ???? ???????? ???????????? "??????????????" ?????????????? ?????????????? ???????? ?????????????? ?????????????? ?????????? ???? ?????? ???????????????????? ?????????????? ???? ?????????? ?????????????????? ?????????? ???????????? ???? ????????????. ?????? ?????????????? "??????????????" ???? ???????? ?????????????????? ?????????? ?????? ?????????? ???????? ???????????? ???? ?????????????? ?????????????????? ?????????????????? ?????????????? ??????????????. ?????? ???? ?????????????? "??????????????" ???????????????? ?????????????? ???? ?????????? ???????????????? ?????? ???????????? ?????????????????? ?????? ???? ?????? ???? ?????????????? ???? ???????????????? ???????? ?????? ???? ???????? ???? ???????????? ?????????? ?????????? ?????? ???????????? ???????????? ?????????????? ???? ?????????? ???? ??????????. ?????????????? ?????? ?????????????? "??????????????" ?????????? ???????????????? ???? ???????????????? ?????? ?????????????? ???????????????? ???????????????? ?????? ? ??? ?????????? ?????????????????? ???????? ?????????? ?????????????? ?????????????? ?????????????? ???????????????? ???????????? ???????? ?????? ???? ???????????? ?????? ????????????????. >> >> Looks like most of the changes in java2d/* are related to spaces at the end of the line? > > No, that are just incidental changes (see https://github.com/openjdk/jdk/pull/24566#issuecomment-2795201480). The actual change for the java2d files is the removal of the initial UTF-8 BOM. Github has a hard time showing this though, since the BOM is not visible. I found the side-by-side diff in IntelliJ useful here, as it said "UTF-8 BOM" vs. "UTF-8". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2039263227 From ihse at openjdk.org Fri Apr 11 10:27:40 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 11 Apr 2025 10:27:40 GMT Subject: Integrated: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... This pull request has now been integrated. Changeset: d4e194bc Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/d4e194bc463ff3ad09e55cbb96bea00283679ce6 Stats: 32 lines in 13 files changed: 0 ins; 2 del; 30 mod 8354266: Fix non-UTF-8 text encoding Reviewed-by: rgiulietti, erikj, naoto, eirbjo ------------- PR: https://git.openjdk.org/jdk/pull/24566 From naoto at openjdk.org Fri Apr 11 17:08:26 2025 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 11 Apr 2025 17:08:26 GMT Subject: RFR: 8343157: Examine large files for character encoding/decoding Message-ID: Removing old charset test cases that verify new charset implementations (as of JDK7). Removed tests/files are actual charset implementations used in pre-JDK7, which have been used for comparing the results. Since those "new" implementations have been used since then, I believe it is OK to retire those old test cases. ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24597/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24597&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343157 Stats: 164679 lines in 55 files changed: 0 ins; 164677 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24597.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24597/head:pull/24597 PR: https://git.openjdk.org/jdk/pull/24597 From bchristi at openjdk.org Fri Apr 11 20:17:26 2025 From: bchristi at openjdk.org (Brent Christian) Date: Fri, 11 Apr 2025 20:17:26 GMT Subject: RFR: 8354335: No longer deprecate wrapper class constructors for removal In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 22:05:04 GMT, Roger Riggs wrote: > Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, Character, Double, Float, Integer, Long, Short. > And add `SuppressWarnings("deprecation") `where needed; and remove `SuppressWarnings("removal")` LGTM ------------- Marked as reviewed by bchristi (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24586#pullrequestreview-2761490305 From iris at openjdk.org Fri Apr 11 20:24:25 2025 From: iris at openjdk.org (Iris Clark) Date: Fri, 11 Apr 2025 20:24:25 GMT Subject: RFR: 8354335: No longer deprecate wrapper class constructors for removal In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 22:05:04 GMT, Roger Riggs wrote: > Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, Character, Double, Float, Integer, Long, Short. > And add `SuppressWarnings("deprecation") `where needed; and remove `SuppressWarnings("removal")` Marked as reviewed by iris (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24586#pullrequestreview-2761501284 From alanb at openjdk.org Sat Apr 12 05:51:33 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 12 Apr 2025 05:51:33 GMT Subject: RFR: 8343157: Examine large files for character encoding/decoding In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 17:02:13 GMT, Naoto Sato wrote: > Removing old charset test cases that verify new charset implementations (as of JDK7). Removed tests/files are actual charset implementations used in pre-JDK7, which have been used for comparing the results. Since those "new" implementations have been used since then, I believe it is OK to retire those old test cases. Okay to delete, no real value keeping these. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24597#pullrequestreview-2762069633 From ihse at openjdk.org Sun Apr 13 22:50:37 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Sun, 13 Apr 2025 22:50:37 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 Message-ID: This is a WIP to move the JDK source code base to fully UTF-8, and to ensure tools knows about this. ------------- Commit messages: - Fix flags for Windows - Mark java and native source code as utf-8 - Don't convert properties files to iso-8859-1. - Tell tools we use utf-8 - Replace iso-8859-1 encodings with utf-8 in source code - Explain reason for non-UTF-8 character in JDK_RCFLAGS Changes: https://git.openjdk.org/jdk/pull/24574/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24574&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301971 Stats: 130 lines in 8 files changed: 17 ins; 103 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24574/head:pull/24574 PR: https://git.openjdk.org/jdk/pull/24574 From ihse at openjdk.org Sun Apr 13 22:58:26 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Sun, 13 Apr 2025 22:58:26 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 In-Reply-To: References: Message-ID: <0io2A_4xFMiR8rwbXPPyYyXar_fwE1jG4K81pY_heUU=.18d9f809-dafc-4900-82fa-6478eb50b8de@github.com> On Thu, 10 Apr 2025 14:28:02 GMT, Magnus Ihse Bursie wrote: > Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. > > The fix is basically simple, and includes the following steps: > * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already > * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). I would like to run proper tests to verify the changes in libjava, but I don't know what tests that would be. If anyone can enlighten me, please do. (I suspect that the code did not really work properly before, and that the specially encoded characters where not thoroughly tested, but I can be wrong.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2800165519 From ihse at openjdk.org Sun Apr 13 23:14:41 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Sun, 13 Apr 2025 23:14:41 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v2] In-Reply-To: References: Message-ID: > Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. > > The fix is basically simple, and includes the following steps: > * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already > * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Also tell javadoc that we have utf-8 now ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24574/files - new: https://git.openjdk.org/jdk/pull/24574/files/4fb897ef..38004164 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24574&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24574&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24574/head:pull/24574 PR: https://git.openjdk.org/jdk/pull/24574 From ihse at openjdk.org Mon Apr 14 12:53:35 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 14 Apr 2025 12:53:35 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: > Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. > > The fix is basically simple, and includes the following steps: > * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already > * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: - Also document UTF-8 requirements (solves JDK-8338973) - Let configure only accept utf-8 locales - Address review comments from Kim ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24574/files - new: https://git.openjdk.org/jdk/pull/24574/files/38004164..452f42dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24574&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24574&range=01-02 Stats: 47 lines in 7 files changed: 27 ins; 2 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24574/head:pull/24574 PR: https://git.openjdk.org/jdk/pull/24574 From kbarrett at openjdk.org Mon Apr 14 12:53:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 14 Apr 2025 12:53:53 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v2] In-Reply-To: References: Message-ID: On Sun, 13 Apr 2025 23:14:41 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also tell javadoc that we have utf-8 now A couple of drive-by comments. Don't count me as a Reviewer for this. make/autoconf/flags-cflags.m4 line 577: > 575: elif test "x$TOOLCHAIN_TYPE" = xmicrosoft; then > 576: # The -utf-8 option sets source and execution character sets to UTF-8 to enable correct > 577: # compilation of all source files regardless of the active code page on Windows. Seems like this comment should be updated and moved near the new code block for setting up `CHARSET_CFLAGS`. make/common/JavaCompilation.gmk line 83: > 81: # The sed expression does this: > 82: # 1. Add a backslash before any :, = or ! that do not have a backslash already. > 83: # 3. Delete all lines starting with #. There is no item 2 anymore, so following bullets are misnumbered. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24574#pullrequestreview-2762999364 PR Review Comment: https://git.openjdk.org/jdk/pull/24574#discussion_r2041326051 PR Review Comment: https://git.openjdk.org/jdk/pull/24574#discussion_r2041328098 From ihse at openjdk.org Mon Apr 14 12:53:56 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 14 Apr 2025 12:53:56 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v2] In-Reply-To: References: Message-ID: On Sun, 13 Apr 2025 23:14:41 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also tell javadoc that we have utf-8 now Inspired by [Phil's comment in JDK-8353948](https://bugs.openjdk.org/browse/JDK-8353948?focusedId=14769043&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14769043), I also modified configure to only allow utf-8 environments, but to also allow `en_US.UTF-8` as a valid locale. This also resolves [JDK-8333247](https://bugs.openjdk.org/browse/JDK-8333247) in a better way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2800741990 From naoto at openjdk.org Mon Apr 14 16:12:47 2025 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 14 Apr 2025 16:12:47 GMT Subject: RFR: 8343157: Examine large files for character encoding/decoding In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 17:02:13 GMT, Naoto Sato wrote: > Removing old charset test cases that verify new charset implementations (as of JDK7). Removed tests/files are actual charset implementations used in pre-JDK7, which have been used for comparing the results. Since those "new" implementations have been used since then, I believe it is OK to retire those old test cases. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24597#issuecomment-2802205991 From naoto at openjdk.org Mon Apr 14 16:12:47 2025 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 14 Apr 2025 16:12:47 GMT Subject: Integrated: 8343157: Examine large files for character encoding/decoding In-Reply-To: References: Message-ID: On Fri, 11 Apr 2025 17:02:13 GMT, Naoto Sato wrote: > Removing old charset test cases that verify new charset implementations (as of JDK7). Removed tests/files are actual charset implementations used in pre-JDK7, which have been used for comparing the results. Since those "new" implementations have been used since then, I believe it is OK to retire those old test cases. This pull request has now been integrated. Changeset: d748bb5c Author: Naoto Sato URL: https://git.openjdk.org/jdk/commit/d748bb5cbb983fb07ae28e3a1c194058b73ef652 Stats: 164679 lines in 55 files changed: 0 ins; 164677 del; 2 mod 8343157: Examine large files for character encoding/decoding Reviewed-by: alanb ------------- PR: https://git.openjdk.org/jdk/pull/24597 From kbarrett at openjdk.org Mon Apr 14 17:36:47 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 14 Apr 2025 17:36:47 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim My comments have been addressed. Let's see if this is sufficient to clear my "request changes" state. ------------- PR Review: https://git.openjdk.org/jdk/pull/24574#pullrequestreview-2765099003 From serb at openjdk.org Tue Apr 15 23:23:46 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Tue, 15 Apr 2025 23:23:46 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: <7k8Vqbwnc5gQLdLWy6DMG3ReD0O68knX8T1OH4bdRZ8=.058d8240-f58f-4459-bd1e-e92981d6ae9b@github.com> On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim can we also force this rule by the jcheck? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2807748235 From prr at openjdk.org Wed Apr 16 04:43:42 2025 From: prr at openjdk.org (Phil Race) Date: Wed, 16 Apr 2025 04:43:42 GMT Subject: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:36:31 GMT, Magnus Ihse Bursie wrote: >> As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some additional places where unicode characters are unnecessarily used instead of pure ASCII. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove incorrectly copied "?anchor" src/java.xml/share/legal/xhtml11.md line 50: > 48: or derived from [title and URI of the W3C document]." > 49: > 50: Disclaimers ?anchor Did that come from an upstream file ? test/jdk/java/awt/geom/Path2D/GetBounds2DPrecisionTest.java line 193: > 191: if (str.length() >= DIGIT_COUNT) { > 192: str = str.substring(0,DIGIT_COUNT-1)+"..."; > 193: } How did you test this ? Please say more than tiers 1-3 .. because this test isn't run until tier4. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24567#discussion_r2046043831 PR Review Comment: https://git.openjdk.org/jdk/pull/24567#discussion_r2046047435 From mdoerr at openjdk.org Wed Apr 16 07:56:47 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 16 Apr 2025 07:56:47 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: <_OtXyj0LCymmSCQhXmO-Ak_z5ZEYd5-tvqPp16TmXos=.8da4aecf-3538-4303-9b5a-2a59811642e0@github.com> On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim We get the following problem on AIX: checking for locale to use... no UTF-8 locale found configure: error: No UTF-8 locale found. This is required for building successfully. configure exiting with result code 1 @varada1110, @JoKern65: Can you take a look, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2808717775 From ihse at openjdk.org Wed Apr 16 09:50:49 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 16 Apr 2025 09:50:49 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: <7k8Vqbwnc5gQLdLWy6DMG3ReD0O68knX8T1OH4bdRZ8=.058d8240-f58f-4459-bd1e-e92981d6ae9b@github.com> References: <7k8Vqbwnc5gQLdLWy6DMG3ReD0O68knX8T1OH4bdRZ8=.058d8240-f58f-4459-bd1e-e92981d6ae9b@github.com> Message-ID: On Tue, 15 Apr 2025 23:20:45 GMT, Sergey Bylokhov wrote: > can we also force this rule by the jcheck? Well, yes and no. First, we can verify that we do not have invalid UTF-8. That might be a signal that the encoding is wrong. But then this check needs to be able to distinguish between pure binary files that happen to look like improperly encoded UTF-8 files, and actually incorrectly encoded text files. In the end, this is likely to be more of an heuristic for a warning, rather than something we can block integration on. Secondly, files can have incorrect encodings but still pass as valid UTF-8. Only a human can tell that the content would be incorrect if we were to assume the encoding is UTF-8 instead of e.g. latin-1. This cannot be checked by jcheck, but must be caught by reviewers. I have beeb thinking, though, to add a warning to jcheck for adding non-ASCII characters to known text file types. As a general rule, this is acceptable but should only be done judiciously, so it would be good to have jcheck point it out. That would also give you an extra chance to verify the encoding. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2809028487 From ihse at openjdk.org Wed Apr 16 09:55:40 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 16 Apr 2025 09:55:40 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: <_OtXyj0LCymmSCQhXmO-Ak_z5ZEYd5-tvqPp16TmXos=.8da4aecf-3538-4303-9b5a-2a59811642e0@github.com> References: <_OtXyj0LCymmSCQhXmO-Ak_z5ZEYd5-tvqPp16TmXos=.8da4aecf-3538-4303-9b5a-2a59811642e0@github.com> Message-ID: <6Kyy5kYllWxxLc6k2u-dF9dqmPcEQS74vEJO8rWG-D0=.0adee9b2-334c-473c-b0cc-1cbeb2774df6@github.com> On Wed, 16 Apr 2025 07:54:13 GMT, Martin Doerr wrote: > We get the following problem on AIX: > > ``` > checking for locale to use... no UTF-8 locale found > configure: error: No UTF-8 locale found. This is required for building successfully. > configure exiting with result code 1 > ``` This is (hopefully) more of a configuration issue than an issue with AIX per se. You can run `locale -a` to see all available locales, and see if there is any utf-8 locales at all. It might be that the naming scheme does not match `*.UTF-8`. Otherwise, you'd have to install the `C.UTF-8` or `en_US.UTF-8` locale. If no UTF-8 locales are available at all on AIX, then we might have to add some kind of exception. But beware that you will be building on an unsupported configuration in that case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2809035830 From mdoerr at openjdk.org Wed Apr 16 10:11:46 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 16 Apr 2025 10:11:46 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: <6Kyy5kYllWxxLc6k2u-dF9dqmPcEQS74vEJO8rWG-D0=.0adee9b2-334c-473c-b0cc-1cbeb2774df6@github.com> References: <_OtXyj0LCymmSCQhXmO-Ak_z5ZEYd5-tvqPp16TmXos=.8da4aecf-3538-4303-9b5a-2a59811642e0@github.com> <6Kyy5kYllWxxLc6k2u-dF9dqmPcEQS74vEJO8rWG-D0=.0adee9b2-334c-473c-b0cc-1cbeb2774df6@github.com> Message-ID: On Wed, 16 Apr 2025 09:51:49 GMT, Magnus Ihse Bursie wrote: > `locale -a` C POSIX en_US.8859-15 en_US.IBM-858 en_US.ISO8859-1 en_US I don't know if UTF-8 can be installed. If so, we should also document that as requirement for AIX build machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2809046398 From ihse at openjdk.org Wed Apr 16 10:11:52 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 16 Apr 2025 10:11:52 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim ? It's kind of a wonder that you have been able to build at all so far..! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2809055178 From ihse at openjdk.org Wed Apr 16 10:11:57 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 16 Apr 2025 10:11:57 GMT Subject: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 04:39:22 GMT, Phil Race wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove incorrectly copied "?anchor" > > src/java.xml/share/legal/xhtml11.md line 50: > >> 48: or derived from [title and URI of the W3C document]." >> 49: >> 50: Disclaimers ?anchor > > Did that come from an upstream file ? No, it is copy/pasted from a textual rendering of the html file specified in the URL above. This is what you get if you na?vely select the text in Firefox and press Ctrl-C. The `?anchor` part is not rendered on screen. > test/jdk/java/awt/geom/Path2D/GetBounds2DPrecisionTest.java line 193: > >> 191: if (str.length() >= DIGIT_COUNT) { >> 192: str = str.substring(0,DIGIT_COUNT-1)+"..."; >> 193: } > > How did you test this ? Please say more than tiers 1-3 .. because this test isn't run until tier4. I did not test tier4. Will do so now. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24567#discussion_r2046572753 PR Review Comment: https://git.openjdk.org/jdk/pull/24567#discussion_r2046573122 From mbaesken at openjdk.org Wed Apr 16 10:37:42 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 16 Apr 2025 10:37:42 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim make/autoconf/basic.m4 line 155: > 153: else > 154: AC_MSG_RESULT([no UTF-8 locale found]) > 155: AC_MSG_ERROR([No UTF-8 locale found. This is required for building successfully.]) Seems we run into this 'else' part on AIX checking for locale to use... no UTF-8 locale found configure: error: No UTF-8 locale found. This is required for building successfully. configure exiting with result code 1 maybe it would be nice to display the desired ones C.UTF-8 or en_US.UTF-8 in this message too for more clarity? (have to check if there are other names on AIX) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24574#discussion_r2046642699 From ihse at openjdk.org Wed Apr 16 13:44:52 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 16 Apr 2025 13:44:52 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 10:35:02 GMT, Matthias Baesken wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: >> >> - Also document UTF-8 requirements (solves JDK-8338973) >> - Let configure only accept utf-8 locales >> - Address review comments from Kim > > make/autoconf/basic.m4 line 155: > >> 153: else >> 154: AC_MSG_RESULT([no UTF-8 locale found]) >> 155: AC_MSG_ERROR([No UTF-8 locale found. This is required for building successfully.]) > > Seems we run into this 'else' part on AIX > > > checking for locale to use... no UTF-8 locale found > configure: error: No UTF-8 locale found. This is required for building successfully. > configure exiting with result code 1 > > maybe it would be nice to display the desired ones C.UTF-8 or en_US.UTF-8 in this message too for more clarity? (have to check if there are other names on AIX) If you have a locale named `.UTF-8` as your active locale, that will also be accepted, so it is not limited to C and en_US. But it might be an idea to include it in the error message, yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24574#discussion_r2046971091 From naoto at openjdk.org Wed Apr 16 16:17:49 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 16 Apr 2025 16:17:49 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim We will probably need to make sure things are ok on Windows as well (they are the other confusing environment) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2810074157 From jlu at openjdk.org Wed Apr 16 23:11:15 2025 From: jlu at openjdk.org (Justin Lu) Date: Wed, 16 Apr 2025 23:11:15 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency Message-ID: Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. As the test tracks the ISO 4217 data, manual testing of this change can be done by modifying the cut-over year from 2025 to 2026 for the `CW=ANG;2025-04-01-04-00-00;XCG` entry for both the JDK and test data. Doing so results in behavior such as, 1st test - CW returns the currency ANG. 2nd test - CW returns the currency XCG. ------------- Commit messages: - shorten the ClassFile logic - init Changes: https://git.openjdk.org/jdk/pull/24701/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354344 Stats: 92 lines in 1 file changed: 87 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24701/head:pull/24701 PR: https://git.openjdk.org/jdk/pull/24701 From naoto at openjdk.org Thu Apr 17 00:18:42 2025 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 17 Apr 2025 00:18:42 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 23:06:19 GMT, Justin Lu wrote: > As the test tracks the ISO 4217 data, manual testing of this change can be done by modifying the cut-over year from 2025 to 2026 for the `CW=ANG;2025-04-01-04-00-00;XCG` entry for both the JDK and test data. Would it be possible to use `currency.properties` file to provide a test transition, and use this logic test the validity, so that it is not a manual testing? Still it is not 100% same with the data from `currency.data`, but at least provides the transition logic validity, possibly for both simple and special currencies. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24701#issuecomment-2811294606 From mbaesken at openjdk.org Thu Apr 17 14:05:01 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 17 Apr 2025 14:05:01 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Mon, 14 Apr 2025 12:53:35 GMT, Magnus Ihse Bursie wrote: >> Most of the JDK code base has been transitioned to UTF-8, but not all. This has recently become an acute problem, since our mixing of iso-8859-1 and utf-8 in properties files confused the version of `sed` that is shipped with the new macOS 15.4. >> >> The fix is basically simple, and includes the following steps: >> * Look through the code base for text files containing non-ASCII characters, and convert them to UTF-8, if they are not already >> * Update tooling used in building to recognize the fact that files are now in UTF-8 and treat them accordingly (basically, updating compiler flags, git attributes, etc). > > Magnus Ihse Bursie has updated the pull request incrementally with three additional commits since the last revision: > > - Also document UTF-8 requirements (solves JDK-8338973) > - Let configure only accept utf-8 locales > - Address review comments from Kim Little additional info regarding AIX, when allowing locale C in the basic.m4 too by patching your change , the build works nicely for me, even on a AIX machine without the C.UTF-8 or en_US.UTF-8 locales. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2813037836 From ihse at openjdk.org Thu Apr 17 14:04:59 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 17 Apr 2025 14:04:59 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 16:14:39 GMT, Naoto Sato wrote: > We will probably need to make sure things are ok on Windows as well (they are the other confusing environment) Windows is much more painful to work with, since there is no correspondence of `LC_ALL`; you must set the user's locale. There is a rather long paragraph detailing the requirements for building without problems in the build README. Is there some specific problem you are worried about on Windows that you were thinking about? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2813033188 From ihse at openjdk.org Thu Apr 17 14:48:23 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 17 Apr 2025 14:48:23 GMT Subject: RFR: 8354968: Replace unicode sequences in comment text with UTF-8 characters Message-ID: As part of the UTF-8 cleaning up done in [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971), I looked at where and how we are using unicode sequences (`\uXXXX`). In several string literals, I think the unicode sequences still has merit, if they improve clarity or readability of the code. Some instances are more gray zone. But the places where it does not make sense at all are in comments, as part of fluid text comments. There they are just disruptive and not helpful at all. I tried to locate all such places (but I might have missed places, I did not do a proper lexical analysis to find comments) and fix them. 99% of this fix is to turn poor `Peter von der Ah\u00e9` into `Peter von der Ah?`. ? I checked some random samples on when this was introduced to see if there were some particular commit that mistreated the encoding, but they have been there since the original release of the open JDK source code. There are likely many more places where direct UTF-8 encoded characters is preferable to unicode sequences, but this seemed like a safe and trivial first start. ------------- Commit messages: - 8354968: Replace unicode sequences in comment text with UTF-8 characters Changes: https://git.openjdk.org/jdk/pull/24727/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24727&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354968 Stats: 158 lines in 153 files changed: 0 ins; 2 del; 156 mod Patch: https://git.openjdk.org/jdk/pull/24727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24727/head:pull/24727 PR: https://git.openjdk.org/jdk/pull/24727 From naoto at openjdk.org Thu Apr 17 16:29:56 2025 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 17 Apr 2025 16:29:56 GMT Subject: RFR: 8301971: Make JDK source code UTF-8 [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 14:01:07 GMT, Magnus Ihse Bursie wrote: > Is there some specific problem you are worried about on Windows that you were thinking about? I would have tested on non-English (preferrably Chinese/Japanese) Windows where users set it to English. I believe we had issues from contributors in those areas. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2813487827 From jlu at openjdk.org Thu Apr 17 17:05:59 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 17 Apr 2025 17:05:59 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 00:15:47 GMT, Naoto Sato wrote: > > As the test tracks the ISO 4217 data, manual testing of this change can be done by modifying the cut-over year from 2025 to 2026 for the `CW=ANG;2025-04-01-04-00-00;XCG` entry for both the JDK and test data. > > Would it be possible to use `currency.properties` file to provide a test transition, and use this logic test the validity, so that it is not a manual testing? Still it is not 100% same with the data from `currency.data`, but at least provides the transition logic validity, possibly for both simple and special currencies. That's an interesting idea, I think it is possible. As you stated, the logic is different, but it would guarantee that the second run is indeed testing in the future. I'm not sure how combining a currency.properties override with the main ValidateISO4217 test will turn out though since the test should track 1 to 1 with the ISO data. Will look into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24701#issuecomment-2813567220 From jlu at openjdk.org Thu Apr 17 20:45:31 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 17 Apr 2025 20:45:31 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency [v2] In-Reply-To: References: Message-ID: > Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. > > This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. > > The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Naoto's feedback - check patch/mocking is working via currency.properties override ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24701/files - new: https://git.openjdk.org/jdk/pull/24701/files/1565a986..8ab78973 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=00-01 Stats: 34 lines in 2 files changed: 23 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24701/head:pull/24701 PR: https://git.openjdk.org/jdk/pull/24701 From jlu at openjdk.org Thu Apr 17 20:45:31 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 17 Apr 2025 20:45:31 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency In-Reply-To: References: Message-ID: <4OMcc21bk9Axp2f5FeSqmTUgZM6y85r-vmn017Z9_WM=.6878e187-d403-4810-a676-ed9006ffaa92@github.com> On Thu, 17 Apr 2025 00:15:47 GMT, Naoto Sato wrote: >> Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. >> >> This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. >> >> The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. > >> As the test tracks the ISO 4217 data, manual testing of this change can be done by modifying the cut-over year from 2025 to 2026 for the `CW=ANG;2025-04-01-04-00-00;XCG` entry for both the JDK and test data. > > Would it be possible to use `currency.properties` file to provide a test transition, and use this logic test the validity, so that it is not a manual testing? Still it is not 100% same with the data from `currency.data`, but at least provides the transition logic validity, possibly for both simple and special currencies. @naotoj Thanks for the idea. The latest [commit](https://github.com/openjdk/jdk/pull/24701/commits/8ab7897315294acd1777cce010bd9b037f66ef3d) supplies a custom entry in the properties file override. In the year 3000, `PK` uses the custom currency `JPZ`. Another invocation, (the second one), is added with the override passed via command line, which checks that `PK` is correctly returning `JPZ`. This indicates we have successfully mocked the time and the module patch has been applied. We do this separately from the third invocation as to not conflict with the golden ISO data. Thus the invocations behave as follows, 1 - Test ISO data with current time + Build module patch with mocked time 2 - Check that the module patch and mocked time function correctly. (NO tests are ran). 3 - Test ISO data with mocked time ------------- PR Comment: https://git.openjdk.org/jdk/pull/24701#issuecomment-2813977463 From jlu at openjdk.org Thu Apr 17 21:35:08 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 17 Apr 2025 21:35:08 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency [v3] In-Reply-To: References: Message-ID: > Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. > > This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. > > The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: simple/special case in check invocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24701/files - new: https://git.openjdk.org/jdk/pull/24701/files/8ab78973..60cda372 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=01-02 Stats: 9 lines in 2 files changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24701/head:pull/24701 PR: https://git.openjdk.org/jdk/pull/24701 From naoto at openjdk.org Thu Apr 17 22:11:42 2025 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 17 Apr 2025 22:11:42 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 21:35:08 GMT, Justin Lu wrote: >> Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. >> >> This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. >> >> The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > simple/special case in check invocation Good to see the future currency testing. test/jdk/java/util/Currency/ValidateISO4217.java line 157: > 155: // "check" invocation only runs the main method (and not any tests) to determine if the > 156: // future time checking is correct > 157: public static void main(String[] args) { It would probably helpful to check if the patched Currency class does exist or not. Same for the `MOCKED.TIME=true` case. test/jdk/java/util/Currency/ValidateISO4217.java line 203: > 201: CodeTransform codeTransform = (codeBuilder, e) -> { > 202: switch (e) { > 203: case InvokeInstruction i when i.name().stringValue().equals("currentTimeMillis") -> `equalsString()` may be used. Regardless, is there a way to tell this call is indeed `System.currentTimeMillis()`? Might do that check in case a method on Currency with the same name is introduced (not likely though). ------------- PR Review: https://git.openjdk.org/jdk/pull/24701#pullrequestreview-2777025817 PR Review Comment: https://git.openjdk.org/jdk/pull/24701#discussion_r2049702515 PR Review Comment: https://git.openjdk.org/jdk/pull/24701#discussion_r2049704312 From jlu at openjdk.org Thu Apr 17 23:06:04 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 17 Apr 2025 23:06:04 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency [v4] In-Reply-To: References: Message-ID: > Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. > > This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. > > The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: Naoto review - check instruction owner. check module patch files exist ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24701/files - new: https://git.openjdk.org/jdk/pull/24701/files/60cda372..c58f11f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24701&range=02-03 Stats: 22 lines in 1 file changed: 18 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24701/head:pull/24701 PR: https://git.openjdk.org/jdk/pull/24701 From jlu at openjdk.org Thu Apr 17 23:06:05 2025 From: jlu at openjdk.org (Justin Lu) Date: Thu, 17 Apr 2025 23:06:05 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency [v3] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 22:05:31 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> simple/special case in check invocation > > test/jdk/java/util/Currency/ValidateISO4217.java line 203: > >> 201: CodeTransform codeTransform = (codeBuilder, e) -> { >> 202: switch (e) { >> 203: case InvokeInstruction i when i.name().stringValue().equals("currentTimeMillis") -> > > `equalsString()` may be used. Regardless, is there a way to tell this call is indeed `System.currentTimeMillis()`? Might do that check in case a method on Currency with the same name is introduced (not likely though). Yes, we can check the owner name is _java/lang/System_. Addressed your other comment as well in https://github.com/openjdk/jdk/pull/24701/commits/c58f11f2affa19267aa2416bcff10f842c13d871. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24701#discussion_r2049739614 From eirbjo at openjdk.org Fri Apr 18 06:38:47 2025 From: eirbjo at openjdk.org (Eirik =?UTF-8?B?QmrDuHJzbsO4cw==?=) Date: Fri, 18 Apr 2025 06:38:47 GMT Subject: RFR: 8354273: Restore even more pointless unicode characters to ASCII [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:36:31 GMT, Magnus Ihse Bursie wrote: >> As a follow-up to [JDK-8354213](https://bugs.openjdk.org/browse/JDK-8354213), I found some additional places where unicode characters are unnecessarily used instead of pure ASCII. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove incorrectly copied "?anchor" While the changes here look okay, I think the issue/PR title could be improved. The replacement of Unicode "En Dash" with ASCII hypen-minus and the similar relacement of the Unicode "Horizontal Ellipsis" with three ASCII periods are not really "restoring" much, and these unicode characters are hardly "pointless" as they may carry different semantic meaning, behavior and rendering. It's a valid chioce to normalize them into ASCII though, but perhaps a title like "Normalize even more Unicode characters as ASCII" would be more "fair" to these poor Unicode characters :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24567#issuecomment-2814667275 From naoto at openjdk.org Fri Apr 18 17:14:52 2025 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 18 Apr 2025 17:14:52 GMT Subject: RFR: 8354344: Test behavior after cut-over for future ISO 4217 currency [v4] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 23:06:04 GMT, Justin Lu wrote: >> Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. >> >> This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. >> >> The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > Naoto review - check instruction owner. check module patch files exist LGTM. Thanks for providing the test! ------------- Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24701#pullrequestreview-2779037686 From naoto at openjdk.org Mon Apr 21 20:16:16 2025 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 21 Apr 2025 20:16:16 GMT Subject: RFR: 8355215: Add @spec tags to Emoji related methods Message-ID: Adding @spec tags to Emoji related methods in the Character class. A CSR has also been drafted. ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24779&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355215 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24779/head:pull/24779 PR: https://git.openjdk.org/jdk/pull/24779 From jlu at openjdk.org Mon Apr 21 20:26:54 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 21 Apr 2025 20:26:54 GMT Subject: Integrated: 8354344: Test behavior after cut-over for future ISO 4217 currency In-Reply-To: References: Message-ID: On Wed, 16 Apr 2025 23:06:19 GMT, Justin Lu wrote: > Please review this PR which improves the _ValidateISO4217_ Currency test by adding testing of future currencies after the transition date. > > This is done by creating a patched version of Currency that replaces `System.currentTimeMillis()` calls with a mocked value equivalent to `Long.MAX_VALUE`. A module patch is then applied to supply the new Currency class files. > > The mocked time behavior is tested by using the `currency.properties` override in a separate invocation. This pull request has now been integrated. Changeset: 1526dd81 Author: Justin Lu URL: https://git.openjdk.org/jdk/commit/1526dd81d9b5bf4abaac1546c370cf7a056d01dc Stats: 133 lines in 2 files changed: 128 ins; 1 del; 4 mod 8354344: Test behavior after cut-over for future ISO 4217 currency Reviewed-by: naoto ------------- PR: https://git.openjdk.org/jdk/pull/24701 From joehw at openjdk.org Mon Apr 21 21:03:54 2025 From: joehw at openjdk.org (Joe Wang) Date: Mon, 21 Apr 2025 21:03:54 GMT Subject: RFR: 8355215: Add @spec tags to Emoji related methods In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:11:47 GMT, Naoto Sato wrote: > Adding @spec tags to Emoji related methods in the Character class. A CSR has also been drafted. Marked as reviewed by joehw (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24779#pullrequestreview-2782236067 From iris at openjdk.org Mon Apr 21 21:03:55 2025 From: iris at openjdk.org (Iris Clark) Date: Mon, 21 Apr 2025 21:03:55 GMT Subject: RFR: 8355215: Add @spec tags to Emoji related methods In-Reply-To: References: Message-ID: <0_cJxrcWa57BtkpHk4OQF7TRZ1M-WwCGgPG9fYQbrKo=.8a3b5b68-d014-4640-b5ab-4c2fed25cee4@github.com> On Mon, 21 Apr 2025 20:11:47 GMT, Naoto Sato wrote: > Adding @spec tags to Emoji related methods in the Character class. A CSR has also been drafted. CSR also Reviewed. ------------- Marked as reviewed by iris (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24779#pullrequestreview-2782238754 From jlu at openjdk.org Mon Apr 21 21:53:42 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 21 Apr 2025 21:53:42 GMT Subject: RFR: 8355215: Add @spec tags to Emoji related methods In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:11:47 GMT, Naoto Sato wrote: > Adding @spec tags to Emoji related methods in the Character class. A CSR has also been drafted. Marked as reviewed by jlu (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24779#pullrequestreview-2782310737 From jlu at openjdk.org Mon Apr 21 21:56:59 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 21 Apr 2025 21:56:59 GMT Subject: RFR: 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency Message-ID: Please review this PR which improves future currency checking for ISO 4217 currencies. Checking for a currency that should not yet exist in the set of available currencies is already done. It should also be explicitly checked that such a currency can not be instantiated as well via the String getter. ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/24782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24782&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354343 Stats: 35 lines in 1 file changed: 34 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24782/head:pull/24782 PR: https://git.openjdk.org/jdk/pull/24782 From naoto at openjdk.org Mon Apr 21 22:34:51 2025 From: naoto at openjdk.org (Naoto Sato) Date: Mon, 21 Apr 2025 22:34:51 GMT Subject: RFR: 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency In-Reply-To: References: Message-ID: <-7U0IsIhO6ZZR8u31uYAOOJDCyd7fOdhWQ0DD46btQE=.e795cf83-6ef6-4f9f-8517-73c30d15fdbe@github.com> On Mon, 21 Apr 2025 21:51:35 GMT, Justin Lu wrote: > Please review this PR which improves future currency checking for ISO 4217 currencies. > > Checking for a currency that should not yet exist in the set of available currencies is already done. > It should also be explicitly checked that such a currency can not be instantiated as well via the String getter. LGTM. I think this JIRA issue and the previous test improvement one can be linke to [JDK-8321480](https://bugs.openjdk.org/browse/JDK-8321480), and both have `iso4217` lables. test/jdk/java/util/Currency/ValidateISO4217.java line 183: > 181: setUpPatchedClasses(); > 182: setUpTestingData(); > 183: setUpNotYetDefined(); It may be clearer to move this inside `setUpTestingData()`, and modify the comment there ------------- PR Review: https://git.openjdk.org/jdk/pull/24782#pullrequestreview-2782347271 PR Review Comment: https://git.openjdk.org/jdk/pull/24782#discussion_r2053062123 From jlu at openjdk.org Mon Apr 21 23:15:56 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 21 Apr 2025 23:15:56 GMT Subject: RFR: 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency [v2] In-Reply-To: References: Message-ID: > Please review this PR which improves future currency checking for ISO 4217 currencies. > > Checking for a currency that should not yet exist in the set of available currencies is already done. > It should also be explicitly checked that such a currency can not be instantiated as well via the String getter. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: move future currencies set up under setUpTestingData() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24782/files - new: https://git.openjdk.org/jdk/pull/24782/files/7424e964..6c4cbe94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24782&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24782&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24782/head:pull/24782 PR: https://git.openjdk.org/jdk/pull/24782 From jlu at openjdk.org Mon Apr 21 23:15:56 2025 From: jlu at openjdk.org (Justin Lu) Date: Mon, 21 Apr 2025 23:15:56 GMT Subject: RFR: 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency [v2] In-Reply-To: <-7U0IsIhO6ZZR8u31uYAOOJDCyd7fOdhWQ0DD46btQE=.e795cf83-6ef6-4f9f-8517-73c30d15fdbe@github.com> References: <-7U0IsIhO6ZZR8u31uYAOOJDCyd7fOdhWQ0DD46btQE=.e795cf83-6ef6-4f9f-8517-73c30d15fdbe@github.com> Message-ID: On Mon, 21 Apr 2025 22:25:14 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> move future currencies set up under setUpTestingData() > > test/jdk/java/util/Currency/ValidateISO4217.java line 183: > >> 181: setUpPatchedClasses(); >> 182: setUpTestingData(); >> 183: setUpNotYetDefined(); > > It may be clearer to move this inside `setUpTestingData()`, and modify the comment there Right, makes more sense that way. Also updated the JBS issues as you requested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24782#discussion_r2053093655 From duke at openjdk.org Tue Apr 22 07:49:59 2025 From: duke at openjdk.org (duke) Date: Tue, 22 Apr 2025 07:49:59 GMT Subject: RFR: 8350442: Update copyright In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 17:01:36 GMT, Ivan ?ipka wrote: > @naotoj @coffeys @mahendrachhipa > > update of copyright years. Details on ticket. @frkator Your change (at version 9b4552502567906e90a4314165142206f01e9e16) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23714#issuecomment-2820462241 From isipka at openjdk.org Tue Apr 22 08:49:54 2025 From: isipka at openjdk.org (Ivan =?UTF-8?B?xaBpcGth?=) Date: Tue, 22 Apr 2025 08:49:54 GMT Subject: Integrated: 8350442: Update copyright In-Reply-To: References: Message-ID: <3kTnDi-9XueY4y8zSXHInEADNvuiodiH1CdSP0ovlQM=.9dc424f6-33e9-4db4-b212-3a90502b8ede@github.com> On Thu, 20 Feb 2025 17:01:36 GMT, Ivan ?ipka wrote: > @naotoj @coffeys @mahendrachhipa > > update of copyright years. Details on ticket. This pull request has now been integrated. Changeset: 7cd084cf Author: Ivan ?ipka Committer: Mahendra Chhipa URL: https://git.openjdk.org/jdk/commit/7cd084cf350f66fd6ed5b6f5ba9fda71072963fa Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8350442: Update copyright Reviewed-by: naoto, jlu ------------- PR: https://git.openjdk.org/jdk/pull/23714 From naoto at openjdk.org Tue Apr 22 16:49:50 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 22 Apr 2025 16:49:50 GMT Subject: RFR: 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency [v2] In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 23:15:56 GMT, Justin Lu wrote: >> Please review this PR which improves future currency checking for ISO 4217 currencies. >> >> Checking for a currency that should not yet exist in the set of available currencies is already done. >> It should also be explicitly checked that such a currency can not be instantiated as well via the String getter. > > Justin Lu has updated the pull request incrementally with one additional commit since the last revision: > > move future currencies set up under setUpTestingData() Marked as reviewed by naoto (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24782#pullrequestreview-2784722422 From jwaters at openjdk.org Wed Apr 23 06:18:50 2025 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 23 Apr 2025 06:18:50 GMT Subject: RFR: 8342868: Errors related to unused code on Windows after 8339120 in core libs [v2] In-Reply-To: References: Message-ID: <3LBsjxpWNEAajok5P7-DTOKRKkUmGmyjudWTWlshZ64=.3780c643-dd12-489f-a237-cc3b32b642e0@github.com> On Thu, 31 Oct 2024 05:43:11 GMT, Julian Waters wrote: >> After 8339120, gcc began catching many different instances of unused code in the Windows specific codebase. Some of these seem to be bugs. I've taken the effort to mark out all the relevant globals and locals that trigger the unused warnings and addressed all of them by commenting out the code as appropriate. I am confident that in many cases this simplistic approach of commenting out code does not fix the underlying issue, and the warning actually found a bug that should be fixed. In these instances, I will be aiming to fix these bugs with help from reviewers, so I recommend anyone reviewing who knows more about the code than I do to see whether there is indeed a bug that needs fixing in a different way than what I did > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Remove the got local Stay open ------------- PR Comment: https://git.openjdk.org/jdk/pull/21654#issuecomment-2823181043 From naoto at openjdk.org Wed Apr 23 16:10:57 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 23 Apr 2025 16:10:57 GMT Subject: RFR: 8355215: Add @spec tags to Emoji related methods In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:11:47 GMT, Naoto Sato wrote: > Adding @spec tags to Emoji related methods in the Character class. A CSR has also been drafted. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24779#issuecomment-2824812290 From naoto at openjdk.org Wed Apr 23 16:10:58 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 23 Apr 2025 16:10:58 GMT Subject: Integrated: 8355215: Add @spec tags to Emoji related methods In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 20:11:47 GMT, Naoto Sato wrote: > Adding @spec tags to Emoji related methods in the Character class. A CSR has also been drafted. This pull request has now been integrated. Changeset: f097aa90 Author: Naoto Sato URL: https://git.openjdk.org/jdk/commit/f097aa90c91826ba6c3c7380a84b8e98f1d42bbb Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8355215: Add @spec tags to Emoji related methods Reviewed-by: joehw, iris, jlu ------------- PR: https://git.openjdk.org/jdk/pull/24779 From jlu at openjdk.org Wed Apr 23 16:55:59 2025 From: jlu at openjdk.org (Justin Lu) Date: Wed, 23 Apr 2025 16:55:59 GMT Subject: Integrated: 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency In-Reply-To: References: Message-ID: On Mon, 21 Apr 2025 21:51:35 GMT, Justin Lu wrote: > Please review this PR which improves future currency checking for ISO 4217 currencies. > > Checking for a currency that should not yet exist in the set of available currencies is already done. > It should also be explicitly checked that such a currency can not be instantiated as well via the String getter. This pull request has now been integrated. Changeset: ac41bc31 Author: Justin Lu URL: https://git.openjdk.org/jdk/commit/ac41bc31c96951b9fe51c22d16f31bdc1806a881 Stats: 36 lines in 1 file changed: 34 ins; 0 del; 2 mod 8354343: Hardening of Currency tests for not yet defined future ISO 4217 currency Reviewed-by: naoto ------------- PR: https://git.openjdk.org/jdk/pull/24782 From duke at openjdk.org Fri Apr 25 10:03:33 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Fri, 25 Apr 2025 10:03:33 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files Message-ID: MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. ------------- Commit messages: - Update MET timezone in TimeZoneNames files Changes: https://git.openjdk.org/jdk/pull/24871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342886 Stats: 112 lines in 12 files changed: 0 ins; 90 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24871/head:pull/24871 PR: https://git.openjdk.org/jdk/pull/24871 From naoto at openjdk.org Fri Apr 25 18:16:04 2025 From: naoto at openjdk.org (Naoto Sato) Date: Fri, 25 Apr 2025 18:16:04 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files In-Reply-To: References: Message-ID: <8KXAJ9-KCZaXabjNJzt_txzEwml8NsIGSrR87FwG0eU=.a23343f0-cb19-4b40-b3bf-8d2390206fb3@github.com> On Fri, 25 Apr 2025 09:57:38 GMT, Gautham Krishnan wrote: > MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. > > Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. Thanks for the contribution. >Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. I think this is still worth testing, with comments adjusted (not every euro country, but sampled ones do have the same short names) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24871#issuecomment-2831098820 From rriggs at openjdk.org Mon Apr 28 21:38:52 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 28 Apr 2025 21:38:52 GMT Subject: Integrated: 8354335: No longer deprecate wrapper class constructors for removal In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 22:05:04 GMT, Roger Riggs wrote: > Remove forRemoval = true from @Deprecated annotation of Boolean, Byte, Character, Double, Float, Integer, Long, Short. > And add `SuppressWarnings("deprecation") `where needed; and remove `SuppressWarnings("removal")` This pull request has now been integrated. Changeset: 1fd136cd Author: Roger Riggs URL: https://git.openjdk.org/jdk/commit/1fd136cd6b863ebee70e42b2966584218d0919ec Stats: 22 lines in 9 files changed: 0 ins; 0 del; 22 mod 8354335: No longer deprecate wrapper class constructors for removal Reviewed-by: liach, bchristi, iris ------------- PR: https://git.openjdk.org/jdk/pull/24586 From duke at openjdk.org Tue Apr 29 16:15:05 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 16:15:05 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v2] In-Reply-To: References: Message-ID: > MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. > > Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: Bringing back Bug4848242.java Reverting the change to delete Bug4848242.java as it is still worth testing, with comments adjusted. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24871/files - new: https://git.openjdk.org/jdk/pull/24871/files/4087d825..93ecce79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=00-01 Stats: 71 lines in 1 file changed: 71 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24871/head:pull/24871 PR: https://git.openjdk.org/jdk/pull/24871 From duke at openjdk.org Tue Apr 29 16:17:46 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 16:17:46 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files In-Reply-To: <8KXAJ9-KCZaXabjNJzt_txzEwml8NsIGSrR87FwG0eU=.a23343f0-cb19-4b40-b3bf-8d2390206fb3@github.com> References: <8KXAJ9-KCZaXabjNJzt_txzEwml8NsIGSrR87FwG0eU=.a23343f0-cb19-4b40-b3bf-8d2390206fb3@github.com> Message-ID: On Fri, 25 Apr 2025 18:12:32 GMT, Naoto Sato wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Thanks for the contribution. >>Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > I think this is still worth testing, with comments adjusted (not every euro country, but sampled ones do have the same short names) @naotoj Thanks for reviewing. Agreed that the Bug4848242.java is still worth testing, with comments adjusted. Updated the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24871#issuecomment-2839487894 From naoto at openjdk.org Tue Apr 29 16:32:47 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 29 Apr 2025 16:32:47 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v2] In-Reply-To: References: Message-ID: <1x8msyRwOle9z9emtfJPUXgF9DqDHIuaw0Ux-m50zmU=.6a4527e1-28ae-4fd4-81a6-b96580fb04b9@github.com> On Tue, 29 Apr 2025 16:15:05 GMT, Gautham Krishnan wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Bringing back Bug4848242.java > > Reverting the change to delete Bug4848242.java as it is still worth testing, with comments adjusted. Thanks for bringing back the test case. Please add the bug id [8342886](https://bugs.openjdk.org/browse/JDK-8342886) in the test header. test/jdk/sun/util/resources/TimeZone/Bug4848242.java line 31: > 29: * but due to changes in time zone data and locale handling, that is no longer guaranteed. > 30: * This test now verifies that a representative sample of locales (e.g., DE, FR, IT) > 31: * still use the same short names (CET/CEST). I think German is not in the samples here (it will not return "CET"/"CEST"). Also please use lowercase for the language, eg, fr/it, uppwercased letters are usually for regions. ------------- PR Review: https://git.openjdk.org/jdk/pull/24871#pullrequestreview-2804305540 PR Review Comment: https://git.openjdk.org/jdk/pull/24871#discussion_r2066932598 From duke at openjdk.org Tue Apr 29 17:06:30 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 17:06:30 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v3] In-Reply-To: References: Message-ID: <5JFTqSSyncgkeXZHFDxAcL15wOzfX6B_ygn-8ioAtXc=.e3cb7222-bf83-435a-b34c-7060c74c15ac@github.com> > MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. > > Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: Updating comment in Bug4848242.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24871/files - new: https://git.openjdk.org/jdk/pull/24871/files/93ecce79..56bc5cdd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24871/head:pull/24871 PR: https://git.openjdk.org/jdk/pull/24871 From duke at openjdk.org Tue Apr 29 17:06:30 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 17:06:30 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files In-Reply-To: <8KXAJ9-KCZaXabjNJzt_txzEwml8NsIGSrR87FwG0eU=.a23343f0-cb19-4b40-b3bf-8d2390206fb3@github.com> References: <8KXAJ9-KCZaXabjNJzt_txzEwml8NsIGSrR87FwG0eU=.a23343f0-cb19-4b40-b3bf-8d2390206fb3@github.com> Message-ID: On Fri, 25 Apr 2025 18:12:32 GMT, Naoto Sato wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Thanks for the contribution. >>Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > I think this is still worth testing, with comments adjusted (not every euro country, but sampled ones do have the same short names) @naotoj updated. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/24871#issuecomment-2839606853 From naoto at openjdk.org Tue Apr 29 17:39:46 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 29 Apr 2025 17:39:46 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v3] In-Reply-To: <5JFTqSSyncgkeXZHFDxAcL15wOzfX6B_ygn-8ioAtXc=.e3cb7222-bf83-435a-b34c-7060c74c15ac@github.com> References: <5JFTqSSyncgkeXZHFDxAcL15wOzfX6B_ygn-8ioAtXc=.e3cb7222-bf83-435a-b34c-7060c74c15ac@github.com> Message-ID: On Tue, 29 Apr 2025 17:06:30 GMT, Gautham Krishnan wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Updating comment in Bug4848242.java You'll need to append "8342886" to the `@bug` tag ------------- PR Comment: https://git.openjdk.org/jdk/pull/24871#issuecomment-2839695517 From duke at openjdk.org Tue Apr 29 17:50:05 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 17:50:05 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v4] In-Reply-To: References: Message-ID: > MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. > > Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: Updating @bug tag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24871/files - new: https://git.openjdk.org/jdk/pull/24871/files/56bc5cdd..2c008179 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24871/head:pull/24871 PR: https://git.openjdk.org/jdk/pull/24871 From duke at openjdk.org Tue Apr 29 17:50:05 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 17:50:05 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v3] In-Reply-To: References: <5JFTqSSyncgkeXZHFDxAcL15wOzfX6B_ygn-8ioAtXc=.e3cb7222-bf83-435a-b34c-7060c74c15ac@github.com> Message-ID: On Tue, 29 Apr 2025 17:36:26 GMT, Naoto Sato wrote: >> Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating comment in Bug4848242.java > > You'll need to append "8342886" to the `@bug` tag @naotoj updated. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/24871#issuecomment-2839718245 From naoto at openjdk.org Tue Apr 29 18:03:47 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 29 Apr 2025 18:03:47 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v4] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 17:50:05 GMT, Gautham Krishnan wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Updating @bug tag test/jdk/sun/util/resources/TimeZone/Bug4848242.java line 26: > 24: /* > 25: *@test > 26: *@bug 8342886 The bug id needs to be appended Suggestion: *@bug 4848242 8342886 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24871#discussion_r2067070677 From duke at openjdk.org Tue Apr 29 18:11:05 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Tue, 29 Apr 2025 18:11:05 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v5] In-Reply-To: References: Message-ID: > MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. > > Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/sun/util/resources/TimeZone/Bug4848242.java Co-authored-by: Naoto Sato ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24871/files - new: https://git.openjdk.org/jdk/pull/24871/files/2c008179..30e2c54a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24871&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24871/head:pull/24871 PR: https://git.openjdk.org/jdk/pull/24871 From naoto at openjdk.org Tue Apr 29 19:36:47 2025 From: naoto at openjdk.org (Naoto Sato) Date: Tue, 29 Apr 2025 19:36:47 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v5] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 18:11:05 GMT, Gautham Krishnan wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/sun/util/resources/TimeZone/Bug4848242.java > > Co-authored-by: Naoto Sato LGTM. Confirmed T1-T3 tests succeeded in our CI system ------------- Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24871#pullrequestreview-2804771645 From duke at openjdk.org Wed Apr 30 04:04:46 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 04:04:46 GMT Subject: RFR: 8342886: Update MET timezone in TimeZoneNames files [v5] In-Reply-To: References: Message-ID: <2CpRYM735S1VwXyczwCGJC1wjg4-rYe9uROjqvx4Z3E=.8a12ea9f-c9fc-4121-b3ef-d7266cfb6f6c@github.com> On Tue, 29 Apr 2025 18:11:05 GMT, Gautham Krishnan wrote: >> MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. >> >> Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/sun/util/resources/TimeZone/Bug4848242.java > > Co-authored-by: Naoto Sato @gauthamkrishnanibm Your change (at version 30e2c54a1aa1a41a0dc7cdc555857eed4def8845) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24871#issuecomment-2840742566 From vyazici at openjdk.org Wed Apr 30 06:51:24 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Wed, 30 Apr 2025 06:51:24 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time Message-ID: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. ------------- Commit messages: - Use `Long::hashCode` Changes: https://git.openjdk.org/jdk/pull/24959/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24959&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355391 Stats: 12 lines in 5 files changed: 0 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24959/head:pull/24959 PR: https://git.openjdk.org/jdk/pull/24959 From rriggs at openjdk.org Wed Apr 30 15:03:47 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 30 Apr 2025 15:03:47 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Message-ID: <6pFvkJpLIQVrGXgELkrnhDs3YgJ2-ufY6RaCA8kklkQ=.b565fc4c-994f-4329-994c-9ea11855e1bc@github.com> On Wed, 30 Apr 2025 06:46:07 GMT, Volkan Yazici wrote: > Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. lgtm ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24959#pullrequestreview-2807397426 From pminborg at openjdk.org Wed Apr 30 15:12:44 2025 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 30 Apr 2025 15:12:44 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Message-ID: On Wed, 30 Apr 2025 06:46:07 GMT, Volkan Yazici wrote: > Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. Looks fine! Thanks for this cleanup. ------------- Marked as reviewed by pminborg (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24959#pullrequestreview-2807429588 From naoto at openjdk.org Wed Apr 30 16:06:49 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 30 Apr 2025 16:06:49 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Message-ID: On Wed, 30 Apr 2025 06:46:07 GMT, Volkan Yazici wrote: > Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. LGTM. Thanks for the refactoring ------------- Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24959#pullrequestreview-2807607225 From duke at openjdk.org Wed Apr 30 16:09:06 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Wed, 30 Apr 2025 16:09:06 GMT Subject: Integrated: 8342886: Update MET timezone in TimeZoneNames files In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 09:57:38 GMT, Gautham Krishnan wrote: > MET timezone entry in TimeZoneNames.java and TimeZoneNames_*.java needs to be updated as MET is alias to Europe/Brussels as per 2024b tzdata changes. > > Also Bug4848242.java needs to be removed as the test expects all euro locale time zones should have the same short names. This pull request has now been integrated. Changeset: 66122811 Author: Gautham Krishnan <140151984+gauthamkrishnanibm at users.noreply.github.com> Committer: Naoto Sato URL: https://git.openjdk.org/jdk/commit/66122811aae02caaa0545a7b6dd1fdb06b186f00 Stats: 51 lines in 12 files changed: 3 ins; 22 del; 26 mod 8342886: Update MET timezone in TimeZoneNames files Reviewed-by: naoto ------------- PR: https://git.openjdk.org/jdk/pull/24871 From swen at openjdk.org Wed Apr 30 17:32:45 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 30 Apr 2025 17:32:45 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Message-ID: <2SMe9tRK7bG9W4t3Ab_4HirxijTI5J7A08IL6MUxpI8=.479bf491-69a5-4333-b8eb-ab30927f5a25@github.com> On Wed, 30 Apr 2025 06:46:07 GMT, Volkan Yazici wrote: > Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. There is a place in java.util.Locale::hashCode that can also be changed Current version long bitsWeight = Double.doubleToLongBits(weight); h = 37*h + (int)(bitsWeight ^ (bitsWeight >>> 32)); Can be changed to h = 37*h + Long.hashCode(Double.doubleToLongBits(weight)); ------------- PR Comment: https://git.openjdk.org/jdk/pull/24959#issuecomment-2842775815 From duke at openjdk.org Wed Apr 30 17:49:45 2025 From: duke at openjdk.org (duke) Date: Wed, 30 Apr 2025 17:49:45 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Message-ID: On Wed, 30 Apr 2025 06:46:07 GMT, Volkan Yazici wrote: > Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. @vy Your change (at version 891d9ada7ce6860ea8e1253021f04053cc27090a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24959#issuecomment-2842821265 From vyazici at openjdk.org Wed Apr 30 17:49:45 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Wed, 30 Apr 2025 17:49:45 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <6pFvkJpLIQVrGXgELkrnhDs3YgJ2-ufY6RaCA8kklkQ=.b565fc4c-994f-4329-994c-9ea11855e1bc@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> <6pFvkJpLIQVrGXgELkrnhDs3YgJ2-ufY6RaCA8kklkQ=.b565fc4c-994f-4329-994c-9ea11855e1bc@github.com> Message-ID: On Wed, 30 Apr 2025 15:01:08 GMT, Roger Riggs wrote: >> Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. > > lgtm @RogerRiggs, @minborg, @naotoj, thanks for the reviews. I've attached successful `tier1,2` results to the ticket. I'd appreciate it if one of you would be kind enough to also sponsor the changes. @wenshao, thanks for the tip. I will explore if we can perform similar simplifications in `java.util` too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24959#issuecomment-2842824998 From vyazici at openjdk.org Wed Apr 30 17:55:52 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Wed, 30 Apr 2025 17:55:52 GMT Subject: Integrated: 8355391: Use Long::hashCode in java.time In-Reply-To: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> Message-ID: On Wed, 30 Apr 2025 06:46:07 GMT, Volkan Yazici wrote: > Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. This pull request has now been integrated. Changeset: 18983b63 Author: Volkan Yazici Committer: Naoto Sato URL: https://git.openjdk.org/jdk/commit/18983b635fe3469c1d9060611eee76e0155ba21b Stats: 12 lines in 5 files changed: 0 ins; 5 del; 7 mod 8355391: Use Long::hashCode in java.time Reviewed-by: rriggs, pminborg, naoto ------------- PR: https://git.openjdk.org/jdk/pull/24959 From duke at openjdk.org Wed Apr 30 20:35:19 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Wed, 30 Apr 2025 20:35:19 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' Message-ID: In the following classes, month and day values are stored in fields of type 'int' or 'short'. The range of allowed values is small enough that the type can be 'byte' instead. java.time.YearMonth java.time.MonthDay java.time.LocalDate java.time.chono.HijrahDate Refactoring the type will give the JVM a little more layout flexibility, and will be especially useful when these classes become value classes. ------------- Commit messages: - Changing month and day value data type to byte Changes: https://git.openjdk.org/jdk/pull/24975/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24975&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334742 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24975/head:pull/24975 PR: https://git.openjdk.org/jdk/pull/24975 From duke at openjdk.org Wed Apr 30 20:38:59 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Wed, 30 Apr 2025 20:38:59 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: > In the following classes, month and day values are stored in fields of type 'int' or 'short'. The range of allowed values is small enough that the type can be 'byte' instead. > > java.time.YearMonth > java.time.MonthDay > java.time.LocalDate > java.time.chono.HijrahDate > > Refactoring the type will give the JVM a little more layout flexibility, and will be especially useful when these classes become value classes. Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: Updating copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24975/files - new: https://git.openjdk.org/jdk/pull/24975/files/e198afb2..148c3834 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24975&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24975&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24975/head:pull/24975 PR: https://git.openjdk.org/jdk/pull/24975 From naoto at openjdk.org Wed Apr 30 20:52:45 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 30 Apr 2025 20:52:45 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 20:38:59 GMT, Gautham Krishnan wrote: >> In the following classes, month and day values are stored in fields of type 'int' or 'short'. The range of allowed values is small enough that the type can be 'byte' instead. >> >> java.time.YearMonth >> java.time.MonthDay >> java.time.LocalDate >> java.time.chono.HijrahDate >> >> Refactoring the type will give the JVM a little more layout flexibility, and will be especially useful when these classes become value classes. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright header Since they are serialized fields (except Hijrah ones), I don't think this is doable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24975#issuecomment-2843247511 From duke at openjdk.org Wed Apr 30 20:59:49 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Wed, 30 Apr 2025 20:59:49 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 20:50:34 GMT, Naoto Sato wrote: >> Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright header > > Since they are serialized fields (except Hijrah ones), I don't think this is doable. @naotoj Thanks for reviewing. You are right. But when I checked these classes use a custom Externalizable encoding (see 'readExternal'/'writeExternal'), which already stores these values as bytes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24975#issuecomment-2843263464 From naoto at openjdk.org Wed Apr 30 21:06:47 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 30 Apr 2025 21:06:47 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 20:57:23 GMT, Gautham Krishnan wrote: > But when I checked these classes use a custom Externalizable encoding (see 'readExternal'/'writeExternal'), which already stores these values as bytes. Sorry, that is correct. Jumped the gun too soon ------------- PR Comment: https://git.openjdk.org/jdk/pull/24975#issuecomment-2843278253 From rriggs at openjdk.org Wed Apr 30 21:14:47 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 30 Apr 2025 21:14:47 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 20:38:59 GMT, Gautham Krishnan wrote: >> In the following classes, month and day values are stored in fields of type 'int' or 'short'. The range of allowed values is small enough that the type can be 'byte' instead. >> >> java.time.YearMonth >> java.time.MonthDay >> java.time.LocalDate >> java.time.chono.HijrahDate >> >> Refactoring the type will give the JVM a little more layout flexibility, and will be especially useful when these classes become value classes. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright header Its recommended to set the assignee of a issue before starting working and to mark it in progress. @gauthamkrishnanibm Please make the necessary changes to the issue to let people know you are working on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24975#issuecomment-2843294271 From duke at openjdk.org Wed Apr 30 21:18:47 2025 From: duke at openjdk.org (Gautham Krishnan) Date: Wed, 30 Apr 2025 21:18:47 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 21:11:51 GMT, Roger Riggs wrote: >> Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright header > > Its recommended to set the assignee of a issue before starting working and to mark it in progress. > @gauthamkrishnanibm Please make the necessary changes to the issue to let people know you are working on it. @RogerRiggs Sorry. I am not having write access to JBS as I just started contributing. Should I discuss about the change in mailing list ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24975#issuecomment-2843302869 From rriggs at openjdk.org Wed Apr 30 21:35:44 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 30 Apr 2025 21:35:44 GMT Subject: RFR: 8334742: Change java.time month/day field types to 'byte' [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 20:38:59 GMT, Gautham Krishnan wrote: >> In the following classes, month and day values are stored in fields of type 'int' or 'short'. The range of allowed values is small enough that the type can be 'byte' instead. >> >> java.time.YearMonth >> java.time.MonthDay >> java.time.LocalDate >> java.time.chono.HijrahDate >> >> Refactoring the type will give the JVM a little more layout flexibility, and will be especially useful when these classes become value classes. > > Gautham Krishnan has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright header I hope you've found [The OpenJDK Developers' Guide](https://openjdk.org/guide/), its a good read to know what to expect and what is expected of contributors. Feel free to ask on the mail alias about how things work. If you found the issue you would have seen it was already assigned and it would be appropriate to ask. Please continue. As with any modification, its important to identify the tests that validate the change. In this case, the tests are in the repo under `test/jdk/java/time`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24975#issuecomment-2843334222 From swen at openjdk.org Wed Apr 30 23:27:22 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 30 Apr 2025 23:27:22 GMT Subject: RFR: 8355391: Use Double::hashCode in java.util.Locale::hashCode Message-ID: Similar to #24959, java.util.Locale.hashCode can also make the same improvement. Replace manual bitwise operations in hashCode implementations of java.util.Locale with Double::hashCode. ------------- Commit messages: - Update src/java.base/share/classes/java/util/Locale.java - Use Long::hashCode Changes: https://git.openjdk.org/jdk/pull/24971/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24971&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355391 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24971.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24971/head:pull/24971 PR: https://git.openjdk.org/jdk/pull/24971 From liach at openjdk.org Wed Apr 30 23:27:22 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 30 Apr 2025 23:27:22 GMT Subject: RFR: 8355391: Use Double::hashCode in java.util.Locale::hashCode In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 18:01:08 GMT, Shaojin Wen wrote: > Similar to #24959, java.util.Locale.hashCode can also make the same improvement. > > Replace manual bitwise operations in hashCode implementations of java.util.Locale with Double::hashCode. src/java.base/share/classes/java/util/Locale.java line 3500: > 3498: h = 17; > 3499: h = 37*h + range.hashCode(); > 3500: h = 37*h + Long.hashCode(Double.doubleToLongBits(weight)); Suggestion: h = 37*h + Double.hashCode(weight); These are equivalent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24971#discussion_r2069218063 From swen at openjdk.org Wed Apr 30 23:55:51 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 30 Apr 2025 23:55:51 GMT Subject: RFR: 8355391: Use Long::hashCode in java.time In-Reply-To: <6pFvkJpLIQVrGXgELkrnhDs3YgJ2-ufY6RaCA8kklkQ=.b565fc4c-994f-4329-994c-9ea11855e1bc@github.com> References: <35NfgP2ueIq0RkDomVTj7bLtM_R-qD7spwbhpIXNFZA=.99fab560-5969-4a75-9212-52626a3e5b62@github.com> <6pFvkJpLIQVrGXgELkrnhDs3YgJ2-ufY6RaCA8kklkQ=.b565fc4c-994f-4329-994c-9ea11855e1bc@github.com> Message-ID: On Wed, 30 Apr 2025 15:01:08 GMT, Roger Riggs wrote: >> Replace manual bitwise operations in `hashCode` implementations of `java.time` with `Long::hashCode`. > > lgtm > @RogerRiggs, @minborg, @naotoj, thanks for the reviews. I've attached successful `tier1,2` results to the ticket. I'd appreciate it if one of you would be kind enough to also sponsor the changes. > > @wenshao, thanks for the tip. I will explore if we can perform similar simplifications in `java.util` too. I submitted PR #24971 to use Double.hashCode to do similar simplification. In addition, sun.nio.ch.FileKey/sun.nio.fs.UnixFileKey/sun.nio.fs.UnixFileStore in java.base can also be simplified similarly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24959#issuecomment-2843751482