RFR: 8289227: Support for BCP 47 Extension T - Transformed Content
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted. ------------- Commit messages: - IllformedLocaleEx -> LocaleSyntaxEx - SystemProperty tests - Revived returning Optional - Some clean-ups, including making Extension a sealed class. - Bring the specialized methods back - Using Optional - FieldSeparators()/FieldSubtag() -> Fields() - 8289227: Support for BCP 47 Extension T - T-ext Changes: https://git.openjdk.org/jdk/pull/9620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289227 Stats: 776 lines in 23 files changed: 600 ins; 135 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/9620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9620/head:pull/9620 PR: https://git.openjdk.org/jdk/pull/9620
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: Removed unnecessary `contains()` check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9620/files - new: https://git.openjdk.org/jdk/pull/9620/files/1f1526bc..a869efaa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=00-01 Stats: 5 lines in 3 files changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9620/head:pull/9620 PR: https://git.openjdk.org/jdk/pull/9620
On Mon, 25 Jul 2022 16:10:02 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Removed unnecessary `contains()` check
Looks good to me. Some minor comments below. src/java.base/share/classes/java/util/Locale.java line 236:
234: * particular Unicode locale attributes or key/type pairs. 235: * 236: * <h2><a id="t_extension">Transformed Content extension</a></h2>
Both the full name "Transformed Content extension" and the abbreviated forms "T Extension" are used in this document. I see Unicode and the RFC referred to it as "T Extension", " 't' Extension", and "Extension T". If we describe the extension as "T Extension", maybe we can make it clear in the title, e.g.: Transformed Content (T) Extension could thus in the doc thereafter use "T Extension". The existing methods look like used full extension name, but for T Extension, the short form, e.g. getTExtensionFields also matches its javadoc ("Returns the map of fields in the T extension"). Your call. src/java.base/share/classes/java/util/Locale.java line 265:
263: * field separator (one alpha + one digit), followed by one or more subtags of the length 3 to 8, 264: * each delimited by a hyphen. 265: * <p>The transformed content information {@code source} language tag and {@code fields} are returned
"transformed content information" seems not needed as "The {@code source} language tag and {@code fields}" are known from the previous paragraph. src/java.base/share/classes/sun/util/locale/TransformedContentExtension.java line 40:
38: public final class TransformedContentExtension extends Extension { 39: 40: public static final char SINGLETON = 't';
"singleton" as in "The singleton identifier" in the doc, seems to mean the key consists of a single character. But it can be confused with the Singleton pattern. T_EXTENSION_KEY (as methods such as isTransformedContentxxx expect a key) or Locale.TRANSFORMED_CONTENT_EXTENSION might be better. ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Tue, 26 Jul 2022 01:07:29 GMT, Joe Wang <joehw@openjdk.org> wrote:
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Removed unnecessary `contains()` check
src/java.base/share/classes/java/util/Locale.java line 236:
234: * particular Unicode locale attributes or key/type pairs. 235: * 236: * <h2><a id="t_extension">Transformed Content extension</a></h2>
Both the full name "Transformed Content extension" and the abbreviated forms "T Extension" are used in this document. I see Unicode and the RFC referred to it as "T Extension", " 't' Extension", and "Extension T". If we describe the extension as "T Extension", maybe we can make it clear in the title, e.g.: Transformed Content (T) Extension
could thus in the doc thereafter use "T Extension". The existing methods look like used full extension name, but for T Extension, the short form, e.g. getTExtensionFields also matches its javadoc ("Returns the map of fields in the T extension"). Your call.
Thanks for the suggestion, Joe. I made changes to the `Locale` class documentation, mainly to spell out `transformed content`, instead of using `T`. I believe that would be more descriptive. Since the RFC title uses `T`, I added it in the section title.
src/java.base/share/classes/java/util/Locale.java line 265:
263: * field separator (one alpha + one digit), followed by one or more subtags of the length 3 to 8, 264: * each delimited by a hyphen. 265: * <p>The transformed content information {@code source} language tag and {@code fields} are returned
"transformed content information" seems not needed as "The {@code source} language tag and {@code fields}" are known from the previous paragraph.
Modified to make it more sense.
src/java.base/share/classes/sun/util/locale/TransformedContentExtension.java line 40:
38: public final class TransformedContentExtension extends Extension { 39: 40: public static final char SINGLETON = 't';
"singleton" as in "The singleton identifier" in the doc, seems to mean the key consists of a single character. But it can be confused with the Singleton pattern. T_EXTENSION_KEY (as methods such as isTransformedContentxxx expect a key) or Locale.TRANSFORMED_CONTENT_EXTENSION might be better.
This is analogous to the implementation of `UnicodeLocaleExtension`, and I believe this means the singleton as in the pattern. No need to use literal `t` everywhere. ------------- PR: https://git.openjdk.org/jdk/pull/9620
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Modified javadoc of the transformed conent - Merge branch 'master' into JDK-8289227-T-ext - Removed unnecessary `contains()` check - IllformedLocaleEx -> LocaleSyntaxEx - SystemProperty tests - Revived returning Optional - Some clean-ups, including making Extension a sealed class. - Bring the specialized methods back Some documentation fixes - Using Optional - FieldSeparators()/FieldSubtag() -> Fields() - ... and 2 more: https://git.openjdk.org/jdk/compare/69ea6e10...780f712e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9620/files - new: https://git.openjdk.org/jdk/pull/9620/files/a869efaa..780f712e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=01-02 Stats: 6329 lines in 181 files changed: 3452 ins; 2172 del; 705 mod Patch: https://git.openjdk.org/jdk/pull/9620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9620/head:pull/9620 PR: https://git.openjdk.org/jdk/pull/9620
On Tue, 26 Jul 2022 17:17:46 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
- Modified javadoc of the transformed conent - Merge branch 'master' into JDK-8289227-T-ext - Removed unnecessary `contains()` check - IllformedLocaleEx -> LocaleSyntaxEx - SystemProperty tests - Revived returning Optional - Some clean-ups, including making Extension a sealed class. - Bring the specialized methods back Some documentation fixes - Using Optional - FieldSeparators()/FieldSubtag() -> Fields() - ... and 2 more: https://git.openjdk.org/jdk/compare/59bcd12a...780f712e
The basic support for the "t" extension is covered by the existing methods for `Locale.getExtension()` and `Locale.Builder.setExtension`. At least initially, there will be very few developers and applications using transformed content. The transformed content extension syntax is well defined by BCP 47 and applications that need the source language or fields have a well defined format they can parse themselves from the value returned from `getExtension('t')`. I'm concerned about adding API and implementation complexity that may not be used and might accumulate in the JDK without ever being used. The parsing of the source locale and fields is a convenience but not essential to 't' support. I would suggest deferring the APIs for `getTransformedContentFields()` and `getTransformedContentSource()`. ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Tue, 26 Jul 2022 18:41:57 GMT, Roger Riggs <rriggs@openjdk.org> wrote:
I would suggest deferring the APIs for `getTransformedContentFields()` and `getTransformedContentSource()`.
OK, removed those two convenient APIs. ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Tue, 26 Jul 2022 20:52:20 GMT, Naoto Sato <naoto@openjdk.org> wrote:
I would suggest deferring the APIs for `getTransformedContentFields()` and `getTransformedContentSource()`.
OK, removed those two convenient APIs.
I expected much of the complexity of parsing and storing content transformed extension strings would be unnecessary without the APIs. The original string form of the extension should be sufficient for `Locale.getExtension`. It also adds quite a bit of complexity and overhead to getDisplayName to reconstruct the strings. The correct syntax of the string can be checked with a RegEx in the builder. If a simpler implementation is sufficient, there is less code to maintain. ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Tue, 26 Jul 2022 17:17:46 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
- Modified javadoc of the transformed conent - Merge branch 'master' into JDK-8289227-T-ext - Removed unnecessary `contains()` check - IllformedLocaleEx -> LocaleSyntaxEx - SystemProperty tests - Revived returning Optional - Some clean-ups, including making Extension a sealed class. - Bring the specialized methods back Some documentation fixes - Using Optional - FieldSeparators()/FieldSubtag() -> Fields() - ... and 2 more: https://git.openjdk.org/jdk/compare/8de5da37...780f712e
Looks good to me. src/java.base/share/classes/java/util/Locale.java line 265:
263: * field separator (one alpha + one digit), followed by one or more subtags of the length 3 to 8, 264: * each delimited by a hyphen. 265: * <p>The transformed content information; namely {@code source} language tag and {@code fields}
typo ";" (s/;/,) ------------- Marked as reviewed by joehw (Reviewer). PR: https://git.openjdk.org/jdk/pull/9620
On Tue, 26 Jul 2022 19:03:24 GMT, Joe Wang <joehw@openjdk.org> wrote:
Naoto Sato has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
- Modified javadoc of the transformed conent - Merge branch 'master' into JDK-8289227-T-ext - Removed unnecessary `contains()` check - IllformedLocaleEx -> LocaleSyntaxEx - SystemProperty tests - Revived returning Optional - Some clean-ups, including making Extension a sealed class. - Bring the specialized methods back Some documentation fixes - Using Optional - FieldSeparators()/FieldSubtag() -> Fields() - ... and 2 more: https://git.openjdk.org/jdk/compare/b512a057...780f712e
src/java.base/share/classes/java/util/Locale.java line 265:
263: * field separator (one alpha + one digit), followed by one or more subtags of the length 3 to 8, 264: * each delimited by a hyphen. 265: * <p>The transformed content information; namely {@code source} language tag and {@code fields}
typo ";" (s/;/,)
Actually, I did mean a semicolon, but a comma may be better in this case. ------------- PR: https://git.openjdk.org/jdk/pull/9620
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: Removed convenient APIs for now ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9620/files - new: https://git.openjdk.org/jdk/pull/9620/files/780f712e..e24e5dd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=02-03 Stats: 63 lines in 2 files changed: 0 ins; 60 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9620/head:pull/9620 PR: https://git.openjdk.org/jdk/pull/9620
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: Parse invalid fields correctly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9620/files - new: https://git.openjdk.org/jdk/pull/9620/files/e24e5dd8..d3242ba9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=03-04 Stats: 29 lines in 3 files changed: 26 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9620/head:pull/9620 PR: https://git.openjdk.org/jdk/pull/9620
On Fri, 29 Jul 2022 23:27:53 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Parse invalid fields correctly
test/jdk/java/util/Locale/bcp47/TExtensionTests.java line 190:
188: } catch (IllformedLocaleException ile) { 189: // success 190: System.out.println("IllformedLocaleException thrown correctly: " + ile.getMessage());
Could use assertThrows, but ok if you want to keep it this way. ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Sat, 30 Jul 2022 01:39:54 GMT, Joe Wang <joehw@openjdk.org> wrote:
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Parse invalid fields correctly
test/jdk/java/util/Locale/bcp47/TExtensionTests.java line 190:
188: } catch (IllformedLocaleException ile) { 189: // success 190: System.out.println("IllformedLocaleException thrown correctly: " + ile.getMessage());
Could use assertThrows, but ok if you want to keep it this way.
Thanks, Joe. Replaced them with `assertThrows`. ------------- PR: https://git.openjdk.org/jdk/pull/9620
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision: Use `assertThrows` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9620/files - new: https://git.openjdk.org/jdk/pull/9620/files/d3242ba9..9f60232e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9620&range=04-05 Stats: 16 lines in 1 file changed: 1 ins; 11 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9620/head:pull/9620 PR: https://git.openjdk.org/jdk/pull/9620
On Sat, 30 Jul 2022 20:52:49 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Use `assertThrows`
Looks like it is not much of an additional benefit without APIs. Decided not to implement it at this time. ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Sat, 30 Jul 2022 20:52:49 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Use `assertThrows`
lgtm hi! by the way if you want to see some more `-t-` names in the wild, I recently merged https://github.com/unicode-org/cldr/pull/1755 test/jdk/java/util/Locale/bcp47/TExtensionTests.java line 80:
78: return new Object[][] { 79: {L1, Locale.US, 80: "Cyrillic (Transform: Latin, Transform Rules: UN GEGN Transliteration 2007)"},
💯 I like these display names! ------------- Marked as reviewed by srl295@github.com (no known OpenJDK username). PR: https://git.openjdk.org/jdk/pull/9620
On Mon, 1 Aug 2022 21:27:47 GMT, Steven R. Loomis <duke@openjdk.org> wrote:
Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
Use `assertThrows`
test/jdk/java/util/Locale/bcp47/TExtensionTests.java line 80:
78: return new Object[][] { 79: {L1, Locale.US, 80: "Cyrillic (Transform: Latin, Transform Rules: UN GEGN Transliteration 2007)"},
💯 I like these display names!
Note: another class of use for `-t-` is for keyboards, so for example `ar-t-k0-windows-azerty` in https://github.com/unicode-org/cldr/blob/release-41/keyboards/windows/ar-t-k... ------------- PR: https://git.openjdk.org/jdk/pull/9620
On Fri, 22 Jul 2022 21:53:35 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This PR is to propose supporting the `T` extension to the BCP 47 to which `java.util.Locale` class conforms. There are two extensions to the BCP 47, one is `Unicode Locale Extension` which has been supported since JDK7, the other is this `Transformed Content` extension. A CSR has also been drafted.
This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9620
participants (4)
-
Joe Wang
-
Naoto Sato
-
Roger Riggs
-
Steven R. Loomis