[10] RFR 8176841: Additional Unicode Language-Tag Extensions

Naoto Sato naoto.sato at oracle.com
Tue Nov 21 01:45:50 UTC 2017


Hi Stephen, please see my comments below.

On 11/20/17 5:27 AM, Stephen Colebourne wrote:
> I've had a longer think about how to integrate this. Its very tricky,
> as the unicode extensions create complex conflicts.
> 
> In general, my view is that Locale is too complex and overloaded with
> different levels of meaning. Perhaps a different class should have
> been added for more complex localization settings. It seems like an
> are rife for puzzlers/security issues as unicode extensions are going
> to be hugely undertested. (Most developers think of a Locale as
> language+territory, and will only test for that, if they test at all).
> It will be all too easy for text to appear on web pages or reports in
> an unexpected calendar system or time-zone as a result of this change.
> It also means that numbers might be output in an unexpected numbering
> system or currency.
> 
> The tension is between what the developer has thought of and tested
> for, and what a user might pass in. The mechanism for passing in these
> extensions is partly subtle, being via the default locale. Although
> even via an explicit locale, I suspect very few developers will be
> sanitizing a Locale input from a user.
> 
> AFAICT, unicode extensions are not widely used at present. I can only
> find use of "ca" in Chronology.ofLocale() and Calendar.getInstance().
> So this is the key moment for deciding how they are used.
> 
> We are discussing 6 unicode extensions: "ca" (calendar system), "tz"
> (time-zone), "cu" (currency), "nu" (numbering system), "fw" (first day
> of week) and "rg" (region override).
> 
> Chronology, ZoneId, TimeZone and Currency ("ca", "tz", "cu") are not
> really aspects of formatting at all, they are explicit aspects of the
> associated value. eg. a ZonedDateTime explicitly has a chronology of
> IsoChronology and a specific time zone. A Money class (Java will have
> one at some point, probably after value types are added) explicitly
> has a Currency.
> 
> You *really, really, really* don't want a unicode extension locale to
> be changing the format of a Money object - turning $300 to £300 just
> because the user had -u-cu-gbp in their locale. The same applies to
> the formatting of dates and times - the chronology and time-zone are
> part of the value being formatted, not the format.
> 
> This just leaves "nu" (numbering system) where there is a case for the
> unicode extension to be picked up directly. But doing so would cause a
> LocalDate to be output as ????-??-?? (where ? is the digit for another
> numbering system), While I think that should be supported, should it
> be picked up automatically via the locale? No, it should be explicitly
> selected by using DateTimeFormatter.withDecimalStyle() - which the
> webrev does do.
> 
> The distinction here is that DateTimeFormatter (and our theoretical
> MoneyFormatter) are formatting the whole state of the value object,
> whereas DateFormatter and NumberFormatter essentially just format a
> single number, using extra information alongside to help.
> 
> So, what to do? Some of these are new changes, some just need testing:
> 
> 1) The changes to `Calendar.getInstance()` are not documented wrt
> "tz". Currently it says it results in the "default time zone", which
> isn't true anymore. I think this may be a behavioural compatibility
> issue.

Explicitly document that it will use the time zone in locale.

> 
> 2) `DecimalStyle.ofLocale(Locale)` should use "nu" but does not.

Document it in the javadoc.

> 
> 3) `DateTimeFormatter.localizedBy(Locale)` should use "ca" to call
> `withChronology`, `tz` to call `withZoneId` and `nu` to call
> `withDecimalStyle`. This is a change to the CSR.

Besides that "nu" needs to be spec'ed out, isn't calling withXXXX() an 
implementation note?

> 
> 4) The phrase "Unicode extensions in the locale are ignored." would
> apply to lots of methods across the JDK. In particular for this
> webrev, it would need to be added to a lot more methods, such as the
> default locale methods. Perhaps the phrase could be "If the locale has
> unicode extensions, they are not used", which is slightly easier to
> read. Or perhaps it shouldn't be added at all in most cases (ie.
> unless unicode extensions are mentioned, assume that they are not
> used.)

Removed the phrase.

> 
> 5) The withLocale(Locale) change should say "The locale is stored as
> passed in, without further processing. If the locale has unicode
> extensions, they may be used later in text processing. To set the
> chronology, time-zone and decimal style from unicode extensions, see
> localizedBy()."

Added.

> 
> 6) The "rg" extension (and no other extensions) should be used when
> looking up data to output these:
> - DateTimeFormatterBuilder.appendText()
> - DateTimeFormatterBuilder.appendLocalizedOffset()
> - DateTimeFormatterBuilder.appendZoneText()
> - DateTimeFormatterBuilder.appendChronologyText()
> - DateTimeFormatterBuilder.appendLocalized()
> - DateTimeFormatterBuilder.getLocalizedDateTimePattern()
> This may be the case, but tests should ensure it.

Will need to add tests.

> 
> 7) WeekBasedFieldPrinterParser should use "fw"/"rg", which it already
> does via WeekFields.of(Locale)

Not sure what this means. Where is the file located?

> 
> 8) For clarity of future code, two additional methods would be useful:
> - DateTimeFormatterBuilder.toFormatterIsoRoot()
> - DateTimeFormatterBuilder.toFormatterIso(Locale)
> These would set the chronology of the resulting DTF to be
> IsoChronology.INSTANCE. The "root" variant would use `Locale.ROOT`.

This would be a nice addition.

> 
> 9) Consider more generally what happens if TimeZone.getDefault() does
> not match the "tz" extension of Locale.getDefault().

This should be considered case by case, and document on each occasion 
which would be honored.

> 
> 10) Consider how localizedBy(Locale) operates. Is it the same as
> formatter
>    .withChronology(Chronology.ofLocale(loc))
>    .withZoneId(ZoneId.ofLocale(loc))
>    .withDecimalStyle(DecimalStyle.of(loc))
> or does it only set the chronology if "ca" if found, and only set
> time-zone if "tz" is found and only set decimal style if "nu" is
> found.

IIRC, the localizedBy() is added so that withLocale() would behave as it 
is now. I think localizedBy() should also have the same effect as 
withLocale if the specified locale do not contain any 
calendar/timezone/numberingSystem extensions. Otherwise, say 
localizedBy(Locale.JAPAN) would be no-operation.

As we discussed in the previous email, we don't have ZoneId.ofLocale() yet.

> 
> Perhaps this webrev should be broken into two or more parts, as it is
> very large?
> 
> 
> In summary, going forward it should be recognised that there is a
> separation/difference between formatters of full values
> (Money/Temporal) and old-style formatters of single numbers
> (DateFormat, NumberFormat). Some extensions, "ca", "tz", "cu", are
> part of the value in Money/Temporal, and so can't be overridden by the
> locale. Methods like Chronology.ofLocale or ZoneId.of(Locale) are good
> places to use the extensions, as they are explicit about what happens.

Here is the spec diff from the previous one. Please let me know your 
thought.

http://cr.openjdk.java.net/~naoto/8191349/specdiff.03/overview-summary.html

Naoto

> 
> Stephen
> 
> 
> On 15 November 2017 at 23:31, Naoto Sato <naoto.sato at oracle.com> wrote:
>> Thanks, Lance. Corrected as suggested.
>>
>> http://cr.openjdk.java.net/~naoto/8191349/specdiff.02/overview-summary.html
>> Also, I inserted "@since 10."
>>
>> Naoto
>>
>> On 11/15/17 3:06 PM, Lance Andersen wrote:
>>>
>>> Hi Naoto
>>>
>>> localizedBy, i would suggest changing:
>>>
>>> -   “If the new locale contains “ca”…”  to “if the new locale contains the
>>> “ca”…”
>>> - “Unlike withLocale method” to Unlike the withLocale method”
>>>
>>> Best
>>> Lance
>>>>
>>>> On Nov 15, 2017, at 5:36 PM, Naoto Sato <naoto.sato at oracle.com
>>>> <mailto:naoto.sato at oracle.com>> wrote:
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~naoto/8191349/specdiff.01/overview-summary.html
>>>
>>>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
>>> Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>>
>>>
>>>
>>


More information about the core-libs-dev mailing list