<i18n dev> [8] Review request for JEP 127: Improve Locale Data Packaging and Adopt Unicode CLDR Data

Tue Aug 14 21:17:47 PDT 2012

> CLDR is written in XML. If its spec is well defined and stable, what's
the problem (risk) to write an XML parser to convert the XML files to
another format?

The issue is not the definition or stability. We have found that if
implementations are not aware of the aliasing and parent relations (as
described in the spec), that they often misinterpret the data. That is, the
inheritance is not as simple as just "truncate the locale name and check".

While you certainly can have your own parser, I think Steven was just
trying to warn you about some possible pitfalls. (At Google we did some
work to produce "fully resolved" locale data files using the CLDR tools, so
that people could parse those more easily, especially in languages other
than Java.)

Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Tue, Aug 14, 2012 at 7:04 PM, Masayoshi Okutsu <
masayoshi.okutsu at oracle.com> wrote:

>
> On 8/15/2012 1:52 AM, Steven R. Loomis wrote:
>
>  On Tue, Aug 14, 2012 at 2:37 AM, Masayoshi Okutsu <
>> masayoshi.okutsu at oracle.com <mailto:masayoshi.okutsu@**oracle.com<masayoshi.okutsu at oracle.com>>>
>> wrote:
>>
>>     On 8/14/2012 2:25 PM, Steven R. Loomis wrote:
>>
>>         Naoto,
>>           okay, thought I was done for the night, but just two more
>>         things..
>>
>>         - again on the "talk to us" category.. Sun already wrote one LDML
>>         converter, and contributed to another. They're part of the
>>         CLDR toolset and
>>         work with OOo and Solaris data.
>>
>>         - also, it appears that the new converter doesn't handle
>>         aliases at all, or
>>         parentLocales. You're guaranteed to get the wrong answer.
>>
>>         - Some of the processing (such as for Norwegian) and in other
>>         places seems
>>         to be very .. hardcoded and fragile.
>>
>>
>>     These are limitations of the existing parser. I've briefly checked
>>     the output, but I will need to work on the parser more.
>>     Please note that we use the existing JRE classes (runtime) for
>>     CLDR support, not ICU4J. My understanding is that CLDR is after
>>     all the data part of ICU. A lot of adjustments need to be made to
>>     use the JRE classes.
>>
>>
>> No, that is not correct. First,  CLDR is consumed by a number of other
>> packages, besides ICU, including most recently TwitterCLDR.  ICU is used in
>> the development of CLDR.
>> You could take the opportunity to inflence CLDR to benefit the JRE by
>> providing input into the CLDR process.
>>
>
> That's not my point. As you know, IBM took the JDK 1.3 source code as the
> basis for ICU4J. After that IBM made some incompatible changes (from JDK),
> including deprecating functionality. Then, CLDR data was adjusted with the
> ICU4J changes.
>
>
>
>> Also, I was not referring to using the ICU data generator ( in
>> org.unicode.cldr.icu ) but the parser and utility, ( org.unicode.cldr.util
>>  - particularly, CLDRFile ).
>>
>
> I wasn't, either.
>
>
>
>>         - Are you aware of the fact that CLDR 22 is nearly released?
>>
>>
>>     Yes.
>>
>>
>>           Has there been
>>         any testing with the interim data, or any plans to do so?
>>
>>
>>     Currently we have no plan to use 22 in JDK 8. There are still tons
>>     of work to finish for JDK 8, including fixing ancient bugs.
>>
>>
>> It's ironic and unfortunate timing, to independently pull in 21 at this
>> point. The data input in 21 was from the 2.0 release, ( 2011-May-25 ),
>> which by 2013 will be two years old.
>>
>
> This kind of things will happen if external specs, data, whatever are
> incorporated into another product. CLDR in JDK isn't special.
>
>
>          I think the summary again is, talk to us.  Where "us" is the
>>         CLDR technical
>>         committee.
>>
>>
>>     Thanks for the suggestion, but do you mean it's risky to create
>>     something from the spec and its implementation (data)?
>>
>>
>> It's not an unacceptable risk, but it may be an unnecessary one to work
>> in isolation. The parser does not match the spec in a number of areas. As I
>> noted, I myself have been a bit absent from these discussions, both
>> physically and in catching up on the i18n-dev mail digests. But I hope that
>> more conversation will be mutually beneficial.
>>
>
> CLDR is written in XML. If its spec is well defined and stable, what's the
> problem (risk) to write an XML parser to convert the XML files to another
> format?
>
> Thanks,
> Masayoshi
>