[threeten-dev] zidtext parser

Thu Jan 10 18:44:01 PST 2013

Hi,

One helper class ZoneName, which is based on the metazone data
info from metaZone.xml [1] and the zone aliase info from the Link
entries of all the tzdb data files, is added to help implement the
parsing logic suggested by cldr. The reason we have to have the
the zone alias info is that it appears metaZone.xml uses both old
name and new name in different places.

For each zid/zname candidate the implementation now looks up
the metazone table to see if it has a metazone specified for it, if
yes, it tries to see if there is a preferred zid defined for a particular
locale/region, if yes and if this locale/region matches parser's locale,
that preferred zid is used, otherwise, the default/001 zid for the meta
zone is returned.

The webrev is at

http://cr.openjdk.java.net/~sherman/jdk8_threeten/ztext_parser

The zid->metazone->zid lookup tables is hard coded now in
ZoneName. Those mapping tables are generated from the link[2]
(which is a result of grep "Link" of all tzdb files), metaZone.xml[1]
via the hacky tool MetaZone.java[3]. If this approach is good, I
would expect we will get these info from Naoto's TimeZoneUtilities
"someday" (I think we are parsing metaZones.xml in cldr tool already,
probably for those generic name? we just need to go a little further)

The test currently does not fail if the parsed result is not the
"expected" one (round trip). The test simply prints out those diff
for manual check. If you take a close look at output "result" [4],
which shows "zid", "input text", "parsed result" and "expected", it
appears all the parsed result is correct/reasonable, I guess the
reason those result is not the "expected" is mainly because those
zids are not in metaZone.xml.

The result[4] shows the parsing works reasonably well for "full"
style, but we have lots of "missing/ambiguous" result for those
"short" style names. Given the nature of those short names and
the limited info (the "locale"), It maybe reasonable to not support
those ambiguous short names in parsing? An alternative is to
"specify" the mapping table in spec, but it's not going to make
every one happy either.

Opinion?

-Sherman

[1] http://cr.openjdk.java.net/~sherman/jdk8_threeten/ztext_parser/metaZones
[2] http://cr.openjdk.java.net/~sherman/jdk8_threeten/ztext_parser/link
[3] http://cr.openjdk.java.net/~sherman/jdk8_threeten/ztext_parser/MetaZone.java
[4] http://cr.openjdk.java.net/~sherman/jdk8_threeten/ztext_parser/result

On 01/04/2013 09:07 AM, Stephen Colebourne wrote:
> For this release or JDK 9?
> We need to ensure that we don't do anything that prevents implementing
> the full CLDR strategy if we are not doing it now.
> Stephen
>
> On 4 January 2013 17:03, Xueming Shen<xueming.shen at oracle.com>  wrote:
>> On 1/4/13 2:32 AM, Stephen Colebourne wrote:
>>> Realy, we should implement the rules described in CLDR, as they seem
>>> to have thought about it:
>>> http://www.unicode.org/reports/tr35/#Time_Zone_Fallback
>>
>> we need pull in more cldr data...
>>
>>
>>> Stephen
>>