[threeten-dev] the "deduplicate" mechanism in tzdb compiler

Xueming Shen xueming.shen at oracle.com
Sun Jan 20 16:44:14 PST 2013


It produces exactly the same  TZDB.dat though it appears
the compressed results area slightly different. Objects within
other objects do not matter in tzdb compiler case, since the
only thing we care here is the ZoneRules, anything inside
a ZoneRules will be output by their real bits. This optimization
may make some sense if we try to have an in-memory
ZoneRules collections and there is need to optimize the
memory foot-print, then "shared" objects inside objects may
bring us some benefits. It does not help in our case.

-Sherman

PS. the size of the "new" and the "old" tzdb-all.jar for all
versions.

sherman at sherman-linux:~/Workspace/310/tzdb$ ls -al new/tzdb-all.jar old/tzdb-all.jar
-rw-r--r-- 1 sherman 1050 60046 2013-01-20 16:34 new/tzdb-all.jar
-rw-r--r-- 1 sherman 1050 59274 2013-01-20 16:34 old/tzdb-all.jar
sherman at sherman-linux:~/Workspace/310/tzdb$ jar tvf new/tzdb-all.jar
346543 Sun Jan 20 16:34:26 PST 2013 TZDB.dat
sherman at sherman-linux:~/Workspace/310/tzdb$ jar tvf old/tzdb-all.jar
346543 Sun Jan 20 16:34:52 PST 2013 TZDB.dat
    305 Sun Jan 20 16:34:52 PST 2013 LeapSecondRules.dat
sherman at sherman-linux:~/Workspace/310/tzdb$


On 01/20/2013 03:22 PM, Stephen Colebourne wrote:
> The purpose was to deduplicate objects within other objects.
> If this patch produces a tzdb.jar file with exactly the same data,
> then its fine by me.
>
> Stephen
>
>
> On 20 January 2013 19:53, Xueming Shen<xueming.shen at oracle.com>  wrote:
>> The "deduplicate" mechanism in tzdb compiler looks really fishy, so I
>> took a look into how it really works. It turns out this is really something
>> not necessary, at least for the compiler tool we are looking for here.
>> All those date/time, offset, trans, transruls will be thrown out at the end,
>> why should we care about whether they are dup or not? they have been
>> created anyway.  The only thing we really care here is the ZoneRules
>> which will be output to the tzdb.jar, they need to be de-duplicated
>> before written out. But this is being done perfectly at outputFile(s) via
>> the combination of
>>
>> Set<ZoneRules>  loopAllRules = new HashSet<ZoneRules>(builtZones.values());
>> ...
>> List<ZoneRules>  rulesList = new ArrayList<>(allRules);
>> ...
>> int rulesIndex = rulesList.indexOf(entry.getValue());
>>
>> The HashSet, ArrayList.indexOf() works on ZoneRules.equals() so
>> the ZoneRules objects are being effectively de-duplicated without
>> any help from that "deduplicate" mechanism.
>>
>> http://cr.openjdk.java.net/~sherman/jdk8_threeten/zruleDup2
>>
>> -Sherman



More information about the threeten-dev mailing list