<i18n dev> RFR: 8288979: Improve CLDRConverter run time

Wed Jun 22 17:10:55 UTC 2022

On Wed, 22 Jun 2022 16:11:33 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

>> This PR improves the performance of deduplication done by ResourceBundleGenerator.
>> 
>> The original implementation compared every pair of values, requiring O(n^2) time. The new implementation uses a HashMap to find duplicates, trading off some extra memory consumption for O(n) computational complexity. In practice the time to generate jdk.localedata on my Linux VM files dropped from 14 to 8 seconds.
>> 
>> The resulting files (under build/support/gensrc/java.base and jdk.localedata) have different contents; map iteration order depends on the insertion order, and the insertion order of the new implementation is different from the original.
>> The files generated before and after this change have the same size.
>
> make/jdk/src/classes/build/tools/cldrconverter/ResourceBundleGenerator.java line 146:
> 
>> 144:             // generic reduction of duplicated values
>> 145:             Map<String, Object> newMap = new HashMap<>(map);
>> 146:             Map<BundleEntryValue, BundleEntryValue> dedup = new HashMap<>(map.size());
> 
> LinkedHashMap could be used to retain the iteration order.
> Or TreeMap if some deterministic order was desirable.

True. Which raises the question: do we need any arbitrary order? The original code used a hashmap too. It preserved the original order only when no duplicates were detected.

-------------

PR: https://git.openjdk.org/jdk/pull/9243