<i18n dev> <i18n> Performance patch for DateFormatSymbols.getZoneIndex(String)

Masayoshi Okutsu masayoshi.okutsu at oracle.com
Sun Apr 22 22:19:55 PDT 2012


Hi Deven,

As Yoshito noted, lastZoneIndex needs to be transient because 
DateFormatSymbols is serializable. Otherwise, the change looks good to me.

Thanks,
Masayoshi

On 4/20/2012 11:56 AM, Deven You wrote:
> Yoshito,
>
> It is good to me for your suggestion. So Masayoshi, what is your 
> opinion,  is cache the last zone index better than cache recent zone 
> ID and zone indexes, do you have any concern about using last zone index?
>
> [1] The updated version of this patch 
> http://cr.openjdk.java.net/~youdwei/DateFormatSymbol_perf/webrev.02/ 
> <http://cr.openjdk.java.net/%7Eyoudwei/DateFormatSymbol_perf/webrev.02/>
>
> Thanks a lot!
>
> On 04/11/2012 10:35 PM, Yoshito Umaoka wrote:
>> Deven,
>>
>> If last time zone ID is cached along with the previously resolved 
>> index, and getZoneIndex does not check the contents of array, it 
>> might not work when zone strings are updated (setZoneStrings). So, to 
>> make this change work properly, you have to reset 
>> lastZoneID/zoneIndex in setZoneStrings(String[][]).
>>
>> In my opinion, we just need to keep last zone index. And it should be 
>> 'transient'.
>>
>> For every getZoneIndex(String) call, input ID is compared to 
>> zoneStrings[lastZoneIndex][0] - and if they matches, simply return 
>> lastIndex. Otherwise, do the linear search and update lastZoneIndex- 
>> like below.
>>
>> private transient int lastZoneIndex = 0; // with initial value to be 
>> 0, you do not need to check lastZoneIndex >= 0 every time
>>
>> final int getZoneIndex(String ID) {
>>   // fast path
>>
>>   // note: if you want to get rid of lastZoneIndex < zoneStrings.length,
>>   // setZoneStrings(String[][]) should reset lastZoneIndex to 0 or 
>> any valid index.
>>
>>   if (lastZoneIndex < zoneStrings.length && 
>> ID.equals(zoneStrings[lastZoneIndex ][0]) {
>>     return lastIndex;
>>   }
>>
>>   // slow path
>>   for (int index = 0; index < zoneStrings.length; index++) {
>>     if (ID.equals(zoneStrings[index][0]) {
>>       lastZoneIndex = index;
>>       return index;
>>     }
>>   }
>> }
>>
>> -Yoshito
>>
>> On 4/11/2012 3:02 AM, Deven You wrote:
>>> I think Yoshito's suggestion make sense, since the getZoneIndex is 
>>> an internal method, if there is no manual setting to change the 
>>> timezone, The timezone ID won't be changed for one instance of 
>>> SimpleDateFormt.
>>>
>>> I updated the patch webrev[1] for this suggestion and test the 
>>> GetZoneIndexTest.java, the performance is almost the same with 
>>> webrev.00.
>>>
>>> Please review the updated patch[1].
>>>
>>> Thanks a lot!
>>>
>>> [1] http://cr.openjdk.java.net/~littlee/OJDK-548/webrev.01 
>>> <http://cr.openjdk.java.net/%7Elittlee/OJDK-548/webrev.01>
>>>
>>> On 04/11/2012 04:00 AM, Yoshito Umaoka wrote:
>>>> I'm wondering if caching index matched last time (per 
>>>> DateFormatSymbols instance) is sufficient, instead of keeping 
>>>> multiple results in a hash map.
>>>> I guess majority of use cases repeatedly refer the index of the 
>>>> zone used by SimpleDateFormat.
>>>>
>>>> -Yoshito
>>>>
>>>> On 4/9/2012 10:46 AM, Masayoshi Okutsu wrote:
>>>>> Hi Deven,
>>>>>
>>>>> Here are my comments on the proposed changes.
>>>>>
>>>>> - zoneIndexCache should be an instance field because WeakHashMap 
>>>>> isn't thread-safe and the order of IDs in zoneStrings differs in 
>>>>> each DateFormatSymbols.
>>>>>
>>>>> - equals shouldn't be replaced with equalsIgnoreCase because time 
>>>>> zone IDs are (supposed to be) case-sensitive.
>>>>>
>>>>> - Utilize auto-boxing/unboxing.
>>>>>
>>>>> Thanks,
>>>>> Masayoshi
>>>>>
>>>>> On 4/9/2012 12:10 PM, Deven You wrote:
>>>>>> Hi i18n-devs,
>>>>>>
>>>>>> The getZoneIndex() method is expensive and it's performance 
>>>>>> depends on which timezone you're in. This is because the code 
>>>>>> steps through a list of timezones until it finds the current one. 
>>>>>> This is happening over and over again. An idea would be to either 
>>>>>> cache it or rewrite the way we store time zone ids, such as a 
>>>>>> hashmap instead of an array.
>>>>>> This patch[1] is for the cache option.
>>>>>>
>>>>>> Applications which  format/parse dates using SimpleDateFormat 
>>>>>> repeatedly may obtain benefits from this patch, especially when 
>>>>>> run in a timezone far down the zoneStrings array the improvements 
>>>>>> will be even bigger.
>>>>>>
>>>>>> I have written a test case[2] which shows when a timezone which 
>>>>>> is in lower end of the zone strings array, the performance 
>>>>>> improvement this patch can get.
>>>>>>
>>>>>> test results:
>>>>>>
>>>>>> no patch:
>>>>>> The total time is 742 ms!
>>>>>>
>>>>>> with this patch[1] :
>>>>>> The total time is 508 ms!
>>>>>>
>>>>>> If increase the loop times to 1000000, the results are:
>>>>>>
>>>>>> no patch:
>>>>>> The total time is 4743 ms!
>>>>>>
>>>>>> with this patch[1] :
>>>>>> The total time is 2126 ms!
>>>>>>
>>>>>> The java version is:
>>>>>>
>>>>>> /home/deven/hgrps/jdk8/build/linux-i586/j2sdk-image/bin/java 
>>>>>> -version
>>>>>> openjdk version "1.8.0-internal"
>>>>>> OpenJDK Runtime Environment (build 
>>>>>> 1.8.0-internal-deven_2012_03_14_16_14-b00)
>>>>>> OpenJDK Server VM (build 24.0-b02, mixed mode)
>>>>>>
>>>>>> [1] 
>>>>>> http://cr.openjdk.java.net/~youdwei/DateFormatSymbol_perf/webrev.00/ 
>>>>>> <http://cr.openjdk.java.net/%7Eyoudwei/DateFormatSymbol_perf/webrev.00/> 
>>>>>>
>>>>>> [2] GetZoneIndexTest.java
>>>>
>>>
>>>
>>> -- 
>>> Best Regards,
>>>
>>> Deven
>>
>
>
> -- 
> Best Regards,
>
> Deven


More information about the i18n-dev mailing list