<i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Peter Levart peter.levart at gmail.com
Sat Dec 9 22:33:04 UTC 2017


Hi Naoto,

Thank you for reviewing.

Naoto Sato je 06. 12. 2017 ob 20:41 napisal:
> Hi Peter, Venkat,
>
> Thank you for the fix. It looks good to me. Improved performance is a 
> nice bonus! Would you be able to provide with a regression test?

Sure, here it is:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.02/


The test reliably detects several races in 2 seconds of runtime which 
cause incorrect offset calculations in JDK 9:

shared TZ: 7363986 correct, 0 incorrect calculations
cloned TZ: 3960264 correct, 1180 incorrect calculations
Exception in thread "main" java.lang.AssertionError: 1180 fatal data 
races detected
     at 
SimpleTimeZoneCloneRaceTest.main(SimpleTimeZoneCloneRaceTest.java:86)

With patch applied, there are no failures of course.

Can this go into JDK 10 ?

Regards, Peter

>
> Naoto
>
> On 12/6/17 6:10 AM, Peter Levart wrote:
>> Hi,
>>
>> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>>> I think this class is normally maintained on i18n-dev but I think 
>>> introducing the Cache object looks good and making this much easier 
>>> to understand.
>>>
>>> -Alan 
>>
>> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that 
>> part of JDK have any comments...
>>
>> This is an official request for reviewing a patch for fixing a data 
>> race between cloning a SimpleTimeZone object and lazily initializing 
>> its 3 cache fields which may produce a clone with inconsistent cache 
>> state. Here's Jira issue with details:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>
>> Venkat has proposed to simply call invalidateCache() on the clone 
>> before returning it from SimpleTimeZone.clone() method:
>>
>>      public Object clone()
>>      {
>>          SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>>          clone.invalidateCache();
>>          return clone;
>>      }
>>
>> This fixes the issue and for the case of TimeZone.getDefault() which 
>> is called from ZoneId.systemDefault() even elides synchronization in 
>> clone.invalidateCache() and allocation of the clone, so JITed 
>> ZoneId.systemDefault() is unaffected by the patch. Initially I was 
>> satisfied with the fix, but then I tested something. Suppose someone 
>> sets default zone to SimpleTimeZone:
>>
>>          TimeZone.setDefault(
>>              new SimpleTimeZone(3600000,
>>                                 "Europe/Paris",
>>                                 Calendar.MARCH, -1, Calendar.SUNDAY,
>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>                                 Calendar.OCTOBER, -1, Calendar.SUNDAY,
>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>                                 3600000)
>>          );
>>
>> And then calls some methods that initialize the cache of the internal 
>> shared TimeZone.defaultTimeZone instance:
>>
>>      new Date().toString();
>>
>> The code which after that tries to obtain default time zone and 
>> calculate the offset from UTC at some current point in time, for 
>> example:
>>
>>      TimeZone.getDefault().getOffset(now)
>>
>> can't use the cached state because it has been invalidated in the 
>> returned clone. Such code has to re-compute the offset every time it 
>> gets new clone. I measured this with a JMH benchmark and got the 
>> following result:
>>
>> Default:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  
>> ns/op
>>
>> Venkat's patch - invalidateCache():
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942  
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073  
>> ns/op
>>
>> We see that ZoneId.systemDefault() is unaffected, but 
>> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>>
>> This is not good, so I tried an alternative fix for the issue - 
>> simply making the SimpleTimeZone.clone() synchronized. Access to 
>> cache fields is already synchronized, so this should fix the race. 
>> Here's the JMH result:
>>
>> Default:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  
>> ns/op
>>
>> Synchronized clone():
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936  
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034  
>> ns/op
>>
>> We see that caching works again, but synchronization has some 
>> overhead which is not big for single-threaded access, but might get 
>> bigger when multiple threads try to get default zone concurrently.
>>
>> So I created a 3rd variant of the fix which I'm presenting here and 
>> requesting a review for:
>>
>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/ 
>>
>>
>> The JMH benchmark shows this:
>>
>> Default:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  
>> ns/op
>>
>> Cache object:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274  
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057  
>> ns/op
>>
>> Not only does the fix not affect ZoneId.systemDefault() which is not 
>> surprising, but it also speeds-up cache lookup in single-threaded 
>> benchmark and certainly eliminates possible contention among threads 
>> looking up the shared instance.
>>
>> I have run the test/jdk/java/util/TimeZone and 
>> test/jdk/java/util/Calendar jtreg tests and there were 7 failures 
>> caused by compilation errors (package sun.util.locale.provider is not 
>> visible), while 59 other tests pass.
>>
>> So, what do you think?
>>
>> Regards, Peter
>>
>>
>> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>>> Thanks Peter for sponsoring this patch. Is there anything else that 
>>> needs to be done from my end for this patch to be integrated? Please 
>>> let me know.
>>>
>>> -Venkat
>>>
>>>
>>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>>> Hi Venkat,
>>>>
>>>> I created the following issue:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>>
>>>> I can also sponsor this patch and push it for JDK 10.
>>>>
>>>> The patch that you are proposing looks good to me. Does anybody 
>>>> have anything else to say?
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>>> Thanks David, Peter for your review and comments. As I am new to 
>>>>> the community, can you please help me to open a bug and integrate 
>>>>> the changes into code base?
>>>>>
>>>>> -Venkat
>>>>>
>>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>>> Hi David, Venkat,
>>>>>>
>>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>>> For example, take the following method:
>>>>>>>
>>>>>>> String defaultTZID() {
>>>>>>>     return TimeZone.getDefault().getID();
>>>>>>> }
>>>>>>>
>>>>>>> When JIT compiles it and inlines invocations to other methods 
>>>>>>> within it, it can prove that cloned TimeZone instance never 
>>>>>>> escapes the call to defaultTZID() and can therefore skip 
>>>>>>> allocating the instance on heap.
>>>>>>>
>>>>>>> But this is fragile. If code in invoked methods changes, they 
>>>>>>> may not get inlined or EA may not be able to prove that the 
>>>>>>> cloned instance can't escape and allocation may be introduced. 
>>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if 
>>>>>>> we manage to keep it allocation free.
>>>>>>
>>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>>
>>>>>>     public Object clone()
>>>>>>     {
>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>         // like tz.invalidateCache() but without holding a lock 
>>>>>> on clone
>>>>>>         tz.cacheYear = tz.startYear - 1;
>>>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>>>         return tz;
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> ...and the JMH benchmark with gc profiling shows that 
>>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without 
>>>>>> introducing allocation.
>>>>>>
>>>>>> Even the following (original Venkat's) patch:
>>>>>>
>>>>>>     public Object clone()
>>>>>>     {
>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>         tz.invalidateCache();
>>>>>>         return tz;
>>>>>>     }
>>>>>>
>>>>>> ...does the same and the locking in invalidateCache() is elided 
>>>>>> too. Allocation and lock-elision go hand-in-hand. When object 
>>>>>> doesn't escape, allocation on heap may be eliminated and locks on 
>>>>>> that instance elided.
>>>>>>
>>>>>> So this is better than synchronizing on the original instance 
>>>>>> during .clone() execution as it has potential to avoid locking 
>>>>>> overhead.
>>>>>>
>>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>
>>>>
>>>>
>>>



More information about the core-libs-dev mailing list