RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

David Holmes david.holmes at oracle.com
Wed Dec 6 11:32:55 UTC 2017


Hi Peter,

On 6/12/2017 9:08 PM, Peter Levart wrote:
> Hi David,
> 
> Can I consider your comment as a Review? I'd like to get this patch into 
> JDK10 if possible.

No sorry. I see what you're doing and I think it is okay but the regular 
owners/maintainers of this code need to have the say on any changes here.

David

> Regards, Peter
> 
> 
> On 11/28/2017 08:17 AM, David Holmes wrote:
>> Hi Peter,
>>
>> I like what you have done here. That said the general 
>> thread-unsafeness of the code in SimpleTimeZone still causes me 
>> concern - but what you are doing is not breaking anything more than it 
>> is already broken.
>>
>> David
>>
>> On 25/11/2017 9:32 AM, Peter Levart wrote:
>>> Hi,
>>>
>>> @Venkat: Sorry for late response, but I had to try something 1st.
>>>
>>> This is an official request for reviewing a patch for fixing a data 
>>> race between cloning a SimpleTimeZone object and lazily initializing 
>>> its 3 cache fields which may produce a clone with inconsistent cache 
>>> state. Here's Jira issue with details:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>
>>> Venkat has proposed to simply call invalidateCache() on the clone 
>>> before returning it from SimpleTimeZone.clone() method:
>>>
>>>      public Object clone()
>>>      {
>>>          SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>>>          clone.invalidateCache();
>>>          return clone;
>>>      }
>>>
>>> This fixes the issue and for the case of TimeZone.getDefault() which 
>>> is called from ZoneId.systemDefault() even elides synchronization in 
>>> clone.invalidateCache() and allocation of the clone, so JITed 
>>> ZoneId.systemDefault() is unaffected by the patch. Initially I was 
>>> satisfied with the fix, but then I tested something. Suppose someone 
>>> sets default zone to SimpleTimeZone:
>>>
>>>          TimeZone.setDefault(
>>>              new SimpleTimeZone(3600000,
>>>                                 "Europe/Paris",
>>>                                 Calendar.MARCH, -1, Calendar.SUNDAY,
>>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>>                                 Calendar.OCTOBER, -1, Calendar.SUNDAY,
>>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>>                                 3600000)
>>>          );
>>>
>>> And then calls some methods that initialize the cache of the internal 
>>> shared TimeZone.defaultTimeZone instance:
>>>
>>>      new Date().toString();
>>>
>>> The code which after that tries to obtain default time zone and 
>>> calculate the offset from UTC at some current point in time, for 
>>> example:
>>>
>>>      TimeZone.getDefault().getOffset(now)
>>>
>>> can't use the cached state because it has been invalidated in the 
>>> returned clone. Such code has to re-compute the offset every time it 
>>> gets new clone. I measured this with a JMH benchmark and got the 
>>> following result:
>>>
>>> Default:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501 
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040 ns/op
>>>
>>> Venkat's patch - invalidateCache():
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942 
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073 ns/op
>>>
>>> We see that ZoneId.systemDefault() is unaffected, but 
>>> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>>>
>>> This is not good, so I tried an alternative fix for the issue - 
>>> simply making the SimpleTimeZone.clone() synchronized. Access to 
>>> cache fields is already synchronized, so this should fix the race. 
>>> Here's the JMH result:
>>>
>>> Default:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501 
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040 ns/op
>>>
>>> Synchronized clone():
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936 
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034 ns/op
>>>
>>> We see that caching works again, but synchronization has some 
>>> overhead which is not big for single-threaded access, but might get 
>>> bigger when multiple threads try to get default zone concurrently.
>>>
>>> So I created a 3rd variant of the fix which I'm presenting here and 
>>> requesting a review for:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/ 
>>>
>>>
>>> The JMH benchmark shows this:
>>>
>>> Default:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501 
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040 ns/op
>>>
>>> Cache object:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274 
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057 ns/op
>>>
>>> Not only does the fix not affect ZoneId.systemDefault() which is not 
>>> surprising, but it also speeds-up cache lookup in single-threaded 
>>> benchmark and certainly eliminates possible contention among threads 
>>> looking up the shared instance.
>>>
>>> I have run the test/jdk/java/util/TimeZone and 
>>> test/jdk/java/util/Calendar jtreg tests and there were 7 failures 
>>> caused by compilation errors (package sun.util.locale.provider is not 
>>> visible), while 59 other tests pass.
>>>
>>> So, what do you think?
>>>
>>> Regards, Peter
>>>
>>>
>>> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>>>> Thanks Peter for sponsoring this patch. Is there anything else that 
>>>> needs to be done from my end for this patch to be integrated? Please 
>>>> let me know.
>>>>
>>>> -Venkat
>>>>
>>>>
>>>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>>>> Hi Venkat,
>>>>>
>>>>> I created the following issue:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>>>
>>>>> I can also sponsor this patch and push it for JDK 10.
>>>>>
>>>>> The patch that you are proposing looks good to me. Does anybody 
>>>>> have anything else to say?
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>>
>>>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>>>> Thanks David, Peter for your review and comments. As I am new to 
>>>>>> the community, can you please help me to open a bug and integrate 
>>>>>> the changes into code base?
>>>>>>
>>>>>> -Venkat
>>>>>>
>>>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>>>> Hi David, Venkat,
>>>>>>>
>>>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>>>> For example, take the following method:
>>>>>>>>
>>>>>>>> String defaultTZID() {
>>>>>>>>     return TimeZone.getDefault().getID();
>>>>>>>> }
>>>>>>>>
>>>>>>>> When JIT compiles it and inlines invocations to other methods 
>>>>>>>> within it, it can prove that cloned TimeZone instance never 
>>>>>>>> escapes the call to defaultTZID() and can therefore skip 
>>>>>>>> allocating the instance on heap.
>>>>>>>>
>>>>>>>> But this is fragile. If code in invoked methods changes, they 
>>>>>>>> may not get inlined or EA may not be able to prove that the 
>>>>>>>> cloned instance can't escape and allocation may be introduced. 
>>>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if 
>>>>>>>> we manage to keep it allocation free.
>>>>>>>
>>>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>>>
>>>>>>>     public Object clone()
>>>>>>>     {
>>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>>         // like tz.invalidateCache() but without holding a lock 
>>>>>>> on clone
>>>>>>>         tz.cacheYear = tz.startYear - 1;
>>>>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>>>>         return tz;
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>> ...and the JMH benchmark with gc profiling shows that 
>>>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without 
>>>>>>> introducing allocation.
>>>>>>>
>>>>>>> Even the following (original Venkat's) patch:
>>>>>>>
>>>>>>>     public Object clone()
>>>>>>>     {
>>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>>         tz.invalidateCache();
>>>>>>>         return tz;
>>>>>>>     }
>>>>>>>
>>>>>>> ...does the same and the locking in invalidateCache() is elided 
>>>>>>> too. Allocation and lock-elision go hand-in-hand. When object 
>>>>>>> doesn't escape, allocation on heap may be eliminated and locks on 
>>>>>>> that instance elided.
>>>>>>>
>>>>>>> So this is better than synchronizing on the original instance 
>>>>>>> during .clone() execution as it has potential to avoid locking 
>>>>>>> overhead.
>>>>>>>
>>>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
> 


More information about the core-libs-dev mailing list