<i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Naoto Sato naoto.sato at oracle.com
Wed Dec 6 19:41:41 UTC 2017


Hi Peter, Venkat,

Thank you for the fix. It looks good to me. Improved performance is a 
nice bonus! Would you be able to provide with a regression test?

Naoto

On 12/6/17 6:10 AM, Peter Levart wrote:
> Hi,
> 
> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>> I think this class is normally maintained on i18n-dev but I think 
>> introducing the Cache object looks good and making this much easier to 
>> understand.
>>
>> -Alan 
> 
> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that 
> part of JDK have any comments...
> 
> This is an official request for reviewing a patch for fixing a data race 
> between cloning a SimpleTimeZone object and lazily initializing its 3 
> cache fields which may produce a clone with inconsistent cache state. 
> Here's Jira issue with details:
> 
> https://bugs.openjdk.java.net/browse/JDK-8191216
> 
> Venkat has proposed to simply call invalidateCache() on the clone before 
> returning it from SimpleTimeZone.clone() method:
> 
>      public Object clone()
>      {
>          SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>          clone.invalidateCache();
>          return clone;
>      }
> 
> This fixes the issue and for the case of TimeZone.getDefault() which is 
> called from ZoneId.systemDefault() even elides synchronization in 
> clone.invalidateCache() and allocation of the clone, so JITed 
> ZoneId.systemDefault() is unaffected by the patch. Initially I was 
> satisfied with the fix, but then I tested something. Suppose someone 
> sets default zone to SimpleTimeZone:
> 
>          TimeZone.setDefault(
>              new SimpleTimeZone(3600000,
>                                 "Europe/Paris",
>                                 Calendar.MARCH, -1, Calendar.SUNDAY,
>                                 3600000, SimpleTimeZone.UTC_TIME,
>                                 Calendar.OCTOBER, -1, Calendar.SUNDAY,
>                                 3600000, SimpleTimeZone.UTC_TIME,
>                                 3600000)
>          );
> 
> And then calls some methods that initialize the cache of the internal 
> shared TimeZone.defaultTimeZone instance:
> 
>      new Date().toString();
> 
> The code which after that tries to obtain default time zone and 
> calculate the offset from UTC at some current point in time, for example:
> 
>      TimeZone.getDefault().getOffset(now)
> 
> can't use the cached state because it has been invalidated in the 
> returned clone. Such code has to re-compute the offset every time it 
> gets new clone. I measured this with a JMH benchmark and got the 
> following result:
> 
> Default:
> 
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op
> 
> Venkat's patch - invalidateCache():
> 
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073  ns/op
> 
> We see that ZoneId.systemDefault() is unaffected, but 
> TimeZone.getDefault().getOffset(now) becomes 3x slower.
> 
> This is not good, so I tried an alternative fix for the issue - simply 
> making the SimpleTimeZone.clone() synchronized. Access to cache fields 
> is already synchronized, so this should fix the race. Here's the JMH 
> result:
> 
> Default:
> 
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op
> 
> Synchronized clone():
> 
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034  ns/op
> 
> We see that caching works again, but synchronization has some overhead 
> which is not big for single-threaded access, but might get bigger when 
> multiple threads try to get default zone concurrently.
> 
> So I created a 3rd variant of the fix which I'm presenting here and 
> requesting a review for:
> 
> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/ 
> 
> 
> The JMH benchmark shows this:
> 
> Default:
> 
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op
> 
> Cache object:
> 
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057  ns/op
> 
> Not only does the fix not affect ZoneId.systemDefault() which is not 
> surprising, but it also speeds-up cache lookup in single-threaded 
> benchmark and certainly eliminates possible contention among threads 
> looking up the shared instance.
> 
> I have run the test/jdk/java/util/TimeZone and 
> test/jdk/java/util/Calendar jtreg tests and there were 7 failures caused 
> by compilation errors (package sun.util.locale.provider is not visible), 
> while 59 other tests pass.
> 
> So, what do you think?
> 
> Regards, Peter
> 
> 
> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>> Thanks Peter for sponsoring this patch. Is there anything else that 
>> needs to be done from my end for this patch to be integrated? Please 
>> let me know.
>>
>> -Venkat
>>
>>
>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>> Hi Venkat,
>>>
>>> I created the following issue:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>
>>> I can also sponsor this patch and push it for JDK 10.
>>>
>>> The patch that you are proposing looks good to me. Does anybody have 
>>> anything else to say?
>>>
>>> Regards, Peter
>>>
>>>
>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>> Thanks David, Peter for your review and comments. As I am new to the 
>>>> community, can you please help me to open a bug and integrate the 
>>>> changes into code base?
>>>>
>>>> -Venkat
>>>>
>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>> Hi David, Venkat,
>>>>>
>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>> For example, take the following method:
>>>>>>
>>>>>> String defaultTZID() {
>>>>>>     return TimeZone.getDefault().getID();
>>>>>> }
>>>>>>
>>>>>> When JIT compiles it and inlines invocations to other methods 
>>>>>> within it, it can prove that cloned TimeZone instance never 
>>>>>> escapes the call to defaultTZID() and can therefore skip 
>>>>>> allocating the instance on heap.
>>>>>>
>>>>>> But this is fragile. If code in invoked methods changes, they may 
>>>>>> not get inlined or EA may not be able to prove that the cloned 
>>>>>> instance can't escape and allocation may be introduced. 
>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if we 
>>>>>> manage to keep it allocation free.
>>>>>
>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>
>>>>>     public Object clone()
>>>>>     {
>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>         // like tz.invalidateCache() but without holding a lock on 
>>>>> clone
>>>>>         tz.cacheYear = tz.startYear - 1;
>>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>>         return tz;
>>>>>     }
>>>>>
>>>>>
>>>>> ...and the JMH benchmark with gc profiling shows that 
>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without 
>>>>> introducing allocation.
>>>>>
>>>>> Even the following (original Venkat's) patch:
>>>>>
>>>>>     public Object clone()
>>>>>     {
>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>         tz.invalidateCache();
>>>>>         return tz;
>>>>>     }
>>>>>
>>>>> ...does the same and the locking in invalidateCache() is elided 
>>>>> too. Allocation and lock-elision go hand-in-hand. When object 
>>>>> doesn't escape, allocation on heap may be eliminated and locks on 
>>>>> that instance elided.
>>>>>
>>>>> So this is better than synchronizing on the original instance 
>>>>> during .clone() execution as it has potential to avoid locking 
>>>>> overhead.
>>>>>
>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>
>>>
>>>
>>


More information about the core-libs-dev mailing list