<i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields
Naoto Sato
naoto.sato at oracle.com
Mon Dec 11 18:41:03 UTC 2017
Hi Peter,
Thanks for the tests. Looks good to me.
One nit: it should throw an Exception instead of AssertionError when the
test fails. No further review is needed.
> Can this go into JDK 10 ?
You can push it before the JDK 10 fork.
Naoto
On 12/9/17 2:33 PM, Peter Levart wrote:
> Hi Naoto,
>
> Thank you for reviewing.
>
> Naoto Sato je 06. 12. 2017 ob 20:41 napisal:
>> Hi Peter, Venkat,
>>
>> Thank you for the fix. It looks good to me. Improved performance is a
>> nice bonus! Would you be able to provide with a regression test?
>
> Sure, here it is:
>
> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.02/
>
>
> The test reliably detects several races in 2 seconds of runtime which
> cause incorrect offset calculations in JDK 9:
>
> shared TZ: 7363986 correct, 0 incorrect calculations
> cloned TZ: 3960264 correct, 1180 incorrect calculations
> Exception in thread "main" java.lang.AssertionError: 1180 fatal data
> races detected
> at
> SimpleTimeZoneCloneRaceTest.main(SimpleTimeZoneCloneRaceTest.java:86)
>
> With patch applied, there are no failures of course.
>
> Can this go into JDK 10 ?
>
> Regards, Peter
>
>>
>> Naoto
>>
>> On 12/6/17 6:10 AM, Peter Levart wrote:
>>> Hi,
>>>
>>> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>>>> I think this class is normally maintained on i18n-dev but I think
>>>> introducing the Cache object looks good and making this much easier
>>>> to understand.
>>>>
>>>> -Alan
>>>
>>> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that
>>> part of JDK have any comments...
>>>
>>> This is an official request for reviewing a patch for fixing a data
>>> race between cloning a SimpleTimeZone object and lazily initializing
>>> its 3 cache fields which may produce a clone with inconsistent cache
>>> state. Here's Jira issue with details:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>
>>> Venkat has proposed to simply call invalidateCache() on the clone
>>> before returning it from SimpleTimeZone.clone() method:
>>>
>>> public Object clone()
>>> {
>>> SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>>> clone.invalidateCache();
>>> return clone;
>>> }
>>>
>>> This fixes the issue and for the case of TimeZone.getDefault() which
>>> is called from ZoneId.systemDefault() even elides synchronization in
>>> clone.invalidateCache() and allocation of the clone, so JITed
>>> ZoneId.systemDefault() is unaffected by the patch. Initially I was
>>> satisfied with the fix, but then I tested something. Suppose someone
>>> sets default zone to SimpleTimeZone:
>>>
>>> TimeZone.setDefault(
>>> new SimpleTimeZone(3600000,
>>> "Europe/Paris",
>>> Calendar.MARCH, -1, Calendar.SUNDAY,
>>> 3600000, SimpleTimeZone.UTC_TIME,
>>> Calendar.OCTOBER, -1, Calendar.SUNDAY,
>>> 3600000, SimpleTimeZone.UTC_TIME,
>>> 3600000)
>>> );
>>>
>>> And then calls some methods that initialize the cache of the internal
>>> shared TimeZone.defaultTimeZone instance:
>>>
>>> new Date().toString();
>>>
>>> The code which after that tries to obtain default time zone and
>>> calculate the offset from UTC at some current point in time, for
>>> example:
>>>
>>> TimeZone.getDefault().getOffset(now)
>>>
>>> can't use the cached state because it has been invalidated in the
>>> returned clone. Such code has to re-compute the offset every time it
>>> gets new clone. I measured this with a JMH benchmark and got the
>>> following result:
>>>
>>> Default:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 57,168 ± 0,501
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,558 ± 0,040
>>> ns/op
>>>
>>> Venkat's patch - invalidateCache():
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 179,476 ± 1,942
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,538 ± 0,073
>>> ns/op
>>>
>>> We see that ZoneId.systemDefault() is unaffected, but
>>> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>>>
>>> This is not good, so I tried an alternative fix for the issue -
>>> simply making the SimpleTimeZone.clone() synchronized. Access to
>>> cache fields is already synchronized, so this should fix the race.
>>> Here's the JMH result:
>>>
>>> Default:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 57,168 ± 0,501
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,558 ± 0,040
>>> ns/op
>>>
>>> Synchronized clone():
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 58,103 ± 0,936
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault avgt 10 4,154 ± 0,034
>>> ns/op
>>>
>>> We see that caching works again, but synchronization has some
>>> overhead which is not big for single-threaded access, but might get
>>> bigger when multiple threads try to get default zone concurrently.
>>>
>>> So I created a 3rd variant of the fix which I'm presenting here and
>>> requesting a review for:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/
>>>
>>>
>>> The JMH benchmark shows this:
>>>
>>> Default:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 57,168 ± 0,501
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,558 ± 0,040
>>> ns/op
>>>
>>> Cache object:
>>>
>>> Benchmark Mode Cnt Score Error Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 42,860 ± 0,274
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,545 ± 0,057
>>> ns/op
>>>
>>> Not only does the fix not affect ZoneId.systemDefault() which is not
>>> surprising, but it also speeds-up cache lookup in single-threaded
>>> benchmark and certainly eliminates possible contention among threads
>>> looking up the shared instance.
>>>
>>> I have run the test/jdk/java/util/TimeZone and
>>> test/jdk/java/util/Calendar jtreg tests and there were 7 failures
>>> caused by compilation errors (package sun.util.locale.provider is not
>>> visible), while 59 other tests pass.
>>>
>>> So, what do you think?
>>>
>>> Regards, Peter
>>>
>>>
>>> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>>>> Thanks Peter for sponsoring this patch. Is there anything else that
>>>> needs to be done from my end for this patch to be integrated? Please
>>>> let me know.
>>>>
>>>> -Venkat
>>>>
>>>>
>>>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>>>> Hi Venkat,
>>>>>
>>>>> I created the following issue:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>>>
>>>>> I can also sponsor this patch and push it for JDK 10.
>>>>>
>>>>> The patch that you are proposing looks good to me. Does anybody
>>>>> have anything else to say?
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>>
>>>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>>>> Thanks David, Peter for your review and comments. As I am new to
>>>>>> the community, can you please help me to open a bug and integrate
>>>>>> the changes into code base?
>>>>>>
>>>>>> -Venkat
>>>>>>
>>>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>>>> Hi David, Venkat,
>>>>>>>
>>>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>>>> For example, take the following method:
>>>>>>>>
>>>>>>>> String defaultTZID() {
>>>>>>>> return TimeZone.getDefault().getID();
>>>>>>>> }
>>>>>>>>
>>>>>>>> When JIT compiles it and inlines invocations to other methods
>>>>>>>> within it, it can prove that cloned TimeZone instance never
>>>>>>>> escapes the call to defaultTZID() and can therefore skip
>>>>>>>> allocating the instance on heap.
>>>>>>>>
>>>>>>>> But this is fragile. If code in invoked methods changes, they
>>>>>>>> may not get inlined or EA may not be able to prove that the
>>>>>>>> cloned instance can't escape and allocation may be introduced.
>>>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if
>>>>>>>> we manage to keep it allocation free.
>>>>>>>
>>>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>>>
>>>>>>> public Object clone()
>>>>>>> {
>>>>>>> SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>> // like tz.invalidateCache() but without holding a lock
>>>>>>> on clone
>>>>>>> tz.cacheYear = tz.startYear - 1;
>>>>>>> tz.cacheStart = tz.cacheEnd = 0;
>>>>>>> return tz;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ...and the JMH benchmark with gc profiling shows that
>>>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without
>>>>>>> introducing allocation.
>>>>>>>
>>>>>>> Even the following (original Venkat's) patch:
>>>>>>>
>>>>>>> public Object clone()
>>>>>>> {
>>>>>>> SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>> tz.invalidateCache();
>>>>>>> return tz;
>>>>>>> }
>>>>>>>
>>>>>>> ...does the same and the locking in invalidateCache() is elided
>>>>>>> too. Allocation and lock-elision go hand-in-hand. When object
>>>>>>> doesn't escape, allocation on heap may be eliminated and locks on
>>>>>>> that instance elided.
>>>>>>>
>>>>>>> So this is better than synchronizing on the original instance
>>>>>>> during .clone() execution as it has potential to avoid locking
>>>>>>> overhead.
>>>>>>>
>>>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>
More information about the i18n-dev
mailing list