<i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields
Peter Levart
peter.levart at gmail.com
Sat Dec 9 22:33:04 UTC 2017
Hi Naoto,
Thank you for reviewing.
Naoto Sato je 06. 12. 2017 ob 20:41 napisal:
> Hi Peter, Venkat,
>
> Thank you for the fix. It looks good to me. Improved performance is a
> nice bonus! Would you be able to provide with a regression test?
Sure, here it is:
http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.02/
The test reliably detects several races in 2 seconds of runtime which
cause incorrect offset calculations in JDK 9:
shared TZ: 7363986 correct, 0 incorrect calculations
cloned TZ: 3960264 correct, 1180 incorrect calculations
Exception in thread "main" java.lang.AssertionError: 1180 fatal data
races detected
at
SimpleTimeZoneCloneRaceTest.main(SimpleTimeZoneCloneRaceTest.java:86)
With patch applied, there are no failures of course.
Can this go into JDK 10 ?
Regards, Peter
>
> Naoto
>
> On 12/6/17 6:10 AM, Peter Levart wrote:
>> Hi,
>>
>> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>>> I think this class is normally maintained on i18n-dev but I think
>>> introducing the Cache object looks good and making this much easier
>>> to understand.
>>>
>>> -Alan
>>
>> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that
>> part of JDK have any comments...
>>
>> This is an official request for reviewing a patch for fixing a data
>> race between cloning a SimpleTimeZone object and lazily initializing
>> its 3 cache fields which may produce a clone with inconsistent cache
>> state. Here's Jira issue with details:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>
>> Venkat has proposed to simply call invalidateCache() on the clone
>> before returning it from SimpleTimeZone.clone() method:
>>
>> public Object clone()
>> {
>> SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>> clone.invalidateCache();
>> return clone;
>> }
>>
>> This fixes the issue and for the case of TimeZone.getDefault() which
>> is called from ZoneId.systemDefault() even elides synchronization in
>> clone.invalidateCache() and allocation of the clone, so JITed
>> ZoneId.systemDefault() is unaffected by the patch. Initially I was
>> satisfied with the fix, but then I tested something. Suppose someone
>> sets default zone to SimpleTimeZone:
>>
>> TimeZone.setDefault(
>> new SimpleTimeZone(3600000,
>> "Europe/Paris",
>> Calendar.MARCH, -1, Calendar.SUNDAY,
>> 3600000, SimpleTimeZone.UTC_TIME,
>> Calendar.OCTOBER, -1, Calendar.SUNDAY,
>> 3600000, SimpleTimeZone.UTC_TIME,
>> 3600000)
>> );
>>
>> And then calls some methods that initialize the cache of the internal
>> shared TimeZone.defaultTimeZone instance:
>>
>> new Date().toString();
>>
>> The code which after that tries to obtain default time zone and
>> calculate the offset from UTC at some current point in time, for
>> example:
>>
>> TimeZone.getDefault().getOffset(now)
>>
>> can't use the cached state because it has been invalidated in the
>> returned clone. Such code has to re-compute the offset every time it
>> gets new clone. I measured this with a JMH benchmark and got the
>> following result:
>>
>> Default:
>>
>> Benchmark Mode Cnt Score Error Units
>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 57,168 ± 0,501
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,558 ± 0,040
>> ns/op
>>
>> Venkat's patch - invalidateCache():
>>
>> Benchmark Mode Cnt Score Error Units
>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 179,476 ± 1,942
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,538 ± 0,073
>> ns/op
>>
>> We see that ZoneId.systemDefault() is unaffected, but
>> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>>
>> This is not good, so I tried an alternative fix for the issue -
>> simply making the SimpleTimeZone.clone() synchronized. Access to
>> cache fields is already synchronized, so this should fix the race.
>> Here's the JMH result:
>>
>> Default:
>>
>> Benchmark Mode Cnt Score Error Units
>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 57,168 ± 0,501
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,558 ± 0,040
>> ns/op
>>
>> Synchronized clone():
>>
>> Benchmark Mode Cnt Score Error Units
>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 58,103 ± 0,936
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault avgt 10 4,154 ± 0,034
>> ns/op
>>
>> We see that caching works again, but synchronization has some
>> overhead which is not big for single-threaded access, but might get
>> bigger when multiple threads try to get default zone concurrently.
>>
>> So I created a 3rd variant of the fix which I'm presenting here and
>> requesting a review for:
>>
>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/
>>
>>
>> The JMH benchmark shows this:
>>
>> Default:
>>
>> Benchmark Mode Cnt Score Error Units
>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 57,168 ± 0,501
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,558 ± 0,040
>> ns/op
>>
>> Cache object:
>>
>> Benchmark Mode Cnt Score Error Units
>> ZoneIdBench.TimeZone_getDefault_getOffset avgt 10 42,860 ± 0,274
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault avgt 10 3,545 ± 0,057
>> ns/op
>>
>> Not only does the fix not affect ZoneId.systemDefault() which is not
>> surprising, but it also speeds-up cache lookup in single-threaded
>> benchmark and certainly eliminates possible contention among threads
>> looking up the shared instance.
>>
>> I have run the test/jdk/java/util/TimeZone and
>> test/jdk/java/util/Calendar jtreg tests and there were 7 failures
>> caused by compilation errors (package sun.util.locale.provider is not
>> visible), while 59 other tests pass.
>>
>> So, what do you think?
>>
>> Regards, Peter
>>
>>
>> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>>> Thanks Peter for sponsoring this patch. Is there anything else that
>>> needs to be done from my end for this patch to be integrated? Please
>>> let me know.
>>>
>>> -Venkat
>>>
>>>
>>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>>> Hi Venkat,
>>>>
>>>> I created the following issue:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>>
>>>> I can also sponsor this patch and push it for JDK 10.
>>>>
>>>> The patch that you are proposing looks good to me. Does anybody
>>>> have anything else to say?
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>>> Thanks David, Peter for your review and comments. As I am new to
>>>>> the community, can you please help me to open a bug and integrate
>>>>> the changes into code base?
>>>>>
>>>>> -Venkat
>>>>>
>>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>>> Hi David, Venkat,
>>>>>>
>>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>>> For example, take the following method:
>>>>>>>
>>>>>>> String defaultTZID() {
>>>>>>> return TimeZone.getDefault().getID();
>>>>>>> }
>>>>>>>
>>>>>>> When JIT compiles it and inlines invocations to other methods
>>>>>>> within it, it can prove that cloned TimeZone instance never
>>>>>>> escapes the call to defaultTZID() and can therefore skip
>>>>>>> allocating the instance on heap.
>>>>>>>
>>>>>>> But this is fragile. If code in invoked methods changes, they
>>>>>>> may not get inlined or EA may not be able to prove that the
>>>>>>> cloned instance can't escape and allocation may be introduced.
>>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if
>>>>>>> we manage to keep it allocation free.
>>>>>>
>>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>>
>>>>>> public Object clone()
>>>>>> {
>>>>>> SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>> // like tz.invalidateCache() but without holding a lock
>>>>>> on clone
>>>>>> tz.cacheYear = tz.startYear - 1;
>>>>>> tz.cacheStart = tz.cacheEnd = 0;
>>>>>> return tz;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> ...and the JMH benchmark with gc profiling shows that
>>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without
>>>>>> introducing allocation.
>>>>>>
>>>>>> Even the following (original Venkat's) patch:
>>>>>>
>>>>>> public Object clone()
>>>>>> {
>>>>>> SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>> tz.invalidateCache();
>>>>>> return tz;
>>>>>> }
>>>>>>
>>>>>> ...does the same and the locking in invalidateCache() is elided
>>>>>> too. Allocation and lock-elision go hand-in-hand. When object
>>>>>> doesn't escape, allocation on heap may be eliminated and locks on
>>>>>> that instance elided.
>>>>>>
>>>>>> So this is better than synchronizing on the original instance
>>>>>> during .clone() execution as it has potential to avoid locking
>>>>>> overhead.
>>>>>>
>>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>
>>>>
>>>>
>>>
More information about the i18n-dev
mailing list