RFR (S): 8028280 ParkEvent leak when running modified runThese which only loads classes
Frederic Parain
frederic.parain at oracle.com
Tue Jan 21 01:52:49 PST 2014
Looks good to me.
Fred
On 01/15/2014 10:37 AM, David Simms wrote:
>
> Still need a 2nd reviewer (doesn't have to be capital 'R') for this fix.
>
> Cheers
> /David Simms
>
> On 11/29/2013 08:30 AM, David Simms wrote:
>>
>> Thanks for the review !
>>
>> Inline...
>>
>> On 29/11/2013 2:06 a.m., David Holmes wrote:
>>> Hi David,
>>>
>>> Apologies for the delay in getting back to this, and even more so
>>> that I won't be able to respond further as I'll be on vacation the
>>> next two weeks. Fortunately no major issues that would need further
>>> discussion :)
>>>
>> It's okay, we took the bug of the critical list, it's actually hard to
>> cause an actual OOM.
>>> src/os/*/vm/os_*.cpp
>>>
>>> So ... os::naked_sleep is now morphed into os::naked_short_sleep and
>>> is always really "naked" ie it uses an OS call. Ok.
>>>
>>> Why does the windows version not have:
>>>
>>> assert(ms < 1000, "Long sleep will break Thread.interrupt()
>>> semantics");
>>>
>>> ?
>> Will fix.
>>>
>>> What is your concern with Thread.interrupt here? These sleeps aren't
>>> involved in Thread.sleep; and the parking logic only looks for
>>> interrupts at specific points (the existing SpinAcquire code will not
>>> return if the thread is interrupted). The main issue with long
>>> blocking is that the thread is not undergoing a state transition and
>>> so a safepoint will not be reachable until the sleep is over (and we
>>> reach a point where we check for a safepoint). I'd be worried that a
>>> 1 second sleep might cause a problem with safepoints. Of course this
>>> depends on what the caller does before using naked_short_sleep.
>>>
>> My concern is someone else looking to use naked_short_sleep() outside
>> the current context, must be aware that this only for short periods.
>> I.e. it is in no way interruptible. I can change the comment to be
>> more generic, i.e. what I just said.
>>
>>> ---
>>>
>>> src/share/vm/runtime/thread.hpp
>>>
>>> // Low-level leaf-lock primitives used to implement synchronization
>>> // and native monitor-mutex infrastructure.
>>> // Not for general synchronization use.
>>> + // Direct naked OS primitives, safe to use standalone in lock impl
>>>
>>> Your comment really only applies to SpinAcquire, not the set of
>>> methods that the original comment block refers to.
>>
>> Whoops, thought that comment referred to SpinAcquire and SpinRelease,
>> can leave it in for mux acquire and release of course.
>>>
>>> ---
>>>
>>> src/share/vm/runtime/park.cpp
>>>
>>> 91 // Note that if we didn't have the TSM/immortal constraint, then
>>> 92 // when reattaching, above, we could trim the list.
>>>
>>> What is the "reattaching, above" you are referring to? This comment
>>> seem more directed to the old code than the new ??
>> Ah, old comment, thanks for spotting that. Although it's still true
>> ",/above,/" needs to go.
>>>
>>> ---
>>>
>>> Overall this seems okay but I do wonder about the change in
>>> performance profile. The scenario that caused the leak indicates a
>>> high-level of contention. Have any of the synchronization benchmarks
>>> been run as a sanity check?
>>
>> I can talk with the local performance engineer and check for
>> regressions. Shouldn't be a problem semantically, as I'm modifying
>> slow-path...of fairly critical code - will test.
>>
>> The scenario was a targeted stress test, multi-threaded class-loading
>> from the same jar. Not quite "real-world", but since the test exist,
>> will re-run with performance in mind.
>>
>> Thanks again !
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 26/11/2013 8:17 PM, David Simms wrote:
>>>>
>>>> Updated patch to avoid Solaris RT link requirement...
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8028280
>>>> Webrev: http://cr.openjdk.java.net/~dsimms/8028280/
>>>> <http://cr.openjdk.java.net/%7Edsimms/8028280/>
>>>>
>>>> Rerun testing including vm.quick-pcl.testlist on all platforms.
>>>>
>>>>
>>>> On 11/21/2013 04:36 PM, David Simms wrote:
>>>>>
>>>>> Sorry,
>>>>>
>>>>> Some test failures due to Solaris ld not finding nanosleep...
>>>>>
>>>>>
>>>>> On 11/21/2013 04:03 PM, David Simms wrote:
>>>>>>
>>>>>> Small fix for resource leakage with concurrent free list handling
>>>>>> down in park.cpp
>>>>>>
>>>>>> I did modify Thread::SpinAcquire() to be independent of park event
>>>>>> (naked OS functions only), since this method doesn't need to follow
>>>>>> Thread.interrupt() semantics (park doesn't make sense here).
>>>>>>
>>>>>> This allowed a simple spin lock around the resources without worrying
>>>>>> about the fact that park is used in runtime locks.
>>>>>>
>>>>>> Perhaps SpinAcquire/Release could be spit out to a separate
>>>>>> stand-alone util class, a some point in the future.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8028280
>>>>>> Webrev: http://cr.openjdk.java.net/~dsimms/8028280/
>>>>>> <http://cr.openjdk.java.net/%7Edsimms/8028280/>
>>>>>>
>>>>>> Testing: vm testbase, BigApps (30 mins), jtreg (all 64bit platforms)
>>>>>> (...almost completed)
>>>>>>
>>>>>>
>>>>>> /David Simms
>>>>>
>>>>
>>
>
--
Frederic Parain - Oracle
Grenoble Engineering Center - France
Phone: +33 4 76 18 81 17
Email: Frederic.Parain at oracle.com
More information about the hotspot-runtime-dev
mailing list