RFR (S): 8028280 ParkEvent leak when running modified runThese which only loads classes

Wed Jan 15 01:37:48 PST 2014

Still need a 2nd reviewer (doesn't have to be capital 'R') for this fix.

Cheers
/David Simms

On 11/29/2013 08:30 AM, David Simms wrote:
>
> Thanks for the review !
>
> Inline...
>
> On 29/11/2013 2:06 a.m., David Holmes wrote:
>> Hi David,
>>
>> Apologies for the delay in getting back to this, and even more so 
>> that I won't be able to respond further as I'll be on vacation the 
>> next two weeks. Fortunately no major issues that would need further 
>> discussion :)
>>
> It's okay, we took the bug of the critical list, it's actually hard to 
> cause an actual OOM.
>> src/os/*/vm/os_*.cpp
>>
>> So ... os::naked_sleep is now morphed into os::naked_short_sleep and 
>> is always really "naked" ie it uses an OS call. Ok.
>>
>> Why does the windows version not have:
>>
>>  assert(ms < 1000, "Long sleep will break Thread.interrupt() 
>> semantics");
>>
>> ?
> Will fix.
>>
>> What is your concern with Thread.interrupt here? These sleeps aren't 
>> involved in Thread.sleep; and the parking logic only looks for 
>> interrupts at specific points (the existing SpinAcquire code will not 
>> return if the thread is interrupted). The main issue with long 
>> blocking is that the thread is not undergoing a state transition and 
>> so a safepoint will not be reachable until the sleep is over (and we 
>> reach a point where we check for a safepoint). I'd be worried that a 
>> 1 second sleep might cause a problem with safepoints. Of course this 
>> depends on what the caller does before using naked_short_sleep.
>>
> My concern is someone else looking to use naked_short_sleep() outside 
> the current context, must be aware that this only for short periods. 
> I.e. it is in no way interruptible. I can change the comment to be 
> more generic, i.e. what I just said.
>
>> ---
>>
>> src/share/vm/runtime/thread.hpp
>>
>>     // Low-level leaf-lock primitives used to implement synchronization
>>     // and native monitor-mutex infrastructure.
>>     // Not for general synchronization use.
>> +   // Direct naked OS primitives, safe to use standalone in lock impl
>>
>> Your comment really only applies to SpinAcquire, not the set of 
>> methods that the original comment block refers to.
>
> Whoops, thought that comment referred to SpinAcquire and SpinRelease, 
> can leave it in for mux acquire and release of course.
>>
>> ---
>>
>> src/share/vm/runtime/park.cpp
>>
>>  91   // Note that if we didn't have the TSM/immortal constraint, then
>>  92   // when reattaching, above, we could trim the list.
>>
>> What is the "reattaching, above" you are referring to? This comment 
>> seem more directed to the old code than the new ??
> Ah, old comment, thanks for spotting that. Although it's still true 
> ",/above,/" needs to go.
>>
>> ---
>>
>> Overall this seems okay but I do wonder about the change in 
>> performance profile. The scenario that caused the leak indicates a 
>> high-level of contention. Have any of the synchronization benchmarks 
>> been run as a sanity check?
>
> I can talk with the local performance engineer and check for 
> regressions. Shouldn't be a problem semantically, as I'm modifying 
> slow-path...of fairly critical code - will test.
>
> The scenario was a targeted stress test, multi-threaded class-loading 
> from the same jar. Not quite "real-world", but since the test exist, 
> will re-run with performance in mind.
>
> Thanks again !
>
>>
>> Thanks,
>> David
>>
>> On 26/11/2013 8:17 PM, David Simms wrote:
>>>
>>> Updated patch to avoid Solaris RT link requirement...
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8028280
>>> Webrev: http://cr.openjdk.java.net/~dsimms/8028280/
>>> <http://cr.openjdk.java.net/%7Edsimms/8028280/>
>>>
>>> Rerun testing including vm.quick-pcl.testlist on all platforms.
>>>
>>>
>>> On 11/21/2013 04:36 PM, David Simms wrote:
>>>>
>>>> Sorry,
>>>>
>>>> Some test failures due to Solaris ld not finding nanosleep...
>>>>
>>>>
>>>> On 11/21/2013 04:03 PM, David Simms wrote:
>>>>>
>>>>> Small fix for resource leakage with concurrent free list handling
>>>>> down in park.cpp
>>>>>
>>>>> I did modify Thread::SpinAcquire() to be independent of park event
>>>>> (naked OS functions only), since this method doesn't need to follow
>>>>> Thread.interrupt() semantics (park doesn't make sense here).
>>>>>
>>>>> This allowed a simple spin lock around the resources without worrying
>>>>> about the fact that park is used in runtime locks.
>>>>>
>>>>> Perhaps SpinAcquire/Release could be spit out to a separate
>>>>> stand-alone util class, a some point in the future.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8028280
>>>>> Webrev: http://cr.openjdk.java.net/~dsimms/8028280/
>>>>> <http://cr.openjdk.java.net/%7Edsimms/8028280/>
>>>>>
>>>>> Testing: vm testbase, BigApps (30 mins), jtreg (all 64bit platforms)
>>>>> (...almost completed)
>>>>>
>>>>>
>>>>> /David Simms
>>>>
>>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20140115/56660762/attachment.html