RFR(L): JDK-8046936 : JEP 270: Reserved Stack Areas for Critical Sections

Frederic Parain frederic.parain at oracle.com
Thu Nov 26 16:11:58 UTC 2015


Considering stack overflow as fatal errors makes sense for
JVM running single applications. This could be the subject
of a RFE, the feature is well defined and the implementation
should not be too complex.

However, JEP-270 has been designed with multi-tenant applications
in mind. In this context, we'd like to avoid having to crash the
VM and restart the application and all tenants because one
tenant had a misbehaving thread. The reserved stack area is used
to protect the critical locks of the host application, in order
to give it a chance to cleanly kill the problematic tenant without
impacting the others.

Regards,

Fred

On 24/11/2015 19:16, Steven Schlansker wrote:
>
> On Nov 24, 2015, at 8:46 AM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
>
>> Doug,
>>
>> I have been thinking about this more from the perspective of the original problem
>> we set out to solve
>
> I apologize if this has already been considered -- but for a lot well designed systems,
> occasional application failure is an expected fact of life and we design our HA around
> this with automatic restarts and monitoring.
>
> If it is so hard to detect / resolve a stack overflow situation, maybe one useful
> mitigation of such awful situations (juc hangs, corrupt state, lost locks) would be to
> actually treat a stack overflow as a fatal condition, much like OutOfMemoryError.
>
> In fact, we configure all of our production servers with the moral equivalent of
> -XX:OnOutOfMemoryError="kill -9 %p"
> because once we are in a possibly inconsistent state, we would much rather nuke
> it from orbit and start over.
>
> Maybe introducing some new options, like
> -XX:OnStackOverflowError=
> or
> -XX:TreatStackOverflowAsOOM (piggyback on the existing tunable above)
> would allow end users to avoid the really bad behavior in a controllable way?
>
>
>> , which was identified in the concurrent hash map usage, at the
>> time in the class loading logic. While the class loading logic has changed, I think we
>> have enough experience with this particular example and have studied
>> the code constructs sufficiently that there is value in checking in the small set of
>> JDK changes that target that situation. I also think this gives a sample of
>> the kind of model in which this approach can be effective. In addition, having this small set of
>> changes provides the ability to test and ensure that the hotspot changes continue to
>> work.
>>
>> So I would like to recommend that we go ahead and check in the hotspot changes
>> and the initial minimal set of j.u.c. updates as a way to put the new mechanism
>> in place so that the people with more domain expertise in the java.util.concurrent
>> libraries can experiment with the mechanism and add incremental improvements.
>>
>> thanks,
>> Karen
>>
>>> On Nov 22, 2015, at 7:04 PM, Doug Lea <dl at cs.oswego.edu> wrote:
>>>
>>> On 11/20/2015 12:40 PM, Karen Kinnear wrote:
>>>> Totally appreciate the suggestion that the java.util.concurrent modifications
>>>> be done by folks with more domain expertise.
>>>>
>>>> Would you have us incorporate the initial minimal set of j.u.c. updates or none
>>>> at all?
>>>
>>> Sorry that I'm still in foot-drag mode on this.
>>> Reading David and Fred's exchanges reinforce my thoughts
>>> that there is no defensible rule or approach to
>>> use @ReservedStackAccess so as to add as little time and
>>> space as possible to reduce the occurrence of stuck
>>> resources as much as possible during StackOverflowError.
>>>
>>> After googling "StackOverflowError java util concurrent" and seeing the
>>> range of situations that can be encountered, I don't even know
>>> which kinds of constructions to target.
>>> And I'm less sure whether using @ReservedStackAccess at all
>>> is better than doing nothing.
>>>
>>> Maybe there is some decent empirical strategy, but I can't
>>> tell until hotspot support of @ReservedStackAccess is in place.
>>> So my vote is still to keep the JDK changes out for now.
>>>
>>> -Doug
>>>
>>
>

-- 
Frederic Parain - Oracle
Grenoble Engineering Center - France
Phone: +33 4 76 18 81 17
Email: Frederic.Parain at oracle.com



More information about the core-libs-dev mailing list