synchronization of JVMTI phase notifications [Fwd: Data visibility between threads in Hotspot]

Fri Feb 13 14:13:38 PST 2009

Hiroshi,

I found a couple of good candidates:

    6357005 4/3 'SingleStep' event fails assertion in
                'src/share/vm/prims/jvmtiEventController.cpp, 590'

    6648438 4/4 src/share/vm/prims/jvmtiEnv.cpp:457 assert(phase
                == JVMTI_PHASE_LIVE,"sanity check")

I suspect that I'll have much more luck making a reproducible test
case from the test case in 6357005.

Dan

Daniel D. Daugherty wrote:
> Hiroshi,
>
> I'm leaning toward making the two fields volatile and adding the
> appropriate OrderAccess::XXX calls. I vaguely remember an existing
> bug for events posting late. I'm going to see if I can find it.
>
> Dan
>
>
> Hiroshi Yamauchi wrote:
>> Hi Dan,
>>
>> On Wed, Feb 11, 2009 at 12:43 PM, Daniel D. Daugherty 
>> <Daniel.Daugherty at sun.com <mailto:Daniel.Daugherty at sun.com>> wrote:
>>
>>     I'll chime in on parts of this thread below.
>>
>>
>>
>>
>>     Tim Bell wrote:
>>
>>         I don't know the precise answer to this question, so I am
>>         forwarding it to the Serviceability list to get a wider 
>> audience.
>>
>>         -------- Original Message --------
>>         Subject:     Data visibility between threads in Hotspot
>>         Date:     Tue, 10 Feb 2009 11:14:24 -0800
>>
>>         Hi Tim,
>>
>>         I have a Hotspot question. Chuck Rasbold pointed me to you.
>>         Feel free to
>>         forward this message to other experts at Sun, if needed.
>>
>>         If one Hotspot engineer wants to guarantee the immediate
>>         visibility of a
>>         write to a static variable to reads in other threads (assuming
>>         the reads
>>         and the writes are properly synchronized via a mutex), what
>>         does s/he do?
>>
>>         Does the use of MutexLocker guarantee the visibility? It
>>         probably does not.
>>
>>
>>     I believe that MutexLocker does guarantee the visibility:
>>
>>     src/share/vm/runtime/mutexLocker.hpp:
>>
>>      133  // See orderAccess.hpp.  We assume throughout the VM that
>>     MutexLocker's
>>      134  // and friends constructors do a fence, a lock and an
>>     acquire *in that
>>      135  // order*.  And that their destructors do a release and
>>     unlock, in *that*
>>      136  // order.  If their implementations change such that these
>>     assumptions
>>      137  // are violated, a whole lot of code will break.
>>
>>
>> OK. That's handy.
>>  
>>
>>
>>
>>     See src/share/vm/runtime/orderAccess.hpp for the gory details.
>>
>>
>>
>>
>>         There are several volatile variables in Hotpot, but it may not
>>         really
>>         work because C++ compilers do not guarantee the visibility or
>>         limit the
>>         instruction reordering.
>>
>>     See the "C++ Volatility" section in
>>     src/share/vm/runtime/orderAccess.hpp.
>>
>>
>> It's an interesting header file to read.
>>  
>>
>>
>>
>>
>>         For example, there is the static variable JvmtiEnvBase:_phase
>>         which
>>         indicates the JVMTI phase in which the JVM is currently in.
>>         But AFAICT,
>>         there is no synchronization used by the callers of 
>> get_phase() and
>>         set_phase() and apparently they can be called by multiple
>>         threads. Also,
>>         the JvmtiEventEnabled::_enabled_bits which is a jlong variable
>>         that
>>         indicates which events are enabled for a single jvmtiEnv. The
>>         writes and
>>         reads to it are not synchronized, either. These are race
>>         conditions,
>>         which implies that some JVMTI event can go off in the dead
>>         phase when no
>>         events are supposed to go off. I'm actually looking into a bug
>>         in a
>>         JVMTI agent that I suspect is due to this.
>>
>>
>>     It certainly looks like JvmtiEnvBase::_phase should minimally be a
>>     volatile; it may also require some sort of synchronization to force
>>     the updated values to be visible...
>>
>>     JvmtiEventEnabled::_enabled_bits also looks problematic. I'm going
>>     to have to mull on both of these and talk to some other folks.
>>
>>
>> The bug that I ran across was a memory corruption crash because a 
>> CompiledMethodLoad event happened to be sent (due to this race 
>> condition) after Agent_OnUnload call where I had already deallocated 
>> the agent's data objects. This happens very rarely (once in every 
>> more than 100k runs of a more-or-less javac invocation) but often 
>> enough to fix.
>>
>> I have a patch attached, which I used to suppress the bug by avoiding 
>> the events to be sent during the dead phase by adding 
>> synchronization. Basically I make the _phase variable volatile and I 
>> use a mutex to guard event callbacks in the all the post_xxx 
>> functions and the phase transition to the dead phase in 
>> post_vm_death(). Probably we don't need the volatile as long as the 
>> mutex guarantees the visibility. The other phase transitions 
>> (onload->primodial->start->live) may not need the same degree of 
>> synchronization because events that can be sent in a earlier phase 
>> can be sent in the next phase as well.
>>
>> The _enabled_bits and the callback function table will probably need 
>> a similar synchronization as well. But the attached patch does not 
>> implement it and only ensures that no events are sent during the dead 
>> phase. I'd like to get the fix into openjdk in one form or another.
>>
>> Thanks,
>> Hiroshi
>>
>>  
>>
>>
>>
>>     Dan
>>
>>
>>
>>         Thanks,
>>         Hiroshi
>>
>>