review for 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale

Mon Jan 30 02:14:43 PST 2012

Hi Tom,

Looked at your latest webrev. Looks good to me.

Some minor comments:

gcLocker.cpp

You still use tty->timestamp().milliseconds() instead of 
javaTimeNanos(). Not sure whether you just didn't get to fixing it yet 
or if you decided against it.

gcLocker.hpp

   84   static void verify_critical_count() NOT_DEBUG_RETURN;

Not sure what the correct thing is here, but the GC code seems to use 
PRODUCT_RETURN more than NOT_DEBUG_RETURN. I guess the only difference 
is for optimized builds (and of course using #ifndef PRODUCT instead of 
#ifdef ASSERT).

safepoint.cpp

Not strictly related to your change, but the 
SafepointSynchronize::begin() method is missing one indentation level 
due to the the scoping that starts on line 139:

  139   {
  140   MutexLocker mu(Safepoint_lock);
  141
  142   // Reset the count of active JNI critical threads
  143   _current_jni_active_count = 0;

  This scope ends at the end of the method:

  398   if (PrintSafepointStatistics) {
  399     // Record how much time spend on the above cleanup tasks
  400     update_statistics_on_cleanup_end(os::javaTimeNanos());
  401   }
  402   }
  403 }

It seems to me that the scope is not really necessary. In that case, can 
we remove it? It would be nice to have the method correctly indented. If 
we decide to keep the scope I'd prefer a comment on line 402, to 
indicate that it is the end of the MutexLocker scope.

Bengt

On 2012-01-30 03:26, David Holmes wrote:
> Hi Tom,
>
> On 30/01/2012 10:30 AM, Tom Rodriguez wrote:
>> On Jan 29, 2012, at 3:53 PM, David Holmes wrote:
>>
>>> On 28/01/2012 5:25 AM, Vladimir Kozlov wrote:
>>>> I change alias to hotspot-dev since it affects everyone.
>>>
>>> Thanks Vladimir!
>>>
>>> Overall I find it difficult to follow the protocol that applies here 
>>> - in particular how safepoints can and can not appear between 
>>> different methods and the way in which needs_gc() might change value 
>>> between the enter and exit of a critical region. Without that 
>>> understanding it is hard to evaluate the correctness of the new 
>>> approach.
>>
>> I tried to put in enough asserts to ensure that it was clear what 
>> invariants are in force when updating the state.  Safepoints can 
>> appear wherever they normally can appear, while locking and waiting 
>> on mutexes.  _needs_gc can only change from false to true during a 
>> safepoint.  For the most part everything operates exactly as it used 
>> to once _needs_gc is set and the only trick is computing the correct 
>> value of _jni_active_count when setting _needs_gc to be true.  The 
>> debug code that still performs the atomics attempts to verify that 
>> the computed value and actual value are in sync.  Is there any more 
>> that I can explain about how it operates?
>
> It's my ignorance of the overall operation that makes evaluating this 
> change difficult for me. But thanks to your side-band emails I now 
> have a much clear understanding of things - thanks. And the protocol 
> seems safe.
>
>>> src/share/vm/memory/gcLocker.cpp
>>>
>>> 72       _wait_begin = os::javaTimeMillis();
>>> 135      os::javaTimeMillis() - _wait_begin,
>>>
>>> Are you sure you want to use the non-monotonic javaTimeMillis here? 
>>> Use of this was recently changed elsewhere in the GC code.
>>
>> It's only printing code, but I'll switch it to use 
>> tty->timestamp().milliseconds() which will make it match our internal 
>> time stamps, which seems like a useful property.  Is that ok?
>
> The recent GC changes switched to using javaTimeNanos() converted to 
> millis - see 7117303 and 7129514. The TimeStamp is based on 
> elapsedCounter which is still a time-of-day source on some platforms 
> and so not monotonic.
>
> Thanks,
> David
>
>> I've updated the webrev.
>>
>> tom
>>
>>>
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>
>>>
>>>> In GC_locker::check_active_before_gc() move wait_begin initialization
>>>> under Print* check since it used only under such checks. Could method
>>>> check_active_before_gc() be called from different threads 
>>>> concurrently?
>>>> Does it need lock? Note that other methods have lock.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> Tom Rodriguez wrote:
>>>>> http://cr.openjdk.java.net/~never/7129164
>>>>> 200 lines changed: 126 ins; 33 del; 41 mod; 3958 unchg
>>>>>
>>>>> 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale
>>>>> Summary:
>>>>> Reviewed-by:
>>>>>
>>>>> The machinery for GC_locker which supports GetPrimitiveArrayCritical
>>>>> maintains a count of the number of threads that currently have raw
>>>>> pointers exposed. This is used to allow existing critical sections to
>>>>> drain before creating new ones so that a GC can occur. Currently
>>>>> these counts are updated using atomic all the time, even if a GC 
>>>>> isn't
>>>>> being requested. This creates scalability problem when a lot of
>>>>> threads are hammering atomic operations on the jni_lock_count. The
>>>>> count only needs to be valid when checking if a critical section is
>>>>> currently active and when draining the existing sections. The fix is
>>>>> to make the count be computed as part of the safepointing machinery
>>>>> and to only decrement the counters when _needs_gc is true. In debug
>>>>> mode the old count is maintained and validated against the lazily
>>>>> computed count.
>>>>>
>>>>> On a microbenchmark that stresses GetPrimitiveArrayCritical with many
>>>>> threads and relatively short critical section it can more than double
>>>>> the throughput. This also slightly speeds up the normal
>>>>> GetPrimtiveArrayCritical calls. Mostly it's not easily measurable
>>>>> with larger scale programs.
>>>>>
>>>>> Tested with microbenchmark that stresses GetPrimtiveArrayCritical and
>>>>> the crypto benchmarks of specjvm2008 on x86 and sparc. I also ran the
>>>>> java.io regression tests from the JDK.
>>>>>
>>