review for 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale

Thu Feb 2 18:32:58 PST 2012

>> 
>> I think we should implement os::elapsed_counter in terms of javaTimeNanos and that would bring these sources into agreement.  On Solaris they both use getTimeNanos, which is basically gethrtime.  On linux elapsed_counter is gettimeofday and javaTimeNanos uses clock_gettime if it's supported.  It looks like the sources just diverged and elapsed_counter was never updated.  On windows it's also the same though the implementation equivalence is somewhat complicated.  It might be most straightforward to move elapsed_counter into os.cpp and implement it in terms of javaTimeNanos.  Is there any reason that javaTimeMillis shouldn't be implemented in terms of javaTimeNanos too?
> 
> Switching os::elapsed_counter to use javaTimeNanos on all platforms, in platform-independent code, would be a good thing in my opinion. Only problem as always is the backward compatibility issue. On Linux those numbers were timestamps and could be reconciled with timestamps produced by other sources - if we change them to javaTimeNanos then the correlation with system time is gone.

elapsed_counter is always the delta since the start of the JVM so there really isn't an exposed way to turn them into wall clock times.

> 
> javaTimeMillis _must_ be the time-of-day, that is what it is required to return.

Ah, right.

tom

> 
> David
> -----
> 
>> tom
>> 
>>> 
>>> David
>>> -----
>>> 
>>>> Bengt
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Bengt
>>>> 
>>>>> 
>>>>> tom
>>>>> 
>>>>>> Bengt
>>>>>> 
>>>>>>> tom
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> gcLocker.hpp
>>>>>>>>> 
>>>>>>>>> 84 static void verify_critical_count() NOT_DEBUG_RETURN;
>>>>>>>>> 
>>>>>>>>> Not sure what the correct thing is here, but the GC code seems to
>>>>>>>>> use PRODUCT_RETURN more than NOT_DEBUG_RETURN. I guess the only
>>>>>>>>> difference is for optimized builds (and of course using #ifndef
>>>>>>>>> PRODUCT instead of #ifdef ASSERT).
>>>>>>>> I didn't want it doing any work in optimized builds so that's why
>>>>>>>> it's NOT_DEBUG_RETURN.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> safepoint.cpp
>>>>>>>>> 
>>>>>>>>> Not strictly related to your change, but the
>>>>>>>>> SafepointSynchronize::begin() method is missing one indentation
>>>>>>>>> level due to the the scoping that starts on line 139:
>>>>>>>> Yes, I noticed that too. The whole thing could be factored better
>>>>>>>> too. I already pushed so I can't fix it. I've got another change
>>>>>>>> coming so maybe I'll fix it in there. Thanks.
>>>>>>>> 
>>>>>>>> tom
>>>>>>>> 
>>>>>>>>> 139 {
>>>>>>>>> 140 MutexLocker mu(Safepoint_lock);
>>>>>>>>> 141
>>>>>>>>> 142 // Reset the count of active JNI critical threads
>>>>>>>>> 143 _current_jni_active_count = 0;
>>>>>>>>> 
>>>>>>>>> This scope ends at the end of the method:
>>>>>>>>> 
>>>>>>>>> 398 if (PrintSafepointStatistics) {
>>>>>>>>> 399 // Record how much time spend on the above cleanup tasks
>>>>>>>>> 400 update_statistics_on_cleanup_end(os::javaTimeNanos());
>>>>>>>>> 401 }
>>>>>>>>> 402 }
>>>>>>>>> 403 }
>>>>>>>>> 
>>>>>>>>> It seems to me that the scope is not really necessary. In that
>>>>>>>>> case, can we remove it? It would be nice to have the method
>>>>>>>>> correctly indented. If we decide to keep the scope I'd prefer a
>>>>>>>>> comment on line 402, to indicate that it is the end of the
>>>>>>>>> MutexLocker scope.
>>>>>>>>> 
>>>>>>>>> Bengt
>>>>>>>>> 
>>>>>>>>> On 2012-01-30 03:26, David Holmes wrote:
>>>>>>>>>> Hi Tom,
>>>>>>>>>> 
>>>>>>>>>> On 30/01/2012 10:30 AM, Tom Rodriguez wrote:
>>>>>>>>>>> On Jan 29, 2012, at 3:53 PM, David Holmes wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> On 28/01/2012 5:25 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>> I change alias to hotspot-dev since it affects everyone.
>>>>>>>>>>>> Thanks Vladimir!
>>>>>>>>>>>> 
>>>>>>>>>>>> Overall I find it difficult to follow the protocol that applies
>>>>>>>>>>>> here - in particular how safepoints can and can not appear
>>>>>>>>>>>> between different methods and the way in which needs_gc() might
>>>>>>>>>>>> change value between the enter and exit of a critical region.
>>>>>>>>>>>> Without that understanding it is hard to evaluate the
>>>>>>>>>>>> correctness of the new approach.
>>>>>>>>>>> I tried to put in enough asserts to ensure that it was clear
>>>>>>>>>>> what invariants are in force when updating the state. Safepoints
>>>>>>>>>>> can appear wherever they normally can appear, while locking and
>>>>>>>>>>> waiting on mutexes. _needs_gc can only change from false to true
>>>>>>>>>>> during a safepoint. For the most part everything operates
>>>>>>>>>>> exactly as it used to once _needs_gc is set and the only trick
>>>>>>>>>>> is computing the correct value of _jni_active_count when setting
>>>>>>>>>>> _needs_gc to be true. The debug code that still performs the
>>>>>>>>>>> atomics attempts to verify that the computed value and actual
>>>>>>>>>>> value are in sync. Is there any more that I can explain about
>>>>>>>>>>> how it operates?
>>>>>>>>>> It's my ignorance of the overall operation that makes evaluating
>>>>>>>>>> this change difficult for me. But thanks to your side-band emails
>>>>>>>>>> I now have a much clear understanding of things - thanks. And the
>>>>>>>>>> protocol seems safe.
>>>>>>>>>> 
>>>>>>>>>>>> src/share/vm/memory/gcLocker.cpp
>>>>>>>>>>>> 
>>>>>>>>>>>> 72 _wait_begin = os::javaTimeMillis();
>>>>>>>>>>>> 135 os::javaTimeMillis() - _wait_begin,
>>>>>>>>>>>> 
>>>>>>>>>>>> Are you sure you want to use the non-monotonic javaTimeMillis
>>>>>>>>>>>> here? Use of this was recently changed elsewhere in the GC code.
>>>>>>>>>>> It's only printing code, but I'll switch it to use
>>>>>>>>>>> tty->timestamp().milliseconds() which will make it match our
>>>>>>>>>>> internal time stamps, which seems like a useful property. Is
>>>>>>>>>>> that ok?
>>>>>>>>>> The recent GC changes switched to using javaTimeNanos() converted
>>>>>>>>>> to millis - see 7117303 and 7129514. The TimeStamp is based on
>>>>>>>>>> elapsedCounter which is still a time-of-day source on some
>>>>>>>>>> platforms and so not monotonic.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>>> I've updated the webrev.
>>>>>>>>>>> 
>>>>>>>>>>> tom
>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> In GC_locker::check_active_before_gc() move wait_begin
>>>>>>>>>>>>> initialization
>>>>>>>>>>>>> under Print* check since it used only under such checks. Could
>>>>>>>>>>>>> method
>>>>>>>>>>>>> check_active_before_gc() be called from different threads
>>>>>>>>>>>>> concurrently?
>>>>>>>>>>>>> Does it need lock? Note that other methods have lock.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Tom Rodriguez wrote:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~never/7129164
>>>>>>>>>>>>>> 200 lines changed: 126 ins; 33 del; 41 mod; 3958 unchg
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale
>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>> Reviewed-by:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The machinery for GC_locker which supports
>>>>>>>>>>>>>> GetPrimitiveArrayCritical
>>>>>>>>>>>>>> maintains a count of the number of threads that currently
>>>>>>>>>>>>>> have raw
>>>>>>>>>>>>>> pointers exposed. This is used to allow existing critical
>>>>>>>>>>>>>> sections to
>>>>>>>>>>>>>> drain before creating new ones so that a GC can occur. Currently
>>>>>>>>>>>>>> these counts are updated using atomic all the time, even if a
>>>>>>>>>>>>>> GC isn't
>>>>>>>>>>>>>> being requested. This creates scalability problem when a lot of
>>>>>>>>>>>>>> threads are hammering atomic operations on the
>>>>>>>>>>>>>> jni_lock_count. The
>>>>>>>>>>>>>> count only needs to be valid when checking if a critical
>>>>>>>>>>>>>> section is
>>>>>>>>>>>>>> currently active and when draining the existing sections. The
>>>>>>>>>>>>>> fix is
>>>>>>>>>>>>>> to make the count be computed as part of the safepointing
>>>>>>>>>>>>>> machinery
>>>>>>>>>>>>>> and to only decrement the counters when _needs_gc is true. In
>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>> mode the old count is maintained and validated against the
>>>>>>>>>>>>>> lazily
>>>>>>>>>>>>>> computed count.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On a microbenchmark that stresses GetPrimitiveArrayCritical
>>>>>>>>>>>>>> with many
>>>>>>>>>>>>>> threads and relatively short critical section it can more
>>>>>>>>>>>>>> than double
>>>>>>>>>>>>>> the throughput. This also slightly speeds up the normal
>>>>>>>>>>>>>> GetPrimtiveArrayCritical calls. Mostly it's not easily
>>>>>>>>>>>>>> measurable
>>>>>>>>>>>>>> with larger scale programs.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Tested with microbenchmark that stresses
>>>>>>>>>>>>>> GetPrimtiveArrayCritical and
>>>>>>>>>>>>>> the crypto benchmarks of specjvm2008 on x86 and sparc. I also
>>>>>>>>>>>>>> ran the
>>>>>>>>>>>>>> java.io regression tests from the JDK.
>>>>>>>>>>>>>> 
>>>> 
>>