review for 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale

Thu Feb 2 18:43:44 PST 2012

On 3/02/2012 12:32 PM, Tom Rodriguez wrote:
>>>
>>> I think we should implement os::elapsed_counter in terms of javaTimeNanos and that would bring these sources into agreement.  On Solaris they both use getTimeNanos, which is basically gethrtime.  On linux elapsed_counter is gettimeofday and javaTimeNanos uses clock_gettime if it's supported.  It looks like the sources just diverged and elapsed_counter was never updated.  On windows it's also the same though the implementation equivalence is somewhat complicated.  It might be most straightforward to move elapsed_counter into os.cpp and implement it in terms of javaTimeNanos.  Is there any reason that javaTimeMillis shouldn't be implemented in terms of javaTimeNanos too?
>>
>> Switching os::elapsed_counter to use javaTimeNanos on all platforms, in platform-independent code, would be a good thing in my opinion. Only problem as always is the backward compatibility issue. On Linux those numbers were timestamps and could be reconciled with timestamps produced by other sources - if we change them to javaTimeNanos then the correlation with system time is gone.
>
> elapsed_counter is always the delta since the start of the JVM so there really isn't an exposed way to turn them into wall clock times.

Well there are ways because there is a way to ask for the VM start time 
through the M&M interface. But no in general elapsed_counter should 
always be a delta. I'm unclear what the TimeStamp class is actually 
presenting when it is used? Anyway if this is changed we should make 
sure we know all of the clients of this code.

Thanks,
David

>>
>> javaTimeMillis _must_ be the time-of-day, that is what it is required to return.
>
> Ah, right.
>
> tom
>
>>
>> David
>> -----
>>
>>> tom
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Bengt
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Bengt
>>>>>
>>>>>>
>>>>>> tom
>>>>>>
>>>>>>> Bengt
>>>>>>>
>>>>>>>> tom
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> gcLocker.hpp
>>>>>>>>>>
>>>>>>>>>> 84 static void verify_critical_count() NOT_DEBUG_RETURN;
>>>>>>>>>>
>>>>>>>>>> Not sure what the correct thing is here, but the GC code seems to
>>>>>>>>>> use PRODUCT_RETURN more than NOT_DEBUG_RETURN. I guess the only
>>>>>>>>>> difference is for optimized builds (and of course using #ifndef
>>>>>>>>>> PRODUCT instead of #ifdef ASSERT).
>>>>>>>>> I didn't want it doing any work in optimized builds so that's why
>>>>>>>>> it's NOT_DEBUG_RETURN.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> safepoint.cpp
>>>>>>>>>>
>>>>>>>>>> Not strictly related to your change, but the
>>>>>>>>>> SafepointSynchronize::begin() method is missing one indentation
>>>>>>>>>> level due to the the scoping that starts on line 139:
>>>>>>>>> Yes, I noticed that too. The whole thing could be factored better
>>>>>>>>> too. I already pushed so I can't fix it. I've got another change
>>>>>>>>> coming so maybe I'll fix it in there. Thanks.
>>>>>>>>>
>>>>>>>>> tom
>>>>>>>>>
>>>>>>>>>> 139 {
>>>>>>>>>> 140 MutexLocker mu(Safepoint_lock);
>>>>>>>>>> 141
>>>>>>>>>> 142 // Reset the count of active JNI critical threads
>>>>>>>>>> 143 _current_jni_active_count = 0;
>>>>>>>>>>
>>>>>>>>>> This scope ends at the end of the method:
>>>>>>>>>>
>>>>>>>>>> 398 if (PrintSafepointStatistics) {
>>>>>>>>>> 399 // Record how much time spend on the above cleanup tasks
>>>>>>>>>> 400 update_statistics_on_cleanup_end(os::javaTimeNanos());
>>>>>>>>>> 401 }
>>>>>>>>>> 402 }
>>>>>>>>>> 403 }
>>>>>>>>>>
>>>>>>>>>> It seems to me that the scope is not really necessary. In that
>>>>>>>>>> case, can we remove it? It would be nice to have the method
>>>>>>>>>> correctly indented. If we decide to keep the scope I'd prefer a
>>>>>>>>>> comment on line 402, to indicate that it is the end of the
>>>>>>>>>> MutexLocker scope.
>>>>>>>>>>
>>>>>>>>>> Bengt
>>>>>>>>>>
>>>>>>>>>> On 2012-01-30 03:26, David Holmes wrote:
>>>>>>>>>>> Hi Tom,
>>>>>>>>>>>
>>>>>>>>>>> On 30/01/2012 10:30 AM, Tom Rodriguez wrote:
>>>>>>>>>>>> On Jan 29, 2012, at 3:53 PM, David Holmes wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 28/01/2012 5:25 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>> I change alias to hotspot-dev since it affects everyone.
>>>>>>>>>>>>> Thanks Vladimir!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Overall I find it difficult to follow the protocol that applies
>>>>>>>>>>>>> here - in particular how safepoints can and can not appear
>>>>>>>>>>>>> between different methods and the way in which needs_gc() might
>>>>>>>>>>>>> change value between the enter and exit of a critical region.
>>>>>>>>>>>>> Without that understanding it is hard to evaluate the
>>>>>>>>>>>>> correctness of the new approach.
>>>>>>>>>>>> I tried to put in enough asserts to ensure that it was clear
>>>>>>>>>>>> what invariants are in force when updating the state. Safepoints
>>>>>>>>>>>> can appear wherever they normally can appear, while locking and
>>>>>>>>>>>> waiting on mutexes. _needs_gc can only change from false to true
>>>>>>>>>>>> during a safepoint. For the most part everything operates
>>>>>>>>>>>> exactly as it used to once _needs_gc is set and the only trick
>>>>>>>>>>>> is computing the correct value of _jni_active_count when setting
>>>>>>>>>>>> _needs_gc to be true. The debug code that still performs the
>>>>>>>>>>>> atomics attempts to verify that the computed value and actual
>>>>>>>>>>>> value are in sync. Is there any more that I can explain about
>>>>>>>>>>>> how it operates?
>>>>>>>>>>> It's my ignorance of the overall operation that makes evaluating
>>>>>>>>>>> this change difficult for me. But thanks to your side-band emails
>>>>>>>>>>> I now have a much clear understanding of things - thanks. And the
>>>>>>>>>>> protocol seems safe.
>>>>>>>>>>>
>>>>>>>>>>>>> src/share/vm/memory/gcLocker.cpp
>>>>>>>>>>>>>
>>>>>>>>>>>>> 72 _wait_begin = os::javaTimeMillis();
>>>>>>>>>>>>> 135 os::javaTimeMillis() - _wait_begin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you sure you want to use the non-monotonic javaTimeMillis
>>>>>>>>>>>>> here? Use of this was recently changed elsewhere in the GC code.
>>>>>>>>>>>> It's only printing code, but I'll switch it to use
>>>>>>>>>>>> tty->timestamp().milliseconds() which will make it match our
>>>>>>>>>>>> internal time stamps, which seems like a useful property. Is
>>>>>>>>>>>> that ok?
>>>>>>>>>>> The recent GC changes switched to using javaTimeNanos() converted
>>>>>>>>>>> to millis - see 7117303 and 7129514. The TimeStamp is based on
>>>>>>>>>>> elapsedCounter which is still a time-of-day source on some
>>>>>>>>>>> platforms and so not monotonic.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> I've updated the webrev.
>>>>>>>>>>>>
>>>>>>>>>>>> tom
>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In GC_locker::check_active_before_gc() move wait_begin
>>>>>>>>>>>>>> initialization
>>>>>>>>>>>>>> under Print* check since it used only under such checks. Could
>>>>>>>>>>>>>> method
>>>>>>>>>>>>>> check_active_before_gc() be called from different threads
>>>>>>>>>>>>>> concurrently?
>>>>>>>>>>>>>> Does it need lock? Note that other methods have lock.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tom Rodriguez wrote:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~never/7129164
>>>>>>>>>>>>>>> 200 lines changed: 126 ins; 33 del; 41 mod; 3958 unchg
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale
>>>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>>>> Reviewed-by:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The machinery for GC_locker which supports
>>>>>>>>>>>>>>> GetPrimitiveArrayCritical
>>>>>>>>>>>>>>> maintains a count of the number of threads that currently
>>>>>>>>>>>>>>> have raw
>>>>>>>>>>>>>>> pointers exposed. This is used to allow existing critical
>>>>>>>>>>>>>>> sections to
>>>>>>>>>>>>>>> drain before creating new ones so that a GC can occur. Currently
>>>>>>>>>>>>>>> these counts are updated using atomic all the time, even if a
>>>>>>>>>>>>>>> GC isn't
>>>>>>>>>>>>>>> being requested. This creates scalability problem when a lot of
>>>>>>>>>>>>>>> threads are hammering atomic operations on the
>>>>>>>>>>>>>>> jni_lock_count. The
>>>>>>>>>>>>>>> count only needs to be valid when checking if a critical
>>>>>>>>>>>>>>> section is
>>>>>>>>>>>>>>> currently active and when draining the existing sections. The
>>>>>>>>>>>>>>> fix is
>>>>>>>>>>>>>>> to make the count be computed as part of the safepointing
>>>>>>>>>>>>>>> machinery
>>>>>>>>>>>>>>> and to only decrement the counters when _needs_gc is true. In
>>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>>> mode the old count is maintained and validated against the
>>>>>>>>>>>>>>> lazily
>>>>>>>>>>>>>>> computed count.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On a microbenchmark that stresses GetPrimitiveArrayCritical
>>>>>>>>>>>>>>> with many
>>>>>>>>>>>>>>> threads and relatively short critical section it can more
>>>>>>>>>>>>>>> than double
>>>>>>>>>>>>>>> the throughput. This also slightly speeds up the normal
>>>>>>>>>>>>>>> GetPrimtiveArrayCritical calls. Mostly it's not easily
>>>>>>>>>>>>>>> measurable
>>>>>>>>>>>>>>> with larger scale programs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tested with microbenchmark that stresses
>>>>>>>>>>>>>>> GetPrimtiveArrayCritical and
>>>>>>>>>>>>>>> the crypto benchmarks of specjvm2008 on x86 and sparc. I also
>>>>>>>>>>>>>>> ran the
>>>>>>>>>>>>>>> java.io regression tests from the JDK.
>>>>>>>>>>>>>>>
>>>>>
>>>
>