review for 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale

Thu Feb 2 17:30:08 PST 2012

On 3/02/2012 4:03 AM, Tom Rodriguez wrote:
>>> First, I was not really arguing for doing this one way or the other. It
>>> was just not clear to me from the review comments if you wanted to
>>> change to javaTimeNanos() or not.
>>>
>>> You are right about the use in the GC code. We recently changed to use
>>> javaTimeNanos() for time intervals, but we use
>>> tty->timestamp().seconds() to print the time stamps. If I remember
>>> correctly, you were logging how long a thread had to wait, so it seemed
>>> like an interval to me. But, as I said, I was mostly just wondering what
>>> the decision was.
>>>
>>> It would probably be a good idea to use the same time source for time
>>> stamps and intervals. When we were using javaTimeMillis() I think it was
>>> consistent, but maybe we should now change to javaTimeNanos() for
>>> tty->timestamp()?
>>>
>>> I can't say I know the implications of changing tty->timestamp() to use
>>> the same timesource as javaTimeNanos(). Currently they boil down to the
>>> same OS calls on Windows, but on Linux and Solaris the implementations
>>> are quite different.
>>
>> The primary difference is that javaTimeMillis reports the time-of-day, while javaTimeNanos does not. If you really want a timestamp that you can correlate with other time-related events then javaTimeMillis is what you need, but then you have to deal with lower resolution and non-monotonicity. A compromise is a time-source with the resolution of nanoTime which is sync'ed to time-of-day at VM start (this is what the real-time clock gives you in the Real-Time Specification for Java).
>
> I think we should implement os::elapsed_counter in terms of javaTimeNanos and that would bring these sources into agreement.  On Solaris they both use getTimeNanos, which is basically gethrtime.  On linux elapsed_counter is gettimeofday and javaTimeNanos uses clock_gettime if it's supported.  It looks like the sources just diverged and elapsed_counter was never updated.  On windows it's also the same though the implementation equivalence is somewhat complicated.  It might be most straightforward to move elapsed_counter into os.cpp and implement it in terms of javaTimeNanos.  Is there any reason that javaTimeMillis shouldn't be implemented in terms of javaTimeNanos too?

Switching os::elapsed_counter to use javaTimeNanos on all platforms, in 
platform-independent code, would be a good thing in my opinion. Only 
problem as always is the backward compatibility issue. On Linux those 
numbers were timestamps and could be reconciled with timestamps produced 
by other sources - if we change them to javaTimeNanos then the 
correlation with system time is gone.

javaTimeMillis _must_ be the time-of-day, that is what it is required to 
return.

David
-----

> tom
>
>>
>> David
>> -----
>>
>>> Bengt
>>>
>>>
>>>
>>>
>>>
>>>
>>> Bengt
>>>
>>>>
>>>> tom
>>>>
>>>>> Bengt
>>>>>
>>>>>> tom
>>>>>>
>>>>>>>>
>>>>>>>> gcLocker.hpp
>>>>>>>>
>>>>>>>> 84 static void verify_critical_count() NOT_DEBUG_RETURN;
>>>>>>>>
>>>>>>>> Not sure what the correct thing is here, but the GC code seems to
>>>>>>>> use PRODUCT_RETURN more than NOT_DEBUG_RETURN. I guess the only
>>>>>>>> difference is for optimized builds (and of course using #ifndef
>>>>>>>> PRODUCT instead of #ifdef ASSERT).
>>>>>>> I didn't want it doing any work in optimized builds so that's why
>>>>>>> it's NOT_DEBUG_RETURN.
>>>>>>>
>>>>>>>>
>>>>>>>> safepoint.cpp
>>>>>>>>
>>>>>>>> Not strictly related to your change, but the
>>>>>>>> SafepointSynchronize::begin() method is missing one indentation
>>>>>>>> level due to the the scoping that starts on line 139:
>>>>>>> Yes, I noticed that too. The whole thing could be factored better
>>>>>>> too. I already pushed so I can't fix it. I've got another change
>>>>>>> coming so maybe I'll fix it in there. Thanks.
>>>>>>>
>>>>>>> tom
>>>>>>>
>>>>>>>> 139 {
>>>>>>>> 140 MutexLocker mu(Safepoint_lock);
>>>>>>>> 141
>>>>>>>> 142 // Reset the count of active JNI critical threads
>>>>>>>> 143 _current_jni_active_count = 0;
>>>>>>>>
>>>>>>>> This scope ends at the end of the method:
>>>>>>>>
>>>>>>>> 398 if (PrintSafepointStatistics) {
>>>>>>>> 399 // Record how much time spend on the above cleanup tasks
>>>>>>>> 400 update_statistics_on_cleanup_end(os::javaTimeNanos());
>>>>>>>> 401 }
>>>>>>>> 402 }
>>>>>>>> 403 }
>>>>>>>>
>>>>>>>> It seems to me that the scope is not really necessary. In that
>>>>>>>> case, can we remove it? It would be nice to have the method
>>>>>>>> correctly indented. If we decide to keep the scope I'd prefer a
>>>>>>>> comment on line 402, to indicate that it is the end of the
>>>>>>>> MutexLocker scope.
>>>>>>>>
>>>>>>>> Bengt
>>>>>>>>
>>>>>>>> On 2012-01-30 03:26, David Holmes wrote:
>>>>>>>>> Hi Tom,
>>>>>>>>>
>>>>>>>>> On 30/01/2012 10:30 AM, Tom Rodriguez wrote:
>>>>>>>>>> On Jan 29, 2012, at 3:53 PM, David Holmes wrote:
>>>>>>>>>>
>>>>>>>>>>> On 28/01/2012 5:25 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>> I change alias to hotspot-dev since it affects everyone.
>>>>>>>>>>> Thanks Vladimir!
>>>>>>>>>>>
>>>>>>>>>>> Overall I find it difficult to follow the protocol that applies
>>>>>>>>>>> here - in particular how safepoints can and can not appear
>>>>>>>>>>> between different methods and the way in which needs_gc() might
>>>>>>>>>>> change value between the enter and exit of a critical region.
>>>>>>>>>>> Without that understanding it is hard to evaluate the
>>>>>>>>>>> correctness of the new approach.
>>>>>>>>>> I tried to put in enough asserts to ensure that it was clear
>>>>>>>>>> what invariants are in force when updating the state. Safepoints
>>>>>>>>>> can appear wherever they normally can appear, while locking and
>>>>>>>>>> waiting on mutexes. _needs_gc can only change from false to true
>>>>>>>>>> during a safepoint. For the most part everything operates
>>>>>>>>>> exactly as it used to once _needs_gc is set and the only trick
>>>>>>>>>> is computing the correct value of _jni_active_count when setting
>>>>>>>>>> _needs_gc to be true. The debug code that still performs the
>>>>>>>>>> atomics attempts to verify that the computed value and actual
>>>>>>>>>> value are in sync. Is there any more that I can explain about
>>>>>>>>>> how it operates?
>>>>>>>>> It's my ignorance of the overall operation that makes evaluating
>>>>>>>>> this change difficult for me. But thanks to your side-band emails
>>>>>>>>> I now have a much clear understanding of things - thanks. And the
>>>>>>>>> protocol seems safe.
>>>>>>>>>
>>>>>>>>>>> src/share/vm/memory/gcLocker.cpp
>>>>>>>>>>>
>>>>>>>>>>> 72 _wait_begin = os::javaTimeMillis();
>>>>>>>>>>> 135 os::javaTimeMillis() - _wait_begin,
>>>>>>>>>>>
>>>>>>>>>>> Are you sure you want to use the non-monotonic javaTimeMillis
>>>>>>>>>>> here? Use of this was recently changed elsewhere in the GC code.
>>>>>>>>>> It's only printing code, but I'll switch it to use
>>>>>>>>>> tty->timestamp().milliseconds() which will make it match our
>>>>>>>>>> internal time stamps, which seems like a useful property. Is
>>>>>>>>>> that ok?
>>>>>>>>> The recent GC changes switched to using javaTimeNanos() converted
>>>>>>>>> to millis - see 7117303 and 7129514. The TimeStamp is based on
>>>>>>>>> elapsedCounter which is still a time-of-day source on some
>>>>>>>>> platforms and so not monotonic.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> I've updated the webrev.
>>>>>>>>>>
>>>>>>>>>> tom
>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> In GC_locker::check_active_before_gc() move wait_begin
>>>>>>>>>>>> initialization
>>>>>>>>>>>> under Print* check since it used only under such checks. Could
>>>>>>>>>>>> method
>>>>>>>>>>>> check_active_before_gc() be called from different threads
>>>>>>>>>>>> concurrently?
>>>>>>>>>>>> Does it need lock? Note that other methods have lock.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> Tom Rodriguez wrote:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~never/7129164
>>>>>>>>>>>>> 200 lines changed: 126 ins; 33 del; 41 mod; 3958 unchg
>>>>>>>>>>>>>
>>>>>>>>>>>>> 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale
>>>>>>>>>>>>> Summary:
>>>>>>>>>>>>> Reviewed-by:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The machinery for GC_locker which supports
>>>>>>>>>>>>> GetPrimitiveArrayCritical
>>>>>>>>>>>>> maintains a count of the number of threads that currently
>>>>>>>>>>>>> have raw
>>>>>>>>>>>>> pointers exposed. This is used to allow existing critical
>>>>>>>>>>>>> sections to
>>>>>>>>>>>>> drain before creating new ones so that a GC can occur. Currently
>>>>>>>>>>>>> these counts are updated using atomic all the time, even if a
>>>>>>>>>>>>> GC isn't
>>>>>>>>>>>>> being requested. This creates scalability problem when a lot of
>>>>>>>>>>>>> threads are hammering atomic operations on the
>>>>>>>>>>>>> jni_lock_count. The
>>>>>>>>>>>>> count only needs to be valid when checking if a critical
>>>>>>>>>>>>> section is
>>>>>>>>>>>>> currently active and when draining the existing sections. The
>>>>>>>>>>>>> fix is
>>>>>>>>>>>>> to make the count be computed as part of the safepointing
>>>>>>>>>>>>> machinery
>>>>>>>>>>>>> and to only decrement the counters when _needs_gc is true. In
>>>>>>>>>>>>> debug
>>>>>>>>>>>>> mode the old count is maintained and validated against the
>>>>>>>>>>>>> lazily
>>>>>>>>>>>>> computed count.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On a microbenchmark that stresses GetPrimitiveArrayCritical
>>>>>>>>>>>>> with many
>>>>>>>>>>>>> threads and relatively short critical section it can more
>>>>>>>>>>>>> than double
>>>>>>>>>>>>> the throughput. This also slightly speeds up the normal
>>>>>>>>>>>>> GetPrimtiveArrayCritical calls. Mostly it's not easily
>>>>>>>>>>>>> measurable
>>>>>>>>>>>>> with larger scale programs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tested with microbenchmark that stresses
>>>>>>>>>>>>> GetPrimtiveArrayCritical and
>>>>>>>>>>>>> the crypto benchmarks of specjvm2008 on x86 and sparc. I also
>>>>>>>>>>>>> ran the
>>>>>>>>>>>>> java.io regression tests from the JDK.
>>>>>>>>>>>>>
>>>
>