review for 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale

Thu Feb 2 01:31:25 PST 2012

On 2/02/2012 6:49 PM, Bengt Rutisson wrote:
> On 2012-02-01 20:08, Tom Rodriguez wrote:
>>>>>>
>>>>>> You still use tty->timestamp().milliseconds() instead of
>>>>>> javaTimeNanos(). Not sure whether you just didn't get to fixing it
>>>>>> yet or if you decided against it.
>>>>> I decided against it. It's a printing timestamp so matching the
>>>>> time stamp in tty seemed to make more sense to me.
>>>> I'm about to send out a webrev for some other changes and I'm going
>>>> to correct this to use javaTimeNanos to be consistent with the other
>>>> GC timestamps. I'm also going to remove the extra braces in
>>>> safepoint.cpp.
>>> Had a quick look at the changes you sent out for 7013347. I see that
>>> you have included these fixes there. Thanks!
>> I think that using tty->timestamp().milliseconds() was actually
>> correct. GC uses gclog_or_tty->stamp() or gclog_or_tty->date_stamp()
>> for all the times printed in the logs that I can see, which uses the
>> same value as timestamp().milliseconds(). javaTimeNanos (and previous
>> javaTimeMillis) appears to be used for measuring elapsed times. Am I
>> reading this wrong?
>
> First, I was not really arguing for doing this one way or the other. It
> was just not clear to me from the review comments if you wanted to
> change to javaTimeNanos() or not.
>
> You are right about the use in the GC code. We recently changed to use
> javaTimeNanos() for time intervals, but we use
> tty->timestamp().seconds() to print the time stamps. If I remember
> correctly, you were logging how long a thread had to wait, so it seemed
> like an interval to me. But, as I said, I was mostly just wondering what
> the decision was.
>
> It would probably be a good idea to use the same time source for time
> stamps and intervals. When we were using javaTimeMillis() I think it was
> consistent, but maybe we should now change to javaTimeNanos() for
> tty->timestamp()?
>
> I can't say I know the implications of changing tty->timestamp() to use
> the same timesource as javaTimeNanos(). Currently they boil down to the
> same OS calls on Windows, but on Linux and Solaris the implementations
> are quite different.

The primary difference is that javaTimeMillis reports the time-of-day, 
while javaTimeNanos does not. If you really want a timestamp that you 
can correlate with other time-related events then javaTimeMillis is what 
you need, but then you have to deal with lower resolution and 
non-monotonicity. A compromise is a time-source with the resolution of 
nanoTime which is sync'ed to time-of-day at VM start (this is what the 
real-time clock gives you in the Real-Time Specification for Java).

David
-----

> Bengt
>
>
>
>
>
>
> Bengt
>
>>
>> tom
>>
>>> Bengt
>>>
>>>> tom
>>>>
>>>>>>
>>>>>> gcLocker.hpp
>>>>>>
>>>>>> 84 static void verify_critical_count() NOT_DEBUG_RETURN;
>>>>>>
>>>>>> Not sure what the correct thing is here, but the GC code seems to
>>>>>> use PRODUCT_RETURN more than NOT_DEBUG_RETURN. I guess the only
>>>>>> difference is for optimized builds (and of course using #ifndef
>>>>>> PRODUCT instead of #ifdef ASSERT).
>>>>> I didn't want it doing any work in optimized builds so that's why
>>>>> it's NOT_DEBUG_RETURN.
>>>>>
>>>>>>
>>>>>> safepoint.cpp
>>>>>>
>>>>>> Not strictly related to your change, but the
>>>>>> SafepointSynchronize::begin() method is missing one indentation
>>>>>> level due to the the scoping that starts on line 139:
>>>>> Yes, I noticed that too. The whole thing could be factored better
>>>>> too. I already pushed so I can't fix it. I've got another change
>>>>> coming so maybe I'll fix it in there. Thanks.
>>>>>
>>>>> tom
>>>>>
>>>>>> 139 {
>>>>>> 140 MutexLocker mu(Safepoint_lock);
>>>>>> 141
>>>>>> 142 // Reset the count of active JNI critical threads
>>>>>> 143 _current_jni_active_count = 0;
>>>>>>
>>>>>> This scope ends at the end of the method:
>>>>>>
>>>>>> 398 if (PrintSafepointStatistics) {
>>>>>> 399 // Record how much time spend on the above cleanup tasks
>>>>>> 400 update_statistics_on_cleanup_end(os::javaTimeNanos());
>>>>>> 401 }
>>>>>> 402 }
>>>>>> 403 }
>>>>>>
>>>>>> It seems to me that the scope is not really necessary. In that
>>>>>> case, can we remove it? It would be nice to have the method
>>>>>> correctly indented. If we decide to keep the scope I'd prefer a
>>>>>> comment on line 402, to indicate that it is the end of the
>>>>>> MutexLocker scope.
>>>>>>
>>>>>> Bengt
>>>>>>
>>>>>> On 2012-01-30 03:26, David Holmes wrote:
>>>>>>> Hi Tom,
>>>>>>>
>>>>>>> On 30/01/2012 10:30 AM, Tom Rodriguez wrote:
>>>>>>>> On Jan 29, 2012, at 3:53 PM, David Holmes wrote:
>>>>>>>>
>>>>>>>>> On 28/01/2012 5:25 AM, Vladimir Kozlov wrote:
>>>>>>>>>> I change alias to hotspot-dev since it affects everyone.
>>>>>>>>> Thanks Vladimir!
>>>>>>>>>
>>>>>>>>> Overall I find it difficult to follow the protocol that applies
>>>>>>>>> here - in particular how safepoints can and can not appear
>>>>>>>>> between different methods and the way in which needs_gc() might
>>>>>>>>> change value between the enter and exit of a critical region.
>>>>>>>>> Without that understanding it is hard to evaluate the
>>>>>>>>> correctness of the new approach.
>>>>>>>> I tried to put in enough asserts to ensure that it was clear
>>>>>>>> what invariants are in force when updating the state. Safepoints
>>>>>>>> can appear wherever they normally can appear, while locking and
>>>>>>>> waiting on mutexes. _needs_gc can only change from false to true
>>>>>>>> during a safepoint. For the most part everything operates
>>>>>>>> exactly as it used to once _needs_gc is set and the only trick
>>>>>>>> is computing the correct value of _jni_active_count when setting
>>>>>>>> _needs_gc to be true. The debug code that still performs the
>>>>>>>> atomics attempts to verify that the computed value and actual
>>>>>>>> value are in sync. Is there any more that I can explain about
>>>>>>>> how it operates?
>>>>>>> It's my ignorance of the overall operation that makes evaluating
>>>>>>> this change difficult for me. But thanks to your side-band emails
>>>>>>> I now have a much clear understanding of things - thanks. And the
>>>>>>> protocol seems safe.
>>>>>>>
>>>>>>>>> src/share/vm/memory/gcLocker.cpp
>>>>>>>>>
>>>>>>>>> 72 _wait_begin = os::javaTimeMillis();
>>>>>>>>> 135 os::javaTimeMillis() - _wait_begin,
>>>>>>>>>
>>>>>>>>> Are you sure you want to use the non-monotonic javaTimeMillis
>>>>>>>>> here? Use of this was recently changed elsewhere in the GC code.
>>>>>>>> It's only printing code, but I'll switch it to use
>>>>>>>> tty->timestamp().milliseconds() which will make it match our
>>>>>>>> internal time stamps, which seems like a useful property. Is
>>>>>>>> that ok?
>>>>>>> The recent GC changes switched to using javaTimeNanos() converted
>>>>>>> to millis - see 7117303 and 7129514. The TimeStamp is based on
>>>>>>> elapsedCounter which is still a time-of-day source on some
>>>>>>> platforms and so not monotonic.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> I've updated the webrev.
>>>>>>>>
>>>>>>>> tom
>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> In GC_locker::check_active_before_gc() move wait_begin
>>>>>>>>>> initialization
>>>>>>>>>> under Print* check since it used only under such checks. Could
>>>>>>>>>> method
>>>>>>>>>> check_active_before_gc() be called from different threads
>>>>>>>>>> concurrently?
>>>>>>>>>> Does it need lock? Note that other methods have lock.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> Tom Rodriguez wrote:
>>>>>>>>>>> http://cr.openjdk.java.net/~never/7129164
>>>>>>>>>>> 200 lines changed: 126 ins; 33 del; 41 mod; 3958 unchg
>>>>>>>>>>>
>>>>>>>>>>> 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale
>>>>>>>>>>> Summary:
>>>>>>>>>>> Reviewed-by:
>>>>>>>>>>>
>>>>>>>>>>> The machinery for GC_locker which supports
>>>>>>>>>>> GetPrimitiveArrayCritical
>>>>>>>>>>> maintains a count of the number of threads that currently
>>>>>>>>>>> have raw
>>>>>>>>>>> pointers exposed. This is used to allow existing critical
>>>>>>>>>>> sections to
>>>>>>>>>>> drain before creating new ones so that a GC can occur. Currently
>>>>>>>>>>> these counts are updated using atomic all the time, even if a
>>>>>>>>>>> GC isn't
>>>>>>>>>>> being requested. This creates scalability problem when a lot of
>>>>>>>>>>> threads are hammering atomic operations on the
>>>>>>>>>>> jni_lock_count. The
>>>>>>>>>>> count only needs to be valid when checking if a critical
>>>>>>>>>>> section is
>>>>>>>>>>> currently active and when draining the existing sections. The
>>>>>>>>>>> fix is
>>>>>>>>>>> to make the count be computed as part of the safepointing
>>>>>>>>>>> machinery
>>>>>>>>>>> and to only decrement the counters when _needs_gc is true. In
>>>>>>>>>>> debug
>>>>>>>>>>> mode the old count is maintained and validated against the
>>>>>>>>>>> lazily
>>>>>>>>>>> computed count.
>>>>>>>>>>>
>>>>>>>>>>> On a microbenchmark that stresses GetPrimitiveArrayCritical
>>>>>>>>>>> with many
>>>>>>>>>>> threads and relatively short critical section it can more
>>>>>>>>>>> than double
>>>>>>>>>>> the throughput. This also slightly speeds up the normal
>>>>>>>>>>> GetPrimtiveArrayCritical calls. Mostly it's not easily
>>>>>>>>>>> measurable
>>>>>>>>>>> with larger scale programs.
>>>>>>>>>>>
>>>>>>>>>>> Tested with microbenchmark that stresses
>>>>>>>>>>> GetPrimtiveArrayCritical and
>>>>>>>>>>> the crypto benchmarks of specjvm2008 on x86 and sparc. I also
>>>>>>>>>>> ran the
>>>>>>>>>>> java.io regression tests from the JDK.
>>>>>>>>>>>
>