review for 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale

Thu Feb 2 10:03:38 PST 2012

>> 
>> First, I was not really arguing for doing this one way or the other. It
>> was just not clear to me from the review comments if you wanted to
>> change to javaTimeNanos() or not.
>> 
>> You are right about the use in the GC code. We recently changed to use
>> javaTimeNanos() for time intervals, but we use
>> tty->timestamp().seconds() to print the time stamps. If I remember
>> correctly, you were logging how long a thread had to wait, so it seemed
>> like an interval to me. But, as I said, I was mostly just wondering what
>> the decision was.
>> 
>> It would probably be a good idea to use the same time source for time
>> stamps and intervals. When we were using javaTimeMillis() I think it was
>> consistent, but maybe we should now change to javaTimeNanos() for
>> tty->timestamp()?
>> 
>> I can't say I know the implications of changing tty->timestamp() to use
>> the same timesource as javaTimeNanos(). Currently they boil down to the
>> same OS calls on Windows, but on Linux and Solaris the implementations
>> are quite different.
> 
> The primary difference is that javaTimeMillis reports the time-of-day, while javaTimeNanos does not. If you really want a timestamp that you can correlate with other time-related events then javaTimeMillis is what you need, but then you have to deal with lower resolution and non-monotonicity. A compromise is a time-source with the resolution of nanoTime which is sync'ed to time-of-day at VM start (this is what the real-time clock gives you in the Real-Time Specification for Java).

I think we should implement os::elapsed_counter in terms of javaTimeNanos and that would bring these sources into agreement.  On Solaris they both use getTimeNanos, which is basically gethrtime.  On linux elapsed_counter is gettimeofday and javaTimeNanos uses clock_gettime if it's supported.  It looks like the sources just diverged and elapsed_counter was never updated.  On windows it's also the same though the implementation equivalence is somewhat complicated.  It might be most straightforward to move elapsed_counter into os.cpp and implement it in terms of javaTimeNanos.  Is there any reason that javaTimeMillis shouldn't be implemented in terms of javaTimeNanos too?

tom

> 
> David
> -----
> 
>> Bengt
>> 
>> 
>> 
>> 
>> 
>> 
>> Bengt
>> 
>>> 
>>> tom
>>> 
>>>> Bengt
>>>> 
>>>>> tom
>>>>> 
>>>>>>> 
>>>>>>> gcLocker.hpp
>>>>>>> 
>>>>>>> 84 static void verify_critical_count() NOT_DEBUG_RETURN;
>>>>>>> 
>>>>>>> Not sure what the correct thing is here, but the GC code seems to
>>>>>>> use PRODUCT_RETURN more than NOT_DEBUG_RETURN. I guess the only
>>>>>>> difference is for optimized builds (and of course using #ifndef
>>>>>>> PRODUCT instead of #ifdef ASSERT).
>>>>>> I didn't want it doing any work in optimized builds so that's why
>>>>>> it's NOT_DEBUG_RETURN.
>>>>>> 
>>>>>>> 
>>>>>>> safepoint.cpp
>>>>>>> 
>>>>>>> Not strictly related to your change, but the
>>>>>>> SafepointSynchronize::begin() method is missing one indentation
>>>>>>> level due to the the scoping that starts on line 139:
>>>>>> Yes, I noticed that too. The whole thing could be factored better
>>>>>> too. I already pushed so I can't fix it. I've got another change
>>>>>> coming so maybe I'll fix it in there. Thanks.
>>>>>> 
>>>>>> tom
>>>>>> 
>>>>>>> 139 {
>>>>>>> 140 MutexLocker mu(Safepoint_lock);
>>>>>>> 141
>>>>>>> 142 // Reset the count of active JNI critical threads
>>>>>>> 143 _current_jni_active_count = 0;
>>>>>>> 
>>>>>>> This scope ends at the end of the method:
>>>>>>> 
>>>>>>> 398 if (PrintSafepointStatistics) {
>>>>>>> 399 // Record how much time spend on the above cleanup tasks
>>>>>>> 400 update_statistics_on_cleanup_end(os::javaTimeNanos());
>>>>>>> 401 }
>>>>>>> 402 }
>>>>>>> 403 }
>>>>>>> 
>>>>>>> It seems to me that the scope is not really necessary. In that
>>>>>>> case, can we remove it? It would be nice to have the method
>>>>>>> correctly indented. If we decide to keep the scope I'd prefer a
>>>>>>> comment on line 402, to indicate that it is the end of the
>>>>>>> MutexLocker scope.
>>>>>>> 
>>>>>>> Bengt
>>>>>>> 
>>>>>>> On 2012-01-30 03:26, David Holmes wrote:
>>>>>>>> Hi Tom,
>>>>>>>> 
>>>>>>>> On 30/01/2012 10:30 AM, Tom Rodriguez wrote:
>>>>>>>>> On Jan 29, 2012, at 3:53 PM, David Holmes wrote:
>>>>>>>>> 
>>>>>>>>>> On 28/01/2012 5:25 AM, Vladimir Kozlov wrote:
>>>>>>>>>>> I change alias to hotspot-dev since it affects everyone.
>>>>>>>>>> Thanks Vladimir!
>>>>>>>>>> 
>>>>>>>>>> Overall I find it difficult to follow the protocol that applies
>>>>>>>>>> here - in particular how safepoints can and can not appear
>>>>>>>>>> between different methods and the way in which needs_gc() might
>>>>>>>>>> change value between the enter and exit of a critical region.
>>>>>>>>>> Without that understanding it is hard to evaluate the
>>>>>>>>>> correctness of the new approach.
>>>>>>>>> I tried to put in enough asserts to ensure that it was clear
>>>>>>>>> what invariants are in force when updating the state. Safepoints
>>>>>>>>> can appear wherever they normally can appear, while locking and
>>>>>>>>> waiting on mutexes. _needs_gc can only change from false to true
>>>>>>>>> during a safepoint. For the most part everything operates
>>>>>>>>> exactly as it used to once _needs_gc is set and the only trick
>>>>>>>>> is computing the correct value of _jni_active_count when setting
>>>>>>>>> _needs_gc to be true. The debug code that still performs the
>>>>>>>>> atomics attempts to verify that the computed value and actual
>>>>>>>>> value are in sync. Is there any more that I can explain about
>>>>>>>>> how it operates?
>>>>>>>> It's my ignorance of the overall operation that makes evaluating
>>>>>>>> this change difficult for me. But thanks to your side-band emails
>>>>>>>> I now have a much clear understanding of things - thanks. And the
>>>>>>>> protocol seems safe.
>>>>>>>> 
>>>>>>>>>> src/share/vm/memory/gcLocker.cpp
>>>>>>>>>> 
>>>>>>>>>> 72 _wait_begin = os::javaTimeMillis();
>>>>>>>>>> 135 os::javaTimeMillis() - _wait_begin,
>>>>>>>>>> 
>>>>>>>>>> Are you sure you want to use the non-monotonic javaTimeMillis
>>>>>>>>>> here? Use of this was recently changed elsewhere in the GC code.
>>>>>>>>> It's only printing code, but I'll switch it to use
>>>>>>>>> tty->timestamp().milliseconds() which will make it match our
>>>>>>>>> internal time stamps, which seems like a useful property. Is
>>>>>>>>> that ok?
>>>>>>>> The recent GC changes switched to using javaTimeNanos() converted
>>>>>>>> to millis - see 7117303 and 7129514. The TimeStamp is based on
>>>>>>>> elapsedCounter which is still a time-of-day source on some
>>>>>>>> platforms and so not monotonic.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> 
>>>>>>>>> I've updated the webrev.
>>>>>>>>> 
>>>>>>>>> tom
>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> In GC_locker::check_active_before_gc() move wait_begin
>>>>>>>>>>> initialization
>>>>>>>>>>> under Print* check since it used only under such checks. Could
>>>>>>>>>>> method
>>>>>>>>>>> check_active_before_gc() be called from different threads
>>>>>>>>>>> concurrently?
>>>>>>>>>>> Does it need lock? Note that other methods have lock.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>> 
>>>>>>>>>>> Tom Rodriguez wrote:
>>>>>>>>>>>> http://cr.openjdk.java.net/~never/7129164
>>>>>>>>>>>> 200 lines changed: 126 ins; 33 del; 41 mod; 3958 unchg
>>>>>>>>>>>> 
>>>>>>>>>>>> 7129164: JNI Get/ReleasePrimitiveArrayCritical doesn't scale
>>>>>>>>>>>> Summary:
>>>>>>>>>>>> Reviewed-by:
>>>>>>>>>>>> 
>>>>>>>>>>>> The machinery for GC_locker which supports
>>>>>>>>>>>> GetPrimitiveArrayCritical
>>>>>>>>>>>> maintains a count of the number of threads that currently
>>>>>>>>>>>> have raw
>>>>>>>>>>>> pointers exposed. This is used to allow existing critical
>>>>>>>>>>>> sections to
>>>>>>>>>>>> drain before creating new ones so that a GC can occur. Currently
>>>>>>>>>>>> these counts are updated using atomic all the time, even if a
>>>>>>>>>>>> GC isn't
>>>>>>>>>>>> being requested. This creates scalability problem when a lot of
>>>>>>>>>>>> threads are hammering atomic operations on the
>>>>>>>>>>>> jni_lock_count. The
>>>>>>>>>>>> count only needs to be valid when checking if a critical
>>>>>>>>>>>> section is
>>>>>>>>>>>> currently active and when draining the existing sections. The
>>>>>>>>>>>> fix is
>>>>>>>>>>>> to make the count be computed as part of the safepointing
>>>>>>>>>>>> machinery
>>>>>>>>>>>> and to only decrement the counters when _needs_gc is true. In
>>>>>>>>>>>> debug
>>>>>>>>>>>> mode the old count is maintained and validated against the
>>>>>>>>>>>> lazily
>>>>>>>>>>>> computed count.
>>>>>>>>>>>> 
>>>>>>>>>>>> On a microbenchmark that stresses GetPrimitiveArrayCritical
>>>>>>>>>>>> with many
>>>>>>>>>>>> threads and relatively short critical section it can more
>>>>>>>>>>>> than double
>>>>>>>>>>>> the throughput. This also slightly speeds up the normal
>>>>>>>>>>>> GetPrimtiveArrayCritical calls. Mostly it's not easily
>>>>>>>>>>>> measurable
>>>>>>>>>>>> with larger scale programs.
>>>>>>>>>>>> 
>>>>>>>>>>>> Tested with microbenchmark that stresses
>>>>>>>>>>>> GetPrimtiveArrayCritical and
>>>>>>>>>>>> the crypto benchmarks of specjvm2008 on x86 and sparc. I also
>>>>>>>>>>>> ran the
>>>>>>>>>>>> java.io regression tests from the JDK.
>>>>>>>>>>>> 
>>