RFR (S) CR 7177472: JSR292: MethodType interning penalizes scalability
Peter Levart
peter.levart at gmail.com
Sat Jun 8 20:25:19 UTC 2013
On 06/08/2013 10:17 PM, Peter Levart wrote:
>
> On 06/08/2013 12:21 PM, Remi Forax wrote:
>> On 06/07/2013 12:01 PM, Aleksey Shipilev wrote:
>>> (posting this to hotspot-dev@ and cc-ing core-libs-dev@, as
>>> Christian T.
>>> suggested offline)
>>>
>>> Hi guys,
>>
>> Hi Aleksey,
>>
>>>
>>> The fix for scalability problem is here:
>>> http://cr.openjdk.java.net/~shade/7177472/webrev.00/
>>
>> in add, the do/while does something a little weird, if putIfAbsent
>> returns null,
>> interned is equals to elem, there is no need to do a e.get() in that
>> case,
>> I would prefer a code like this:
>>
>> T interned;
>> do {
>> expungeStaleElements();
>> WeakEntry<T> e = new WeakEntry<>(elem, stale);
>> WeakEntry<T> exist = map.putIfAbsent(e, e);
>> interned = (exist == null)? elem: exist.get();
>> } while (interned == null);
>> return interned;
>
> Hi Remi, Aleksey,
>
> In case the loop retries, there's no need to construct another
> WeakEntry...
>
> T interned;
> WeakEntry<T> e = new WeakEntry<>(elem, stale);
> do {
> expungeStaleElements();
> WeakEntry<T> exist = map.putIfAbsent(e, e);
> interned = (exist == null)? elem: exist.get();
> } while (interned == null);
> return interned;
>
> Regards, Peter
Aleksey,
Also, in a retry-loop you might be spin-waiting for the ReferenceHandler
thread to transfer the cleared WeakEntry to the ReferenceQueue. You
might want to do a map.replace(e, exist, e) in case exist != null and
exist.get() == null to make it independent of ReferenceHandler thread's
processing. In this case only a single out-of-loop call to
expungeStaleElements() would be enough.
Peter
>
>>
>> The cast to WeakEntry in expungeStaleElements is not needed.
>>
>> In WeakEntry.equals, the null check is not needed because null
>> instanceof WeakEntry returns false,
>> and you don't need to cast obj.get() to a WeakEntry<T> given you only
>> to call equals on entry.get().
>> So the code should be:
>>
>>
>> public boolean equals(Object obj) {
>> if (!(obj instanceof WeakEntry)) {
>> return false;
>> }
>> Object that = ((WeakEntry<?>)obj).get();
>> Object mine = get();
>> return (that == null || mine == null)? this == obj:
>> mine.equals(that);
>> }
>>
>>
>> Otherwise, it looks good.
>>
>> Rémi
>>
>>>
>>> Testing:
>>> - Linux x86_64 builds OK
>>> - Linux x86_64 java/lang/invoke/ jtreg passes OK
>>>
>>> Since this is a proof-of-concept patch at this point, I kept the
>>> testing
>>> minimal. Let me know what other testing you think is needed before the
>>> integration.
>>>
>>> Thanks,
>>> Aleksey.
>>>
>>> *** Rationale and details for the issue follow: ***
>>>
>>> This issue was in limbo for an year. The performance issue with
>>> MethodType.methodType factory leads to abysmal scalability on the
>>> targeted tests. While this can arguably demonstrated on larger
>>> workloads, the targeted microbenchmarks are showcasing it nicely. This
>>> issue was blocking the JSR292 API performance testing, because on large
>>> enough test server, you will inevitably be bitten by this.
>>>
>>> If you run the JMH-enabled [1] microbenchmark:
>>> http://openjdk.java.net/projects/code-tools/jmh/
>>> http://cr.openjdk.java.net/~shade/7177472/jsr292bench.tar.gz
>>>
>>> ...on the current jdk8/jdk8, 2x2 i5 2.0 Ghz, Linux x86_64, you will see
>>> the scalability is going down. ("testCached" makes sure the
>>> reference to
>>> MethodType is always strongly reachable, "testNew" makes sure the
>>> reference is mostly non-reachable, prompting cache re-fill, "testOOB"
>>> does not make any precautions about that, and also does not suffer from
>>> the static overheads entailed by the reference management in first two
>>> tests).
>>>
>>> 1 thread:
>>> MethodTypeBench.testCached 78 +- 1 nsec/op
>>> MethodTypeBench.testNew 120 +- 1 nsec/op
>>> MethodTypeBench.testOOB 41 +- 1 nsec/op
>>>
>>> 4 threads:
>>> MethodTypeBench.testCached 495 +- 40 nsec/op
>>> MethodTypeBench.testNew 611 +- 37 nsec/op
>>> MethodTypeBench.testOOB 377 +- 10 nsec/op
>>>
>>> In fact, on larger machines it may be as bad as tens/hundreds of
>>> microseconds to look up the MethodType. With the patch applied, on the
>>> same 2x2 machine:
>>>
>>> 1 thread:
>>> MethodTypeBench.testCached 92 +- 1 nsec/op
>>> MethodTypeBench.testNew 101 +- 1 nsec/op
>>> MethodTypeBench.testOOB 49 +- 1 nsec/op
>>>
>>> 4 threads:
>>> MethodTypeBench.testCached 129 +- 1 nsec/op
>>> MethodTypeBench.testNew 142 +- 10 nsec/op
>>> MethodTypeBench.testOOB 89 +- 3 nsec/op
>>>
>>>
>>> ...which means the scalability is back: the average time to look up the
>>> MethodType is not growing spectacularly when the thread count go up.
>>> Again, the effect is astonishing on large machines.
>>>
>>> Notice that we took some hit in single-threaded performance, and
>>> that is
>>> due to two reasons: a) WeakEntry allocation in get(); b) CHM. With the
>>> upcoming CHMv8, this starts to look better:
>>>
>>> 1 thread:
>>> MethodTypeBench.testCached 87 +- 1 nsec/op
>>> MethodTypeBench.testNew 95 +- 1 nsec/op
>>> MethodTypeBench.testOOB 52 +- 1 nsec/op
>>>
>>> 4 threads:
>>> MethodTypeBench.testCached 121 +- 2 nsec/op
>>> MethodTypeBench.testNew 141 +- 1 nsec/op
>>> MethodTypeBench.testOOB 93 +- 5 nsec/op
>>>
>>> ...which seems to indicate this change is future-proof.
>>
>
More information about the core-libs-dev
mailing list