RFR (S) CR 7177472: JSR292: MethodType interning penalizes scalability

Fri Jun 7 10:01:16 UTC 2013

(posting this to hotspot-dev@ and cc-ing core-libs-dev@, as Christian T.
suggested offline)

Hi guys,

The fix for scalability problem is here:
  http://cr.openjdk.java.net/~shade/7177472/webrev.00/

Testing:
  - Linux x86_64 builds OK
  - Linux x86_64 java/lang/invoke/ jtreg passes OK

Since this is a proof-of-concept patch at this point, I kept the testing
minimal. Let me know what other testing you think is needed before the
integration.

Thanks,
Aleksey.

*** Rationale and details for the issue follow: ***

This issue was in limbo for an year. The performance issue with
MethodType.methodType factory leads to abysmal scalability on the
targeted tests. While this can arguably demonstrated on larger
workloads, the targeted microbenchmarks are showcasing it nicely. This
issue was blocking the JSR292 API performance testing, because on large
enough test server, you will inevitably be bitten by this.

If you run the JMH-enabled [1] microbenchmark:
 http://openjdk.java.net/projects/code-tools/jmh/
 http://cr.openjdk.java.net/~shade/7177472/jsr292bench.tar.gz

...on the current jdk8/jdk8, 2x2 i5 2.0 Ghz, Linux x86_64, you will see
the scalability is going down. ("testCached" makes sure the reference to
MethodType is always strongly reachable, "testNew" makes sure the
reference is mostly non-reachable, prompting cache re-fill, "testOOB"
does not make any precautions about that, and also does not suffer from
the static overheads entailed by the reference management in first two
tests).

1 thread:
MethodTypeBench.testCached     78 +-  1 nsec/op
MethodTypeBench.testNew       120 +-  1 nsec/op
MethodTypeBench.testOOB        41 +-  1 nsec/op

4 threads:
MethodTypeBench.testCached    495 +- 40 nsec/op
MethodTypeBench.testNew       611 +- 37 nsec/op
MethodTypeBench.testOOB       377 +- 10 nsec/op

In fact, on larger machines it may be as bad as tens/hundreds of
microseconds to look up the MethodType. With the patch applied, on the
same 2x2 machine:

1 thread:
MethodTypeBench.testCached     92 +-  1 nsec/op
MethodTypeBench.testNew       101 +-  1 nsec/op
MethodTypeBench.testOOB        49 +-  1 nsec/op

4 threads:
MethodTypeBench.testCached    129 +-  1 nsec/op
MethodTypeBench.testNew       142 +- 10 nsec/op
MethodTypeBench.testOOB        89 +-  3 nsec/op

...which means the scalability is back: the average time to look up the
MethodType is not growing spectacularly when the thread count go up.
Again, the effect is astonishing on large machines.

Notice that we took some hit in single-threaded performance, and that is
due to two reasons: a) WeakEntry allocation in get(); b) CHM. With the
upcoming CHMv8, this starts to look better:

1 thread:
MethodTypeBench.testCached     87 +-  1 nsec/op
MethodTypeBench.testNew        95 +-  1 nsec/op
MethodTypeBench.testOOB        52 +-  1 nsec/op

4 threads:
MethodTypeBench.testCached    121 +-  2 nsec/op
MethodTypeBench.testNew       141 +-  1 nsec/op
MethodTypeBench.testOOB        93 +-  5 nsec/op

...which seems to indicate this change is future-proof.