RFR (L): 8060025: Object copy time regressions after JDK-8031323 and JDK-8057536
thomas.schatzl at oracle.com
Wed Dec 3 12:41:32 UTC 2014
I would like to have reviews for the following change that improves
object copy time after we noticed performance regressions after the
changes in JDK-8031323 (alignment of survivor objects) and JDK-8057536
(context specific allocations).
In conjunction with JDK-8064473 (Improved handling of age during object
copy) the changes improve object copy time by ~8% on x64/linux and ~7%
on SPARC/solaris on SPECjbb2005.
There are no particular improvements on the scores though as there is
very little GC work done.
There seems to be some overall performance gain on CRM Fuse.
The changes include:
- merging of the FastCSetTable table with the GCAllocPurpose into a
table of in_cset_state_t. Each element not only contains information
about whether the region is humongous or not, but also what generation
it belongs to if it is in the collection set.
The encoding has been selected to allow good instruction encoding of
commonly used checks (e.g. in collection set or not, is humongous).
GCAllocPurpose has been removed.
- factor out plab allocation as fast-path for allocation from other
types of allocations. There have been a few renamings of methods to
(imo) make the various stages more clear. (i.e. The methods are not all
called "allocate" any more :))
- use a per-ParThreadScanState tenuring threshold.
- only calculate object age if required.
- some additional direct use of markOop contents instead of accessing
via the oop (like in JDK-8064473).
- manually extract some common subexpressions from the code that are not
obvious to the compiler.
There is no change in functionality, and the survivor alignment check
still has some minor performance impact. However imo these changes in
total outweigh its impact, so further attempts to factor this out (e.g.
templatizing) do not seem to have a good cost/benefit ratio.
We may still want to create an RFE that deals with that in a separate
change. There is enough good change in this change already to warrant
separate CRs if needed.
This work is largely based on changes from Tony Printezis at Twitter who
coincidentally has been working on this issue at the same time, and has
then been tweaked further (Thanks a lot!). Extensive performance testing
of many variants (of which this seems to be the best) has been performed
on internal test systems.
Tony reported even better improvements on some microbenchmarks on the
original version of the change.
As mentioned, unless the application is somewhat GC and object copy
heavy, there will not be much impact.
JPRT, specjbb2005/2013, CRM Fuse
More information about the hotspot-gc-dev