RFR(M): 8166317: InterpreterCodeSize should be computed

Thu Aug 31 06:54:31 UTC 2017

Hi,

can I please have a review and sponsor for the following change:

http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317/
https://bugs.openjdk.java.net/browse/JDK-8166317

The template interpreter is currently created into a compile time
constant, fixed sized part of the CodeCache. This constant (i.e.
'TemplateInterpreter::InterpreterCodeSize') is specified per platform
in templateInterpreterGenerator_<arch>.cpp and may depend on various
other compile time configurations like for example JVMCI. Also, this
constant is quadrupled for debug builds in order to accommodate for
the additional debugging code.

The problem with this approach is that we have to 'guess' a good value
for 'InterpreterCodeSize'. If the value is too big, we will
unnecessarily waste CodeCache because the unused parts of the code
cache allocated for the interpreter won't be returned back to the
CodeCache. If the value is too small the VM may fail to initialize
with "not enough space for interpreter generation". The situation is
further complicated by the fact that some dynamic, run-time
configuration options like debugging/JVMTI, compressed oops, implicit
null checks, etc. may influence the size of the generated interpreter.

Currently, the used/wasted ratio for the interpreter part of the code
cache (which can be dumped with -XX:+PrintInterpreter) looks as
follows for jdk10 on Linux/x86_64:

dbg/JVMTI
-------------
code size        =    475K bytes
total space      =   1071K bytes
wasted space     =    596K bytes

dbg
-------------
code size        =    262K bytes
total space      =   1071K bytes
wasted space     =    809K bytes

opt/JVMTI
-------------
code size        =    195K bytes
total space      =    267K bytes
wasted space     =     72K bytes

opt
-------------
code size        =    124K bytes
total space      =    267K bytes
wasted space     =    143K bytes

Unfortunately it is not easy to compute the size of the generated
interpreter dynamically (we would actually have to generate it two
times in order to do that). It is also not easy to create the
interpreter into a bigger, temporary buffer and move it into an
exactly sized buffer afterwards because the interpreter code is not
relocatable (and the assignment of the various entry points is spread
out over many code locations).

But what we can actually do quite easily is to return the unused part
of the initially allocated memory back to the CodeCache. This is
possible for two reasons. First, the interpreter codelets and stubs
are generated "densely" (see CodeletMark constructor/destructor), i.e.
the unused space of the initially allocated memory is located at the
end of the reserved memory. Second, te interpreter is generated in a
very early stage during VM startup ('interpreter_init()' is called
early from 'init_globals()'). During this early stage we're still
single threaded and can be sure that nobody else is allocating from
the CodeCache while we're generating the interpreter.

So I've introduced a new method 'CodeCache::free_unused_tail(CodeBlob*
cb, size_t used)' which frees the unused tail of the interpreter
CodeBlob. It has a guarantee which makes sure that is only called for
the interpreter CodeBlob. 'free_unused_tail()' calls
'CodeHeap::deallocate_tail(void* p, size_t used_size)' on the
corresponding CodeHeap which in turn checks (with another guarantee)
that there have been no intermediate allocations and returns the
unused tail of the corresponding HeapBlock back to the CodeHeap.

With this change, theres no more waste in the CodeCache after
interpreter generation and the output of -XX:+PrintInterpreter looks
as follows:

dbg/JVMTI
-------------
code size        =    475K bytes
total space      =    475K bytes
wasted space     =      0K bytes

dbg
-------------
code size        =    262K bytes
total space      =    262K bytes
wasted space     =      0K bytes

opt/JVMTI
-------------
code size        =    195K bytes
total space      =    195K bytes
wasted space     =      0K bytes

opt
-------------
code size        =    124K bytes
total space      =    124K bytes
wasted space     =      0K bytes

So in the normal case (product build without debugging) we're saving
143K of CodeCache. While this is not overly impressing I think the
major benefit of this change is that we can increase the default value
for 'InterpreterCodeSize' in the future without much reasoning (e.g.
doubling it if it is too small) because the unused part will be
returned back to the CodeCache.

I've successfully (with the exception described below) tested the
change by running the hotspot JTreg tests with and without
SegmentedCodeCache.

While working on this, I found another problem which is related to the
fix of JDK-8183573 and leads to crashes when executing the JTreg test
compiler/codecache/stress/ReturnBlobToWrongHeapTest.java.

The problem is that JDK-8183573 replaced

  virtual bool contains_blob(const CodeBlob* blob) const { return
low_boundary() <= (char*) blob && (char*) blob < high(); }

by:

  bool contains_blob(const CodeBlob* blob) const { return
contains(blob->code_begin()); }

But that my be wrong in the corner case where the size of the
CodeBlob's payload is zero (i.e. the CodeBlob consists only of the
'header' - i.e. the C++ object itself) because in that case
CodeBlob::code_begin() points right behind the CodeBlob's header which
is a memory location which doesn't belong to the CodeBlob anymore.

This exact corner case is exercised by ReturnBlobToWrongHeapTest which
allocates CodeBlobs of size zero (i.e. zero 'payload') with the help
of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills
up. The test first fills the 'non-profiled nmethods' CodeHeap. If the
'non-profiled nmethods' CodeHeap is full, the VM automatically tries
to allocate from the 'profiled nmethods' CodeHeap until that fills up
as well.  But in the CodeCache the 'profiled nmethods' CodeHeap is
located right before the non-profiled nmethods' CodeHeap. So if the
last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a
payload size of zero and uses all the CodeHeaps remaining size, we
will end up with a CodeBlob whose code_begin() address will point
right behind the actual CodeHeap (i.e. it will point right at the
beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This
will result in the following guarantee to fire, when we try to free
the last allocated CodeBlob (i.e. the one allocated at the end of the
'profiled nmethods' CodeHeap which has its code_begin() address
pointing at the the beginning of the adjacent, 'non-profiled nmethods'
CodeHeap) with sun.hotspot.WhiteBox.freeCodeBlob():

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (heap.cpp:248), pid=27586, tid=27587
#  guarantee((char*) b >= _memory.low_boundary() && (char*) b <
_memory.high()) failed: The block to be deallocated 0x00007fffe6666f80
is not within the heap starting with 0x00007fffe6667000 and ending
with 0x00007fffe6ba000

The fix is trivial so I'll include it into this change: just revert
the part of JDK-8183573 mentioned before such that contains_blob()
again uses the address of the CodeBlob instead of
CodeBlob->code_begin().

Thank you and best regards,
Volker