RFR(M): 8166317: InterpreterCodeSize should be computed

Thu Aug 31 12:35:34 UTC 2017

Hi Volker,

Thank you for doing this.  I will review this and sponsor it. 
Unfortunately the jdk10/hs repository is going to be closed for a few 
weeks while the consolidation effort is taking place, but that should 
give plenty of time for reviews.

thanks,
Coleen

On 8/31/17 2:54 AM, Volker Simonis wrote:
> Hi,
>
> can I please have a review and sponsor for the following change:
>
> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317/
> https://bugs.openjdk.java.net/browse/JDK-8166317
>
> The template interpreter is currently created into a compile time
> constant, fixed sized part of the CodeCache. This constant (i.e.
> 'TemplateInterpreter::InterpreterCodeSize') is specified per platform
> in templateInterpreterGenerator_<arch>.cpp and may depend on various
> other compile time configurations like for example JVMCI. Also, this
> constant is quadrupled for debug builds in order to accommodate for
> the additional debugging code.
>
> The problem with this approach is that we have to 'guess' a good value
> for 'InterpreterCodeSize'. If the value is too big, we will
> unnecessarily waste CodeCache because the unused parts of the code
> cache allocated for the interpreter won't be returned back to the
> CodeCache. If the value is too small the VM may fail to initialize
> with "not enough space for interpreter generation". The situation is
> further complicated by the fact that some dynamic, run-time
> configuration options like debugging/JVMTI, compressed oops, implicit
> null checks, etc. may influence the size of the generated interpreter.
>
> Currently, the used/wasted ratio for the interpreter part of the code
> cache (which can be dumped with -XX:+PrintInterpreter) looks as
> follows for jdk10 on Linux/x86_64:
>
> dbg/JVMTI
> -------------
> code size        =    475K bytes
> total space      =   1071K bytes
> wasted space     =    596K bytes
>
> dbg
> -------------
> code size        =    262K bytes
> total space      =   1071K bytes
> wasted space     =    809K bytes
>
> opt/JVMTI
> -------------
> code size        =    195K bytes
> total space      =    267K bytes
> wasted space     =     72K bytes
>
> opt
> -------------
> code size        =    124K bytes
> total space      =    267K bytes
> wasted space     =    143K bytes
>
> Unfortunately it is not easy to compute the size of the generated
> interpreter dynamically (we would actually have to generate it two
> times in order to do that). It is also not easy to create the
> interpreter into a bigger, temporary buffer and move it into an
> exactly sized buffer afterwards because the interpreter code is not
> relocatable (and the assignment of the various entry points is spread
> out over many code locations).
>
> But what we can actually do quite easily is to return the unused part
> of the initially allocated memory back to the CodeCache. This is
> possible for two reasons. First, the interpreter codelets and stubs
> are generated "densely" (see CodeletMark constructor/destructor), i.e.
> the unused space of the initially allocated memory is located at the
> end of the reserved memory. Second, te interpreter is generated in a
> very early stage during VM startup ('interpreter_init()' is called
> early from 'init_globals()'). During this early stage we're still
> single threaded and can be sure that nobody else is allocating from
> the CodeCache while we're generating the interpreter.
>
> So I've introduced a new method 'CodeCache::free_unused_tail(CodeBlob*
> cb, size_t used)' which frees the unused tail of the interpreter
> CodeBlob. It has a guarantee which makes sure that is only called for
> the interpreter CodeBlob. 'free_unused_tail()' calls
> 'CodeHeap::deallocate_tail(void* p, size_t used_size)' on the
> corresponding CodeHeap which in turn checks (with another guarantee)
> that there have been no intermediate allocations and returns the
> unused tail of the corresponding HeapBlock back to the CodeHeap.
>
> With this change, theres no more waste in the CodeCache after
> interpreter generation and the output of -XX:+PrintInterpreter looks
> as follows:
>
> dbg/JVMTI
> -------------
> code size        =    475K bytes
> total space      =    475K bytes
> wasted space     =      0K bytes
>
> dbg
> -------------
> code size        =    262K bytes
> total space      =    262K bytes
> wasted space     =      0K bytes
>
> opt/JVMTI
> -------------
> code size        =    195K bytes
> total space      =    195K bytes
> wasted space     =      0K bytes
>
> opt
> -------------
> code size        =    124K bytes
> total space      =    124K bytes
> wasted space     =      0K bytes
>
> So in the normal case (product build without debugging) we're saving
> 143K of CodeCache. While this is not overly impressing I think the
> major benefit of this change is that we can increase the default value
> for 'InterpreterCodeSize' in the future without much reasoning (e.g.
> doubling it if it is too small) because the unused part will be
> returned back to the CodeCache.
>
> I've successfully (with the exception described below) tested the
> change by running the hotspot JTreg tests with and without
> SegmentedCodeCache.
>
> While working on this, I found another problem which is related to the
> fix of JDK-8183573 and leads to crashes when executing the JTreg test
> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java.
>
> The problem is that JDK-8183573 replaced
>
>    virtual bool contains_blob(const CodeBlob* blob) const { return
> low_boundary() <= (char*) blob && (char*) blob < high(); }
>
> by:
>
>    bool contains_blob(const CodeBlob* blob) const { return
> contains(blob->code_begin()); }
>
> But that my be wrong in the corner case where the size of the
> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the
> 'header' - i.e. the C++ object itself) because in that case
> CodeBlob::code_begin() points right behind the CodeBlob's header which
> is a memory location which doesn't belong to the CodeBlob anymore.
>
> This exact corner case is exercised by ReturnBlobToWrongHeapTest which
> allocates CodeBlobs of size zero (i.e. zero 'payload') with the help
> of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills
> up. The test first fills the 'non-profiled nmethods' CodeHeap. If the
> 'non-profiled nmethods' CodeHeap is full, the VM automatically tries
> to allocate from the 'profiled nmethods' CodeHeap until that fills up
> as well.  But in the CodeCache the 'profiled nmethods' CodeHeap is
> located right before the non-profiled nmethods' CodeHeap. So if the
> last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a
> payload size of zero and uses all the CodeHeaps remaining size, we
> will end up with a CodeBlob whose code_begin() address will point
> right behind the actual CodeHeap (i.e. it will point right at the
> beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This
> will result in the following guarantee to fire, when we try to free
> the last allocated CodeBlob (i.e. the one allocated at the end of the
> 'profiled nmethods' CodeHeap which has its code_begin() address
> pointing at the the beginning of the adjacent, 'non-profiled nmethods'
> CodeHeap) with sun.hotspot.WhiteBox.freeCodeBlob():
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (heap.cpp:248), pid=27586, tid=27587
> #  guarantee((char*) b >= _memory.low_boundary() && (char*) b <
> _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80
> is not within the heap starting with 0x00007fffe6667000 and ending
> with 0x00007fffe6ba000
>
> The fix is trivial so I'll include it into this change: just revert
> the part of JDK-8183573 mentioned before such that contains_blob()
> again uses the address of the CodeBlob instead of
> CodeBlob->code_begin().
>
> Thank you and best regards,
> Volker