RFR (L): 8015774: Add support for multiple code heaps

Thu Oct 10 08:36:18 PDT 2013

Vladimir, thanks again for the quick review!

Please find the new webrev at:
http://cr.openjdk.java.net/~anoll/8015774/webrev.02/

See comments inline:

2013/10/10 Vladimir Kozlov <vladimir.kozlov at oracle.com>
>
>
> And you totally removed code in sweeper which enables compilation again.
>
> This is bad. PSR runs FA in VM environment only with few threads (2) and
> with this C2 Compiler threads will continue consume resources even when
> there are no space to store produced code. Output will be stormed by
> "CodeHeap for %s is full." messages.
>
> It doesn't even do decay invocation counters because next code is not
> executed:
>    if (!should_compile_new_jobs()) {
>      CompilationPolicy::policy()->**delay_compilation(method());
>      return NULL;
>    }
>
> Next code is also not executed. As result you continue do only partial
> sweeps when codecache (some heaps) is full:
>
>  266   if (!CompileBroker::should_**compile_new_jobs()) {
>  267     // If we have turned off compilations we might as well do full
> sweeps
>  268     // in order to reach the clean state faster. Otherwise the
> sleeping compiler
>  269     // threads will slow down sweeping.
>  270     _invocations = 1;
>  271   }

The "CodeHeap for %s is full warning" is printed only once for each
CodeHeap, so in fact we print less warnings than the old implementation
does.
Because we check in the common transition functions of the threshold
policies (SimpleThresholdPolicy::common and
AdvancedThresholdPolicy::common) if the CodeHeap for the next compilation
level is available and remain at the current level if it is full, the
compiler threads do not produce code if a heap is full:

if (CodeCache::is_full(CodeCache::get_code_blob_type(next_level))) {
  // The CodeHeap for next_level is full, stay at current level
 return cur_level;
}

They only consume resources to repeatedly check if the CodeHeap for the
next compilation level is now available. Benchmarking showed that this does
not negatively affect performance. You are right with the not executed code
parts and the partial sweeps. Although this can be adapted easily I
followed your suggestions and reverted the changes. As before, compilation
is now disabled as long as a code heap is full and enabled after sweeping.

I still think that it would be better to disable compilation for CodeHeaps
individually (as this was done implicitly by my implementation), because we
could still compile into the other heap. Maybe this can be introduced in a
future patch.

> I agree, this part is still a little bit unclear. This is partly due to
>> the fact that I'm still not sure how to set the
>> size of the non method heap. On my machine around 3.5Mb are sufficient
>> for all benchmarks (debug/product). The jprt
>> tests seem to need more, that's why it is temporarily set to 8Mb. I have
>> to evaluate that in more detail. Probably we
>> can set it depending on build/runtime settings and operating system.
>>
>
> [Non]ProfiledCodeHeapSize are already platform specific flags. You don't
> need to do anything spacial.

Yes, I meant that I still have to figure out the specific values for each
platform.

> One option would be to create a new heap for all blobs needed by the VM to
>> startup. This heap size can be set to
>> CodeCacheMinimumUseSpace. All non method blobs created by the running
>> java application are then put into the non method
>> heap.
>>
>
> CodeCacheMinimumUseSpace is already used to run only Interpreter. You
> can't change its meaning. You need to make sure your new code allow to run
> with such size.

 I mean:
>
> size_t non_method_size = ReservedCodeCacheSize - (ProfiledCodeHeapSize +
> NonProfiledCodeHeapSize);
>

I implemented it as you suggested. By default NonProfiledCodeHeapSize is
approximately ProfiledCodeHeapSize * 2. I will set the platform specific
default values such that the non_method_size is big enough to run the
interpreter (currently it is set to 4Mb).

> My question was: why you need Profiled heap for C1 and Zero? Because this
> is what you get if COMPILER2 is not defined (Client or Zero VM).

You are perfectly right. The profiled heap is not needed if COMPILER2 is
not defined. I changed it accordingly.

>  I'm not sure if I understood your point correctly. At the moment we delay
>> compilation for the corresponding compilation
>> level if a heap is full. In practice (at least for the benchmarks I run)
>> the second heap then fills up really fast. Your
>> suggestion to allocate CodeBlobs in another heap somehow "breaks" the
>> design, because the client of the CodeCache could
>> then encounter non profiled, e.g. C2 compiled, methods while iterating
>> over the profiled heap. This is not really a
>> problem at the moment, since no code relies on the fact that the non
>> profiled heap _only_ contains C2 methods (as long
>> as it contains only _methods_).
>> However, I think that the second suggestion to move the boundary between
>> the heaps is better. One could then increase
>> the non profiled heap and decrease the profiled heap size if the first is
>> full. Over time the non profiled heap would
>> probably grow, whereas the profiled heap would shrink.
>>
>> Because I suspect this to be another big change, I would however suggest
>> to implement this behavior with a second patch
>> and first finish the current changes.
>>
>
> Agree that it is for an other changes. But you need to design current
> layout to allow to do that (move boundary) in a future.

To move the boundary we have to adapt the VirtualSpace used by the
CodeHeaps to support "expanding into" another VirtualSpace as it is already
implemented in class PSVirtualSpace (::expand_into) used by parallel
scavenge GC. Having this we can define two adjacent VirtualSpaces for the
profiled and non-profiled heaps, one of which grows high to low, and move
the boundary in case one is full. Similar functionality is implemented in
class AdjoiningVirtualSpaces (also used by the parallel scavenge GC). The
current design of the code cache shouldn't be affected much.

I will look into this more thoroughly and implement the missing
functionality in VirtualSpace and CodeHeap.

> Whole print code is this:
>
> +     warning("CodeHeap for %s is full. Compilation for these methods has
> been delayed.", CodeCache::get_heap_name(code_**blob_type));
>
> +     warning("Try increasing the code cache size using
> -XX:ReservedCodeCacheSize=");
>
> You print specific code code heap name in first message (I agree with
> Christian that name are not good). You need to additionally print
> corresponding flag in next message because increasing ReservedCodeCacheSize
> will not help to NonProfiled heap if you don't increase
> NonProfiledCodeHeapSize.
>

Done (names were already changed in the last webrev).

> About changes in sweeper.cpp.
> I think you accidentally removed collection of freed_memory in
> sweep_code_cache().
>

Right, I reverted that.

> And you can simplify the code (no need for external loop while
> (_current_type <=) ) if you move "while (_current_method == NULL ..."
> inside inner loop after "_current_nmethod = next".
>

Done. Didn't notice that, thanks for the remark.

Regards,
Tobias

>      thanks,
>>     Vladimir
>>
>> 2013/10/4 Christian Thalinger <christian.thalinger at oracle.**com<christian.thalinger at oracle.com><mailto:
>> christian.thalinger@**oracle.com <christian.thalinger at oracle.com>>>
>>
>>
>>     Some general comments:
>>
>>     Don't use T1, T2, … to name the tiers because SPARC processors are
>> named like this.  It's confusing.  Use tier 1,
>>     tier 2, … instead.  The compilation policies actually talk about tier
>> levels...
>>
>>
>> I changed the names of the code heaps and comments accordingly.
>>
>>     I was hoping that we can have a more loose definition of the code
>> heaps.  So we can easily add more.  What this
>>     patch does is a good start but all the hardcoded values and fields
>> need to go at a later point in time, in my opinion.
>>
>>
>> To improve the adding of new heaps I completely removed the direct
>> dependencies on code heaps from the serviceability
>> agent, dtrace and the pstack library, especially the fields
>> _heap_non_method, _heap_non_profiled and _heap_profiled
>> fields in the code cache. The GrowableArray<CodeHeap*>* is now the only
>> field used to reference the code heaps. This
>> allows adding new heaps to the cache without the need to adapt
>> references. An exception is the dtrace ustack helper
>> script (jhelper.d). Because the D language does not allow loops, one has
>> to define a probe for each code heap. For now,
>> the script supports up to 5 code heaps, but more can be added by simply
>> adding new probes to the script (comment in line
>> 183 in jhelper.d describes this).
>>
>> To summarize, if I want to add a new code heap to the cache, all I have
>> to do is to
>> - define a new CodeBlobType (and add it to IsMethodPredicate if it is a
>> method type)
>> - create the heap in CodeCache::initialize_heaps() and define default
>> sizes
>> [- define heap availability in CodeCache::heap_available]
>> [- define the compilation level -> code blob type translation in
>> CodeCache::get_code_blob_type]
>>
>> The last two steps are optional. CodeBlobs created with the new
>> CodeBlobType are then allocated in the corresponding heap.
>>
>> I'm not completely sure what you mean with "hardcoded" values. Do you
>> mean the default values for
>> NonProfiledCodeHeapSize and ProfiledCodeHeapSize? Of course we could
>> compute them implicitly as 1/3 and 2/3 of the
>> ReservedCodeCacheSize (as it is now done if the user does not provide a
>> value), without defining them as VM parameter,
>> but otherwise we have to define default values for them.
>>
>>     src/share/vm/code/codeBlob.**hpp:
>>
>>     +// CodeBlob Types
>>     +// Used in the CodeCache to assign CodeBlobs to different CodeHeaps
>>     +enum CodeBlobType {
>>     +  btMethodNoProfile   = 0,    // T1 and T4 methods (including native
>> methods)
>>     +  btMethodProfile     = 1,    // T2 and T3 methods with profile
>> information
>>     +  btNonMethod         = 2     // Non-methods like Buffers, Adapters
>> and Runtime Stubs
>>     +};
>>
>>     The bt prefix is already overloaded (e.g. BasicType).  Generally
>> speaking, I never liked the fact that enums don't
>>     have their name as scope.  Maybe we can agree on doing something like
>> this:
>>
>>     struct CodeBlobType {
>>        enum Enum {
>>          NonMethod = 2
>>        };
>>     };
>>
>>     void foo() {
>>        int a = CodeBlobType::NonMethod;
>>        int b = (CodeBlobType::Enum) 1;
>>     }
>>
>>     Or we move to C++11 :-)
>>
>>     http://www.stroustrup.com/C++**11FAQ.html#enum<http://www.stroustrup.com/C++11FAQ.html#enum>
>>
>>
>> Yes, it is probably more clear to include the scope. I followed your
>> suggestion and changed it to a struct with an
>> anonymous enum.
>>
>>     src/share/vm/code/codeCache.**cpp:
>>
>>     +#define FOR_ALL_HEAPS(index) for (int index = 0; index <
>> _heaps->length(); ++index)
>>
>>     I'm one of the people who started to hate macros (for various
>> reasons).  Although I see the point in defining these
>>     I'd rather go without them.  Is there a way we can use STL vectors
>> with our allocators?  Or we could write our own
>>     iterator class which iterates either over GrowableArrays in general
>> or the code heaps in particular.
>>
>>
>> I perfectly agree with you that macros should be used only if necessary
>> and to improve readability in special cases. I
>> checked the option of using STL vectors but since the STL is not used in
>> the code yet and I'm not sure if they can be
>> used with the hotspot allocators, I think it is better to work with own
>> iterator classes. I therefore defined the custom
>> iterator classes GrowableArrayIterator and GrowableArrayFilterIterator.
>> They implement the STL input iterator interface
>> without actually inheriting from it (since #include <iterator> seems not
>> to work with solaris). The filter iterator is
>> used to iterate only over those code heaps satisfying a given predicate,
>> for example those heaps containing nmethods.
>> This makes the code a lot more readable and reduces the number of macros.
>> I think the remaining four macros, namely
>> FOR_ALL_HEAPS and FOR_ALL_METHOD_HEAPS to iterate over the heaps and
>> FOR_ALL_BLOBS and FOR_ALL_ALIVE_BLOBS to iterate
>> over the blobs are justified, because they are simple (only declaring the
>> iterators) and make the code more readable.
>>
>>     src/share/vm/memory/heap.hpp:
>>
>>     +  const char* getName() const   { return _name; }
>>
>>     This should either be get_name() or name().
>>
>> I renamed it to name().
>>
>> Thanks!
>>
>> Best regards
>> Tobias
>>
>>
>>     On 10/2/13 7:06 AM, Tobias Hartmann wrote:
>>
>>         Hi,
>>
>>         please review the following change.
>>
>>         bug: https://bugs.openjdk.java.net/**__browse/JDK-8015774<https://bugs.openjdk.java.net/__browse/JDK-8015774><
>> https://bugs.openjdk.java.**net/browse/JDK-8015774<https://bugs.openjdk.java.net/browse/JDK-8015774>
>> >
>>         webrev: http://cr.openjdk.java.net/~__**anoll/8015774/webrev.00/<http://cr.openjdk.java.net/~__anoll/8015774/webrev.00/>
>>
>>         <http://cr.openjdk.java.net/~**anoll/8015774/webrev.00/<http://cr.openjdk.java.net/~anoll/8015774/webrev.00/>
>> >
>>
>>         This change implements support for multiple code heaps in the code
>>         cache. The interface of the code cache was changed accordingly and
>>         references from other components of the VM were adapted. This
>> includes
>>         the indirect references from:
>>         - the Serviceability Agent: vmStructs and the Java code cache
>> interface
>>         (sun.jvm.hotspot.code.__**CodeCache)
>>
>>         - the dtrace ustack helper script (jhelper.d)
>>         - the pstack support library libjvm_db.c
>>
>>         Currently the code cache contains the following three code heaps
>> each of
>>         which contains CodeBlobs of a specific type:
>>         - Non-Profiled methods: nmethods that are not profiled, i.e.,
>> those
>>         compiled at tier 1 or 4 and native methods
>>         - Profiled methods: nmethods that are profiled, i.e., those
>> compiled at
>>         tier 2 or 3
>>         - Non-methods: Non-methods like Buffers, Adapters and Runtime
>> Stubs
>>
>>         By default, 2/3 of the ReservedCodeCacheSize is used for the
>>         non-profiled heap and 1/3 is used for the profiled heap.
>> Experiments
>>         with a small code cache size have shown that this configuration
>> performs
>>         best (see [1]). Sizes can be configured using the
>>         NonProfiledCodeHeapSize and ProfiledCodeHeapSize parameters. By
>> now the
>>         non-method heap has a fixed size of 8mb. More tests have to be
>> performed
>>         to determine reasonable default values for different build and
>> runtime
>>         configurations. It would be also possible to add an additional
>> heap for
>>         the CodeBlobs needed during VM startup (adapters, stubs,
>> interpreter,
>>         ...) and use the non-method heap for non-method blobs allocated
>> during
>>         runtime.
>>
>>         The main benefit of this allocation is that functions operating on
>>         nmethods can now be optimized to only iterate over CodeBlobs on
>> the
>>         nmethod heaps, avoiding a full scan of the code cache including
>> the
>>         non-methods. Performance evaluation shows that this results in a
>> great
>>         reduction of the sweeping time. Further it is now possible to
>> extend the
>>         sweeper to selectively sweep nmethods of different compilation
>> levels.
>>         Because it is more expensive to recompile a highly optimized
>> method, it
>>         makes sense to first sweep those compiled at a lower level.
>> Possible
>>         future optimizations may also include multithreaded sweeping and
>>         separation of nmethod code and metadata.
>>
>>         The code was tested using Nashorn + Octane, DaCapo, SPECJvm2008
>> and jprt.
>>
>>         Benchmark results:
>>         - [2] shows the score of the old and new version running Nashorn
>> with
>>         the Octane benchmark at different code cache sizes (10 runs each,
>> sizes
>>         < 32mb crash with the old version). An performance improvement of
>> around
>>         15% is reached.
>>         - [3] shows the sweep time of the NMethodSweeper during runs of
>> the
>>         Octane benchmark with different code cache sizes (10 runs each).
>> The
>>         time is greatly reduced with the new version.
>>         - [4] shows the time ratio (TOld / TNew - 1) of the DaCapo
>> benchmarks
>>         with a code cache size of 64mb (30 runs with 20 warmups each).
>> With the
>>         time differences being smaller than the confidence intervals it
>> is not
>>         possible determine a difference between the two versions.
>>         - Multiple runs of the SPECJvm2008 benchmark with different code
>> cache
>>         sizes did not show a difference between the two versions. This is
>>         probably due to the fact that not much code is compiled at runtime
>>         compared to Nashorn with the Octane benchmark.
>>
>>         The error bars at each data point show the 95% confidence
>> interval.
>>
>>         [1] https://bugs.openjdk.java.net/**__secure/attachment/16372/__*
>> *OctaneRatio.png<https://bugs.openjdk.java.net/__secure/attachment/16372/__OctaneRatio.png>
>>         <https://bugs.openjdk.java.**net/secure/attachment/16372/**
>> OctaneRatio.png<https://bugs.openjdk.java.net/secure/attachment/16372/OctaneRatio.png>
>> >
>>         [2]
>>         https://bugs.openjdk.java.net/**__secure/attachment/16374/__**
>> OctaneCCSizes_new.png<https://bugs.openjdk.java.net/__secure/attachment/16374/__OctaneCCSizes_new.png>
>>         <https://bugs.openjdk.java.**net/secure/attachment/16374/**
>> OctaneCCSizes_new.png<https://bugs.openjdk.java.net/secure/attachment/16374/OctaneCCSizes_new.png>
>> >
>>         [3]
>>         https://bugs.openjdk.java.net/**__secure/attachment/16373/__**
>> OctaneSweepTime_new.png<https://bugs.openjdk.java.net/__secure/attachment/16373/__OctaneSweepTime_new.png>
>>         <https://bugs.openjdk.java.**net/secure/attachment/16373/**
>> OctaneSweepTime_new.png<https://bugs.openjdk.java.net/secure/attachment/16373/OctaneSweepTime_new.png>
>> >
>>         [4]
>>         https://bugs.openjdk.java.net/**__secure/attachment/16371/__**
>> DaCapoCCSizes_64.png<https://bugs.openjdk.java.net/__secure/attachment/16371/__DaCapoCCSizes_64.png>
>>
>>         <https://bugs.openjdk.java.**net/secure/attachment/16371/**
>> DaCapoCCSizes_64.png<https://bugs.openjdk.java.net/secure/attachment/16371/DaCapoCCSizes_64.png>
>> >
>>
>>         Thanks and best regards,
>>
>>         Tobias
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20131010/893cc90a/attachment-0001.html