RFR (L): 8015774: Add support for multiple code heaps

Fri Oct 11 10:38:07 PDT 2013

This looks acceptable.

Thanks,
Vladimir

On 10/11/13 7:00 AM, Tobias Hartmann wrote:
> Thanks again for the review.
>
> I added a NonMethodCodeHeapSize flag, because it still wasn't clear what
> to do if the user increases or decreases the default
> ReservedCodeCacheSize, without specifying the sizes of the heaps
> individually. Now the default NonMethodCodeHeapSize is used and the
> remaining size is assigned 1/3 to the profiled heap and 2/3 to the
> non-profiled heap. Like this, it is more transparent because default
> values for all three heaps are available and can also be set by the
> user. I also added value checks to arguments.cpp.
>
> Please find the new webrev at:
> http://cr.openjdk.java.net/~anoll/8015774/webrev.03/
>
> See comments inline:
>
> 2013/10/11 Vladimir Kozlov <vladimir.kozlov at oracle.com
> <mailto:vladimir.kozlov at oracle.com>>
>
>
>     You need to place "while (next == NULL" between next lines:
>
>              _seen++;
>     !       _current_nmethod = next;
>
>     otherwise you will pass incorrect _current_type to process_nmethod().
>
> Done.
>
>     I would prefer to have names similar to flags:
>
>     "non-profiling nmethods"
>     "profiling nmethods"
>
>
> I changed the names to "Non-profiled nmethods" and "Profiled nmethods".
>
> Regards,
> Tobias
>
>
>     On 10/10/13 8:36 AM, Tobias Hartmann wrote:
>
>         Vladimir, thanks again for the quick review!
>
>         Please find the new webrev at:
>         http://cr.openjdk.java.net/~__anoll/8015774/webrev.02/
>         <http://cr.openjdk.java.net/~anoll/8015774/webrev.02/>
>
>         See comments inline:
>
>         2013/10/10 Vladimir Kozlov <vladimir.kozlov at oracle.com
>         <mailto:vladimir.kozlov at oracle.com>
>         <mailto:vladimir.kozlov at __oracle.com
>         <mailto:vladimir.kozlov at oracle.com>>>
>
>
>
>              And you totally removed code in sweeper which enables
>         compilation again.
>
>              This is bad. PSR runs FA in VM environment only with few
>         threads (2)
>              and with this C2 Compiler threads will continue consume
>         resources
>              even when there are no space to store produced code. Output
>         will be
>              stormed by "CodeHeap for %s is full." messages.
>
>              It doesn't even do decay invocation counters because next
>         code is
>              not executed:
>                  if (!should_compile_new_jobs()) {
>
>         CompilationPolicy::policy()->____delay_compilation(method());
>
>                    return NULL;
>                  }
>
>              Next code is also not executed. As result you continue do only
>              partial sweeps when codecache (some heaps) is full:
>
>                266   if (!CompileBroker::should_____compile_new_jobs()) {
>
>                267     // If we have turned off compilations we might as
>         well do
>              full sweeps
>                268     // in order to reach the clean state faster.
>         Otherwise the
>              sleeping compiler
>                269     // threads will slow down sweeping.
>                270     _invocations = 1;
>                271   }
>
>
>         The "CodeHeap for %s is full warning" is printed only once for each
>         CodeHeap, so in fact we print less warnings than the old
>         implementation
>         does.
>         Because we check in the common transition functions of the threshold
>         policies (SimpleThresholdPolicy::common and
>         AdvancedThresholdPolicy::__common) if the CodeHeap for the next
>         compilation level is available and remain at the current level
>         if it is
>         full, the compiler threads do not produce code if a heap is full:
>
>         if
>         (CodeCache::is_full(CodeCache:__:get_code_blob_type(next___level)))
>         {
>             // The CodeHeap for next_level is full, stay at current level
>            return cur_level;
>         }
>
>         They only consume resources to repeatedly check if the CodeHeap
>         for the
>         next compilation level is now available. Benchmarking showed
>         that this
>         does not negatively affect performance. You are right with the not
>         executed code parts and the partial sweeps. Although this can be
>         adapted
>         easily I followed your suggestions and reverted the changes. As
>         before,
>         compilation is now disabled as long as a code heap is full and
>         enabled
>         after sweeping.
>
>         I still think that it would be better to disable compilation for
>         CodeHeaps individually (as this was done implicitly by my
>         implementation), because we could still compile into the other heap.
>         Maybe this can be introduced in a future patch.
>
>                  I agree, this part is still a little bit unclear. This
>         is partly
>                  due to the fact that I'm still not sure how to set the
>                  size of the non method heap. On my machine around 3.5Mb are
>                  sufficient for all benchmarks (debug/product). The jprt
>                  tests seem to need more, that's why it is temporarily
>         set to
>                  8Mb. I have to evaluate that in more detail. Probably we
>                  can set it depending on build/runtime settings and
>         operating system.
>
>
>              [Non]ProfiledCodeHeapSize are already platform specific
>         flags. You
>              don't need to do anything spacial.
>
>
>         Yes, I meant that I still have to figure out the specific values for
>         each platform.
>
>                  One option would be to create a new heap for all blobs
>         needed by
>                  the VM to startup. This heap size can be set to
>                  CodeCacheMinimumUseSpace. All non method blobs created
>         by the
>                  running java application are then put into the non method
>                  heap.
>
>
>              CodeCacheMinimumUseSpace is already used to run only
>         Interpreter.
>              You can't change its meaning. You need to make sure your
>         new code
>              allow to run with such size.
>
>              I mean:
>
>              size_t non_method_size = ReservedCodeCacheSize -
>              (ProfiledCodeHeapSize + NonProfiledCodeHeapSize);
>
>
>         I implemented it as you suggested. By default
>         NonProfiledCodeHeapSize is
>         approximately ProfiledCodeHeapSize * 2. I will set the platform
>         specific
>         default values such that the non_method_size is big enough to
>         run the
>         interpreter (currently it is set to 4Mb).
>
>              My question was: why you need Profiled heap for C1 and
>         Zero? Because
>              this is what you get if COMPILER2 is not defined (Client or
>         Zero VM).
>
>
>         You are perfectly right. The profiled heap is not needed if
>         COMPILER2 is
>         not defined. I changed it accordingly.
>
>                  I'm not sure if I understood your point correctly. At
>         the moment
>                  we delay compilation for the corresponding compilation
>                  level if a heap is full. In practice (at least for the
>                  benchmarks I run) the second heap then fills up really
>         fast. Your
>                  suggestion to allocate CodeBlobs in another heap somehow
>                  "breaks" the design, because the client of the
>         CodeCache could
>                  then encounter non profiled, e.g. C2 compiled, methods
>         while
>                  iterating over the profiled heap. This is not really a
>                  problem at the moment, since no code relies on the fact
>         that the
>                  non profiled heap _only_ contains C2 methods (as long
>                  as it contains only _methods_).
>                  However, I think that the second suggestion to move the
>         boundary
>                  between the heaps is better. One could then increase
>                  the non profiled heap and decrease the profiled heap
>         size if the
>                  first is full. Over time the non profiled heap would
>                  probably grow, whereas the profiled heap would shrink.
>
>                  Because I suspect this to be another big change, I
>         would however
>                  suggest to implement this behavior with a second patch
>                  and first finish the current changes.
>
>
>              Agree that it is for an other changes. But you need to design
>              current layout to allow to do that (move boundary) in a future.
>
>         To move the boundary we have to adapt the VirtualSpace used by the
>         CodeHeaps to support "expanding into" another VirtualSpace as it is
>         already implemented in class PSVirtualSpace (::expand_into) used by
>         parallel scavenge GC. Having this we can define two adjacent
>         VirtualSpaces for the profiled and non-profiled heaps, one of which
>         grows high to low, and move the boundary in case one is full.
>         Similar
>         functionality is implemented in class AdjoiningVirtualSpaces
>         (also used
>         by the parallel scavenge GC). The current design of the code cache
>         shouldn't be affected much.
>
>         I will look into this more thoroughly and implement the missing
>         functionality in VirtualSpace and CodeHeap.
>
>              Whole print code is this:
>
>              +     warning("CodeHeap for %s is full. Compilation for these
>              methods has been delayed.",
>         CodeCache::get_heap_name(code_____blob_type));
>
>
>              +     warning("Try increasing the code cache size using
>              -XX:ReservedCodeCacheSize=");
>
>              You print specific code code heap name in first message (I
>         agree
>              with Christian that name are not good). You need to
>         additionally
>              print corresponding flag in next message because increasing
>              ReservedCodeCacheSize will not help to NonProfiled heap if
>         you don't
>              increase NonProfiledCodeHeapSize.
>
>         Done (names were already changed in the last webrev).
>
>              About changes in sweeper.cpp.
>              I think you accidentally removed collection of freed_memory in
>              sweep_code_cache().
>
>
>         Right, I reverted that.
>
>              And you can simplify the code (no need for external loop while
>              (_current_type <=) ) if you move "while (_current_method ==
>         NULL
>              ..." inside inner loop after "_current_nmethod = next".
>
>
>         Done. Didn't notice that, thanks for the remark.
>
>
>         Regards,
>         Tobias
>
>                       thanks,
>                       Vladimir
>
>                  2013/10/4 Christian Thalinger
>         <christian.thalinger at oracle.____com
>                  <mailto:christian.thalinger at __oracle.com
>         <mailto:christian.thalinger at oracle.com>>
>                  <mailto:christian.thalinger@
>         <mailto:christian.thalinger@>____oracle.com <http://oracle.com>
>
>                  <mailto:christian.thalinger at __oracle.com
>         <mailto:christian.thalinger at oracle.com>>>>
>
>
>                       Some general comments:
>
>                       Don't use T1, T2, … to name the tiers because SPARC
>                  processors are named like this.  It's confusing.  Use
>         tier 1,
>                       tier 2, … instead.  The compilation policies
>         actually talk
>                  about tier levels...
>
>
>                  I changed the names of the code heaps and comments
>         accordingly.
>
>                       I was hoping that we can have a more loose
>         definition of
>                  the code heaps.  So we can easily add more.  What this
>                       patch does is a good start but all the hardcoded
>         values and
>                  fields need to go at a later point in time, in my opinion.
>
>
>                  To improve the adding of new heaps I completely removed the
>                  direct dependencies on code heaps from the serviceability
>                  agent, dtrace and the pstack library, especially the fields
>                  _heap_non_method, _heap_non_profiled and _heap_profiled
>                  fields in the code cache. The GrowableArray<CodeHeap*>*
>         is now
>                  the only field used to reference the code heaps. This
>                  allows adding new heaps to the cache without the need
>         to adapt
>                  references. An exception is the dtrace ustack helper
>                  script (jhelper.d). Because the D language does not
>         allow loops,
>                  one has to define a probe for each code heap. For now,
>                  the script supports up to 5 code heaps, but more can be
>         added by
>                  simply adding new probes to the script (comment in line
>                  183 in jhelper.d describes this).
>
>                  To summarize, if I want to add a new code heap to the
>         cache, all
>                  I have to do is to
>                  - define a new CodeBlobType (and add it to
>         IsMethodPredicate if
>                  it is a method type)
>                  - create the heap in CodeCache::initialize_heaps() and
>         define
>                  default sizes
>                  [- define heap availability in CodeCache::heap_available]
>                  [- define the compilation level -> code blob type
>         translation in
>                  CodeCache::get_code_blob_type]
>
>                  The last two steps are optional. CodeBlobs created with
>         the new
>                  CodeBlobType are then allocated in the corresponding heap.
>
>                  I'm not completely sure what you mean with "hardcoded"
>         values.
>                  Do you mean the default values for
>                  NonProfiledCodeHeapSize and ProfiledCodeHeapSize? Of
>         course we
>                  could compute them implicitly as 1/3 and 2/3 of the
>                  ReservedCodeCacheSize (as it is now done if the user
>         does not
>                  provide a value), without defining them as VM parameter,
>                  but otherwise we have to define default values for them.
>
>                       src/share/vm/code/codeBlob.____hpp:
>
>
>                       +// CodeBlob Types
>                       +// Used in the CodeCache to assign CodeBlobs to
>         different
>                  CodeHeaps
>                       +enum CodeBlobType {
>                       +  btMethodNoProfile   = 0,    // T1 and T4 methods
>                  (including native methods)
>                       +  btMethodProfile     = 1,    // T2 and T3
>         methods with
>                  profile information
>                       +  btNonMethod         = 2     // Non-methods like
>         Buffers,
>                  Adapters and Runtime Stubs
>                       +};
>
>                       The bt prefix is already overloaded (e.g. BasicType).
>                    Generally speaking, I never liked the fact that enums
>         don't
>                       have their name as scope.  Maybe we can agree on doing
>                  something like this:
>
>                       struct CodeBlobType {
>                          enum Enum {
>                            NonMethod = 2
>                          };
>                       };
>
>                       void foo() {
>                          int a = CodeBlobType::NonMethod;
>                          int b = (CodeBlobType::Enum) 1;
>                       }
>
>                       Or we move to C++11 :-)
>
>         http://www.stroustrup.com/C++____11FAQ.html#enum
>         <http://www.stroustrup.com/C++__11FAQ.html#enum>
>
>                  <http://www.stroustrup.com/C++__11FAQ.html#enum
>         <http://www.stroustrup.com/C++11FAQ.html#enum>>
>
>
>                  Yes, it is probably more clear to include the scope. I
>         followed
>                  your suggestion and changed it to a struct with an
>                  anonymous enum.
>
>                       src/share/vm/code/codeCache.____cpp:
>
>
>                       +#define FOR_ALL_HEAPS(index) for (int index = 0;
>         index <
>                  _heaps->length(); ++index)
>
>                       I'm one of the people who started to hate macros (for
>                  various reasons).  Although I see the point in defining
>         these
>                       I'd rather go without them.  Is there a way we can
>         use STL
>                  vectors with our allocators?  Or we could write our own
>                       iterator class which iterates either over
>         GrowableArrays in
>                  general or the code heaps in particular.
>
>
>                  I perfectly agree with you that macros should be used
>         only if
>                  necessary and to improve readability in special cases. I
>                  checked the option of using STL vectors but since the
>         STL is not
>                  used in the code yet and I'm not sure if they can be
>                  used with the hotspot allocators, I think it is better
>         to work
>                  with own iterator classes. I therefore defined the custom
>                  iterator classes GrowableArrayIterator and
>                  GrowableArrayFilterIterator. They implement the STL input
>                  iterator interface
>                  without actually inheriting from it (since #include
>         <iterator>
>                  seems not to work with solaris). The filter iterator is
>                  used to iterate only over those code heaps satisfying a
>         given
>                  predicate, for example those heaps containing nmethods.
>                  This makes the code a lot more readable and reduces the
>         number
>                  of macros. I think the remaining four macros, namely
>                  FOR_ALL_HEAPS and FOR_ALL_METHOD_HEAPS to iterate over
>         the heaps
>                  and FOR_ALL_BLOBS and FOR_ALL_ALIVE_BLOBS to iterate
>                  over the blobs are justified, because they are simple (only
>                  declaring the iterators) and make the code more readable.
>
>                       src/share/vm/memory/heap.hpp:
>
>                       +  const char* getName() const   { return _name; }
>
>                       This should either be get_name() or name().
>
>                  I renamed it to name().
>
>                  Thanks!
>
>                  Best regards
>                  Tobias
>
>
>                       On 10/2/13 7:06 AM, Tobias Hartmann wrote:
>
>                           Hi,
>
>                           please review the following change.
>
>                           bug:
>         https://bugs.openjdk.java.net/______browse/JDK-8015774
>         <https://bugs.openjdk.java.net/____browse/JDK-8015774>
>                  <https://bugs.openjdk.java.__net/__browse/JDK-8015774
>         <https://bugs.openjdk.java.net/__browse/JDK-8015774>>
>                  <https://bugs.openjdk.java.____net/browse/JDK-8015774
>                  <https://bugs.openjdk.java.__net/browse/JDK-8015774
>         <https://bugs.openjdk.java.net/browse/JDK-8015774>>>
>                           webrev:
>         http://cr.openjdk.java.net/~______anoll/8015774/webrev.00/
>         <http://cr.openjdk.java.net/~____anoll/8015774/webrev.00/>
>
>         <http://cr.openjdk.java.net/~____anoll/8015774/webrev.00/
>         <http://cr.openjdk.java.net/~__anoll/8015774/webrev.00/>>
>
>
>
>           <http://cr.openjdk.java.net/~____anoll/8015774/webrev.00/
>         <http://cr.openjdk.java.net/~__anoll/8015774/webrev.00/>
>                  <http://cr.openjdk.java.net/~__anoll/8015774/webrev.00/
>         <http://cr.openjdk.java.net/~anoll/8015774/webrev.00/>>>
>
>                           This change implements support for multiple
>         code heaps
>                  in the code
>                           cache. The interface of the code cache was changed
>                  accordingly and
>                           references from other components of the VM were
>                  adapted. This includes
>                           the indirect references from:
>                           - the Serviceability Agent: vmStructs and the
>         Java code
>                  cache interface
>                           (sun.jvm.hotspot.code.______CodeCache)
>
>
>                           - the dtrace ustack helper script (jhelper.d)
>                           - the pstack support library libjvm_db.c
>
>                           Currently the code cache contains the
>         following three
>                  code heaps each of
>                           which contains CodeBlobs of a specific type:
>                           - Non-Profiled methods: nmethods that are not
>         profiled,
>                  i.e., those
>                           compiled at tier 1 or 4 and native methods
>                           - Profiled methods: nmethods that are
>         profiled, i.e.,
>                  those compiled at
>                           tier 2 or 3
>                           - Non-methods: Non-methods like Buffers,
>         Adapters and
>                  Runtime Stubs
>
>                           By default, 2/3 of the ReservedCodeCacheSize
>         is used
>                  for the
>                           non-profiled heap and 1/3 is used for the profiled
>                  heap. Experiments
>                           with a small code cache size have shown that this
>                  configuration performs
>                           best (see [1]). Sizes can be configured using the
>                           NonProfiledCodeHeapSize and ProfiledCodeHeapSize
>                  parameters. By now the
>                           non-method heap has a fixed size of 8mb. More
>         tests
>                  have to be performed
>                           to determine reasonable default values for
>         different
>                  build and runtime
>                           configurations. It would be also possible to
>         add an
>                  additional heap for
>                           the CodeBlobs needed during VM startup (adapters,
>                  stubs, interpreter,
>                           ...) and use the non-method heap for
>         non-method blobs
>                  allocated during
>                           runtime.
>
>                           The main benefit of this allocation is that
>         functions
>                  operating on
>                           nmethods can now be optimized to only iterate over
>                  CodeBlobs on the
>                           nmethod heaps, avoiding a full scan of the
>         code cache
>                  including the
>                           non-methods. Performance evaluation shows that
>         this
>                  results in a great
>                           reduction of the sweeping time. Further it is now
>                  possible to extend the
>                           sweeper to selectively sweep nmethods of different
>                  compilation levels.
>                           Because it is more expensive to recompile a highly
>                  optimized method, it
>                           makes sense to first sweep those compiled at a
>         lower
>                  level. Possible
>                           future optimizations may also include
>         multithreaded
>                  sweeping and
>                           separation of nmethod code and metadata.
>
>                           The code was tested using Nashorn + Octane,
>         DaCapo,
>                  SPECJvm2008 and jprt.
>
>                           Benchmark results:
>                           - [2] shows the score of the old and new version
>                  running Nashorn with
>                           the Octane benchmark at different code cache
>         sizes (10
>                  runs each, sizes
>                           < 32mb crash with the old version). An performance
>                  improvement of around
>                           15% is reached.
>                           - [3] shows the sweep time of the
>         NMethodSweeper during
>                  runs of the
>                           Octane benchmark with different code cache
>         sizes (10
>                  runs each). The
>                           time is greatly reduced with the new version.
>                           - [4] shows the time ratio (TOld / TNew - 1)
>         of the
>                  DaCapo benchmarks
>                           with a code cache size of 64mb (30 runs with
>         20 warmups
>                  each). With the
>                           time differences being smaller than the confidence
>                  intervals it is not
>                           possible determine a difference between the
>         two versions.
>                           - Multiple runs of the SPECJvm2008 benchmark with
>                  different code cache
>                           sizes did not show a difference between the two
>                  versions. This is
>                           probably due to the fact that not much code is
>         compiled
>                  at runtime
>                           compared to Nashorn with the Octane benchmark.
>
>                           The error bars at each data point show the 95%
>                  confidence interval.
>
>                           [1]
>         https://bugs.openjdk.java.net/______secure/attachment/16372/______OctaneRatio.png
>         <https://bugs.openjdk.java.net/____secure/attachment/16372/____OctaneRatio.png>
>
>         <https://bugs.openjdk.java.__net/__secure/attachment/16372/____OctaneRatio.png
>         <https://bugs.openjdk.java.net/__secure/attachment/16372/__OctaneRatio.png>>
>
>
>         <https://bugs.openjdk.java.____net/secure/attachment/16372/____OctaneRatio.png
>
>         <https://bugs.openjdk.java.__net/secure/attachment/16372/__OctaneRatio.png
>         <https://bugs.openjdk.java.net/secure/attachment/16372/OctaneRatio.png>>>
>                           [2]
>         https://bugs.openjdk.java.net/______secure/attachment/16374/______OctaneCCSizes_new.png
>         <https://bugs.openjdk.java.net/____secure/attachment/16374/____OctaneCCSizes_new.png>
>
>         <https://bugs.openjdk.java.__net/__secure/attachment/16374/____OctaneCCSizes_new.png
>         <https://bugs.openjdk.java.net/__secure/attachment/16374/__OctaneCCSizes_new.png>>
>
>
>         <https://bugs.openjdk.java.____net/secure/attachment/16374/____OctaneCCSizes_new.png
>
>         <https://bugs.openjdk.java.__net/secure/attachment/16374/__OctaneCCSizes_new.png
>         <https://bugs.openjdk.java.net/secure/attachment/16374/OctaneCCSizes_new.png>>>
>                           [3]
>         https://bugs.openjdk.java.net/______secure/attachment/16373/______OctaneSweepTime_new.png
>         <https://bugs.openjdk.java.net/____secure/attachment/16373/____OctaneSweepTime_new.png>
>
>         <https://bugs.openjdk.java.__net/__secure/attachment/16373/____OctaneSweepTime_new.png
>         <https://bugs.openjdk.java.net/__secure/attachment/16373/__OctaneSweepTime_new.png>>
>
>
>         <https://bugs.openjdk.java.____net/secure/attachment/16373/____OctaneSweepTime_new.png
>
>         <https://bugs.openjdk.java.__net/secure/attachment/16373/__OctaneSweepTime_new.png
>         <https://bugs.openjdk.java.net/secure/attachment/16373/OctaneSweepTime_new.png>>>
>                           [4]
>         https://bugs.openjdk.java.net/______secure/attachment/16371/______DaCapoCCSizes_64.png
>         <https://bugs.openjdk.java.net/____secure/attachment/16371/____DaCapoCCSizes_64.png>
>
>         <https://bugs.openjdk.java.__net/__secure/attachment/16371/____DaCapoCCSizes_64.png
>         <https://bugs.openjdk.java.net/__secure/attachment/16371/__DaCapoCCSizes_64.png>>
>
>
>
>         <https://bugs.openjdk.java.____net/secure/attachment/16371/____DaCapoCCSizes_64.png
>
>
>         <https://bugs.openjdk.java.__net/secure/attachment/16371/__DaCapoCCSizes_64.png
>         <https://bugs.openjdk.java.net/secure/attachment/16371/DaCapoCCSizes_64.png>>>
>
>                           Thanks and best regards,
>
>                           Tobias
>
>
>
>