Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects

Kuai Wei kuaiwei.kw at alibaba-inc.com
Mon Oct 29 14:44:02 UTC 2018


Hi Vladimir,

  I've re-run spec test on our test machine several times. In my test, I can get 2~3 percent improvement. So I think it may help to our system. It would be good for us if it could be merged into upstream JDK. Could you help us to push this change?

Thanks,
Kevin


------------------------------------------------------------------
From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
Send Time:2018年10月25日(星期四) 01:04
To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects

jbb2015 finished and there is no significant difference.

To test changes I added diagnostic flag to turn changes on and off:

-XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode
-XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode

Please, try it in your setup.  Based on all this data I agree to push changes but please test this latest changes with 
on/off new code to verify that you still see improvement and you still *want* these changes (the last version of it).

Thanks,
Vladimir

diff -r c9459e2f7bc8 src/hotspot/share/opto/graphKit.cpp
--- a/src/hotspot/share/opto/graphKit.cpp Tue Oct 23 17:01:48 2018 -0400
+++ b/src/hotspot/share/opto/graphKit.cpp Wed Oct 24 08:52:48 2018 -0700
@@ -2116,8 +2116,17 @@
  // We use this to determine if an object is so "fresh" that
  // it does not require card marks.
  Node* GraphKit::just_allocated_object(Node* current_control) {
-  if (C->recent_alloc_ctl() == current_control)
-    return C->recent_alloc_obj();
+  Node* ctrl = current_control;
+  // Object::<init> is invoked after allocation, most of invoke nodes
+  // will be reduced, but a region node is kept in parse time, we check
+  // the pattern and skip the region node if it degraded to a copy.
+  if (UseNewCode && ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2 &&
+      ctrl->as_Region()->is_copy()) {
+    ctrl = ctrl->as_Region()->is_copy();
+  }
+  if (C->recent_alloc_ctl() == ctrl) {
+   return C->recent_alloc_obj();
+  }
    return NULL;
  }



On 10/24/18 8:59 AM, Vladimir Kozlov wrote:
> I ran jvm2008, jbb2005, jbb2015 (which is still running) and javac tool performance tests.
> I don't see any effects in jvm2008 and jbb2005. I see slight improvement in javac performance.
> Lets wait when jbb2015 finish it runs.
> We also have very unstable jbb2015 results and use mean value from several runs.
> 
> Vladimir
> 
> On 10/23/18 7:14 PM, Kuai Wei wrote:
>>
>> I tested several times and the result is similar. I can get overall performance improvement and some test suite 
>> degradation.
>> It looks the small jOPS test will cause more unstable score. Can you share some experience to to run spec with stable 
>> result?
>>
>> Thanks,
>> Kevin
>>
>>     ------------------------------------------------------------------
>>     From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>     Send Time:2018年10月24日(星期三) 06:57
>>     To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler
>>     <hotspot-compiler-dev at openjdk.java.net>
>>     Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>>     Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>
>>     Did you run it only one? It is known that jbb2015 produce unstable results. I would suggest to rerun test with
>>     regression several times.
>>
>>     My regression testing finished clean. I will submit performance testing too.
>>
>>     Regards,
>>     Vladimir
>>
>>     On 10/19/18 12:56 AM, Kuai Wei wrote:
>>      > Hi Vladimir,
>>      >
>>      >    Thanks for testing the change. It's good to me to move input check from assert to condition.
>>      > I also tested with specjbb2015. The result is
>>      >                                                          enable         disable
>>      > jbb2015.result.metric.max-jOPS           73234          71531      2.38%
>>      > jbb2015.result.metric.critical-jOPS        31256          30293      3.18%
>>      > jbb2015.result.SLA-10000-jOPS           16605          14902      11.43%
>>      > jbb2015.result.SLA-25000-jOPS            21715          23503      -7.61%
>>      > jbb2015.result.SLA-50000-jOPS            38076          34630       9.95%
>>      > jbb2015.result.SLA-75000-jOPS            43004          43145      -0.33%
>>      > jbb2015.result.SLA-100000-jOPS          50525          48751       3.64%
>>      >
>>      >   The jvm option I used is "-XX:+UseG1GC
>>      
>> > -XX:+UnlockExperimentalVMOptions -Xmx100g -Xms100g -XX:MaxGCPauseMillis=300 -XX:G1NewSizePercent=8 -XX:G1MaxNewSizePercent=50" 
>>
>>      
>> > It can improve overall performance. And there's one test suite with performance degradation. I'm not sure why it happen. 
>>
>>      >
>>      > Thanks,
>>      > Kevin
>>      >
>>      >     ------------------------------------------------------------------
>>      >     From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>      >     Send Time:2018年10月17日(星期三) 05:16
>>      >     To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler
>>      >     <hotspot-compiler-dev at openjdk.java.net>
>>      >     Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>>      >     Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>      >
>>      >     On 10/16/18 4:56 AM, Kuai Wei wrote:
>>      >      > Hi Vladimir,
>>      >      >
>>      >      >    About the copy region, I made this fix to check if region node is copy. In my test case, the
>>      >      > region is degraded to copy and
>>      >      > post write barrier is removed. I haven't checked the detail of the transform. I think there's an
>>      >      > ideal phase between inlining and
>>      >      > parsing store oop. So the phi node is removed and region node is degraded to copy.
>>      >      >
>>      >      >    We tested the fix in several online trade systems. We can check cpu usage difference between
>>      >      > machines with same load. I will
>>      >      > test with some benchmark tests and give the score.
>>      >
>>      >     Sounds good. Lets see if it helps in your production system.
>>      >     I will submit our internal testing with your latest fix with small change. I moved req() == 2 check
>>      >     from assert to condition to narrow cases (copy Region may have > 2 inputs with only 1 not-null input):
>>      >
>>      >     http://cr.openjdk.java.net/~kvn/8210853/webrev.00/
>>      >
>>      >     Vladimir
>>      >
>>      >      >
>>      >      > Thanks,
>>      >      > Kevin
>>      >      >
>>      >      >     ------------------------------------------------------------------
>>      >      >     From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>      >      >     Send Time:2018年10月16日(星期二) 08:45
>>      >      >     To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>;
>>      >      >     hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>      >      >     Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>>      >      >     Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>      >      >
>>      >      >     Hi Kevin,
>>      >      >
>>      >      >     On 10/4/18 8:18 AM, Kuai Wei wrote:
>>      >      >      > Hi Vladimir,
>>      >      >      >
>>      
>> >      >      >    I'm not sure about the phi node here. Is it used for merging allocation in fast path and slow path?
>>      >      >      > In parsing phase, the allocation node is not expanded, so there's no this phi node. If my understand
>>      >      >      > is wrong, please correct me. So far as I know, the region node is created while inlining initialize
>>      >      >      > method
>>      >      >      > of super type. And phi node is not necessary. But the region node is always created.
>>      >      >
>>      >      >     It is not related to slow and fast path of allocation. It is general rule to have Phi nodes when new
>>      >      >
>>      >      >     Region node is created. Like here:
>>      >      >
>>      >      >     http://hg.openjdk.java.net/jdk/jdk/file/04d4f1e4aff2/src/hotspot/share/opto/parse1.cpp#l772
>>      >      >
>>      >      >      >
>>      
>> >      >      >    I understand your concern. But write barrier is added in parse phase, we must check it in parse time
>>      >      >      > to do this optimization. If the region node is a copy, could we assume it will not have additional
>>      >      >      > input?
>>      >      >
>>      >      >     Yes, I think it is true. Copy Region never goes back to normal (merge) Region.
>>      >      >
>>      >      >      > I checked region node, a copy region node is created when only one input edge and compiler can not
>>      >      >      > reshape, the region node will be degraded to a copy. I think it will be dead in future phase.
>>      >      >
>>      >      >     Yes. But does your case have a 'copy' Region? I originally thought that you have normal Region with
>>      >      >     just one input.
>>      >      >
>>      >      >      >
>>      
>> >      >      >    In our trade system, this optimization could improve 5%+ performance. It's a big improvement and we
>>      >      >      > don't want to lose it. If there's better solution, we are pleasure to implement it.
>>      >      >
>>      >      >     Yes, I agree that we should fix it.
>>      >      >
>>      >      >     What testing did you perform?
>>      >      >
>>      >      >     Thanks,
>>      >      >     Vladimir
>>      >      >
>>      >      >      >
>>      >      >      > Thanks,
>>      >      >      > Kevin
>>      >      >      >
>>      >      >      >     ------------------------------------------------------------------
>>      >      >      >     From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>      >      >      >     Send Time:2018年10月4日(星期四) 06:50
>>      >      >      >     To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>;
>>      >      >      >     hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>>      >      >      >     Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>>      >      >      >     Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>      >      >      >
>>      >      >      >     Sorry, I mean
>>      >      >      >
>>      >      >      >     "should exist Phi *associated* with allocation"
>>      >      >      >
>>      >      >      >     On 10/3/18 3:48 PM, Vladimir Kozlov wrote:
>>      >      >      >      > Hi Kevin,
>>      >      >      >      >
>>      
>> >      >      >      > You correctly pointed in your analysis that if base of address is pointing to allocation there
>>      
>> >      >      >      > should not be other control path to this store. But it also means that before there should exist Phi 
>>
>>      
>> >      >      >      > not associated with allocation when Region node was added. If Phi node was removed at some time
>>      >      >      >      > Region node should be removed too (or not add it in first place). Please, look on that.
>>      >      >      >      >
>>      
>> >      >      >      > I share Tobias's concern about skipping Region node in Parse phase when IR graph is still constructed. 
>>
>>      >      >      >      >
>>      >      >      >      > Thanks,
>>      >      >      >      > Vladimir
>>      >      >      >      >
>>      >      >      >      > On 10/3/18 8:08 AM, Kuai Wei wrote:
>>      >      >      >      >> Hi Tobias,
>>      >      >      >      >>
>>      >      >      >      >>   I made the change to check with RegionNode::is_copy, could you check the new patch?
>>      >      >      >      >>
>>      >      >      >      >> diff --git a/src/hotspot/share/opto/graphKit.cpp b/src/hotspot/share/opto/graphKit.cpp
>>      >      >      >      >> index 068141f..884a76c 100644
>>      >      >      >      >> --- a/src/hotspot/share/opto/graphKit.cpp
>>      >      >      >      >> +++ b/src/hotspot/share/opto/graphKit.cpp
>>      >      >      >      >> @@ -2126,7 +2126,15 @@ void GraphKit::uncommon_trap(int trap_request,
>>      >      >      >      >>   // We use this to determine if an object is so "fresh" that
>>      >      >      >      >>   // it does not require card marks.
>>      >      >      >      >>   Node* GraphKit::just_allocated_object(Node* current_control) {
>>      >      >      >      >> -  if (C->recent_alloc_ctl() == current_control)
>>      >      >      >      >> +  Node * ctrl = current_control;
>>      >      >      >      >> +  // Object::<init> is invoked after allocation, most of invoke nodes
>>      >      >      >      >> +  // will be reduced, but a region node is kept in parse time, we check
>>      >      >      >      >> +  // the pattern and skip the region node
>>      >      >      >      >> +  if (ctrl != NULL && ctrl->is_Region() && ctrl->as_Region()->is_copy()) {
>>      >      >      >      >> +    assert(ctrl->req() == 2, "copy region has only 2 inputs");
>>      >      >      >      >> +    ctrl = ctrl->as_Region()->is_copy();
>>      >      >      >      >> +  }
>>      >      >      >      >> +  if (C->recent_alloc_ctl() == ctrl)
>>      >      >      >      >>       return C->recent_alloc_obj();
>>      >      >      >      >>     return NULL;
>>      >      >      >      >>   }
>>      >      >      >      >> Thanks,
>>      >      >      >      >> Kevin
>>      >      >      >      >>
>>      >      >      >      >>     ------------------------------------------------------------------
>>      >      >      >      >>     From:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>
>>      >      >      >      >>     Send Time:2018年9月25日(星期二) 21:50
>>      >      >      >      >>     To:Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler
>>      >      >      >      >>     <hotspot-compiler-dev at openjdk.java.net>
>>      >      >      >      >>     Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>>      >      >      >      >>     Subject:回复:回复:
>>      >      >     [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>      >      >      >      >>
>>      >      >      >      >>     Hi Tobias,
>>      >      >      >      >>
>>      
>> >      >      >      >>        Thanks for your comments. I will check RegionNode::is_copy to see if it can be used to detect 
>>
>>      >      >      >      >>     unnecessary region node. I will send new review after testing.
>>      >      >      >      >>
>>      >      >      >      >>     Best Regards,
>>      >      >      >      >>     Kevin
>>      >      >      >      >>
>>      >      >      >      >>     ------------------------------------------------------------------
>>      >      >      >      >>     发件人:Tobias Hartmann <tobias.hartmann at oracle.com>
>>      >      >      >      >>     发送时间:2018年9月24日(星期一) 21:34
>>      >      >      >      >>     收件人:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; hotspot compiler
>>      >      >      >      >>     <hotspot-compiler-dev at openjdk.java.net>
>>      >      >      >      >>     抄 送:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>>      >      >      >      >>     主 题:Re: 回复:
>>      >      >     [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>      >      >      >      >>
>>      >      >      >      >>     Hi Kevin,
>>      >      >      >      >>
>>      >      >      >      >>     On 24.09.2018 08:06, Kuai Wei wrote:
>>      >      >      >      >>
>>      
>> >      >      >      >> >   Thanks for your suggestion. I think your point is the region node may have new path in later parse 
>>
>>      >      >      >      >>
>>      >      >      >      >>      > phase, so we can not make sure the region node will be optimized.
>>      >      >      >      >>
>>      >      >      >      >>
>>      
>> >      >      >      >> Yes, my point is that a new path to the region might be added after your optimization and that path 
>>
>>      >      >      >      >>     might contain stores to the newly allocated object.
>>      >      >      >      >>
>>      
>> >      >      >      >>      >   It's a good question and I checked it. Now I think it may not cause trouble. In post barrier 
>>
>>      
>> >      >      >      >>      > reduce, the oop store use allocation node as base pointer. The data graph guarantee control of 
>>
>>      >      >      >      >>
>>      
>> >      >      >      >> > allocation node should dominate control of store. If allocation node is in pred of region node and 
>>
>>      >      >      >      >>
>>      
>> >      >      >      >> > there's a new path into region, the graph is bad because we can reach store without allocation.
>>      >      >      >      >>
>>      
>> >      >      >      >>     Yes but the new path might be a backedge from a loop that is dominated by the allocation.
>>      >      >      >      >>
>>      >      >      >      >>
>>      
>> >      >      >      >> > If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not 
>>
>>      >      >      >      >>
>>      >      >      >      >>      > reach control of allocation by skipping one region.
>>      >      >      >      >>
>>      >      >      >      >>
>>      
>> >      >      >      >> Right, that's basically the implicit assumption of your patch. I'm not sure if it always holds. But 
>>
>>      >      >      >      >>     I think you should at least use RegionNode::is_copy().
>>      >      >      >      >>
>>      >      >      >      >>     Let's see what other reviewers think.
>>      >      >      >      >>
>>      >      >      >      >>
>>      
>> >      >      >      >> >   The better solution is we can know the region node is created in exit_map and we will not change 
>>
>>      >      >      >      >>      > it in later. Is there any way to know it in compile time?
>>      >      >      >      >>
>>      >      >      >      >>
>>      
>> >      >      >      >> The region node is created in Parse::build_exits(). I don't think there is a way to keep track of this. 
>>
>>      >      >      >      >>
>>      >      >      >      >>
>>      >      >      >      >>     Thanks,
>>      >      >      >      >>     Tobias
>>      >      >      >      >>
>>      >      >      >
>>      >      >
>>      >      >
>>      >
>>      >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20181029/052e0126/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list