Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Oct 24 17:04:13 UTC 2018
jbb2015 finished and there is no significant difference.
To test changes I added diagnostic flag to turn changes on and off:
-XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode
-XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode
Please, try it in your setup. Based on all this data I agree to push changes but please test this latest changes with
on/off new code to verify that you still see improvement and you still *want* these changes (the last version of it).
Thanks,
Vladimir
diff -r c9459e2f7bc8 src/hotspot/share/opto/graphKit.cpp
--- a/src/hotspot/share/opto/graphKit.cpp Tue Oct 23 17:01:48 2018 -0400
+++ b/src/hotspot/share/opto/graphKit.cpp Wed Oct 24 08:52:48 2018 -0700
@@ -2116,8 +2116,17 @@
// We use this to determine if an object is so "fresh" that
// it does not require card marks.
Node* GraphKit::just_allocated_object(Node* current_control) {
- if (C->recent_alloc_ctl() == current_control)
- return C->recent_alloc_obj();
+ Node* ctrl = current_control;
+ // Object::<init> is invoked after allocation, most of invoke nodes
+ // will be reduced, but a region node is kept in parse time, we check
+ // the pattern and skip the region node if it degraded to a copy.
+ if (UseNewCode && ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2 &&
+ ctrl->as_Region()->is_copy()) {
+ ctrl = ctrl->as_Region()->is_copy();
+ }
+ if (C->recent_alloc_ctl() == ctrl) {
+ return C->recent_alloc_obj();
+ }
return NULL;
}
On 10/24/18 8:59 AM, Vladimir Kozlov wrote:
> I ran jvm2008, jbb2005, jbb2015 (which is still running) and javac tool performance tests.
> I don't see any effects in jvm2008 and jbb2005. I see slight improvement in javac performance.
> Lets wait when jbb2015 finish it runs.
> We also have very unstable jbb2015 results and use mean value from several runs.
>
> Vladimir
>
> On 10/23/18 7:14 PM, Kuai Wei wrote:
>>
>> I tested several times and the result is similar. I can get overall performance improvement and some test suite
>> degradation.
>> It looks the small jOPS test will cause more unstable score. Can you share some experience to to run spec with stable
>> result?
>>
>> Thanks,
>> Kevin
>>
>> ------------------------------------------------------------------
>> From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> Send Time:2018年10月24日(星期三) 06:57
>> To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>> Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>>
>> Did you run it only one? It is known that jbb2015 produce unstable results. I would suggest to rerun test with
>> regression several times.
>>
>> My regression testing finished clean. I will submit performance testing too.
>>
>> Regards,
>> Vladimir
>>
>> On 10/19/18 12:56 AM, Kuai Wei wrote:
>> > Hi Vladimir,
>> >
>> > Thanks for testing the change. It's good to me to move input check from assert to condition.
>> > I also tested with specjbb2015. The result is
>> > enable disable
>> > jbb2015.result.metric.max-jOPS 73234 71531 2.38%
>> > jbb2015.result.metric.critical-jOPS 31256 30293 3.18%
>> > jbb2015.result.SLA-10000-jOPS 16605 14902 11.43%
>> > jbb2015.result.SLA-25000-jOPS 21715 23503 -7.61%
>> > jbb2015.result.SLA-50000-jOPS 38076 34630 9.95%
>> > jbb2015.result.SLA-75000-jOPS 43004 43145 -0.33%
>> > jbb2015.result.SLA-100000-jOPS 50525 48751 3.64%
>> >
>> > The jvm option I used is "-XX:+UseG1GC
>>
>> > -XX:+UnlockExperimentalVMOptions -Xmx100g -Xms100g -XX:MaxGCPauseMillis=300 -XX:G1NewSizePercent=8 -XX:G1MaxNewSizePercent=50"
>>
>>
>> > It can improve overall performance. And there's one test suite with performance degradation. I'm not sure why it happen.
>>
>> >
>> > Thanks,
>> > Kevin
>> >
>> > ------------------------------------------------------------------
>> > From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> > Send Time:2018年10月17日(星期三) 05:16
>> > To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler
>> > <hotspot-compiler-dev at openjdk.java.net>
>> > Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>> > Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>> >
>> > On 10/16/18 4:56 AM, Kuai Wei wrote:
>> > > Hi Vladimir,
>> > >
>> > > About the copy region, I made this fix to check if region node is copy. In my test case, the
>> > > region is degraded to copy and
>> > > post write barrier is removed. I haven't checked the detail of the transform. I think there's an
>> > > ideal phase between inlining and
>> > > parsing store oop. So the phi node is removed and region node is degraded to copy.
>> > >
>> > > We tested the fix in several online trade systems. We can check cpu usage difference between
>> > > machines with same load. I will
>> > > test with some benchmark tests and give the score.
>> >
>> > Sounds good. Lets see if it helps in your production system.
>> > I will submit our internal testing with your latest fix with small change. I moved req() == 2 check
>> > from assert to condition to narrow cases (copy Region may have > 2 inputs with only 1 not-null input):
>> >
>> > http://cr.openjdk.java.net/~kvn/8210853/webrev.00/
>> >
>> > Vladimir
>> >
>> > >
>> > > Thanks,
>> > > Kevin
>> > >
>> > > ------------------------------------------------------------------
>> > > From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> > > Send Time:2018年10月16日(星期二) 08:45
>> > > To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>;
>> > > hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> > > Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>> > > Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>> > >
>> > > Hi Kevin,
>> > >
>> > > On 10/4/18 8:18 AM, Kuai Wei wrote:
>> > > > Hi Vladimir,
>> > > >
>>
>> > > > I'm not sure about the phi node here. Is it used for merging allocation in fast path and slow path?
>> > > > In parsing phase, the allocation node is not expanded, so there's no this phi node. If my understand
>> > > > is wrong, please correct me. So far as I know, the region node is created while inlining initialize
>> > > > method
>> > > > of super type. And phi node is not necessary. But the region node is always created.
>> > >
>> > > It is not related to slow and fast path of allocation. It is general rule to have Phi nodes when new
>> > >
>> > > Region node is created. Like here:
>> > >
>> > > http://hg.openjdk.java.net/jdk/jdk/file/04d4f1e4aff2/src/hotspot/share/opto/parse1.cpp#l772
>> > >
>> > > >
>>
>> > > > I understand your concern. But write barrier is added in parse phase, we must check it in parse time
>> > > > to do this optimization. If the region node is a copy, could we assume it will not have additional
>> > > > input?
>> > >
>> > > Yes, I think it is true. Copy Region never goes back to normal (merge) Region.
>> > >
>> > > > I checked region node, a copy region node is created when only one input edge and compiler can not
>> > > > reshape, the region node will be degraded to a copy. I think it will be dead in future phase.
>> > >
>> > > Yes. But does your case have a 'copy' Region? I originally thought that you have normal Region with
>> > > just one input.
>> > >
>> > > >
>>
>> > > > In our trade system, this optimization could improve 5%+ performance. It's a big improvement and we
>> > > > don't want to lose it. If there's better solution, we are pleasure to implement it.
>> > >
>> > > Yes, I agree that we should fix it.
>> > >
>> > > What testing did you perform?
>> > >
>> > > Thanks,
>> > > Vladimir
>> > >
>> > > >
>> > > > Thanks,
>> > > > Kevin
>> > > >
>> > > > ------------------------------------------------------------------
>> > > > From:Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> > > > Send Time:2018年10月4日(星期四) 06:50
>> > > > To:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; Tobias Hartmann <tobias.hartmann at oracle.com>;
>> > > > hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> > > > Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>> > > > Subject:Re: 回复:[Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>> > > >
>> > > > Sorry, I mean
>> > > >
>> > > > "should exist Phi *associated* with allocation"
>> > > >
>> > > > On 10/3/18 3:48 PM, Vladimir Kozlov wrote:
>> > > > > Hi Kevin,
>> > > > >
>>
>> > > > > You correctly pointed in your analysis that if base of address is pointing to allocation there
>>
>> > > > > should not be other control path to this store. But it also means that before there should exist Phi
>>
>>
>> > > > > not associated with allocation when Region node was added. If Phi node was removed at some time
>> > > > > Region node should be removed too (or not add it in first place). Please, look on that.
>> > > > >
>>
>> > > > > I share Tobias's concern about skipping Region node in Parse phase when IR graph is still constructed.
>>
>> > > > >
>> > > > > Thanks,
>> > > > > Vladimir
>> > > > >
>> > > > > On 10/3/18 8:08 AM, Kuai Wei wrote:
>> > > > >> Hi Tobias,
>> > > > >>
>> > > > >> I made the change to check with RegionNode::is_copy, could you check the new patch?
>> > > > >>
>> > > > >> diff --git a/src/hotspot/share/opto/graphKit.cpp b/src/hotspot/share/opto/graphKit.cpp
>> > > > >> index 068141f..884a76c 100644
>> > > > >> --- a/src/hotspot/share/opto/graphKit.cpp
>> > > > >> +++ b/src/hotspot/share/opto/graphKit.cpp
>> > > > >> @@ -2126,7 +2126,15 @@ void GraphKit::uncommon_trap(int trap_request,
>> > > > >> // We use this to determine if an object is so "fresh" that
>> > > > >> // it does not require card marks.
>> > > > >> Node* GraphKit::just_allocated_object(Node* current_control) {
>> > > > >> - if (C->recent_alloc_ctl() == current_control)
>> > > > >> + Node * ctrl = current_control;
>> > > > >> + // Object::<init> is invoked after allocation, most of invoke nodes
>> > > > >> + // will be reduced, but a region node is kept in parse time, we check
>> > > > >> + // the pattern and skip the region node
>> > > > >> + if (ctrl != NULL && ctrl->is_Region() && ctrl->as_Region()->is_copy()) {
>> > > > >> + assert(ctrl->req() == 2, "copy region has only 2 inputs");
>> > > > >> + ctrl = ctrl->as_Region()->is_copy();
>> > > > >> + }
>> > > > >> + if (C->recent_alloc_ctl() == ctrl)
>> > > > >> return C->recent_alloc_obj();
>> > > > >> return NULL;
>> > > > >> }
>> > > > >> Thanks,
>> > > > >> Kevin
>> > > > >>
>> > > > >> ------------------------------------------------------------------
>> > > > >> From:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>
>> > > > >> Send Time:2018年9月25日(星期二) 21:50
>> > > > >> To:Tobias Hartmann <tobias.hartmann at oracle.com>; hotspot compiler
>> > > > >> <hotspot-compiler-dev at openjdk.java.net>
>> > > > >> Cc:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>> > > > >> Subject:回复:回复:
>> > > [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>> > > > >>
>> > > > >> Hi Tobias,
>> > > > >>
>>
>> > > > >> Thanks for your comments. I will check RegionNode::is_copy to see if it can be used to detect
>>
>> > > > >> unnecessary region node. I will send new review after testing.
>> > > > >>
>> > > > >> Best Regards,
>> > > > >> Kevin
>> > > > >>
>> > > > >> ------------------------------------------------------------------
>> > > > >> 发件人:Tobias Hartmann <tobias.hartmann at oracle.com>
>> > > > >> 发送时间:2018年9月24日(星期一) 21:34
>> > > > >> 收件人:蒯微(麦庶) <kuaiwei.kw at alibaba-inc.com>; hotspot compiler
>> > > > >> <hotspot-compiler-dev at openjdk.java.net>
>> > > > >> 抄 送:李三红(三红) <sanhong.lsh at alibaba-inc.com>
>> > > > >> 主 题:Re: 回复:
>> > > [Patch] 8210853: C2 doesn't skip post barrier for new allocated objects
>> > > > >>
>> > > > >> Hi Kevin,
>> > > > >>
>> > > > >> On 24.09.2018 08:06, Kuai Wei wrote:
>> > > > >>
>>
>> > > > >> > Thanks for your suggestion. I think your point is the region node may have new path in later parse
>>
>> > > > >>
>> > > > >> > phase, so we can not make sure the region node will be optimized.
>> > > > >>
>> > > > >>
>>
>> > > > >> Yes, my point is that a new path to the region might be added after your optimization and that path
>>
>> > > > >> might contain stores to the newly allocated object.
>> > > > >>
>>
>> > > > >> > It's a good question and I checked it. Now I think it may not cause trouble. In post barrier
>>
>>
>> > > > >> > reduce, the oop store use allocation node as base pointer. The data graph guarantee control of
>>
>> > > > >>
>>
>> > > > >> > allocation node should dominate control of store. If allocation node is in pred of region node and
>>
>> > > > >>
>>
>> > > > >> > there's a new path into region, the graph is bad because we can reach store without allocation.
>> > > > >>
>>
>> > > > >> Yes but the new path might be a backedge from a loop that is dominated by the allocation.
>> > > > >>
>> > > > >>
>>
>> > > > >> > If allocation node is in a domination ancestor, the graph shape is a little complicated, so we can not
>>
>> > > > >>
>> > > > >> > reach control of allocation by skipping one region.
>> > > > >>
>> > > > >>
>>
>> > > > >> Right, that's basically the implicit assumption of your patch. I'm not sure if it always holds. But
>>
>> > > > >> I think you should at least use RegionNode::is_copy().
>> > > > >>
>> > > > >> Let's see what other reviewers think.
>> > > > >>
>> > > > >>
>>
>> > > > >> > The better solution is we can know the region node is created in exit_map and we will not change
>>
>> > > > >> > it in later. Is there any way to know it in compile time?
>> > > > >>
>> > > > >>
>>
>> > > > >> The region node is created in Parse::build_exits(). I don't think there is a way to keep track of this.
>>
>> > > > >>
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Tobias
>> > > > >>
>> > > >
>> > >
>> > >
>> >
>> >
More information about the hotspot-compiler-dev
mailing list