Any way to avoid JIT overhead for small programs when using AOT?
jayaprabhakar k
jayaprabhakar at gmail.com
Tue Sep 11 06:22:51 UTC 2018
On Mon, 10 Sep 2018 at 01:40, <hotspot-compiler-dev-request at openjdk.java.net>
wrote:
> Send hotspot-compiler-dev mailing list submissions to
> hotspot-compiler-dev at openjdk.java.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-compiler-dev
> or, via email, send a message with subject or body 'help' to
> hotspot-compiler-dev-request at openjdk.java.net
>
> You can reach the person managing the list at
> hotspot-compiler-dev-owner at openjdk.java.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of hotspot-compiler-dev digest..."
>
>
> Today's Topics:
>
> 1. Any way to avoid JIT overhead for small programs when using
> AOT? (jayaprabhakar k)
> 2. [PING] RE: RFR(S): 8210152: Optimize integer divisible by
> power-of-2 check (Pengfei Li (Arm Technology China))
> 3. Re: Any way to avoid JIT overhead for small programs when
> using AOT? (dean.long at oracle.com)
> 4. Re: Any way to avoid JIT overhead for small programs when
> using AOT? (Andrew Haley)
> 5. JIT: C2 doesn't skip post barrier for new allocated objects
> (Kuai Wei)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 9 Sep 2018 20:58:02 -0700
> From: jayaprabhakar k <jayaprabhakar at gmail.com>
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Any way to avoid JIT overhead for small programs when using
> AOT?
> Message-ID:
> <CA+t=
> Si+mtt7YVS2tAtMfD9MWhxOJkHky9T_i68EMHWvgHgEioQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
> I understand that at present AOT and -Xint are not compatible. I see the
> code explicitly disables AOT when -Xint is set
> <
> http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> >
> .
>
> For extremely short programs, typically used by beginners learning Java, I
> see that CDS, AOT and Xint all help reduce the startup time. While CDS
> works with both AOT and Xint, multiplying the benefits, AOT and Xint do
> not.
>
> Is there a way to keep both AOT + Xint, For classes/methods that are
> precompiled, use AOT code, and for others just interpret? If not now, would
> it be possible in the future?
>
> Thanks,
> JP
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180909/86bb6624/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 10 Sep 2018 04:24:16 +0000
> From: "Pengfei Li (Arm Technology China)" <Pengfei.Li at arm.com>
> To: "dean.long at oracle.com" <dean.long at oracle.com>, Vladimir Kozlov
> <vladimir.kozlov at oracle.com>, "
> hotspot-compiler-dev at openjdk.java.net"
> <hotspot-compiler-dev at openjdk.java.net>,
> "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
> "Pengfei Li (Arm Technology China)" <Pengfei.Li at arm.com>
> Cc: nd <nd at arm.com>
> Subject: [PING] RE: RFR(S): 8210152: Optimize integer divisible by
> power-of-2 check
> Message-ID:
> <
> DB7PR08MB31150B1D6C7E547538B2B99A96050 at DB7PR08MB3115.eurprd08.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Dean / Vladimir / JDK experts,
>
> Do you have any further questions or comments on this patch? Or should I
> make some modifications on it, such as adding some limitations to the
> matching condition?
> I appreciate your help.
>
> --
> Thanks,
> Pengfei
>
>
> > -----Original Message-----
> > From: Pengfei Li (Arm Technology China)
> > Sent: Monday, September 3, 2018 13:50
> > To: 'dean.long at oracle.com' <dean.long at oracle.com>; 'Vladimir Kozlov'
> > <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
> > hotspot-dev at openjdk.java.net
> > Cc: nd <nd at arm.com>
> > Subject: RE: RFR(S): 8210152: Optimize integer divisible by power-of-2
> check
> >
> > Hi Vladimir, Dean,
> >
> > Thanks for your review.
> >
> > > I don't see where negation is coming from for 'X % 2 == 0' expression.
> > > It should be only 2 instructions: 'cmp (X and 1), 0'
> > The 'cmp (X and 1), 0' is just what we expected. But there's redundant
> > conditional negation coming from the possibly negative X handling in "X
> % 2".
> > For instance, X = -5, "X % 2" should be -1. So only "(X and 1)"
> operation is not
> > enough. We have to negate the result.
> >
> > > I will look on it next week. But it would be nice if you can provide
> small test
> > to show this issue.
> > I've already provided a case of "if (a%2 == 0) { ... }" in JBS
> description. What
> > code generated and what can be optimized are listed there.
> > You could see https://bugs.openjdk.java.net/browse/JDK-8210152 for
> details.
> > You could also see the test case for this optimization I attached below.
> >
> > > It looks like your matching may allow more patterns than expected. I
> was
> > expecting it to look for < 0 or >= 0 for the conditional negation, but I
> don't see
> > it.
> > Yes. I didn't limit the if condition to <0 or >= 0 so it will match more
> patterns.
> > But nothing is going wrong if this ideal transformation applies on more
> cases.
> > In pseudo code, if someone writes:
> > if ( some_condition ) { x = -x; }
> > if ( x == 0 ) { do_something(); }
> > The negation in 1st if-clause could always be eliminated whatever the
> > condition is.
> >
> > --
> > Thanks,
> > Pengfei
> >
> >
> > -- my test case attached below --
> > public class Foo {
> >
> > public static void main(String[] args) {
> > int[] dividends = { 0, 17, 1553, -90, -35789, 0x80000000 };
> > for (int i = 0; i < dividends.length; i++) {
> > int x = dividends[i];
> > System.out.println(testDivisible(x));
> > System.out.println(testModulo(x));
> > testCondNeg(x);
> > }
> > return;
> > }
> >
> > public static int testDivisible(int x) {
> > // Modulo result is only for zero check
> > if (x % 4 == 0) {
> > return 444;
> > }
> > return 555;
> > }
> >
> > public static int testModulo(int x) {
> > int y = x % 4;
> > if (y == 0) {
> > return 222;
> > }
> > // Modulo result is used elsewhere
> > System.out.println(y);
> > return 333;
> > }
> >
> > public static void testCondNeg(int x) {
> > // Pure conditional negation
> > if (printAndIfNeg(x)) {
> > x = -x;
> > }
> > if (x == 0) {
> > System.out.println("zero!");
> > }
> > }
> >
> > static boolean printAndIfNeg(int x) {
> > System.out.println(x);
> > return x <= 0;
> > }
> > }
>
> ------------------------------
>
> Message: 3
> Date: Mon, 10 Sep 2018 01:00:29 -0700
> From: dean.long at oracle.com
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: Any way to avoid JIT overhead for small programs when
> using AOT?
> Message-ID: <fa2711c0-e73e-b7b0-9ca6-5d0fb52cb330 at oracle.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> On 9/9/18 8:58 PM, jayaprabhakar k wrote:
> > Hi,
> > I understand that at present AOT and -Xint are not compatible. I see
> > the code explicitly disables AOT when -Xint is set
> > <
> http://cr.openjdk.java.net/%7Ekvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> >.
> >
> > For extremely short programs, typically used by beginners learning
> > Java, I see that CDS, AOT and Xint all help reduce the startup time.
> > While CDS works with both AOT and Xint, multiplying the benefits, AOT
> > and Xint do not.
> >
> > Is there a way to keep both AOT?+ Xint, For classes/methods that are
> > precompiled, use AOT code, and for others just interpret? If not now,
> > would it be possible in the future?
> >
> > Thanks,
> > JP
>
> Hi JP.? Yes, it could be possible in the future.? One problem is
> MethodHandle intrinsics.? With -Xint, there's no code heap, so no place
> to generate native adapters for those intrinsics.
>
> dl
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/5f3ec9cd/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Mon, 10 Sep 2018 09:17:59 +0100
> From: Andrew Haley <aph at redhat.com>
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: Any way to avoid JIT overhead for small programs when
> using AOT?
> Message-ID: <2753b70f-67c7-ef7a-ca40-49266f502401 at redhat.com>
> Content-Type: text/plain; charset=utf-8
>
> On 09/10/2018 04:58 AM, jayaprabhakar k wrote:
>
> > I understand that at present AOT and -Xint are not compatible. I see the
> > code explicitly disables AOT when -Xint is set
> > <
> http://cr.openjdk.java.net/~kvn/8171137/webrev/raw_files/new/src/share/vm/aot/aotLoader.cpp
> >
> > .
> >
> > For extremely short programs, typically used by beginners learning Java,
> I
> > see that CDS, AOT and Xint all help reduce the startup time. While CDS
> > works with both AOT and Xint, multiplying the benefits, AOT and Xint do
> > not.
> >
> > Is there a way to keep both AOT + Xint, For classes/methods that are
> > precompiled, use AOT code, and for others just interpret? If not now,
> would
> > it be possible in the future?
>
> Does it significantly help? If you precompile the Java library and your
> programs
> are extremely short, you'll see very little compilation activity.
>
Thanks Andrew.
I don't see any compilation (The default -XX:CompileThreshold is quite
large), but the overhead still seems to be large. I ran a small test on
AWS T2 instances.
The test class just has empty main method. But I could reproduce the exact
same behavior when run with *--dry-run* command line option.
So most of the delay happens on startup.
-- Default --
$ perf stat -e cpu-clock -r50 java -XX:+UseG1GC EmptyMainMethod
Performance counter stats for 'java -XX:+UseG1GC EmptyMainMethod' (50 runs):
104.039398 cpu-clock (msec)
( +- 0.39% )
0.093801870 seconds time elapsed
( +- 2.66% )
-- Xint --
perf stat -e cpu-clock -r50 java -XX:+UseG1GC -Xint EmptyMainMethod
Performance counter stats for 'java -XX:+UseG1GC -Xint
EmptyMainMethod' (50 runs):
76.203249 cpu-clock (msec)
( +- 0.33% )
0.083464038 seconds time elapsed
( +- 2.03% )
-- AOT --
$ perf stat -e cpu-clock -r50 java -XX:+UseG1GC
-XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod
Performance counter stats for 'java -XX:+UseG1GC
-XX:AOTLibrary=jaot/touched_methods.so EmptyMainMethod' (50 runs):
102.416037 cpu-clock (msec)
( +- 0.22% )
0.083394143 seconds time elapsed
( +- 0.92% )
--
--
The source code for the test is
public class EmptyMainMethod {
public static void main(String[] args) {
}
}
--
This delay seems consistent with most programs created by school students
learning Java.
Context for the request: I am the developer of Codiva.io online Java IDE
<https://www.codiva.io>. Many teachers recommend it for their students to
learn java. To support spiky load, I run the programs on the server on a
container with reduced resource limits for each run. At 10% CPU limit, the
difference gets around 200ms.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 10 Sep 2018 16:39:42 +0800
> From: "Kuai Wei" <kuaiwei.kw at alibaba-inc.com>
> To: "hotspot compiler" <hotspot-compiler-dev at openjdk.java.net>
> Subject: JIT: C2 doesn't skip post barrier for new allocated objects
> Message-ID:
> <1849545e-2e42-4b24-957c-cb4924362971.kuaiwei.kw at alibaba-inc.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> Hi,
>
> Recently I checked the optimization of reducing G1 post barrier for new
> allocated object. But I found it doesn't work as expected.
> I wrote a simple test case to store oop in initialize function or just
> after init function .
> public class StoreTest {
> static String val="x";
>
> public static Foo testMethod() {
> Foo newfoo = new Foo(val);
> newfoo.b=val; // the store barrier could be reduced
> return newfoo;
> }
>
> public static void main(String []args) {
> Foo obj = new Foo(val); // init Foo class
> testMethod();
> }
>
> static class Foo {
> Object a;
> Object b;
> public Foo(Object val) {
> this.a=val; // the store barrier could be reduced
> };
> }
> }
> I inline Foo:<init> and Object::<init> when compile testMethod by C2, so I
> think the 2 store marked red don't need post barrier. But I still found
> post barrier in generated assembly code.
> The test command: java -Xcomp -Xbatch -XX:+UseG1GC
> -XX:CompileCommandFile=compile_command -Xbatch -XX:+PrintCompilation
> -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
> StoreTest
> compile_command:
> compileonly, StoreTest::testMethod
> compileonly, StoreTest$Foo::<init>
> inline, StoreTest$Foo::<init>
> compileonly, java.lang.Object::<init>
> inline, java.lang.Object::<init>
> print, StoreTest::testMethod
>
> I checked the node graph in parsing phase. The optimization depends on
> GraphKit::just_allocated_object to detect new allocate object. The idea is
> to check control of store is control proj of allocation. But in parse phase
> , there's a Region node between control proj and control of store. The
> region just has one input edge. So it could be optimized later. The region
> node is generated when C2 inline init method of super class, I think it's
> used in exit map to merge all exit path.
>
> The change is simple, in just_allocated_object, I checked if there's
> region node with only 1 input. With the change, we can see good performance
> improvement in pressure test.
>
> Could you check the change and give comments about it?
>
> graphKit.cpp
> // We use this to determine if an object is so "fresh" that
> // it does not require card marks.
> Node* GraphKit::just_allocated_object(Node* current_control) {
> - if (C->recent_alloc_ctl() == current_control)
> + Node * ctrl = current_control;
> + if (CheckJustAllocatedAggressive) {
> + // Object::<init> is invoked after allocation, most of invoke nodes
> + // will be reduced, but a region node is kept in parse time, we check
> + // the pattern and skip the region node
> + if (ctrl != NULL && ctrl->is_Region() && ctrl->req() == 2) {
> + ctrl = ctrl->in(1);
> + }
> + }
> + if (C->recent_alloc_ctl() == ctrl)
> return C->recent_alloc_obj();
> return NULL;
> }
> Thanks,
> Kevin
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/0f0f7161/attachment.html
> >
>
> End of hotspot-compiler-dev Digest, Vol 136, Issue 30
> *****************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180910/4b8dde1a/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list