From rwestrel at redhat.com Mon Oct 2 07:25:11 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Oct 2017 09:25:11 +0200 Subject: RFR(S): 8187822: C2 conditonal move optimization might create broken graph In-Reply-To: References: <0d0b226d-418f-2344-2ff9-a7682747a0e2@oracle.com> <66e0634c-84a6-f8ef-6451-bf1e6a9be252@oracle.com> Message-ID: > Yes. Thanks to look on it. Changes are good. Thanks for the review. Anyone to sponsor this fix? Roland. From rwestrel at redhat.com Mon Oct 2 07:48:14 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Oct 2017 09:48:14 +0200 Subject: RFR(S): 8187822: C2 conditonal move optimization might create broken graph In-Reply-To: References: <0d0b226d-418f-2344-2ff9-a7682747a0e2@oracle.com> <66e0634c-84a6-f8ef-6451-bf1e6a9be252@oracle.com> Message-ID: Ready to push changeset: http://cr.openjdk.java.net/~roland/8187822/changeset Roland. From martin.doerr at sap.com Mon Oct 2 09:03:39 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 2 Oct 2017 09:03:39 +0000 Subject: sponsor needed for 8185979: PPC64: Implement SHA2 intrinsic In-Reply-To: <4bd56460-59c6-f95a-7a9a-9a6687d84115@oracle.com> References: <4bd56460-59c6-f95a-7a9a-9a6687d84115@oracle.com> Message-ID: <6bf0bbb8001c4b6d88145abd48b15574@sap.com> Hi Vladimir, thanks a lot. Best regards, Martin -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Freitag, 29. September 2017 20:41 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: sponsor needed for 8185979: PPC64: Implement SHA2 intrinsic I will sponsor it. Vladimir On 9/29/17 8:05 AM, Doerr, Martin wrote: > Hi, > > we need a sponsor for the following PPC64 change: > > 8185979: PPC64: Implement SHA2 intrinsic > > because it touches hotspot tests. > > Latest webrev for jdk10/hs is here: > > http://cr.openjdk.java.net/~mdoerr/8185979_ppc_sha2/webrev.06/ > > It already has 2 reviews. Can somebody push it through JPRT, please? > > Best regards, > > Martin > From rwestrel at redhat.com Mon Oct 2 09:40:34 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Oct 2017 11:40:34 +0200 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 Message-ID: http://cr.openjdk.java.net/~roland/8188151/webrev.00/ When Compilation::generate_exception_handler_table() walks the exception handler information to populate the exception handler table, it has some logic that removes duplicate handlers for one particular throwing pc and it is wrong AFAICT. That code iterates over already processed (handler_bci, scope_count, entry_pco) triples stored in GrowableArrays bcis, scope_depths, pcos and looks for entries for which handler_bci, scope_count are identical to the current one. It does that by looking for an entry with same handler_bci in the bcis array and then checks whether scope_count matches too. The list of triples could be something like: 1: (13, 0, ..) 2: (13, 1, ..) and the next triple to be process: (13, 1, ..) which is a duplicate of 2. That logic looks for a handler with bci 13, finds entry 1 which doesn't have scope count 1. And concludes that there no duplicate entry. It would need to look at the following entry too. Given scope counts are sorted in increasing order, rather that iterate over the list of triples from the start, looking for duplicates fromt the end of the list fixes that problem. Roland. From gustavo.scalet at eldorado.org.br Mon Oct 2 10:53:32 2017 From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet) Date: Mon, 2 Oct 2017 10:53:32 +0000 Subject: [10] RFR(M): 8185976: PPC64: Implement MulAdd and SquareToLen intrinsics In-Reply-To: References: <1f159ee480284095b8e5c3f444dceb96@serv031.corp.eldorado.org.br> <16e8b68451e94eb79cdd7d9cb5d7984c@sap.com> <2425566a8ff74051af485c919a0bf5ee@serv030.corp.eldorado.org.br> <4ec93a6bcbe14cf99c2fa02d50a18965@sap.com> <0ef23b5fcbc54996aea876d4c60e4097@sap.com> <10a918efbd344b1fbf95c56b7beedbc0@serv031.corp.eldorado.org.br> <2badcffdc4fb44c9ba77b5a1c6cc26fb@sap.com> <6362e4c1e3ab4871b12232580f2971aa@serv031.corp.eldorado.org.br> <1a54e77a4b5e45e2b848da3fccf423dd@serv030.corp.eldorado.org.br> <170eb84ecbdb4c9b9fe8d4481e4c319f@sap.com> <0aaf319e25934903a468542d02f6a734@serv030.corp.eldorado.org.br> <2432cbfebfa342dfb560ecf4d6023581@serv030.corp.eldorado.org.br> Message-ID: <257df2509b4c4376967933c9b08ac967@serv030.corp.eldorado.org.br> Sorry, I didn't notice that. Thanks, have a great week > -----Original Message----- > From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com] > Sent: sexta-feira, 29 de setembro de 2017 19:48 > To: Gustavo Serra Scalet ; Doerr, Martin > > Cc: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and > SquareToLen intrinsics > > Hi, > > I pushed it a few days ago: > http://hg.openjdk.java.net/jdk10/hs/rev/122833427b36 > > Cheers, > Goetz. > > > -----Original Message----- > > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] > > Sent: Friday, September 29, 2017 11:26 PM > > To: Doerr, Martin ; Lindenmaier, Goetz > > > > Cc: 'hotspot-compiler-dev at openjdk.java.net' > dev at openjdk.java.net> > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and > > SquareToLen intrinsics > > > > Hi Martin and Goetz, > > > > A new webrev updated to the new repo structure was requested and can > > be viewed below: > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.05/ > > > > PS: changes applied cleanly from old hotspot to new one. > > > > Can it be sponsored now? > > > > Thanks. > > > > > -----Original Message----- > > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > > > Sent: quarta-feira, 6 de setembro de 2017 09:45 > > > To: Lindenmaier, Goetz ; Doerr, Martin > > > ; 'hotspot-compiler-dev at openjdk.java.net' > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and > > > SquareToLen intrinsics > > > > > > Alright, thanks for the instructions. I'll keep that in mind. > > > > > > > -----Original Message----- > > > > From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com] > > > > Sent: quarta-feira, 6 de setembro de 2017 09:44 > > > > To: Gustavo Serra Scalet ; Doerr, > > > > Martin ; 'hotspot-compiler- > > dev at openjdk.java.net' > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and > > > > SquareToLen intrinsics > > > > > > > > Hi Gustavo, > > > > > > > > the repos are all closed. Once they are opened again, you will > > > > have to merge your change into the new repo structure, post a new > > > > webrev and only then it can be sponsored. Me or Martin will > sponsor it then. > > > > > > > > Best regards, > > > > Goetz. > > > > > > > > > -----Original Message----- > > > > > From: Gustavo Serra Scalet > > > > > [mailto:gustavo.scalet at eldorado.org.br] > > > > > Sent: Mittwoch, 6. September 2017 14:32 > > > > > To: Lindenmaier, Goetz ; Doerr, > > > > > Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and > > > > > SquareToLen intrinsics > > > > > > > > > > Thanks Goetz. > > > > > > > > > > Could somebody sponsor this change? > > > > > > > > > > THanks > > > > > > > > > > > -----Original Message----- > > > > > > From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com] > > > > > > Sent: quarta-feira, 6 de setembro de 2017 03:30 > > > > > > To: Gustavo Serra Scalet ; > > > > > > Doerr, Martin ; 'hotspot-compiler- > > > > dev at openjdk.java.net' > > > > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and > > > > > > SquareToLen intrinsics > > > > > > > > > > > > Hi, > > > > > > > > > > > > I had a look at this change and tested it. Reviewed. > > > > > > > > > > > > Best regards, > > > > > > Goetz. > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > > > > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > > > > > > > Sent: Freitag, 1. September 2017 19:12 > > > > > > > To: Doerr, Martin ; 'hotspot-compiler- > > > > > > > dev at openjdk.java.net' > > > > > > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd > > > > > > > and SquareToLen intrinsics > > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Doerr, Martin > > > > > > > > your first webrev already works on Big Endian. So the only > > > > > > > > required change is to fix your new code by this trivial > patch: > > > > > > > > --- a/src/cpu/ppc/vm/stubGenerator_ppc.cpp Fri Sep 01 > > > > 17:47:45 > > > > > > 2017 > > > > > > > > +0200 > > > > > > > > +++ b/src/cpu/ppc/vm/stubGenerator_ppc.cpp Fri Sep 01 > > > > 17:55:08 > > > > > > 2017 > > > > > > > > +0200 > > > > > > > > @@ -3426,7 +3426,9 @@ > > > > > > > > __ srdi (product, product, 1); > > > > > > > > // join them to the same register and store it as > > > > > > > > Little > > > > Endian > > > > > > > > __ orr (product, lplw_s, product); > > > > > > > > +#ifdef VM_LITTLE_ENDIAN > > > > > > > > __ rldicl (product, product, 32, 0); > > > > > > > > +#endif > > > > > > > > __ stdu (product, 8, out_aux); > > > > > > > > __ bdnz (LOOP_SQUARE); > > > > > > > > > > > > > > > > So please enable it again for Big Endian in > vm_version_ppc. > > > > > > > > Besides that, it looks good to me. We also need a 2nd > review. > > > > > > > > > > > > > > Great! Thanks for checking it and suggesting the diff. > > > > > > > > > > > > > > I changed these things. You can find it below: > > > > > > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.04/ > > > > > > > > > > > > > > I wonder who could be a 2nd reviewer... Anybody in mind that > > > > > > > we may > > > > > > ping? > > > > > > > Maybe Goetz Lindenmaier? > > > > > > > > > > > > > > Best Regards, > > > > > > > Gustavo Serra Scalet > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Gustavo Serra Scalet > > > > > > > > [mailto:gustavo.scalet at eldorado.org.br] > > > > > > > > Sent: Mittwoch, 30. August 2017 19:03 > > > > > > > > To: Doerr, Martin ; > > > > > > > > 'hotspot-compiler- dev at openjdk.java.net' > > > > > > > > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd > > > > > > > > and SquareToLen intrinsics > > > > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > (webrev at the end) > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Doerr, Martin > > > > > > > > > > > > > > > > > > > The s/rldicl/rldic/ was fixed for "offset", but "len" > > > > > > > > > > doesn't seem to need further changes as it's being > > > > > > > > > > cleared with clrldi, which is the same as rldic with > no shift. > > > > > > > > > > Therefore it's treated appropriately as requested for > > > > "offset" parameter. Do you agree? > > > > > > > > > > > > > > > > > > No, I didn't find clrldi for len in generate_mulAdd(). > > > > > > > > > Only > > > > for k. > > > > > > > > > > > > > > > > I'm sorry. I was thinking about "offset" and "k", which > > > > > > > > are both cleaned on generate_mulAdd(). "len" was not > > > > > > > > cleaned and it was being used on > > > > > > > > muladd() directly with cmpdi, which could lead to > problems. > > > > > > > > > > > > > > > > That is being changed. > > > > > > > > > > > > > > > > > Where are in_len and out_len fixed up in > > > > generate_squareToLen()? > > > > > > > > > > > > > > > > They are not. According to your suggestions, I agree it > > > > > > > > also needs to be done for the same reason. > > > > > > > > > > > > > > > > > > You are right. The way I'm building the 64 bits of the > > > > > > > > > > register depends on which kind of endianness it is > run. > > > > > > > > > > For now it works only on little endian so I'm adding a > > > > > > > > > > switch (just like I did for SHA) to make it available > > > > > > > > > > only on > > > > little endian systems. > > > > > > > > > > > > > > > > > > It shouldn't be that hard to get it working on big > > > > > > > > > endian > > > > > > > > > ;-) Btw., my point was not to replace the 2 4-byte store > > > > > > > > > instructions by an 8-byte one (though I'm also ok with > > > that). > > > > > > > > > It was that 2 stwu which update the same pointer doesn't > > > > > > > > > make sense from > > > > > > performance point of view. > > > > > > > > > Please keep something which works on big endian, too. > > > > > > > > > > > > > > > > I see. The 2x stwu was being used like that because it was > > > > > > > > the trivial approach when considering the original java > update: > > > > > > > > z[i++] = (lastProductLowWord << 31) | (int)(product >>> > > > > > > > > 33); z[i++] = (int)(product >>> 1); > > > > > > > > > > > > > > > > As you pointed out, that might cause some stall on the > > > > > > > > pipeline so I made it with 1s stdu (and could improve code > > > > > > > > by reducing 1 > > > > > > > > instruction) > > > > > > > > > > > > > > > > Now about having a big endian version: I'm not confident > > > > > > > > in doing so as I don't have access to such a machine at > > > > > > > > the > > > moment. > > > > > > > > You were kind on offering test support but I don't know if > > > > > > > > it'd work like that. I may support you in checking out > > > > > > > > which places are endianness-related but I'm not > > > > > > > > comfortable in sending you untested > > > > > > code. > > > > > > > > > > > > > > > > Would you be interested in doing such a changes for making > > > > > > > > it work on Big Endian? For this patch, I provided an > > > > > > > > interesting test that might help you to verify if it > worked. > > > > > > > > > > > > > > > > > > No, I used the jdk8u152-b01 (State of repository at > > > > > > > > > > Thu Apr > > > > > > > > > > 6 > > > > > > > > > > 14:15:31 2017). The reported performance speedup was > > > > > > > > > > calculated by running the following test > > > > (TestSquareToLen.java): > > > > > > > > > > > > > > > > > > Seems like JDK-8145913 has not been backported, yet. > > > > > > > > > Sorry for not checking this earlier. So if you want to > > > > > > > > > make RSA really fast, it should be so much better to > > > > > > > > > backport that one. But I can still sponsor this change > > > > > > > > > as it may be used > > > elsewhere. > > > > > > > > > > > > > > > > No problem. It's nice to know that I may not need to > > > > > > > > request a backport of this patch for performance reasons. > > > > > > > > > > > > > > > > And at last, but not least, the new webrev with these > > > > > > > > clrldi > > > > > > changes: > > > > > > > > https://gut.github.io/openjdk/webrev/JDK- > > > > > > > 8185976/webrev.03/index.html > > > > > > > > > > > > > > > > Thank you once again, > > > > > > > > Gustavo Serra Scalet > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Gustavo Serra Scalet > > > > > > > > > [mailto:gustavo.scalet at eldorado.org.br] > > > > > > > > > Sent: Dienstag, 29. August 2017 22:37 > > > > > > > > > To: Doerr, Martin ; > > > > > > > > > 'hotspot-compiler- dev at openjdk.java.net' > > > > > > > > > > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement > > > > > > > > > MulAdd and SquareToLen intrinsics > > > > > > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > New changes: > > > > > > > > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev. > > > > > > > > > 02/ > > > > > > > > > > > > > > > > > > Check comments below, please. > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Doerr, Martin > > > > > > > > > > > > > > > > > > > > 1. Sign extending offset and len Right, sign and zero > > > > > > > > > > extending is equivalent for offset and len because > > > > > > > > > > they are guaranteed to be >=0 (by checks in Java). But > > > > > > > > > > you can only rely on bit 32 (IBM > > > > > > > > > > notation) to be 0. Bit 0-31 may contain > > > > > > > > > garbage. > > > > > > > > > > rldicl was incorrect. My mistake, sorry for that. > > > > > > > > > > Correct would be rldic which also clears the least > > > > > > > > > > significant > > > bits. > > > > > > > > > > len should also get fixed e.g. by replacing cmpdi by > > > > > > > > > > extsw_ in > > > > > > > > muladd. > > > > > > > > > > > > > > > > > > The s/rldicl/rldic/ was fixed for "offset", but "len" > > > > > > > > > doesn't seem to need further changes as it's being > > > > > > > > > cleared with clrldi, which is the same as rldic with no > shift. > > > > > > > > > Therefore it's treated appropriately as requested for > > > "offset" > > > > parameter. Do you agree? > > > > > > > > > > > > > > > > > > > 2. Using 8 byte instructions for int The code which > > > > > > > > > > feeds stdu is endianess specific. Doesn't work on all > > > > > > > > > > PPC64 platforms. > > > > > > > > > > > > > > > > > > You are right. The way I'm building the 64 bits of the > > > > > > > > > register depends on which kind of endianness it is run. > > > > > > > > > For now it works only on little endian so I'm adding a > > > > > > > > > switch (just like I did for > > > > > > > > > SHA) to make it available only on little endian systems. > > > > > > > > > > > > > > > > > > > 3.Regarding Andrew's point: Superseded by Montgomery? > > > > > > > > > > The Montgomery change got backported to jdk8u (JDK- > > 8150152 > > > > > > > > > > in > > > > > > > > 8u102). > > > > > > > > > > I'd expect the performance improvement of these > > > > > > > > > > intrinsics to be irrelevant for crypto.rsa. Did you > > > > > > > > > > measure with an older jdk8 > > > > > > > > release? > > > > > > > > > > > > > > > > > > No, I used the jdk8u152-b01 (State of repository at Thu > > > > > > > > > Apr > > > > > > > > > 6 > > > > > > > > > 14:15:31 2017). The reported performance speedup was > > > > > > > > > calculated by running the following test > > > > (TestSquareToLen.java): > > > > > > > > > import java.math.BigInteger; > > > > > > > > > > > > > > > > > > public class TestSquareToLen { > > > > > > > > > > > > > > > > > > public static void main(String args[]) throws > > > > > > > > > Exception { > > > > > > > > > > > > > > > > > > int n = 10000000; > > > > > > > > > if (args.length >=1) { > > > > > > > > > n = Integer.parseInt(args[0]); > > > > > > > > > } > > > > > > > > > > > > > > > > > > BigInteger b1 = new > > > > > > > > > > > > > > > > > > > > > > > BigInteger("3489398092355735908635051498208250392000229831187732 > > 0859 > > > > > > > 99 > > > > > > > > > 36 > > > > > > > > > > > > > > > > > > > > > > > 73955941838010214688430713917560492078731370166315598379312147 > > 54926 > > > > > > > 092 > > > > > > > > > 22 > > > > > > > > > > > > > > > > > > > > > > > 37802921102076092232721848082893366300577359694237268085206410 > > 30118 > > > > > > > 116 > > > > > > > > > 51 > > > > > > > > > > > > > > > > > > > > > > > 64401804883382348239081994789652420763585798455208997799631311 > > 31540 > > > > > > > 166 > > > > > > > > > 68 718795349783157384006672542605760392289645528307"); > > > > > > > > > BigInteger b2 = BigInteger.valueOf(0); > > > > > > > > > BigInteger check = BigInteger.valueOf(1); > > > > > > > > > for (int i = 0; i < n; i++) { > > > > > > > > > b2 = b1.multiply(b1); > > > > > > > > > if (i == 0) > > > > > > > > > // Didn't JIT yet. Comparing against > > > > > > > > > interpreted > > > > mode > > > > > > > > > check = b2; > > > > > > > > > } > > > > > > > > > if (b2.compareTo(check) == 0) > > > > > > > > > System.out.println("Check ok!"); > > > > > > > > > else > > > > > > > > > System.out.println("Check failed!"); > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > I got these results on JDK8 on my POWER8 machine: > > > > > > > > > $ ./javac TestSquareToLen.java $ sudo perf stat -r 5 > > > > > > > > > ./java -XX:-UseMulAddIntrinsic -XX:- > > > > > > > > > UseSquareToLenIntrinsic TestSquareToLen Check ok! > > > > > > > > > Check ok! > > > > > > > > > Check ok! > > > > > > > > > Check ok! > > > > > > > > > Check ok! > > > > > > > > > > > > > > > > > > Performance counter stats for './java > > > > > > > > > -XX:-UseMulAddIntrinsic > > > > > > > > > -XX:- UseSquareToLenIntrinsic TestSquareToLen' (5 runs): > > > > > > > > > > > > > > > > > > 15148.009557 task-clock (msec) # > 1.053 > > > > CPUs > > > > > > > > > utilized ( +- 0.48% ) > > > > > > > > > 2,425 context-switches # > 0.160 > > > > K/sec > > > > > > > > > ( +- 5.84% ) > > > > > > > > > 356 cpu-migrations # > 0.023 > > > > K/sec > > > > > > > > > ( +- 3.01% ) > > > > > > > > > 5,153 page-faults # > 0.340 > > > > K/sec > > > > > > > > > ( +- 5.22% ) > > > > > > > > > 54,536,889,909 cycles # > 3.600 > > > > GHz > > > > > > > > > ( +- 0.56% ) (66.68%) > > > > > > > > > 239,554,105 stalled-cycles-frontend # > 0.44% > > > > > > frontend > > > > > > > > > cycles idle ( +- 4.87% ) (49.90%) > > > > > > > > > 27,683,316,001 stalled-cycles-backend # > 50.76% > > > > > > backend > > > > > > > > > cycles idle ( +- 0.56% ) (50.17%) > > > > > > > > > 102,020,229,733 instructions # > 1.87 > > > > insn > > > > > > per > > > > > > > > > cycle > > > > > > > > > # > 0.27 > > > > > > stalled > > > > > > > > > cycles per insn ( +- 0.14% ) (66.94%) > > > > > > > > > 7,706,072,218 branches # > 508.718 > > > > M/sec > > > > > > > > > ( +- 0.23% ) (50.20%) > > > > > > > > > 456,051,162 branch-misses # > 5.92% > > > > of > > > > > > all > > > > > > > > > branches ( +- 0.09% ) (50.07%) > > > > > > > > > > > > > > > > > > 14.390840733 seconds time elapsed ( +- 0.09% ) > > > > > > > > > > > > > > > > > > $ sudo perf stat -r 5 ./java -XX:+UseMulAddIntrinsic - > > > > > > > > > XX:+UseSquareToLenIntrinsic TestSquareToLen Check ok! > > > > > > > > > Check ok! > > > > > > > > > Check ok! > > > > > > > > > Check ok! > > > > > > > > > Check ok! > > > > > > > > > > > > > > > > > > Performance counter stats for './java > > > > > > > > > -XX:+UseMulAddIntrinsic > > > > > > > > > - XX:+UseSquareToLenIntrinsic TestSquareToLen' (5 runs): > > > > > > > > > > > > > > > > > > 11368.141410 task-clock (msec) # > 1.045 > > > > CPUs > > > > > > > > > utilized ( +- 0.64% ) > > > > > > > > > 1,964 context-switches # > 0.173 > > > > K/sec > > > > > > > > > ( +- 8.93% ) > > > > > > > > > 338 cpu-migrations # > 0.030 > > > > K/sec > > > > > > > > > ( +- 7.65% ) > > > > > > > > > 5,627 page-faults # > 0.495 > > > > K/sec > > > > > > > > > ( +- 6.15% ) > > > > > > > > > 41,100,168,967 cycles # > 3.615 > > > > GHz > > > > > > > > > ( +- 0.50% ) (66.36%) > > > > > > > > > 309,052,316 stalled-cycles-frontend # > 0.75% > > > > > > frontend > > > > > > > > > cycles idle ( +- 2.84% ) (49.89%) > > > > > > > > > 14,188,581,685 stalled-cycles-backend # > 34.52% > > > > > > backend > > > > > > > > > cycles idle ( +- 0.99% ) (50.34%) > > > > > > > > > 77,846,029,829 instructions # > 1.89 > > > > insn > > > > > > per > > > > > > > > > cycle > > > > > > > > > # > 0.18 > > > > > > stalled > > > > > > > > > cycles per insn ( +- 0.29% ) (66.96%) > > > > > > > > > 8,435,216,989 branches # > 742.005 > > > > M/sec > > > > > > > > > ( +- 0.28% ) (50.17%) > > > > > > > > > 339,903,936 branch-misses # > 4.03% > > > > of > > > > > > all > > > > > > > > > branches ( +- 0.27% ) (49.90%) > > > > > > > > > > > > > > > > > > 10.882357546 seconds time elapsed ( +- 0.24% ) > > > > > > > > > > > > > > > > > > > > > > > > > > > (out of curiosity, these numbers are 15.19s (+- 0.32%) > > > > > > > > > and 13.42s > > > > > > > > > (+- > > > > > > > > > 0.53%) on JDK10) > > > > > > > > > > > > > > > > > > I may run for SpecJVM2008's crypto.rsa if you are > > > interested. > > > > > > > > > > > > > > > > > > Thank you once again for reviewing this. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Gustavo > > > > > > > > > > > > > > > > > > > (I think the change is still acceptable as the > > > > > > > > > > intrinsics could be used elsewhere and the > > > > > > > > > > implementation also exists on other > > > > > > > > > > platforms.) > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: Gustavo Serra Scalet > > > > > > > > > > [mailto:gustavo.scalet at eldorado.org.br] > > > > > > > > > > Sent: Mittwoch, 16. August 2017 18:50 > > > > > > > > > > To: Doerr, Martin ; > > > > > > > > > > 'hotspot-compiler- dev at openjdk.java.net' > > > > > > > > > > > > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement > > > > > > > > > > MulAdd and SquareToLen intrinsics > > > > > > > > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > > > Thanks for dedicated review. It took me a while to be > > > > > > > > > > able to work on this but I hope to have your points > solved. > > > > > > > > > > Please check below the review as well as my comments > > > > > > > > > > quoting > > > > your email: > > > > > > > > > > https://gut.github.io/openjdk/webrev/JDK- > > 8185976/webrev.01 > > > > > > > > > > / > > > > > > > > > > > > > > > > > > > > > -----Original Message----- First of all, C2 does not > > > > > > > > > > > perform sign extend when calling > > > > > > stubs. > > > > > > > > > > > The int parms need to get zero/sign extended. (Could > > > > > > > > > > > even be done without extra instructions by replacing > > > > > > > > > > > sldi -> rldicl, cmpdi -> extsw_ in some > > > > > > > > > > > cases.) > > > > > > > > > > > > > > > > > > > > Does it make a difference on my case? > > > > > > > > > > > > > > > > > > > > I guess you are talking about mulAdd preparation code. > > > > > > > > > > The only aspect I found about him is to force the cast > > > > > > > > > > from 32 bits -> 64 bits by cleaning higher bits. > > > > > > > > > > Offset is a signed integer but it can't be > > > > > > > > > negative anyway. > > > > > > > > > > > > > > > > > > > > So I changed from: > > > > > > > > > > sldi (R5_ARG3, R5_ARG3, 2); > > > > > > > > > > > > > > > > > > > > to: > > > > > > > > > > rldicl (R5_ARG3, R5_ARG3, 2, 32); // always positive > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > macroAssembler_ppc.cpp: > > > > > > > > > > > - Indentation should be 2 spaces. > > > > > > > > > > > > > > > > > > > > Done > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > stubGenerator_ppc:cpp: > > > > > > > > > > > - or_, addi_ should get replaced by orr, addi when > > > > > > > > > > > CR0 result is not needed. > > > > > > > > > > > > > > > > > > > > Done > > > > > > > > > > > > > > > > > > > > > - Where is lplw initialized? > > > > > > > > > > > > > > > > > > > > It should be initialized with 0, I missed that... > > > > > > > > > > > > > > > > > > > > > - I believe that the updating load/store > > > > > > > > > > > instructions > > > e.g. > > > > > > > > > > > lwzu don't perform well on some processors. At least > > > > > > > > > > > using stwu 2 times in the loop doesn't make sense. > > > > > > > > > > > > > > > > > > > > You are right. I could manipulate the bits differently > > > > > > > > > > and ended up with a single stdu in the loop. Neat! > > > > > > > > > > Although I could not reduce the total number of > instructions. > > > > > > > > > > > > > > > > > > > > > - Note: It should be possible to use 8 byte instead > > > > > > > > > > > of 4 byte > > > > > > > > > > > instructions: MacroAssembler::multiply64, addc, > adde. > > > > > > > > > > > But I'm not requesting to change that because I > > > > > > > > > > > guess it would make the code very complicated, > > > > > > > > > > > especially when supporting both endianess > > > > > > > > > versions. > > > > > > > > > > > > > > > > > > > > Yes, that would require a new analysis on this code. > > > > > > > > > > May we consider it next? As you said, I prefer having > > > > > > > > > > an initial version that looks as simple as the > > > > > > > > > > original java > > > code. > > > > > > > > > > > > > > > > > > > > > - The squareToLen stub implementation is very close > > > > > > > > > > > the Java implementation. So it'd be interesting to > > > > > > > > > > > understand what C2 doesn't do as well as the hand > > > > > > > > > > > written assembly code. Do you know that? (Not > > > > > > > > > > > absolutely necessary for accepting this change as > > > > > > > > > > > long as the stub is measurably > > > > > > > > > > > faster.) > > > > > > > > > > > > > > > > > > > > I don't know either. Basically I chose doing it > > > > > > > > > > because I noticed some performance gain on SpecJVM2008 > > > > > > > > > > when analyzing > > > > > X64. > > > > > > > > > > Then, taking a closer look, I didn't notice any AVX or > > > > > > > > > > some special instructions on > > > > > > > > > > X64 so I decided to try it on ppc64 by using some > > > > > > > > > > basic > > > > > > assembly. > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > From: hotspot-compiler-dev > > > > > > > > > > > [mailto:hotspot-compiler-dev- > > > > > > > > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra > > > > > > > > > > > Scalet > > > > > > > > > > > Sent: Donnerstag, 10. August 2017 19:22 > > > > > > > > > > > To: 'hotspot-compiler-dev at openjdk.java.net' > > > > > > > > > > > > > > > > > > > > > > Subject: FW: [10] RFR(M): 8185976: PPC64: Implement > > > > > > > > > > > MulAdd > > > > > and > > > > > > > > > > > SquareToLen intrinsics > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > > > > > > > > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra > > > > > > > > > > > Scalet > > > > > > > > > > > Sent: ter?a-feira, 8 de agosto de 2017 17:19 > > > > > > > > > > > To: ppc-aix-port-dev at openjdk.java.net > > > > > > > > > > > Subject: [10] RFR(M): 8185976: PPC64: Implement > > > > > > > > > > > MulAdd and SquareToLen intrinsics > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > Could you please review this specific PPC64 change > > > > > > > > > > > to > > > > hotspot? > > > > > > > > > > > By implementing these intrinsics I noticed a small > > > > > > > > > > > improvement with microbenchmarks analysis. On > > > > > > > > > > > SpecJVM2008's crypto.rsa benchmark, only when > > > > > > > > > > > backporting to JDK8 an improvement was > > > > > > noticed. > > > > > > > > > > > > > > > > > > > > > > JBS: > > > > > > > > > > > https://bugs.openjdk.java.net/browse/JDK-8185976 > > > > > > > > > > > Webrev: https://gut.github.io/openjdk/webrev/JDK- > > > > > > > 8185976/webrev/ > > > > > > > > > > > > > > > > > > > > > > Motivation for this implementation: > > > > > > > > > > > https://twitter.com/ijuma/status/698309312498835457 > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > Gustavo Serra Scalet From rwestrel at redhat.com Mon Oct 2 11:46:37 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 02 Oct 2017 13:46:37 +0200 Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler dying subgraph with single if proj Message-ID: http://cr.openjdk.java.net/~roland/8188223/webrev.00/ I saw the following crash (that I cannot reproduce anymore having deleted the replay file by mistake). With subgraph shape: UNC->Region->IfProj->RangeCheck The region has the IfProj as single input. The following code in RegionNode::Ideal(): if (can_reshape && cnt == 1) { // Is it dead loop? // If it is LoopNopde it had 2 (+1 itself) inputs and // one of them was cut. The loop is dead if it was EntryContol. // Loop node may have only one input because entry path // is removed in PhaseIdealLoop::Dominators(). assert(!this->is_Loop() || cnt_orig <= 3, "Loop node should have 3 or less inputs"); if ((this->is_Loop() && (del_it == LoopNode::EntryControl || (del_it == 0 && is_unreachable_region(phase)))) || (!this->is_Loop() && has_phis && is_unreachable_region(phase))) { finds that the subgraph is unreachable which causes the IfProj to be removed. RangeCheckNode::Ideal() is later called on a dominated range check which walks the graph, hit the RangeCheck that has a single projection and causes a crash. I think it makes sense to make IfNode::range_check_trap_proj() handle the case of a RangeCheckNode with a single input. Roland. From dmitrij.pochepko at bell-sw.com Mon Oct 2 13:47:44 2017 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 2 Oct 2017 16:47:44 +0300 Subject: [10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long) In-Reply-To: <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com> References: <5cfb314b-9785-fe47-8797-a899f38643ef@redhat.com> <32a35ee7-9593-bbc4-a540-37c590ec3ba6@redhat.com> <5fa7e391-bde0-9310-7bcd-98cf96b3f158@bell-sw.com> <3d76ff3c-17d7-a37c-2959-8f5e6dacd854@oracle.com> <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com> Message-ID: <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com> Hi, please find rebased webrev here: http://cr.openjdk.java.net/~dpochepk/8187684/webrev.newws.01/ Thanks, Dmitij On 29.09.2017 02:40, Vladimir Kozlov wrote: > Dmitry, > > Please, update changes for new consolidated sources and send new > patch/webrev. > > Thanks, > Vladimir > > On 9/25/17 9:42 AM, Vladimir Kozlov wrote: >> Yes, when repo will be opened. >> >> Please, send patch and add latest webrev link to the RFE. >> >> Thanks, >> Vladimir >> >> On 9/25/17 5:04 AM, Dmitrij Pochepko wrote: >>> >>> On 25.09.2017 14:04, Andrew Haley wrote: >>>> On 20/09/17 14:29, Andrew Haley wrote: >>>>> On 20/09/17 14:08, Dmitrij Pochepko wrote: >>>>>> please review small patch for enhancement: 8187684 - Intrinsify >>>>>> Math.multiplyHigh(long, long) >>>>> OK, thanks. >>>> Dmitrij, do you have a sponsor for this?? I'm sure Vladimir would >>>> be happy to help.? :-) >>>> >>> Hi, >>> >>> Vladimir, can you sponsor it? >>> >>> Thanks, >>> Dmitrij From jcbeyler at google.com Tue Oct 3 03:52:30 2017 From: jcbeyler at google.com (JC Beyler) Date: Mon, 2 Oct 2017 20:52:30 -0700 Subject: Low-Overhead Heap Profiling In-Reply-To: References: <2af975e6-3827-bd57-0c3d-fadd54867a67@oracle.com> <365499b6-3f4d-a4df-9e7e-e72a739fb26b@oracle.com> <102c59b8-25b6-8c21-8eef-1de7d0bbf629@oracle.com> <1497366226.2829.109.camel@oracle.com> <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> Message-ID: Dear all, Small update to the webrev: http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ Full webrev is here: http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ I updated a bit of the naming, removed a TODO comment, and I added a test for testing the sampling rate. I also updated the maximum stack depth to 1024, there is no reason to keep it so small. I did a micro benchmark that tests the overhead and it seems relatively the same. I compared allocations from a stack depth of 10 and allocations from a stack depth of 1024 (allocations are from the same helper method in http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java ): - For an array of 1 integer allocated in a loop; stack depth 1024 vs stack depth 10: 1% slower - For an array of 200k integers allocated in a loop; stack depth 1024 vs stack depth 10: 3% slower So basically now moving the maximum stack depth to 1024 but we only copy over the stack depths actually used. For the next webrev, I will be adding a stack depth test to show that it works and probably put back the mutex locking so that we can see how difficult it is to keep thread safe. Let me know what you think! Jc On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler wrote: > Forgot to say that for my numbers: > - Not in the test are the actual numbers I got for the various array > sizes, I ran the program 30 times and parsed the output; here are the > averages and standard deviation: > 1000: 1.28% average; 1.13% standard deviation > 10000: 1.59% average; 1.25% standard deviation > 100000: 1.26% average; 1.26% standard deviation > > The 1000/10000/100000 are the sizes of the arrays being allocated. These > are allocated 100k times and the sampling rate is 111 times the size of the > array. > > Thanks! > Jc > > > On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler wrote: > >> Hi all, >> >> After a bit of a break, I am back working on this :). As before, here are >> two webrevs: >> >> - Full change set: http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/ >> - Compared to version 8: http://cr.openjdk.java.net/ >> ~rasbold/8171119/webrev.08_09/ >> (This version is compared to version 8 I last showed but ported to >> the new folder hierarchy) >> >> In this version I have: >> - Handled Thomas' comments from his email of 07/03: >> - Merged the logging to be standard >> - Fixed up the code a bit where asked >> - Added some notes about the code not being thread-safe yet >> - Removed additional dead code from the version that modifies >> interpreter/c1/c2 >> - Fixed compiler issues so that it compiles with >> --disable-precompiled-header >> - Tested with ./configure --with-boot-jdk= >> --with-debug-level=slowdebug --disable-precompiled-headers >> >> Additionally, I added a test to check the sanity of the sampler: >> HeapMonitorStatCorrectnessTest (http://cr.openjdk.java.net/~r >> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit >> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch) >> - This allocates a number of arrays and checks that we obtain the >> number of samples we want with an accepted error of 5%. I tested it 100 >> times and it passed everytime, I can test more if wanted >> - Not in the test are the actual numbers I got for the various array >> sizes, I ran the program 30 times and parsed the output; here are the >> averages and standard deviation: >> 1000: 1.28% average; 1.13% standard deviation >> 10000: 1.59% average; 1.25% standard deviation >> 100000: 1.26% average; 1.26% standard deviation >> >> What this means is that we were always at about 1~2% of the number of >> samples the test expected. >> >> Let me know what you think, >> Jc >> >> >> >> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler wrote: >> >>> Hi all, >>> >>> I apologize, I have not yet handled your remarks but thought this new >>> webrev would also be useful to see and comment on perhaps. >>> >>> Here is the latest webrev, it is generated slightly different than the >>> others since now I'm using webrev.ksh without the -N option: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ >>> >>> And the webrev.07 to webrev.08 diff is here: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ >>> >>> (Let me know if it works well) >>> >>> It's a small change between versions but it: >>> - provides a fix that makes the average sample rate correct (more on >>> that below). >>> - fixes the code to actually have it play nicely with the fast tlab >>> refill >>> - cleaned up a bit the JVMTI text and now use jvmtiFrameInfo >>> - moved the capability to be onload solo >>> >>> With this webrev, I've done a small study of the random number generator >>> we use here for the sampling rate. I took a small program and it can be >>> simplified to: >>> >>> for (outer loop) >>> for (inner loop) >>> int[] tmp = new int[arraySize]; >>> >>> - I've fixed the outer and inner loops to being 800 for this experiment, >>> meaning we allocate 640000 times an array of a given array size. >>> >>> - Each program provides the average sample size used for the whole >>> execution >>> >>> - Then, I ran each variation 30 times and then calculated the average of >>> the average sample size used for various array sizes. I selected the array >>> size to be one of the following: 1, 10, 100, 1000. >>> >>> - When compared to 512kb, the average sample size of 30 runs: >>> 1: 4.62% of error >>> 10: 3.09% of error >>> 100: 0.36% of error >>> 1000: 0.1% of error >>> 10000: 0.03% of error >>> >>> What it shows is that, depending on the number of samples, the average >>> does become better. This is because with an allocation of 1 element per >>> array, it will take longer to hit one of the thresholds. This is seen by >>> looking at the sample count statistic I put in. For the same number of >>> iterations (800 * 800), the different array sizes provoke: >>> 1: 62 samples >>> 10: 125 samples >>> 100: 788 samples >>> 1000: 6166 samples >>> 10000: 57721 samples >>> >>> And of course, the more samples you have, the more sample rates you >>> pick, which means that your average gets closer using that math. >>> >>> Thanks, >>> Jc >>> >>> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler wrote: >>> >>>> Thanks Robbin, >>>> >>>> This seems to have worked. When I have the next webrev ready, we will >>>> find out but I'm fairly confident it will work! >>>> >>>> Thanks agian! >>>> Jc >>>> >>>> On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn >>>> wrote: >>>> >>>>> Hi JC, >>>>> >>>>> On 06/29/2017 12:15 AM, JC Beyler wrote: >>>>> >>>>>> B) Incremental changes >>>>>> >>>>> >>>>> I guess the most common work flow here is using mq : >>>>> hg qnew fix_v1 >>>>> edit files >>>>> hg qrefresh >>>>> hg qnew fix_v2 >>>>> edit files >>>>> hg qrefresh >>>>> >>>>> if you do hg log you will see 2 commits >>>>> >>>>> webrev.ksh -r -2 -o my_inc_v1_v2 >>>>> webrev.ksh -o my_full_v2 >>>>> >>>>> >>>>> In your .hgrc you might need: >>>>> [extensions] >>>>> mq = >>>>> >>>>> /Robbin >>>>> >>>>> >>>>>> Again another newbiew question here... >>>>>> >>>>>> For showing the incremental changes, is there a link that explains >>>>>> how to do that? I apologize for my newbie questions all the time :) >>>>>> >>>>>> Right now, I do: >>>>>> >>>>>> ksh ../webrev.ksh -m -N >>>>>> >>>>>> That generates a webrev.zip and send it to Chuck Rasbold. He then >>>>>> uploads it to a new webrev. >>>>>> >>>>>> I tried commiting my change and adding a small change. Then if I just >>>>>> do ksh ../webrev.ksh without any options, it seems to produce a similar >>>>>> page but now with only the changes I had (so the 06-07 comparison you were >>>>>> talking about) and a changeset that has it all. I imagine that is what you >>>>>> meant. >>>>>> >>>>>> Which means that my workflow would become: >>>>>> >>>>>> 1) Make changes >>>>>> 2) Make a webrev without any options to show just the differences >>>>>> with the tip >>>>>> 3) Amend my changes to my local commit so that I have it done with >>>>>> 4) Go to 1 >>>>>> >>>>>> Does that seem correct to you? >>>>>> >>>>>> Note that when I do this, I only see the full change of a file in the >>>>>> full change set (Side note here: now the page says change set and not >>>>>> patch, which is maybe why Serguei was having issues?). >>>>>> >>>>>> Thanks! >>>>>> Jc >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn >>>>> > wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 06/28/2017 12:04 AM, JC Beyler wrote: >>>>>> >>>>>> Dear Thomas et al, >>>>>> >>>>>> Here is the newest webrev: >>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ < >>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/> >>>>>> >>>>>> >>>>>> >>>>>> You have some more bits to in there but generally this looks good >>>>>> and really nice with more tests. >>>>>> I'll do and deep dive and re-test this when I get back from my >>>>>> long vacation with whatever patch version you have then. >>>>>> >>>>>> Also I think it's time you provide incremental (v06->07 changes) >>>>>> as well as complete change-sets. >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thomas, I "think" I have answered all your remarks. The >>>>>> summary is: >>>>>> >>>>>> - The statistic system is up and provides insight on what the >>>>>> heap sampler is doing >>>>>> - I've noticed that, though the sampling rate is at the >>>>>> right mean, we are missing some samples, I have not yet tracked out why >>>>>> (details below) >>>>>> >>>>>> - I've run a tiny benchmark that is the worse case: it is a >>>>>> very tight loop and allocated a small array >>>>>> - In this case, I see no overhead when the system is off >>>>>> so that is a good start :) >>>>>> - I see right now a high overhead in this case when >>>>>> sampling is on. This is not a really too surprising but I'm going to see if >>>>>> this is consistent with our >>>>>> internal implementation. The benchmark is really allocation >>>>>> stressful so I'm not too surprised but I want to do the due diligence. >>>>>> >>>>>> - The statistic system up is up and I have a new test >>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/s >>>>>> erviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTes >>>>>> t.java.patch >>>>>> >>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTe >>>>>> st.java.patch> >>>>>> - I did a bit of a study about the random generator >>>>>> here, more details are below but basically it seems to work well >>>>>> >>>>>> - I added a capability but since this is the first time >>>>>> doing this, I was not sure I did it right >>>>>> - I did add a test though for it and the test seems to >>>>>> do what I expect (all methods are failing with the >>>>>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error). >>>>>> - http://cr.openjdk.java.net/~ra >>>>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito >>>>>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch >>>>>> >>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >>>>>> bilityTest.java.patch> >>>>>> >>>>>> - I still need to figure out what to do about the >>>>>> multi-agent vs single-agent issue >>>>>> >>>>>> - As far as measurements, it seems I still need to look at: >>>>>> - Why we do the 20 random calls first, are they >>>>>> necessary? >>>>>> - Look at the mean of the sampling rate that the random >>>>>> generator does and also what is actually sampled >>>>>> - What is the overhead in terms of memory/performance >>>>>> when on? >>>>>> >>>>>> I have inlined my answers, I think I got them all in the new >>>>>> webrev, let me know your thoughts. >>>>>> >>>>>> Thanks again! >>>>>> Jc >>>>>> >>>>>> >>>>>> On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl < >>>>>> thomas.schatzl at oracle.com >>>>> thomas.schatzl at oracle.com >>>>>> >>>>>> >> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote: >>>>>> > Hi all, >>>>>> > >>>>>> > First off: Thanks again to Robbin and Thomas for their >>>>>> reviews :) >>>>>> > >>>>>> > Next, I've uploaded a new webrev: >>>>>> > http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ >>>>>> >>>>>> >>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>> >>>>>> >>>>>> > >>>>>> > Here is an update: >>>>>> > >>>>>> > - @Robbin, I forgot to say that yes I need to look at >>>>>> implementing >>>>>> > this for the other architectures and testing it before >>>>>> it is all >>>>>> > ready to go. Is it common to have it working on all >>>>>> possible >>>>>> > combinations or is there a subset that I should be >>>>>> doing first and we >>>>>> > can do the others later? >>>>>> > - I've tested slowdebug, built and ran the JTreg tests >>>>>> I wrote with >>>>>> > slowdebug and fixed a few more issues >>>>>> > - I've refactored a bit of the code following Thomas' >>>>>> comments >>>>>> > - I think I've handled all the comments from Thomas >>>>>> (I put >>>>>> > comments inline below for the specifics) >>>>>> >>>>>> Thanks for handling all those. >>>>>> >>>>>> > - Following Thomas' comments on statistics, I want to >>>>>> add some >>>>>> > quality assurance tests and find that the easiest way >>>>>> would be to >>>>>> > have a few counters of what is happening in the >>>>>> sampler and expose >>>>>> > that to the user. >>>>>> > - I'll be adding that in the next version if no one >>>>>> sees any >>>>>> > objections to that. >>>>>> > - This will allow me to add a sanity test in JTreg >>>>>> about number of >>>>>> > samples and average of sampling rate >>>>>> > >>>>>> > @Thomas: I had a few questions that I inlined below >>>>>> but I will >>>>>> > summarize the "bigger ones" here: >>>>>> > - You mentioned constants are not using the right >>>>>> conventions, I >>>>>> > looked around and didn't see any convention except >>>>>> normal naming then >>>>>> > for static constants. Is that right? >>>>>> >>>>>> I looked through https://wiki.openjdk.java.net/ >>>>>> display/HotSpot/StyleGui >>>>> /display/HotSpot/StyleGui> >>>>>> >>>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>> >>>>>> de and the rule is to "follow an existing pattern and >>>>>> must have a >>>>>> distinct appearance from other names". Which does not >>>>>> help a lot I >>>>>> guess :/ The GC team started using upper camel case, e.g. >>>>>> SomeOtherConstant, but very likely this is probably not >>>>>> applied >>>>>> consistently throughout. So I am fine with not adding >>>>>> another style >>>>>> (like kMaxStackDepth with the "k" in front with some >>>>>> unknown meaning) >>>>>> is fine. >>>>>> >>>>>> (Chances are you will find that style somewhere used >>>>>> anyway too, >>>>>> apologies if so :/) >>>>>> >>>>>> >>>>>> Thanks for that link, now I know where to look. I used the >>>>>> upper camel case in my code as well then :) I should have gotten them all. >>>>>> >>>>>> >>>>>> > PS: I've also inlined my answers to Thomas below: >>>>>> > >>>>>> > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl >>>>>> >>>>> > e.com > wrote: >>>>>> > > Hi all, >>>>>> > > >>>>>> > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote: >>>>>> > > > Dear all, >>>>>> > > > >>>>>> > > > I've continued working on this and have done the >>>>>> following >>>>>> > > webrev: >>>>>> > > > http://cr.openjdk.java.net/~ra >>>>>> sbold/8171119/webrev.05/ >>>>> asbold/8171119/webrev.05/> >>>>>> >>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>> >>>>>> >>>>>> > > >>>>>> > > [...] >>>>>> > > > Things I still need to do: >>>>>> > > > - Have to fix that TLAB case for the >>>>>> FastTLABRefill >>>>>> > > > - Have to start looking at the data to see >>>>>> that it is >>>>>> > > consistent and does gather the right samples, right >>>>>> frequency, etc. >>>>>> > > > - Have to check the GC elements and what that >>>>>> produces >>>>>> > > > - Run a slowdebug run and ensure I fixed all >>>>>> those issues you >>>>>> > > saw > Robbin >>>>>> > > > >>>>>> > > > Thanks for looking at the webrev and have a great >>>>>> week! >>>>>> > > >>>>>> > > scratching a bit on the surface of this change, >>>>>> so apologies for >>>>>> > > rather shallow comments: >>>>>> > > >>>>>> > > - macroAssembler_x86.cpp:5604: while this is >>>>>> compiler code, and I >>>>>> > > am not sure this is final, please avoid littering >>>>>> the code with >>>>>> > > TODO remarks :) They tend to be candidates for >>>>>> later wtf moments >>>>>> > > only. >>>>>> > > >>>>>> > > Just file a CR for that. >>>>>> > > >>>>>> > Newcomer question: what is a CR and not sure I have >>>>>> the rights to do >>>>>> > that yet ? :) >>>>>> >>>>>> Apologies. CR is a change request, this suggests to file >>>>>> a bug in the >>>>>> bug tracker. And you are right, you can't just create a >>>>>> new account in >>>>>> the OpenJDK JIRA yourselves. :( >>>>>> >>>>>> >>>>>> Ok good to know, I'll continue with my own todo list but I'll >>>>>> work hard on not letting it slip in the webrevs anymore :) >>>>>> >>>>>> >>>>>> I was mostly referring to the "... but it is a TODO" >>>>>> part of that >>>>>> comment in macroassembler_x86.cpp. Comments about the >>>>>> why of the code >>>>>> are appreciated. >>>>>> >>>>>> [Note that I now understand that this is to some degree >>>>>> still work in >>>>>> progress. As long as the final changeset does no contain >>>>>> TODO's I am >>>>>> fine (and it's not a hard objection, rather their use in >>>>>> "final" code >>>>>> is typically limited in my experience)] >>>>>> >>>>>> 5603 // Currently, if this happens, just set back the >>>>>> actual end to >>>>>> where it was. >>>>>> 5604 // We miss a chance to sample here. >>>>>> >>>>>> Would be okay, if explaining "this" and the "why" of >>>>>> missing a chance >>>>>> to sample here would be best. >>>>>> >>>>>> Like maybe: >>>>>> >>>>>> // If we needed to refill TLABs, just set the actual end >>>>>> point to >>>>>> // the end of the TLAB again. We do not sample here >>>>>> although we could. >>>>>> >>>>>> Done with your comment, it works well in my mind. >>>>>> >>>>>> I am not sure whether "miss a chance to sample" meant >>>>>> "we could, but >>>>>> consciously don't because it's not that useful" or "it >>>>>> would be >>>>>> necessary but don't because it's too complicated to do.". >>>>>> >>>>>> Looking at the original comment once more, I am also not >>>>>> sure if that >>>>>> comment shouldn't referring to the "end" variable (not >>>>>> actual_end) >>>>>> because that's the variable that is responsible for >>>>>> taking the sampling >>>>>> path? (Going from the member description of >>>>>> ThreadLocalAllocBuffer). >>>>>> >>>>>> >>>>>> I've moved this code and it no longer shows up here but the >>>>>> rationale and answer was: >>>>>> >>>>>> So.. Yes, end is the variable provoking the sampling. Actual >>>>>> end is the actual end of the TLAB. >>>>>> >>>>>> What was happening here is that the code is resetting _end to >>>>>> point towards the end of the new TLAB. Because, we now have the end for >>>>>> sampling and _actual_end for >>>>>> the actual end, we need to update the actual_end as well. >>>>>> >>>>>> Normally, were we to do the real work here, we would >>>>>> calculate the (end - start) offset, then do: >>>>>> >>>>>> - Set the new end to : start + (old_end - old_start) >>>>>> - Set the actual end like we do here now where it because it >>>>>> is the actual end. >>>>>> >>>>>> Why is this not done here now anymore? >>>>>> - I was still debating which path to take: >>>>>> - Do it in the fast refill code, it has its perks: >>>>>> - In a world where fast refills are happening all >>>>>> the time or a lot, we can augment there the code to do the sampling >>>>>> - Remember what we had as an end before leaving the >>>>>> slowpath and check on return >>>>>> - This is what I'm doing now, it removes the need >>>>>> to go fix up all fast refill paths but if you remain in fast refill paths, >>>>>> you won't get sampling. I >>>>>> have to think of the consequences of that, maybe a future >>>>>> change later on? >>>>>> - I have the statistics now so I'm going to >>>>>> study that >>>>>> -> By the way, though my statistics are >>>>>> showing I'm missing some samples, if I turn off FastTlabRefill, it is the >>>>>> same loss so for now, it seems >>>>>> this does not occur in my simple test. >>>>>> >>>>>> >>>>>> >>>>>> But maybe I am only confused and it's best to just leave >>>>>> the comment >>>>>> away. :) >>>>>> >>>>>> Thinking about it some more, doesn't this not-sampling >>>>>> in this case >>>>>> mean that sampling does not work in any collector that >>>>>> does inline TLAB >>>>>> allocation at the moment? (Or is inline TLAB alloc >>>>>> automatically >>>>>> disabled with sampling somehow?) >>>>>> >>>>>> That would indeed be a bigger TODO then :) >>>>>> >>>>>> >>>>>> Agreed, this remark made me think that perhaps as a first >>>>>> step the new way of doing it is better but I did have to: >>>>>> - Remove the const of the ThreadLocalBuffer remaining and >>>>>> hard_end methods >>>>>> - Move hard_end out of the header file to have a bit more >>>>>> logic there >>>>>> >>>>>> Please let me know what you think of that and if you prefer >>>>>> it this way or changing the fast refills. (I prefer this way now because it >>>>>> is more incremental). >>>>>> >>>>>> >>>>>> > > - calling HeapMonitoring::do_weak_oops() (which >>>>>> should probably be >>>>>> > > called weak_oops_do() like other similar methods) >>>>>> only if string >>>>>> > > deduplication is enabled (in >>>>>> g1CollectedHeap.cpp:4511) seems wrong. >>>>>> > >>>>>> > The call should be at least around 6 lines up outside >>>>>> the if. >>>>>> > >>>>>> > Preferentially in a method like >>>>>> process_weak_jni_handles(), including >>>>>> > additional logging. (No new (G1) gc phase without >>>>>> minimal logging >>>>>> > :)). >>>>>> > Done but really not sure because: >>>>>> > >>>>>> > I put for logging: >>>>>> > log_develop_trace(gc, freelist)("G1ConcRegionFreeing >>>>>> [other] : heap >>>>>> > monitoring"); >>>>>> >>>>>> I would think that "gc, ref" would be more appropriate >>>>>> log tags for >>>>>> this similar to jni handles. >>>>>> (I am als not sure what weak reference handling has to >>>>>> do with >>>>>> G1ConcRegionFreeing, so I am a bit puzzled) >>>>>> >>>>>> >>>>>> I was not sure what to put for the tags or really as the >>>>>> message. I cleaned it up a bit now to: >>>>>> log_develop_trace(gc, ref)("HeapSampling [other] : heap >>>>>> monitoring processing"); >>>>>> >>>>>> >>>>>> >>>>>> > Since weak_jni_handles didn't have logging for me to >>>>>> be inspired >>>>>> > from, I did that but unconvinced this is what should >>>>>> be done. >>>>>> >>>>>> The JNI handle processing does have logging, but only in >>>>>> ReferenceProcessor::process_discovered_references(). In >>>>>> process_weak_jni_handles() only overall time is measured >>>>>> (in a G1 >>>>>> specific way, since only G1 supports disabling reference >>>>>> procesing) :/ >>>>>> >>>>>> The code in ReferenceProcessor prints both time taken >>>>>> referenceProcessor.cpp:254, as well as the count, but >>>>>> strangely only in >>>>>> debug VMs. >>>>>> >>>>>> I have no idea why this logging is that unimportant to >>>>>> only print that >>>>>> in a debug VM. However there are reviews out for >>>>>> changing this area a >>>>>> bit, so it might be useful to wait for that >>>>>> (JDK-8173335). >>>>>> >>>>>> >>>>>> I cleaned it up a bit anyway and now it returns the count of >>>>>> objects that are in the system. >>>>>> >>>>>> >>>>>> > > - the change doubles the size of >>>>>> > > CollectedHeap::allocate_from_tlab_slow() above the >>>>>> "small and nice" >>>>>> > > threshold. Maybe it could be refactored a bit. >>>>>> > Done I think, it looks better to me :). >>>>>> >>>>>> In ThreadLocalAllocBuffer::handle_sample() I think the >>>>>> set_back_actual_end()/pick_next_sample() calls could be >>>>>> hoisted out of >>>>>> the "if" :) >>>>>> >>>>>> >>>>>> Done! >>>>>> >>>>>> >>>>>> > > - referenceProcessor.cpp:261: the change should add >>>>>> logging about >>>>>> > > the number of references encountered, maybe after >>>>>> the corresponding >>>>>> > > "JNI weak reference count" log message. >>>>>> > Just to double check, are you saying that you'd like >>>>>> to have the heap >>>>>> > sampler to keep in store how many sampled objects were >>>>>> encountered in >>>>>> > the HeapMonitoring::weak_oops_do? >>>>>> > - Would a return of the method with the number of >>>>>> handled >>>>>> > references and logging that work? >>>>>> >>>>>> Yes, it's fine if HeapMonitoring::weak_oops_do() only >>>>>> returned the >>>>>> number of processed weak oops. >>>>>> >>>>>> >>>>>> Done also (but I admit I have not tested the output yet) :) >>>>>> >>>>>> >>>>>> > - Additionally, would you prefer it in a separate >>>>>> block with its >>>>>> > GCTraceTime? >>>>>> >>>>>> Yes. Both kinds of information is interesting: while the >>>>>> time taken is >>>>>> typically more important, the next question would be >>>>>> why, and the >>>>>> number of references typically goes a long way there. >>>>>> >>>>>> See above though, it is probably best to wait a bit. >>>>>> >>>>>> >>>>>> Agreed that I "could" wait but, if it's ok, I'll just >>>>>> refactor/remove this when we get closer to something final. Either, >>>>>> JDK-8173335 >>>>>> has gone in and I will notice it now or it will soon and I >>>>>> can change it then. >>>>>> >>>>>> >>>>>> > > - threadLocalAllocBuffer.cpp:331: one more "TODO" >>>>>> > Removed it and added it to my personal todos to look >>>>>> at. >>>>>> > > > >>>>>> > > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer >>>>>> class >>>>>> > > documentation should be updated about the sampling >>>>>> additions. I >>>>>> > > would have no clue what the difference between >>>>>> "actual_end" and >>>>>> > > "end" would be from the given information. >>>>>> > If you are talking about the comments in this file, I >>>>>> made them more >>>>>> > clear I hope in the new webrev. If it was somewhere >>>>>> else, let me know >>>>>> > where to change. >>>>>> >>>>>> Thanks, that's much better. Maybe a note in the comment >>>>>> of the class >>>>>> that ThreadLocalBuffer provides some sampling facility >>>>>> by modifying the >>>>>> end() of the TLAB to cause "frequent" calls into the >>>>>> runtime call where >>>>>> actual sampling takes place. >>>>>> >>>>>> >>>>>> Done, I think it's better now. Added something about the >>>>>> slow_path_end as well. >>>>>> >>>>>> >>>>>> > > - in heapMonitoring.hpp: there are some random >>>>>> comments about some >>>>>> > > code that has been grabbed from >>>>>> "util/math/fastmath.[h|cc]". I >>>>>> > > can't tell whether this is code that can be used but >>>>>> I assume that >>>>>> > > Noam Shazeer is okay with that (i.e. that's all >>>>>> Google code). >>>>>> > Jeremy and I double checked and we can release that as >>>>>> I thought. I >>>>>> > removed the comment from that piece of code entirely. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> > > - heapMonitoring.hpp/cpp static constant naming does >>>>>> not correspond >>>>>> > > to Hotspot's. Additionally, in Hotspot static >>>>>> methods are cased >>>>>> > > like other methods. >>>>>> > I think I fixed the methods to be cased the same way >>>>>> as all other >>>>>> > methods. For static constants, I was not sure. I fixed >>>>>> a few other >>>>>> > variables but I could not seem to really see a >>>>>> consistent trend for >>>>>> > constants. I made them as variables but I'm not sure >>>>>> now. >>>>>> >>>>>> Sorry again, style is a kind of mess. The goal of my >>>>>> suggestions here >>>>>> is only to prevent yet another style creeping in. >>>>>> >>>>>> > > - in heapMonitoring.cpp there are a few cryptic >>>>>> comments at the top >>>>>> > > that seem to refer to internal stuff that should >>>>>> probably be >>>>>> > > removed. >>>>>> > Sorry about that! My personal todos not cleared out. >>>>>> >>>>>> I am happy about comments, but I simply did not >>>>>> understand any of that >>>>>> and I do not know about other readers as well. >>>>>> >>>>>> If you think you will remember removing/updating them >>>>>> until the review >>>>>> proper (I misunderstood the review situation a little it >>>>>> seems). >>>>>> >>>>>> > > I did not think through the impact of the TLAB >>>>>> changes on collector >>>>>> > > behavior yet (if there are). Also I did not check >>>>>> for problems with >>>>>> > > concurrent mark and SATB/G1 (if there are). >>>>>> > I would love to know your thoughts on this, I think >>>>>> this is fine. I >>>>>> >>>>>> I think so too now. No objects are made live out of thin >>>>>> air :) >>>>>> >>>>>> > see issues with multiple threads right now hitting the >>>>>> stack storage >>>>>> > instance. Previous webrevs had a mutex lock here but >>>>>> we took it out >>>>>> > for simplificity (and only for now). >>>>>> >>>>>> :) When looking at this after some thinking I now assume >>>>>> for this >>>>>> review that this code is not MT safe at all. There seems >>>>>> to be more >>>>>> synchronization missing than just the one for the >>>>>> StackTraceStorage. So >>>>>> no comments about this here. >>>>>> >>>>>> >>>>>> I doubled checked a bit (quickly I admit) but it seems that >>>>>> synchronization in StackTraceStorage is really all you need (all methods >>>>>> lead to a StackTraceStorage one >>>>>> and can be multithreaded outside of that). >>>>>> There is a question about the initialization where the method >>>>>> HeapMonitoring::initialize_profiling is not thread safe. >>>>>> It would work (famous last words) and not crash if there was >>>>>> a race but we could add a synchronization point there as well (and >>>>>> therefore on the stop as well). >>>>>> >>>>>> But anyway I will really check and do this once we add back >>>>>> synchronization. >>>>>> >>>>>> >>>>>> Also, this would require some kind of specification of >>>>>> what is allowed >>>>>> to be called when and where. >>>>>> >>>>>> >>>>>> Would we specify this with the methods in the jvmti.xml file? >>>>>> We could start by specifying in each that they are not thread safe but I >>>>>> saw no mention of that for >>>>>> other methods. >>>>>> >>>>>> >>>>>> One potentially relevant observation about locking here: >>>>>> depending on >>>>>> sampling frequency, StackTraceStore::add_trace() may be >>>>>> rather >>>>>> frequently called. I assume that you are going to do >>>>>> measurements :) >>>>>> >>>>>> >>>>>> Though we don't have the TLAB implementation in our code, the >>>>>> compiler generated sampler uses 2% of overhead with a 512k sampling rate. I >>>>>> can do real measurements >>>>>> when the code settles and we can see how costly this is as a >>>>>> TLAB implementation. >>>>>> However, my theory is that if the rate is 512k, the >>>>>> memory/performance overhead should be minimal since it is what we saw with >>>>>> our code/workloads (though not called >>>>>> the same way, we call it essentially at the same rate). >>>>>> If you have a benchmark you'd like me to test, let me know! >>>>>> >>>>>> Right now, with my really small test, this does use a bit of >>>>>> overhead even for a 512k sample size. I don't know yet why, I'm going to >>>>>> see what is going on. >>>>>> >>>>>> Finally, I think it is not reasonable to suppose the overhead >>>>>> to be negligible if the sampling rate used is too low. The user should know >>>>>> that the lower the rate, >>>>>> the higher the overhead (documentation TODO?). >>>>>> >>>>>> >>>>>> I am not sure what the expected usage of the API is, but >>>>>> StackTraceStore::add_trace() seems to be able to grow >>>>>> without bounds. >>>>>> Only a GC truncates them to the live ones. That in >>>>>> itself seems to be >>>>>> problematic (GCs can be *wide* apart), and of course >>>>>> some of the API >>>>>> methods add to that because they duplicate that >>>>>> unbounded array. Do you >>>>>> have any concerns/measurements about this? >>>>>> >>>>>> >>>>>> So, the theory is that yes add_trace can be able to grow >>>>>> without bounds but it grows at a sample per 512k of allocated space. The >>>>>> stacks it gathers are currently >>>>>> maxed at 64 (I'd like to expand that to an option to the user >>>>>> though at some point). So I have no concerns because: >>>>>> >>>>>> - If really this is taking a lot of space, that means the job >>>>>> is keeping a lot of objects in memory as well, therefore the entire heap is >>>>>> getting huge >>>>>> - If this is the case, you will be triggering a GC at some >>>>>> point anyway. >>>>>> >>>>>> (I'm putting under the rug the issue of "What if we set the >>>>>> rate to 1 for example" because as you lower the sampling rate, we cannot >>>>>> guarantee low overhead; the >>>>>> idea behind this feature is to have a means of having >>>>>> meaningful allocated samples at a low overhead) >>>>>> >>>>>> I have no measurements really right now but since I now have >>>>>> some statistics I can poll, I will look a bit more at this question. >>>>>> >>>>>> I have the same last sentence than above: the user should >>>>>> expect this to happen if the sampling rate is too small. That probably can >>>>>> be reflected in the >>>>>> StartHeapSampling as a note : careful this might impact your >>>>>> performance. >>>>>> >>>>>> >>>>>> Also, these stack traces might hold on to huge arrays. >>>>>> Any >>>>>> consideration of that? Particularly it might be the >>>>>> cause for OOMEs in >>>>>> tight memory situations. >>>>>> >>>>>> >>>>>> There is a stack size maximum that is set to 64 so it should >>>>>> not hold huge arrays. I don't think this is an issue but I can double check >>>>>> with a test or two. >>>>>> >>>>>> >>>>>> - please consider adding a safepoint check in >>>>>> HeapMonitoring::weak_oops_do to prevent accidental >>>>>> misuse. >>>>>> >>>>>> - in struct StackTraceStorage, the public fields may >>>>>> also need >>>>>> underscores. At least some files in the runtime >>>>>> directory have structs >>>>>> with underscored public members (and some don't). The >>>>>> runtime team >>>>>> should probably comment on that. >>>>>> >>>>>> >>>>>> Agreed I did not know. I looked around and a lot of structs >>>>>> did not have them it seemed so I left it as is. I will happily change it if >>>>>> someone prefers (I was not >>>>>> sure if you really preferred or not, your sentence seemed to >>>>>> be more a note of "this might need to change but I don't know if the >>>>>> runtime team enforces that", let >>>>>> me know if I read that wrongly). >>>>>> >>>>>> >>>>>> - In StackTraceStorage::weak_oops_do(), when examining >>>>>> the >>>>>> StackTraceData, maybe it is useful to consider having a >>>>>> non-NULL >>>>>> reference outside of the heap's reserved space an error. >>>>>> There should >>>>>> be no oop outside of the heap's reserved space ever. >>>>>> >>>>>> Unless you allow storing random values in >>>>>> StackTraceData::obj, which I >>>>>> would not encourage. >>>>>> >>>>>> >>>>>> I suppose you are talking about this part: >>>>>> if ((value != NULL && Universe::heap()->is_in_reserved(value)) >>>>>> && >>>>>> (is_alive == NULL || >>>>>> is_alive->do_object_b(value))) { >>>>>> >>>>>> What you are saying is that I could have something like: >>>>>> if (value != my_non_null_reference && >>>>>> (is_alive == NULL || >>>>>> is_alive->do_object_b(value))) { >>>>>> >>>>>> Is that what you meant? Is there really a reason to do so? >>>>>> When I look at the code, is_in_reserved seems like a O(1) method call. I'm >>>>>> not even sure we can have a >>>>>> NULL value to be honest. I might have to study that to see if >>>>>> this was not a paranoid test to begin with. >>>>>> >>>>>> The is_alive code has now morphed due to the comment below. >>>>>> >>>>>> >>>>>> >>>>>> - HeapMonitoring::weak_oops_do() does not seem to use the >>>>>> passed AbstractRefProcTaskExecutor. >>>>>> >>>>>> >>>>>> It did use it: >>>>>> size_t HeapMonitoring::weak_oops_do( >>>>>> AbstractRefProcTaskExecutor *task_executor, >>>>>> BoolObjectClosure* is_alive, >>>>>> OopClosure *f, >>>>>> VoidClosure *complete_gc) { >>>>>> assert(SafepointSynchronize::is_at_safepoint(), "must be >>>>>> at safepoint"); >>>>>> >>>>>> if (task_executor != NULL) { >>>>>> task_executor->set_single_threaded_mode(); >>>>>> } >>>>>> return StackTraceStorage::storage()->weak_oops_do(is_alive, >>>>>> f, complete_gc); >>>>>> } >>>>>> >>>>>> But due to the comment below, I refactored this, so this is >>>>>> no longer here. Now I have an always true closure that is passed. >>>>>> >>>>>> >>>>>> - I do not understand allowing to call this method with >>>>>> a NULL >>>>>> complete_gc closure. This would mean that objects >>>>>> referenced from the >>>>>> object that is referenced by the StackTraceData are not >>>>>> pulled, meaning >>>>>> they would get stale. >>>>>> >>>>>> - same with is_alive parameter value of NULL >>>>>> >>>>>> >>>>>> So these questions made me look a bit closer at this code. >>>>>> This code I think was written this way to have a very small impact on the >>>>>> file but you are right, there >>>>>> is no reason for this here. I've simplified the code by >>>>>> making in referenceProcessor.cpp a process_HeapSampling method that handles >>>>>> everything there. >>>>>> >>>>>> The code allowed NULLs because it depended on where you were >>>>>> coming from and how the code was being called. >>>>>> >>>>>> - I added a static always_true variable and pass that now to >>>>>> be more consistent with the rest of the code. >>>>>> - I moved the complete_gc into process_phaseHeapSampling now >>>>>> (new method) and handle the task_executor and the complete_gc there >>>>>> - Newbie question: in our code we did a >>>>>> set_single_threaded_mode but I see that process_phaseJNI does it right >>>>>> before its call, do I need to do it for the >>>>>> process_phaseHeapSample? >>>>>> That API is much cleaner (in my mind) and is consistent with >>>>>> what is done around it (again in my mind). >>>>>> >>>>>> >>>>>> - heapMonitoring.cpp:590: I do not completely understand >>>>>> the purpose of >>>>>> this code: in the end this results in a fixed value >>>>>> directly dependent >>>>>> on the Thread address anyway? In the end this results in >>>>>> a fixed value >>>>>> directly dependent on the Thread address anyway? >>>>>> IOW, what is special about exactly 20 rounds? >>>>>> >>>>>> >>>>>> So we really want a fast random number generator that has a >>>>>> specific mean (512k is the default we use). The code uses the thread >>>>>> address as the start number of the >>>>>> sequence (why not, it is random enough is rationale). Then >>>>>> instead of just starting there, we prime the sequence and really only start >>>>>> at the 21st number, it is >>>>>> arbitrary and I have not done a study to see if we could do >>>>>> more or less of that. >>>>>> >>>>>> As I have the statistics of the system up and running, I'll >>>>>> run some experiments to see if this is needed, is 20 good, or not. >>>>>> >>>>>> >>>>>> - also I would consider stripping a few bits of the >>>>>> threads' address as >>>>>> initialization value for your rng. The last three bits >>>>>> (and probably >>>>>> more, check whether the Thread object is allocated on >>>>>> special >>>>>> boundaries) are always zero for them. >>>>>> Not sure if the given "random" value is random enough >>>>>> before/after, >>>>>> this method, so just skip that comment if you think this >>>>>> is not >>>>>> required. >>>>>> >>>>>> >>>>>> I don't know is the honest answer. I think what is important >>>>>> is that we tend towards a mean and it is random "enough" to not fall in >>>>>> pitfalls of only sampling a >>>>>> subset of objects due to their allocation order. I added that >>>>>> as test to do to see if it changes the mean in any way for the 512k default >>>>>> value and/or if the first >>>>>> 1000 elements look better. >>>>>> >>>>>> >>>>>> Some more random nits I did not find a place to put >>>>>> anywhere: >>>>>> >>>>>> - ThreadLocalAllocBuffer::_extra_space does not seem to >>>>>> be used >>>>>> anywhere? >>>>>> >>>>>> >>>>>> Good catch :). >>>>>> >>>>>> >>>>>> - Maybe indent the declaration of >>>>>> ThreadLocalAllocBuffer::_bytes_until_sample to align below the other >>>>>> members of that group. >>>>>> >>>>>> >>>>>> Done moved it up a bit to have non static members together >>>>>> and static separate. >>>>>> >>>>>> Thanks, >>>>>> Thomas >>>>>> >>>>>> >>>>>> Thanks for your review! >>>>>> Jc >>>>>> >>>>>> >>>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Tue Oct 3 13:19:47 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 03 Oct 2017 15:19:47 +0200 Subject: RFR(L): 8186027: C2: loop strip mining Message-ID: http://cr.openjdk.java.net/~roland/8186027/webrev.00/ This converts loop: for (int i = start; i < stop; i += inc) { // body } to a loop nest: i = start; if (i < stop) { do { int next = MIN(stop, i+LoopStripMiningIter*inc); do { // body i += inc; } while (i < next); safepoint(); } while (i < stop); } (It's actually: int next = MIN(stop - i, LoopStripMiningIter*inc) + i; to protect against overflows) This should bring the best of running with UseCountedLoopSafepoints on and running with it off: low time to safepoint with little to no impact on throughput. That change was first pushed to the shenandoah repo several months ago and we've been running with it enabled since. The command line argument LoopStripMiningIter is the number of iterations between safepoints. In practice, with an arbitrary LoopStripMiningIter=1000, we observe time to safepoint on par with the current -XX:+UseCountedLoopSafepoints and most performance regressions due to -XX:+UseCountedLoopSafepoints gone. The exception is when an inner counted loop runs for a low number of iterations on average (and the compiler doesn't have an upper bound on the number of iteration). This is enabled on the command line with: -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 In PhaseIdealLoop::is_counted_loop(), when loop strip mining is enabled, for an inner loop, the compiler builds a skeleton outer loop around the the counted loop. The outer loop is kept as simple as possible so required adjustments to the existing loop optimizations are not too intrusive. The reason the outer loop is inserted early in the optimization process is so that optimizations are not disrupted: an alternate implementation could have kept the safepoint in the counted loop until loop opts are over and then only have added the outer loop and moved the safepoint to the outer loop. That would have prevented nodes that are referenced in the safepoint to be sunk out of loop for instance. The outer loop is a LoopNode with a backedge to a loop exit test and a safepoint. The loop exit test is a CmpI with a new Opaque5Node. The skeleton loop is populated with all required Phis after loop opts are over during macro expansion. At that point only, the loop exit tests are adjusted so the inner loop runs for at most LoopStripMiningIter. If the compiler can prove the inner loop runs for no more than LoopStripMiningIter then during macro expansion, the outer loop is removed. The safepoint is removed only if the inner loop executes for less than LoopStripMiningIterShortLoop so that if there are several counted loops in a raw, we still poll for safepoints regularly. Until macro expansion, there can be only a few extra nodes in the outer loop: nodes that would have sunk out of the inner loop and be kept in the outer loop by the safepoint. PhaseIdealLoop::clone_loop() which is used by most loop opts has now several ways of cloning a counted loop. For loop unswitching, both inner and outer loops need to be cloned. For unrolling, only the inner loop needs to be cloned. For pre/post loops insertion, only the inner loop needs to be cloned but the control flow must connect one of the inner loop copies to the outer loop of the other copy. Beyond verifying performance results with the usual benchmarks, when I implemented that change, I wrote test cases for (hopefully) every loop optimization and verified by inspection of the generated code that the loop opt triggers correct with loop strip mining. Roland. From vladimir.kozlov at oracle.com Tue Oct 3 21:42:45 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Oct 2017 14:42:45 -0700 Subject: RFR(S): 8187822: C2 conditonal move optimization might create broken graph In-Reply-To: References: <0d0b226d-418f-2344-2ff9-a7682747a0e2@oracle.com> <66e0634c-84a6-f8ef-6451-bf1e6a9be252@oracle.com> Message-ID: <05ec8fdd-a3ea-ac73-2f53-518c57881574@oracle.com> I submitted pre-integration testing. I will push if it passed. Vladimir On 10/2/17 12:48 AM, Roland Westrelin wrote: > > Ready to push changeset: > > http://cr.openjdk.java.net/~roland/8187822/changeset > > Roland. > From dean.long at oracle.com Wed Oct 4 01:18:05 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 3 Oct 2017 18:18:05 -0700 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: References: Message-ID: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> If handler->scope_count() > prev_scope, then can we skip the find because no duplicate is possible? dl On 10/2/17 2:40 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8188151/webrev.00/ > > When Compilation::generate_exception_handler_table() walks the exception > handler information to populate the exception handler table, it has some > logic that removes duplicate handlers for one particular throwing pc and > it is wrong AFAICT. > > That code iterates over already processed (handler_bci, scope_count, > entry_pco) triples stored in GrowableArrays bcis, scope_depths, pcos and > looks for entries for which handler_bci, scope_count are identical to > the current one. It does that by looking for an entry with same > handler_bci in the bcis array and then checks whether scope_count > matches too. The list of triples could be something like: > > 1: (13, 0, ..) > 2: (13, 1, ..) > > and the next triple to be process: (13, 1, ..) which is a duplicate of > 2. That logic looks for a handler with bci 13, finds entry 1 which > doesn't have scope count 1. And concludes that there no duplicate > entry. It would need to look at the following entry too. Given scope > counts are sorted in increasing order, rather that iterate over the list > of triples from the start, looking for duplicates fromt the end of the > list fixes that problem. > > Roland. From lutz.schmidt at sap.com Wed Oct 4 07:10:35 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 4 Oct 2017 07:10:35 +0000 Subject: [10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long) In-Reply-To: <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com> References: <5cfb314b-9785-fe47-8797-a899f38643ef@redhat.com> <32a35ee7-9593-bbc4-a540-37c590ec3ba6@redhat.com> <5fa7e391-bde0-9310-7bcd-98cf96b3f158@bell-sw.com> <3d76ff3c-17d7-a37c-2959-8f5e6dacd854@oracle.com> <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com> <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com> Message-ID: Hi Dmitrij, Your change looks good. It works for my multiplyHigh implementation on s390 and ppc (not yet RFR?ed, delayed until your change is in). Regards, Lutz On 02.10.2017, 15:47, "hotspot-compiler-dev on behalf of Dmitrij Pochepko" wrote: Hi, please find rebased webrev here: http://cr.openjdk.java.net/~dpochepk/8187684/webrev.newws.01/ Thanks, Dmitij On 29.09.2017 02:40, Vladimir Kozlov wrote: > Dmitry, > > Please, update changes for new consolidated sources and send new > patch/webrev. > > Thanks, > Vladimir > > On 9/25/17 9:42 AM, Vladimir Kozlov wrote: >> Yes, when repo will be opened. >> >> Please, send patch and add latest webrev link to the RFE. >> >> Thanks, >> Vladimir >> >> On 9/25/17 5:04 AM, Dmitrij Pochepko wrote: >>> >>> On 25.09.2017 14:04, Andrew Haley wrote: >>>> On 20/09/17 14:29, Andrew Haley wrote: >>>>> On 20/09/17 14:08, Dmitrij Pochepko wrote: >>>>>> please review small patch for enhancement: 8187684 - Intrinsify >>>>>> Math.multiplyHigh(long, long) >>>>> OK, thanks. >>>> Dmitrij, do you have a sponsor for this? I'm sure Vladimir would >>>> be happy to help. :-) >>>> >>> Hi, >>> >>> Vladimir, can you sponsor it? >>> >>> Thanks, >>> Dmitrij From martin.doerr at sap.com Wed Oct 4 12:28:25 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 4 Oct 2017 12:28:25 +0000 Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support. Part II In-Reply-To: <6073FB33-D580-447F-A201-43FB0EB9867C@sap.com> References: <6073FB33-D580-447F-A201-43FB0EB9867C@sap.com> Message-ID: Hi Lutz, reviewed and pushed. Thanks for the contribution. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Donnerstag, 28. September 2017 15:31 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support. Part II Dear all, I would like to request reviews for this s390-only enhancement: Bug: https://bugs.openjdk.java.net/browse/JDK-8187969 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187969.00/index.html This change is all about providing the instruction definitions and related low-level code emitters for the vector string instructions, introduced with z13. It only facilitates code generation. No code is generated by the change itself. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Oct 4 12:29:01 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 04 Oct 2017 14:29:01 +0200 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> Message-ID: Thanks for looking at this, Dean. > If handler->scope_count() > prev_scope, then can we skip the find > because no duplicate is possible? Yes, sounds like a reasonable optimization. Roland. From lutz.schmidt at sap.com Wed Oct 4 12:29:36 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 4 Oct 2017 12:29:36 +0000 Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support. Part II In-Reply-To: References: <6073FB33-D580-447F-A201-43FB0EB9867C@sap.com> Message-ID: Thank you, Martin! Regards, Lutz On 04.10.2017, 14:28, "Doerr, Martin" > wrote: Hi Lutz, reviewed and pushed. Thanks for the contribution. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Donnerstag, 28. September 2017 15:31 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support. Part II Dear all, I would like to request reviews for this s390-only enhancement: Bug: https://bugs.openjdk.java.net/browse/JDK-8187969 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187969.00/index.html This change is all about providing the instruction definitions and related low-level code emitters for the vector string instructions, introduced with z13. It only facilitates code generation. No code is generated by the change itself. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Oct 4 18:52:10 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 4 Oct 2017 11:52:10 -0700 Subject: [10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long) In-Reply-To: <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com> References: <5cfb314b-9785-fe47-8797-a899f38643ef@redhat.com> <32a35ee7-9593-bbc4-a540-37c590ec3ba6@redhat.com> <5fa7e391-bde0-9310-7bcd-98cf96b3f158@bell-sw.com> <3d76ff3c-17d7-a37c-2959-8f5e6dacd854@oracle.com> <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com> <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com> Message-ID: <549a6c92-1bd1-4262-3831-16dc959854c5@oracle.com> These changes passed pre-integration testing. I will push them. Thanks, Vladimir On 10/2/17 6:47 AM, Dmitrij Pochepko wrote: > Hi, > > please find rebased webrev here: http://cr.openjdk.java.net/~dpochepk/8187684/webrev.newws.01/ > > > Thanks, > > Dmitij > > > On 29.09.2017 02:40, Vladimir Kozlov wrote: >> Dmitry, >> >> Please, update changes for new consolidated sources and send new patch/webrev. >> >> Thanks, >> Vladimir >> >> On 9/25/17 9:42 AM, Vladimir Kozlov wrote: >>> Yes, when repo will be opened. >>> >>> Please, send patch and add latest webrev link to the RFE. >>> >>> Thanks, >>> Vladimir >>> >>> On 9/25/17 5:04 AM, Dmitrij Pochepko wrote: >>>> >>>> On 25.09.2017 14:04, Andrew Haley wrote: >>>>> On 20/09/17 14:29, Andrew Haley wrote: >>>>>> On 20/09/17 14:08, Dmitrij Pochepko wrote: >>>>>>> please review small patch for enhancement: 8187684 - Intrinsify >>>>>>> Math.multiplyHigh(long, long) >>>>>> OK, thanks. >>>>> Dmitrij, do you have a sponsor for this?? I'm sure Vladimir would >>>>> be happy to help.? :-) >>>>> >>>> Hi, >>>> >>>> Vladimir, can you sponsor it? >>>> >>>> Thanks, >>>> Dmitrij > From lutz.schmidt at sap.com Fri Oct 6 09:10:27 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 6 Oct 2017 09:10:27 +0000 Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) Message-ID: Dear all, I would like to request reviews for this s390-only enhancement (ppc support was already implemented for other purpose): Bug: https://bugs.openjdk.java.net/browse/JDK-8187964 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187964.00/index.html This change provides platform-specific implementations for the Math.multiplyHigh method, exploiting 64bit x 64bit => 128bit multiply instructions available on these platforms. Microbenchmark performance shows improvement of 4x to 5x for [s390] and 10x to 15x for [ppc]. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Fri Oct 6 13:03:46 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 6 Oct 2017 13:03:46 +0000 Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) In-Reply-To: References: Message-ID: Hi Lutz, looks good. If you like, you can get rid of one or both tmp registers if you want to save them. Did you also check if it improves long division which also uses multiply high nodes? I can sponsor this change. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Freitag, 6. Oktober 2017 11:10 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) Dear all, I would like to request reviews for this s390-only enhancement (ppc support was already implemented for other purpose): Bug: https://bugs.openjdk.java.net/browse/JDK-8187964 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187964.00/index.html This change provides platform-specific implementations for the Math.multiplyHigh method, exploiting 64bit x 64bit => 128bit multiply instructions available on these platforms. Microbenchmark performance shows improvement of 4x to 5x for [s390] and 10x to 15x for [ppc]. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Fri Oct 6 14:14:38 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 6 Oct 2017 14:14:38 +0000 Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) In-Reply-To: References: Message-ID: <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com> Hi Martin, thanks for your review! I have removed the use of the tmp2 register. That was easy. I do not like the idea of getting rid of the tmp1 register. This would have to be replaced by a scratch register. I try to avoid scratch registers at places where I can easily get a tmp from reg alloc. Please find the updated webrev at http://cr.openjdk.java.net/~lucy/webrevs/8187964.01/index.html The long division benefits quite a bit from multiplyHigh. With a simple MicroBenchmark, I see 4x to 5x improvement. Only the latest processor generation doesn?t benefit as much. I see a 1.5x improvement on z13 only. There is an easy explanation to the z13 ?anomaly?: the superscalar layout of a z13 core is twice as wide as that of a z196 core. Z13 needs rather complex loop bodies with independent data streams to reach its full potential. My simple benchmark obviously does not provide that. Best Regards, Lutz On 06.10.2017, 15:03, "Doerr, Martin" > wrote: Hi Lutz, looks good. If you like, you can get rid of one or both tmp registers if you want to save them. Did you also check if it improves long division which also uses multiply high nodes? I can sponsor this change. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Freitag, 6. Oktober 2017 11:10 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) Dear all, I would like to request reviews for this s390-only enhancement (ppc support was already implemented for other purpose): Bug: https://bugs.openjdk.java.net/browse/JDK-8187964 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187964.00/index.html This change provides platform-specific implementations for the Math.multiplyHigh method, exploiting 64bit x 64bit => 128bit multiply instructions available on these platforms. Microbenchmark performance shows improvement of 4x to 5x for [s390] and 10x to 15x for [ppc]. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Fri Oct 6 15:08:20 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 6 Oct 2017 15:08:20 +0000 Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian Message-ID: Hi, I have changed the AES intrinsics to support Big Endian (linux and AIX). Please review: http://cr.openjdk.java.net/~mdoerr/8188868_PPC64_AES_BE/webrev.00/ Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Fri Oct 6 15:36:44 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 6 Oct 2017 15:36:44 +0000 Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete Message-ID: Dear all, I would like to request reviews for this s390-only bug fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8188857 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8188857.00/index.html z/Architecture vector instructions require operating system support. Without os support, any attempt to execute such an instruction results in a SIGFPE signal. The presence of such support cannot be checked for by inspecting the cpu?s facility bits alone. During startup, a vector instruction is attempted to execute and, in case of a SIGFPE, the vector facility is marked unavailable. Thank you! Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at redhat.com Sat Oct 7 09:09:11 2017 From: aph at redhat.com (Andrew Haley) Date: Sat, 7 Oct 2017 10:09:11 +0100 Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) In-Reply-To: <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com> References: <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com> Message-ID: <5c09abb7-5a9f-5b88-bbc9-3eb2ac68db15@redhat.com> One thought about this: we might generate better code on these machines if we had an unsigned multiplyHigh intrinsic and did the adjustment for signed arithmetic in Java code, where the sign adjustment could be scheduled separately by C2. OK, I'll get on with implementing unsignedMultiplyHigh! -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From lutz.schmidt at sap.com Mon Oct 9 07:54:33 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 9 Oct 2017 07:54:33 +0000 Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long) In-Reply-To: <5c09abb7-5a9f-5b88-bbc9-3eb2ac68db15@redhat.com> References: <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com> <5c09abb7-5a9f-5b88-bbc9-3eb2ac68db15@redhat.com> Message-ID: <81216795-00FE-46E7-9F6D-3EBFEBB34278@sap.com> Andrew, unsigned multiplyHigh would definitely help [s390]. Think of just one instruction instead of seven. You could even get the full length product at the same cost. Regards, Lutz On 07.10.2017, 11:09, "Andrew Haley" wrote: One thought about this: we might generate better code on these machines if we had an unsigned multiplyHigh intrinsic and did the adjustment for signed arithmetic in Java code, where the sign adjustment could be scheduled separately by C2. OK, I'll get on with implementing unsignedMultiplyHigh! -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Mon Oct 9 09:38:36 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 9 Oct 2017 09:38:36 +0000 Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete In-Reply-To: References: Message-ID: <9ed18c99e4de4d9fbadcb6962daf74a6@sap.com> Hi Lutz, thanks for providing this fix. Looks good. I?d only like to remove the assignment of used_len from vm_version_s390.cpp because it is neither used nor set to a defined value. I can remove it before pushing if you?re ok with that. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Freitag, 6. Oktober 2017 17:37 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete Dear all, I would like to request reviews for this s390-only bug fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8188857 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8188857.00/index.html z/Architecture vector instructions require operating system support. Without os support, any attempt to execute such an instruction results in a SIGFPE signal. The presence of such support cannot be checked for by inspecting the cpu?s facility bits alone. During startup, a vector instruction is attempted to execute and, in case of a SIGFPE, the vector facility is marked unavailable. Thank you! Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Mon Oct 9 09:49:09 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 9 Oct 2017 09:49:09 +0000 Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete In-Reply-To: <9ed18c99e4de4d9fbadcb6962daf74a6@sap.com> References: <9ed18c99e4de4d9fbadcb6962daf74a6@sap.com> Message-ID: <183BDAE6-3A7D-4CD7-8AD2-3F32182D6B55@sap.com> Thanks, Martin, for reviewing my change. And yes, please go ahead and remove the assignment before pushing. Thank you! Lutz On 09.10.2017, 11:38, "Doerr, Martin" > wrote: Hi Lutz, thanks for providing this fix. Looks good. I?d only like to remove the assignment of used_len from vm_version_s390.cpp because it is neither used nor set to a defined value. I can remove it before pushing if you?re ok with that. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Freitag, 6. Oktober 2017 17:37 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete Dear all, I would like to request reviews for this s390-only bug fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8188857 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8188857.00/index.html z/Architecture vector instructions require operating system support. Without os support, any attempt to execute such an instruction results in a SIGFPE signal. The presence of such support cannot be checked for by inspecting the cpu?s facility bits alone. During startup, a vector instruction is attempted to execute and, in case of a SIGFPE, the vector facility is marked unavailable. Thank you! Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 -------------- next part -------------- An HTML attachment was scrubbed... URL: From goetz.lindenmaier at sap.com Mon Oct 9 09:57:08 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 9 Oct 2017 09:57:08 +0000 Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian In-Reply-To: References: Message-ID: <290565198b6d490ca16375402cee8eb9@sap.com> Hi Martin, thanks for porting this to be. Unfortunately the Unsafe jtreg tests failed tonight in our tests, as the space for the stubs does not suffice. Could you please fix this in this change? No new webrev needed. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Freitag, 6. Oktober 2017 17:08 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian > > Hi, > > > > I have changed the AES intrinsics to support Big Endian (linux and AIX). > > > > Please review: > > http://cr.openjdk.java.net/~mdoerr/8188868_PPC64_AES_BE/webrev.00/ > > > > Best regards, > Martin > > From martin.doerr at sap.com Mon Oct 9 12:04:28 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 9 Oct 2017 12:04:28 +0000 Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian In-Reply-To: <290565198b6d490ca16375402cee8eb9@sap.com> References: <290565198b6d490ca16375402cee8eb9@sap.com> Message-ID: <241225b6b69c41ef90f3cb7aec67b4e6@sap.com> Hi G?tz, thanks for the review. Pushed with increased code_size2 = 24000. Best regards, Martin -----Original Message----- From: Lindenmaier, Goetz Sent: Montag, 9. Oktober 2017 11:57 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian Hi Martin, thanks for porting this to be. Unfortunately the Unsafe jtreg tests failed tonight in our tests, as the space for the stubs does not suffice. Could you please fix this in this change? No new webrev needed. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Freitag, 6. Oktober 2017 17:08 > To: 'hotspot-compiler-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian > > Hi, > > > > I have changed the AES intrinsics to support Big Endian (linux and AIX). > > > > Please review: > > http://cr.openjdk.java.net/~mdoerr/8188868_PPC64_AES_BE/webrev.00/ > > > > Best regards, > Martin > > From daniel.daugherty at oracle.com Mon Oct 9 19:41:24 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 9 Oct 2017 13:41:24 -0600 Subject: RFR(XL): 8167108 - SMR and JavaThread Lifecycle Message-ID: <1e50bb73-840c-fc3a-81ad-31f83037093f@oracle.com> Greetings, We have a (eXtra Large) fix for the following bug: 8167108 inconsistent handling of SR_lock can lead to crashes https://bugs.openjdk.java.net/browse/JDK-8167108 This fix adds a Safe Memory Reclamation (SMR) mechanism based on Hazard Pointers to manage JavaThread lifecycle. Here's a PDF for the internal wiki that we've been using to describe and track the work on this project: http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf Dan has noticed that the indenting is wrong in some of the code quotes in the PDF that are not present in the internal wiki. We don't have a solution for that problem yet. Here's the webrev for current JDK10 version of this fix: http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full This fix has been run through many rounds of JPRT and Mach5 tier[2-5] testing, additional stress testing on Dan's Solaris X64 server, and additional testing on Erik and Robbin's machines. We welcome comments, suggestions and feedback. Daniel Daugherty Erik Osterlund Robbin Ehn From daniel.daugherty at oracle.com Mon Oct 9 21:23:23 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 9 Oct 2017 15:23:23 -0600 Subject: RFR(XL): 8167108 - SMR and JavaThread Lifecycle In-Reply-To: <1e50bb73-840c-fc3a-81ad-31f83037093f@oracle.com> References: <1e50bb73-840c-fc3a-81ad-31f83037093f@oracle.com> Message-ID: <546f3f48-47cf-73d1-30b1-b388418ae0bf@oracle.com> Many thanks to the folks that reviewed this internally and provided much appreciated feedback: - Daniel Daugherty - David Holmes - Erik Osterlund - Jerry Thornbrugh - Karen Kinnear - Kim Barrett - Robbin Ehn - Serguei Spitsyn - Stefan Karlson Since there are three contributing authors, we have been reviewing (and arguing over) each other's code. It has been an adventure! Dan, Erik, and Robbin On 10/9/17 1:41 PM, Daniel D. Daugherty wrote: > Greetings, > > We have a (eXtra Large) fix for the following bug: > > 8167108 inconsistent handling of SR_lock can lead to crashes > https://bugs.openjdk.java.net/browse/JDK-8167108 > > This fix adds a Safe Memory Reclamation (SMR) mechanism based on > Hazard Pointers to manage JavaThread lifecycle. > > Here's a PDF for the internal wiki that we've been using to describe > and track the work on this project: > > http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf > > > Dan has noticed that the indenting is wrong in some of the code quotes > in the PDF that are not present in the internal wiki. We don't have a > solution for that problem yet. > > Here's the webrev for current JDK10 version of this fix: > > http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full > > This fix has been run through many rounds of JPRT and Mach5 tier[2-5] > testing, additional stress testing on Dan's Solaris X64 server, and > additional testing on Erik and Robbin's machines. > > We welcome comments, suggestions and feedback. > > Daniel Daugherty > Erik Osterlund > Robbin Ehn > From jcbeyler at google.com Mon Oct 9 22:57:45 2017 From: jcbeyler at google.com (JC Beyler) Date: Mon, 9 Oct 2017 15:57:45 -0700 Subject: Low-Overhead Heap Profiling In-Reply-To: References: <2af975e6-3827-bd57-0c3d-fadd54867a67@oracle.com> <365499b6-3f4d-a4df-9e7e-e72a739fb26b@oracle.com> <102c59b8-25b6-8c21-8eef-1de7d0bbf629@oracle.com> <1497366226.2829.109.camel@oracle.com> <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> Message-ID: Dear all, Thread-safety is back!! Here is the update webrev: http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/ Full webrev is here: http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/ In order to really test this, I needed to add this so thought now was a good time. It required a few changes here for the creation to ensure correctness and safety. Now we keep the static pointer but clear the data internally so on re-initialize, it will be a bit more costly than before. I don't think this is a huge use-case so I did not think it was a problem. I used the internal MutexLocker, I think I used it well, let me know. I also added three tests: 1) Stack depth test: http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch This test shows that the maximum stack depth system is working. 2) Thread safety: http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch The test creates 24 threads and they all allocate at the same time. The test then checks it does find samples from all the threads. 3) Thread on/off safety http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch The test creates 24 threads that all allocate a bunch of memory. Then another thread turns the sampling on/off. Btw, both tests 2 & 3 failed without the locks. As I worked on this, I saw a lot of places where the tests are doing very similar things, I'm going to clean up the code a bit and make a HeapAllocator class that all tests can call directly. This will greatly simplify the code. Thanks for any comments/criticisms! Jc On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler wrote: > Dear all, > > Small update to the webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ > > Full webrev is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ > > I updated a bit of the naming, removed a TODO comment, and I added a test > for testing the sampling rate. I also updated the maximum stack depth to > 1024, there is no reason to keep it so small. I did a micro benchmark that > tests the overhead and it seems relatively the same. > > I compared allocations from a stack depth of 10 and allocations from a > stack depth of 1024 (allocations are from the same helper method in > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ > raw_files/new/test/hotspot/jtreg/serviceability/jvmti/ > HeapMonitor/MyPackage/HeapMonitorStatRateTest.java): > - For an array of 1 integer allocated in a loop; stack depth > 1024 vs stack depth 10: 1% slower > - For an array of 200k integers allocated in a loop; stack depth > 1024 vs stack depth 10: 3% slower > > So basically now moving the maximum stack depth to 1024 but we only copy > over the stack depths actually used. > > For the next webrev, I will be adding a stack depth test to show that it > works and probably put back the mutex locking so that we can see how > difficult it is to keep thread safe. > > Let me know what you think! > Jc > > > > On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler wrote: > >> Forgot to say that for my numbers: >> - Not in the test are the actual numbers I got for the various array >> sizes, I ran the program 30 times and parsed the output; here are the >> averages and standard deviation: >> 1000: 1.28% average; 1.13% standard deviation >> 10000: 1.59% average; 1.25% standard deviation >> 100000: 1.26% average; 1.26% standard deviation >> >> The 1000/10000/100000 are the sizes of the arrays being allocated. These >> are allocated 100k times and the sampling rate is 111 times the size of the >> array. >> >> Thanks! >> Jc >> >> >> On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler wrote: >> >>> Hi all, >>> >>> After a bit of a break, I am back working on this :). As before, here >>> are two webrevs: >>> >>> - Full change set: http://cr.openjdk.java.ne >>> t/~rasbold/8171119/webrev.09/ >>> - Compared to version 8: http://cr.openjdk.java.net/ >>> ~rasbold/8171119/webrev.08_09/ >>> (This version is compared to version 8 I last showed but ported to >>> the new folder hierarchy) >>> >>> In this version I have: >>> - Handled Thomas' comments from his email of 07/03: >>> - Merged the logging to be standard >>> - Fixed up the code a bit where asked >>> - Added some notes about the code not being thread-safe yet >>> - Removed additional dead code from the version that modifies >>> interpreter/c1/c2 >>> - Fixed compiler issues so that it compiles with >>> --disable-precompiled-header >>> - Tested with ./configure --with-boot-jdk= >>> --with-debug-level=slowdebug --disable-precompiled-headers >>> >>> Additionally, I added a test to check the sanity of the sampler: >>> HeapMonitorStatCorrectnessTest (http://cr.openjdk.java.net/~r >>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit >>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch) >>> - This allocates a number of arrays and checks that we obtain the >>> number of samples we want with an accepted error of 5%. I tested it 100 >>> times and it passed everytime, I can test more if wanted >>> - Not in the test are the actual numbers I got for the various array >>> sizes, I ran the program 30 times and parsed the output; here are the >>> averages and standard deviation: >>> 1000: 1.28% average; 1.13% standard deviation >>> 10000: 1.59% average; 1.25% standard deviation >>> 100000: 1.26% average; 1.26% standard deviation >>> >>> What this means is that we were always at about 1~2% of the number of >>> samples the test expected. >>> >>> Let me know what you think, >>> Jc >>> >>> >>> >>> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler wrote: >>> >>>> Hi all, >>>> >>>> I apologize, I have not yet handled your remarks but thought this new >>>> webrev would also be useful to see and comment on perhaps. >>>> >>>> Here is the latest webrev, it is generated slightly different than the >>>> others since now I'm using webrev.ksh without the -N option: >>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ >>>> >>>> And the webrev.07 to webrev.08 diff is here: >>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ >>>> >>>> (Let me know if it works well) >>>> >>>> It's a small change between versions but it: >>>> - provides a fix that makes the average sample rate correct (more on >>>> that below). >>>> - fixes the code to actually have it play nicely with the fast tlab >>>> refill >>>> - cleaned up a bit the JVMTI text and now use jvmtiFrameInfo >>>> - moved the capability to be onload solo >>>> >>>> With this webrev, I've done a small study of the random number >>>> generator we use here for the sampling rate. I took a small program and it >>>> can be simplified to: >>>> >>>> for (outer loop) >>>> for (inner loop) >>>> int[] tmp = new int[arraySize]; >>>> >>>> - I've fixed the outer and inner loops to being 800 for this >>>> experiment, meaning we allocate 640000 times an array of a given array >>>> size. >>>> >>>> - Each program provides the average sample size used for the whole >>>> execution >>>> >>>> - Then, I ran each variation 30 times and then calculated the average >>>> of the average sample size used for various array sizes. I selected the >>>> array size to be one of the following: 1, 10, 100, 1000. >>>> >>>> - When compared to 512kb, the average sample size of 30 runs: >>>> 1: 4.62% of error >>>> 10: 3.09% of error >>>> 100: 0.36% of error >>>> 1000: 0.1% of error >>>> 10000: 0.03% of error >>>> >>>> What it shows is that, depending on the number of samples, the average >>>> does become better. This is because with an allocation of 1 element per >>>> array, it will take longer to hit one of the thresholds. This is seen by >>>> looking at the sample count statistic I put in. For the same number of >>>> iterations (800 * 800), the different array sizes provoke: >>>> 1: 62 samples >>>> 10: 125 samples >>>> 100: 788 samples >>>> 1000: 6166 samples >>>> 10000: 57721 samples >>>> >>>> And of course, the more samples you have, the more sample rates you >>>> pick, which means that your average gets closer using that math. >>>> >>>> Thanks, >>>> Jc >>>> >>>> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler >>>> wrote: >>>> >>>>> Thanks Robbin, >>>>> >>>>> This seems to have worked. When I have the next webrev ready, we will >>>>> find out but I'm fairly confident it will work! >>>>> >>>>> Thanks agian! >>>>> Jc >>>>> >>>>> On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn >>>>> wrote: >>>>> >>>>>> Hi JC, >>>>>> >>>>>> On 06/29/2017 12:15 AM, JC Beyler wrote: >>>>>> >>>>>>> B) Incremental changes >>>>>>> >>>>>> >>>>>> I guess the most common work flow here is using mq : >>>>>> hg qnew fix_v1 >>>>>> edit files >>>>>> hg qrefresh >>>>>> hg qnew fix_v2 >>>>>> edit files >>>>>> hg qrefresh >>>>>> >>>>>> if you do hg log you will see 2 commits >>>>>> >>>>>> webrev.ksh -r -2 -o my_inc_v1_v2 >>>>>> webrev.ksh -o my_full_v2 >>>>>> >>>>>> >>>>>> In your .hgrc you might need: >>>>>> [extensions] >>>>>> mq = >>>>>> >>>>>> /Robbin >>>>>> >>>>>> >>>>>>> Again another newbiew question here... >>>>>>> >>>>>>> For showing the incremental changes, is there a link that explains >>>>>>> how to do that? I apologize for my newbie questions all the time :) >>>>>>> >>>>>>> Right now, I do: >>>>>>> >>>>>>> ksh ../webrev.ksh -m -N >>>>>>> >>>>>>> That generates a webrev.zip and send it to Chuck Rasbold. He then >>>>>>> uploads it to a new webrev. >>>>>>> >>>>>>> I tried commiting my change and adding a small change. Then if I >>>>>>> just do ksh ../webrev.ksh without any options, it seems to produce a >>>>>>> similar page but now with only the changes I had (so the 06-07 comparison >>>>>>> you were talking about) and a changeset that has it all. I imagine that is >>>>>>> what you meant. >>>>>>> >>>>>>> Which means that my workflow would become: >>>>>>> >>>>>>> 1) Make changes >>>>>>> 2) Make a webrev without any options to show just the differences >>>>>>> with the tip >>>>>>> 3) Amend my changes to my local commit so that I have it done with >>>>>>> 4) Go to 1 >>>>>>> >>>>>>> Does that seem correct to you? >>>>>>> >>>>>>> Note that when I do this, I only see the full change of a file in >>>>>>> the full change set (Side note here: now the page says change set and not >>>>>>> patch, which is maybe why Serguei was having issues?). >>>>>>> >>>>>>> Thanks! >>>>>>> Jc >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn >>>>>> > wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 06/28/2017 12:04 AM, JC Beyler wrote: >>>>>>> >>>>>>> Dear Thomas et al, >>>>>>> >>>>>>> Here is the newest webrev: >>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ < >>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> You have some more bits to in there but generally this looks >>>>>>> good and really nice with more tests. >>>>>>> I'll do and deep dive and re-test this when I get back from my >>>>>>> long vacation with whatever patch version you have then. >>>>>>> >>>>>>> Also I think it's time you provide incremental (v06->07 changes) >>>>>>> as well as complete change-sets. >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thomas, I "think" I have answered all your remarks. The >>>>>>> summary is: >>>>>>> >>>>>>> - The statistic system is up and provides insight on what >>>>>>> the heap sampler is doing >>>>>>> - I've noticed that, though the sampling rate is at the >>>>>>> right mean, we are missing some samples, I have not yet tracked out why >>>>>>> (details below) >>>>>>> >>>>>>> - I've run a tiny benchmark that is the worse case: it is a >>>>>>> very tight loop and allocated a small array >>>>>>> - In this case, I see no overhead when the system is >>>>>>> off so that is a good start :) >>>>>>> - I see right now a high overhead in this case when >>>>>>> sampling is on. This is not a really too surprising but I'm going to see if >>>>>>> this is consistent with our >>>>>>> internal implementation. The benchmark is really allocation >>>>>>> stressful so I'm not too surprised but I want to do the due diligence. >>>>>>> >>>>>>> - The statistic system up is up and I have a new test >>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/s >>>>>>> erviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTes >>>>>>> t.java.patch >>>>>>> >>>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTe >>>>>>> st.java.patch> >>>>>>> - I did a bit of a study about the random generator >>>>>>> here, more details are below but basically it seems to work well >>>>>>> >>>>>>> - I added a capability but since this is the first time >>>>>>> doing this, I was not sure I did it right >>>>>>> - I did add a test though for it and the test seems to >>>>>>> do what I expect (all methods are failing with the >>>>>>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error). >>>>>>> - http://cr.openjdk.java.net/~ra >>>>>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito >>>>>>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch >>>>>>> >>>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >>>>>>> bilityTest.java.patch> >>>>>>> >>>>>>> - I still need to figure out what to do about the >>>>>>> multi-agent vs single-agent issue >>>>>>> >>>>>>> - As far as measurements, it seems I still need to look >>>>>>> at: >>>>>>> - Why we do the 20 random calls first, are they >>>>>>> necessary? >>>>>>> - Look at the mean of the sampling rate that the random >>>>>>> generator does and also what is actually sampled >>>>>>> - What is the overhead in terms of memory/performance >>>>>>> when on? >>>>>>> >>>>>>> I have inlined my answers, I think I got them all in the new >>>>>>> webrev, let me know your thoughts. >>>>>>> >>>>>>> Thanks again! >>>>>>> Jc >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl < >>>>>>> thomas.schatzl at oracle.com >>>>>>> >>>>>> >>>>>>> >> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote: >>>>>>> > Hi all, >>>>>>> > >>>>>>> > First off: Thanks again to Robbin and Thomas for >>>>>>> their reviews :) >>>>>>> > >>>>>>> > Next, I've uploaded a new webrev: >>>>>>> > http://cr.openjdk.java.net/~ra >>>>>>> sbold/8171119/webrev.06/ >>>>>> asbold/8171119/webrev.06/> >>>>>>> >>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>> >>>>>>> >>>>>>> > >>>>>>> > Here is an update: >>>>>>> > >>>>>>> > - @Robbin, I forgot to say that yes I need to look at >>>>>>> implementing >>>>>>> > this for the other architectures and testing it >>>>>>> before it is all >>>>>>> > ready to go. Is it common to have it working on all >>>>>>> possible >>>>>>> > combinations or is there a subset that I should be >>>>>>> doing first and we >>>>>>> > can do the others later? >>>>>>> > - I've tested slowdebug, built and ran the JTreg >>>>>>> tests I wrote with >>>>>>> > slowdebug and fixed a few more issues >>>>>>> > - I've refactored a bit of the code following Thomas' >>>>>>> comments >>>>>>> > - I think I've handled all the comments from >>>>>>> Thomas (I put >>>>>>> > comments inline below for the specifics) >>>>>>> >>>>>>> Thanks for handling all those. >>>>>>> >>>>>>> > - Following Thomas' comments on statistics, I want to >>>>>>> add some >>>>>>> > quality assurance tests and find that the easiest way >>>>>>> would be to >>>>>>> > have a few counters of what is happening in the >>>>>>> sampler and expose >>>>>>> > that to the user. >>>>>>> > - I'll be adding that in the next version if no >>>>>>> one sees any >>>>>>> > objections to that. >>>>>>> > - This will allow me to add a sanity test in JTreg >>>>>>> about number of >>>>>>> > samples and average of sampling rate >>>>>>> > >>>>>>> > @Thomas: I had a few questions that I inlined below >>>>>>> but I will >>>>>>> > summarize the "bigger ones" here: >>>>>>> > - You mentioned constants are not using the right >>>>>>> conventions, I >>>>>>> > looked around and didn't see any convention except >>>>>>> normal naming then >>>>>>> > for static constants. Is that right? >>>>>>> >>>>>>> I looked through https://wiki.openjdk.java.net/ >>>>>>> display/HotSpot/StyleGui >>>>>> /display/HotSpot/StyleGui> >>>>>>> >>>>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>> >>>>>>> de and the rule is to "follow an existing pattern and >>>>>>> must have a >>>>>>> distinct appearance from other names". Which does not >>>>>>> help a lot I >>>>>>> guess :/ The GC team started using upper camel case, >>>>>>> e.g. >>>>>>> SomeOtherConstant, but very likely this is probably not >>>>>>> applied >>>>>>> consistently throughout. So I am fine with not adding >>>>>>> another style >>>>>>> (like kMaxStackDepth with the "k" in front with some >>>>>>> unknown meaning) >>>>>>> is fine. >>>>>>> >>>>>>> (Chances are you will find that style somewhere used >>>>>>> anyway too, >>>>>>> apologies if so :/) >>>>>>> >>>>>>> >>>>>>> Thanks for that link, now I know where to look. I used the >>>>>>> upper camel case in my code as well then :) I should have gotten them all. >>>>>>> >>>>>>> >>>>>>> > PS: I've also inlined my answers to Thomas below: >>>>>>> > >>>>>>> > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl >>>>>>> >>>>>> > e.com > wrote: >>>>>>> > > Hi all, >>>>>>> > > >>>>>>> > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote: >>>>>>> > > > Dear all, >>>>>>> > > > >>>>>>> > > > I've continued working on this and have done the >>>>>>> following >>>>>>> > > webrev: >>>>>>> > > > http://cr.openjdk.java.net/~ra >>>>>>> sbold/8171119/webrev.05/ >>>>>> asbold/8171119/webrev.05/> >>>>>>> >>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>> >>>>>>> >>>>>>> > > >>>>>>> > > [...] >>>>>>> > > > Things I still need to do: >>>>>>> > > > - Have to fix that TLAB case for the >>>>>>> FastTLABRefill >>>>>>> > > > - Have to start looking at the data to see >>>>>>> that it is >>>>>>> > > consistent and does gather the right samples, >>>>>>> right frequency, etc. >>>>>>> > > > - Have to check the GC elements and what that >>>>>>> produces >>>>>>> > > > - Run a slowdebug run and ensure I fixed all >>>>>>> those issues you >>>>>>> > > saw > Robbin >>>>>>> > > > >>>>>>> > > > Thanks for looking at the webrev and have a >>>>>>> great week! >>>>>>> > > >>>>>>> > > scratching a bit on the surface of this change, >>>>>>> so apologies for >>>>>>> > > rather shallow comments: >>>>>>> > > >>>>>>> > > - macroAssembler_x86.cpp:5604: while this is >>>>>>> compiler code, and I >>>>>>> > > am not sure this is final, please avoid littering >>>>>>> the code with >>>>>>> > > TODO remarks :) They tend to be candidates for >>>>>>> later wtf moments >>>>>>> > > only. >>>>>>> > > >>>>>>> > > Just file a CR for that. >>>>>>> > > >>>>>>> > Newcomer question: what is a CR and not sure I have >>>>>>> the rights to do >>>>>>> > that yet ? :) >>>>>>> >>>>>>> Apologies. CR is a change request, this suggests to >>>>>>> file a bug in the >>>>>>> bug tracker. And you are right, you can't just create a >>>>>>> new account in >>>>>>> the OpenJDK JIRA yourselves. :( >>>>>>> >>>>>>> >>>>>>> Ok good to know, I'll continue with my own todo list but >>>>>>> I'll work hard on not letting it slip in the webrevs anymore :) >>>>>>> >>>>>>> >>>>>>> I was mostly referring to the "... but it is a TODO" >>>>>>> part of that >>>>>>> comment in macroassembler_x86.cpp. Comments about the >>>>>>> why of the code >>>>>>> are appreciated. >>>>>>> >>>>>>> [Note that I now understand that this is to some degree >>>>>>> still work in >>>>>>> progress. As long as the final changeset does no >>>>>>> contain TODO's I am >>>>>>> fine (and it's not a hard objection, rather their use >>>>>>> in "final" code >>>>>>> is typically limited in my experience)] >>>>>>> >>>>>>> 5603 // Currently, if this happens, just set back the >>>>>>> actual end to >>>>>>> where it was. >>>>>>> 5604 // We miss a chance to sample here. >>>>>>> >>>>>>> Would be okay, if explaining "this" and the "why" of >>>>>>> missing a chance >>>>>>> to sample here would be best. >>>>>>> >>>>>>> Like maybe: >>>>>>> >>>>>>> // If we needed to refill TLABs, just set the actual >>>>>>> end point to >>>>>>> // the end of the TLAB again. We do not sample here >>>>>>> although we could. >>>>>>> >>>>>>> Done with your comment, it works well in my mind. >>>>>>> >>>>>>> I am not sure whether "miss a chance to sample" meant >>>>>>> "we could, but >>>>>>> consciously don't because it's not that useful" or "it >>>>>>> would be >>>>>>> necessary but don't because it's too complicated to >>>>>>> do.". >>>>>>> >>>>>>> Looking at the original comment once more, I am also >>>>>>> not sure if that >>>>>>> comment shouldn't referring to the "end" variable (not >>>>>>> actual_end) >>>>>>> because that's the variable that is responsible for >>>>>>> taking the sampling >>>>>>> path? (Going from the member description of >>>>>>> ThreadLocalAllocBuffer). >>>>>>> >>>>>>> >>>>>>> I've moved this code and it no longer shows up here but the >>>>>>> rationale and answer was: >>>>>>> >>>>>>> So.. Yes, end is the variable provoking the sampling. Actual >>>>>>> end is the actual end of the TLAB. >>>>>>> >>>>>>> What was happening here is that the code is resetting _end >>>>>>> to point towards the end of the new TLAB. Because, we now have the end for >>>>>>> sampling and _actual_end for >>>>>>> the actual end, we need to update the actual_end as well. >>>>>>> >>>>>>> Normally, were we to do the real work here, we would >>>>>>> calculate the (end - start) offset, then do: >>>>>>> >>>>>>> - Set the new end to : start + (old_end - old_start) >>>>>>> - Set the actual end like we do here now where it because it >>>>>>> is the actual end. >>>>>>> >>>>>>> Why is this not done here now anymore? >>>>>>> - I was still debating which path to take: >>>>>>> - Do it in the fast refill code, it has its perks: >>>>>>> - In a world where fast refills are happening all >>>>>>> the time or a lot, we can augment there the code to do the sampling >>>>>>> - Remember what we had as an end before leaving the >>>>>>> slowpath and check on return >>>>>>> - This is what I'm doing now, it removes the need >>>>>>> to go fix up all fast refill paths but if you remain in fast refill paths, >>>>>>> you won't get sampling. I >>>>>>> have to think of the consequences of that, maybe a future >>>>>>> change later on? >>>>>>> - I have the statistics now so I'm going to >>>>>>> study that >>>>>>> -> By the way, though my statistics are >>>>>>> showing I'm missing some samples, if I turn off FastTlabRefill, it is the >>>>>>> same loss so for now, it seems >>>>>>> this does not occur in my simple test. >>>>>>> >>>>>>> >>>>>>> >>>>>>> But maybe I am only confused and it's best to just >>>>>>> leave the comment >>>>>>> away. :) >>>>>>> >>>>>>> Thinking about it some more, doesn't this not-sampling >>>>>>> in this case >>>>>>> mean that sampling does not work in any collector that >>>>>>> does inline TLAB >>>>>>> allocation at the moment? (Or is inline TLAB alloc >>>>>>> automatically >>>>>>> disabled with sampling somehow?) >>>>>>> >>>>>>> That would indeed be a bigger TODO then :) >>>>>>> >>>>>>> >>>>>>> Agreed, this remark made me think that perhaps as a first >>>>>>> step the new way of doing it is better but I did have to: >>>>>>> - Remove the const of the ThreadLocalBuffer remaining and >>>>>>> hard_end methods >>>>>>> - Move hard_end out of the header file to have a bit more >>>>>>> logic there >>>>>>> >>>>>>> Please let me know what you think of that and if you prefer >>>>>>> it this way or changing the fast refills. (I prefer this way now because it >>>>>>> is more incremental). >>>>>>> >>>>>>> >>>>>>> > > - calling HeapMonitoring::do_weak_oops() (which >>>>>>> should probably be >>>>>>> > > called weak_oops_do() like other similar methods) >>>>>>> only if string >>>>>>> > > deduplication is enabled (in >>>>>>> g1CollectedHeap.cpp:4511) seems wrong. >>>>>>> > >>>>>>> > The call should be at least around 6 lines up outside >>>>>>> the if. >>>>>>> > >>>>>>> > Preferentially in a method like >>>>>>> process_weak_jni_handles(), including >>>>>>> > additional logging. (No new (G1) gc phase without >>>>>>> minimal logging >>>>>>> > :)). >>>>>>> > Done but really not sure because: >>>>>>> > >>>>>>> > I put for logging: >>>>>>> > log_develop_trace(gc, >>>>>>> freelist)("G1ConcRegionFreeing [other] : heap >>>>>>> > monitoring"); >>>>>>> >>>>>>> I would think that "gc, ref" would be more appropriate >>>>>>> log tags for >>>>>>> this similar to jni handles. >>>>>>> (I am als not sure what weak reference handling has to >>>>>>> do with >>>>>>> G1ConcRegionFreeing, so I am a bit puzzled) >>>>>>> >>>>>>> >>>>>>> I was not sure what to put for the tags or really as the >>>>>>> message. I cleaned it up a bit now to: >>>>>>> log_develop_trace(gc, ref)("HeapSampling [other] : heap >>>>>>> monitoring processing"); >>>>>>> >>>>>>> >>>>>>> >>>>>>> > Since weak_jni_handles didn't have logging for me to >>>>>>> be inspired >>>>>>> > from, I did that but unconvinced this is what should >>>>>>> be done. >>>>>>> >>>>>>> The JNI handle processing does have logging, but only in >>>>>>> ReferenceProcessor::process_discovered_references(). In >>>>>>> process_weak_jni_handles() only overall time is >>>>>>> measured (in a G1 >>>>>>> specific way, since only G1 supports disabling >>>>>>> reference procesing) :/ >>>>>>> >>>>>>> The code in ReferenceProcessor prints both time taken >>>>>>> referenceProcessor.cpp:254, as well as the count, but >>>>>>> strangely only in >>>>>>> debug VMs. >>>>>>> >>>>>>> I have no idea why this logging is that unimportant to >>>>>>> only print that >>>>>>> in a debug VM. However there are reviews out for >>>>>>> changing this area a >>>>>>> bit, so it might be useful to wait for that >>>>>>> (JDK-8173335). >>>>>>> >>>>>>> >>>>>>> I cleaned it up a bit anyway and now it returns the count of >>>>>>> objects that are in the system. >>>>>>> >>>>>>> >>>>>>> > > - the change doubles the size of >>>>>>> > > CollectedHeap::allocate_from_tlab_slow() above the >>>>>>> "small and nice" >>>>>>> > > threshold. Maybe it could be refactored a bit. >>>>>>> > Done I think, it looks better to me :). >>>>>>> >>>>>>> In ThreadLocalAllocBuffer::handle_sample() I think the >>>>>>> set_back_actual_end()/pick_next_sample() calls could >>>>>>> be hoisted out of >>>>>>> the "if" :) >>>>>>> >>>>>>> >>>>>>> Done! >>>>>>> >>>>>>> >>>>>>> > > - referenceProcessor.cpp:261: the change should add >>>>>>> logging about >>>>>>> > > the number of references encountered, maybe after >>>>>>> the corresponding >>>>>>> > > "JNI weak reference count" log message. >>>>>>> > Just to double check, are you saying that you'd like >>>>>>> to have the heap >>>>>>> > sampler to keep in store how many sampled objects >>>>>>> were encountered in >>>>>>> > the HeapMonitoring::weak_oops_do? >>>>>>> > - Would a return of the method with the number of >>>>>>> handled >>>>>>> > references and logging that work? >>>>>>> >>>>>>> Yes, it's fine if HeapMonitoring::weak_oops_do() only >>>>>>> returned the >>>>>>> number of processed weak oops. >>>>>>> >>>>>>> >>>>>>> Done also (but I admit I have not tested the output yet) :) >>>>>>> >>>>>>> >>>>>>> > - Additionally, would you prefer it in a separate >>>>>>> block with its >>>>>>> > GCTraceTime? >>>>>>> >>>>>>> Yes. Both kinds of information is interesting: while >>>>>>> the time taken is >>>>>>> typically more important, the next question would be >>>>>>> why, and the >>>>>>> number of references typically goes a long way there. >>>>>>> >>>>>>> See above though, it is probably best to wait a bit. >>>>>>> >>>>>>> >>>>>>> Agreed that I "could" wait but, if it's ok, I'll just >>>>>>> refactor/remove this when we get closer to something final. Either, >>>>>>> JDK-8173335 >>>>>>> has gone in and I will notice it now or it will soon and I >>>>>>> can change it then. >>>>>>> >>>>>>> >>>>>>> > > - threadLocalAllocBuffer.cpp:331: one more "TODO" >>>>>>> > Removed it and added it to my personal todos to look >>>>>>> at. >>>>>>> > > > >>>>>>> > > - threadLocalAllocBuffer.hpp: >>>>>>> ThreadLocalAllocBuffer class >>>>>>> > > documentation should be updated about the sampling >>>>>>> additions. I >>>>>>> > > would have no clue what the difference between >>>>>>> "actual_end" and >>>>>>> > > "end" would be from the given information. >>>>>>> > If you are talking about the comments in this file, I >>>>>>> made them more >>>>>>> > clear I hope in the new webrev. If it was somewhere >>>>>>> else, let me know >>>>>>> > where to change. >>>>>>> >>>>>>> Thanks, that's much better. Maybe a note in the comment >>>>>>> of the class >>>>>>> that ThreadLocalBuffer provides some sampling facility >>>>>>> by modifying the >>>>>>> end() of the TLAB to cause "frequent" calls into the >>>>>>> runtime call where >>>>>>> actual sampling takes place. >>>>>>> >>>>>>> >>>>>>> Done, I think it's better now. Added something about the >>>>>>> slow_path_end as well. >>>>>>> >>>>>>> >>>>>>> > > - in heapMonitoring.hpp: there are some random >>>>>>> comments about some >>>>>>> > > code that has been grabbed from >>>>>>> "util/math/fastmath.[h|cc]". I >>>>>>> > > can't tell whether this is code that can be used >>>>>>> but I assume that >>>>>>> > > Noam Shazeer is okay with that (i.e. that's all >>>>>>> Google code). >>>>>>> > Jeremy and I double checked and we can release that >>>>>>> as I thought. I >>>>>>> > removed the comment from that piece of code entirely. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> > > - heapMonitoring.hpp/cpp static constant naming >>>>>>> does not correspond >>>>>>> > > to Hotspot's. Additionally, in Hotspot static >>>>>>> methods are cased >>>>>>> > > like other methods. >>>>>>> > I think I fixed the methods to be cased the same way >>>>>>> as all other >>>>>>> > methods. For static constants, I was not sure. I >>>>>>> fixed a few other >>>>>>> > variables but I could not seem to really see a >>>>>>> consistent trend for >>>>>>> > constants. I made them as variables but I'm not sure >>>>>>> now. >>>>>>> >>>>>>> Sorry again, style is a kind of mess. The goal of my >>>>>>> suggestions here >>>>>>> is only to prevent yet another style creeping in. >>>>>>> >>>>>>> > > - in heapMonitoring.cpp there are a few cryptic >>>>>>> comments at the top >>>>>>> > > that seem to refer to internal stuff that should >>>>>>> probably be >>>>>>> > > removed. >>>>>>> > Sorry about that! My personal todos not cleared out. >>>>>>> >>>>>>> I am happy about comments, but I simply did not >>>>>>> understand any of that >>>>>>> and I do not know about other readers as well. >>>>>>> >>>>>>> If you think you will remember removing/updating them >>>>>>> until the review >>>>>>> proper (I misunderstood the review situation a little >>>>>>> it seems). >>>>>>> >>>>>>> > > I did not think through the impact of the TLAB >>>>>>> changes on collector >>>>>>> > > behavior yet (if there are). Also I did not check >>>>>>> for problems with >>>>>>> > > concurrent mark and SATB/G1 (if there are). >>>>>>> > I would love to know your thoughts on this, I think >>>>>>> this is fine. I >>>>>>> >>>>>>> I think so too now. No objects are made live out of >>>>>>> thin air :) >>>>>>> >>>>>>> > see issues with multiple threads right now hitting >>>>>>> the stack storage >>>>>>> > instance. Previous webrevs had a mutex lock here but >>>>>>> we took it out >>>>>>> > for simplificity (and only for now). >>>>>>> >>>>>>> :) When looking at this after some thinking I now >>>>>>> assume for this >>>>>>> review that this code is not MT safe at all. There >>>>>>> seems to be more >>>>>>> synchronization missing than just the one for the >>>>>>> StackTraceStorage. So >>>>>>> no comments about this here. >>>>>>> >>>>>>> >>>>>>> I doubled checked a bit (quickly I admit) but it seems that >>>>>>> synchronization in StackTraceStorage is really all you need (all methods >>>>>>> lead to a StackTraceStorage one >>>>>>> and can be multithreaded outside of that). >>>>>>> There is a question about the initialization where the >>>>>>> method HeapMonitoring::initialize_profiling is not thread safe. >>>>>>> It would work (famous last words) and not crash if there was >>>>>>> a race but we could add a synchronization point there as well (and >>>>>>> therefore on the stop as well). >>>>>>> >>>>>>> But anyway I will really check and do this once we add back >>>>>>> synchronization. >>>>>>> >>>>>>> >>>>>>> Also, this would require some kind of specification of >>>>>>> what is allowed >>>>>>> to be called when and where. >>>>>>> >>>>>>> >>>>>>> Would we specify this with the methods in the jvmti.xml >>>>>>> file? We could start by specifying in each that they are not thread safe >>>>>>> but I saw no mention of that for >>>>>>> other methods. >>>>>>> >>>>>>> >>>>>>> One potentially relevant observation about locking >>>>>>> here: depending on >>>>>>> sampling frequency, StackTraceStore::add_trace() may be >>>>>>> rather >>>>>>> frequently called. I assume that you are going to do >>>>>>> measurements :) >>>>>>> >>>>>>> >>>>>>> Though we don't have the TLAB implementation in our code, >>>>>>> the compiler generated sampler uses 2% of overhead with a 512k sampling >>>>>>> rate. I can do real measurements >>>>>>> when the code settles and we can see how costly this is as a >>>>>>> TLAB implementation. >>>>>>> However, my theory is that if the rate is 512k, the >>>>>>> memory/performance overhead should be minimal since it is what we saw with >>>>>>> our code/workloads (though not called >>>>>>> the same way, we call it essentially at the same rate). >>>>>>> If you have a benchmark you'd like me to test, let me know! >>>>>>> >>>>>>> Right now, with my really small test, this does use a bit of >>>>>>> overhead even for a 512k sample size. I don't know yet why, I'm going to >>>>>>> see what is going on. >>>>>>> >>>>>>> Finally, I think it is not reasonable to suppose the >>>>>>> overhead to be negligible if the sampling rate used is too low. The user >>>>>>> should know that the lower the rate, >>>>>>> the higher the overhead (documentation TODO?). >>>>>>> >>>>>>> >>>>>>> I am not sure what the expected usage of the API is, but >>>>>>> StackTraceStore::add_trace() seems to be able to grow >>>>>>> without bounds. >>>>>>> Only a GC truncates them to the live ones. That in >>>>>>> itself seems to be >>>>>>> problematic (GCs can be *wide* apart), and of course >>>>>>> some of the API >>>>>>> methods add to that because they duplicate that >>>>>>> unbounded array. Do you >>>>>>> have any concerns/measurements about this? >>>>>>> >>>>>>> >>>>>>> So, the theory is that yes add_trace can be able to grow >>>>>>> without bounds but it grows at a sample per 512k of allocated space. The >>>>>>> stacks it gathers are currently >>>>>>> maxed at 64 (I'd like to expand that to an option to the >>>>>>> user though at some point). So I have no concerns because: >>>>>>> >>>>>>> - If really this is taking a lot of space, that means the >>>>>>> job is keeping a lot of objects in memory as well, therefore the entire >>>>>>> heap is getting huge >>>>>>> - If this is the case, you will be triggering a GC at some >>>>>>> point anyway. >>>>>>> >>>>>>> (I'm putting under the rug the issue of "What if we set the >>>>>>> rate to 1 for example" because as you lower the sampling rate, we cannot >>>>>>> guarantee low overhead; the >>>>>>> idea behind this feature is to have a means of having >>>>>>> meaningful allocated samples at a low overhead) >>>>>>> >>>>>>> I have no measurements really right now but since I now have >>>>>>> some statistics I can poll, I will look a bit more at this question. >>>>>>> >>>>>>> I have the same last sentence than above: the user should >>>>>>> expect this to happen if the sampling rate is too small. That probably can >>>>>>> be reflected in the >>>>>>> StartHeapSampling as a note : careful this might impact your >>>>>>> performance. >>>>>>> >>>>>>> >>>>>>> Also, these stack traces might hold on to huge arrays. >>>>>>> Any >>>>>>> consideration of that? Particularly it might be the >>>>>>> cause for OOMEs in >>>>>>> tight memory situations. >>>>>>> >>>>>>> >>>>>>> There is a stack size maximum that is set to 64 so it should >>>>>>> not hold huge arrays. I don't think this is an issue but I can double check >>>>>>> with a test or two. >>>>>>> >>>>>>> >>>>>>> - please consider adding a safepoint check in >>>>>>> HeapMonitoring::weak_oops_do to prevent accidental >>>>>>> misuse. >>>>>>> >>>>>>> - in struct StackTraceStorage, the public fields may >>>>>>> also need >>>>>>> underscores. At least some files in the runtime >>>>>>> directory have structs >>>>>>> with underscored public members (and some don't). The >>>>>>> runtime team >>>>>>> should probably comment on that. >>>>>>> >>>>>>> >>>>>>> Agreed I did not know. I looked around and a lot of structs >>>>>>> did not have them it seemed so I left it as is. I will happily change it if >>>>>>> someone prefers (I was not >>>>>>> sure if you really preferred or not, your sentence seemed to >>>>>>> be more a note of "this might need to change but I don't know if the >>>>>>> runtime team enforces that", let >>>>>>> me know if I read that wrongly). >>>>>>> >>>>>>> >>>>>>> - In StackTraceStorage::weak_oops_do(), when examining >>>>>>> the >>>>>>> StackTraceData, maybe it is useful to consider having a >>>>>>> non-NULL >>>>>>> reference outside of the heap's reserved space an >>>>>>> error. There should >>>>>>> be no oop outside of the heap's reserved space ever. >>>>>>> >>>>>>> Unless you allow storing random values in >>>>>>> StackTraceData::obj, which I >>>>>>> would not encourage. >>>>>>> >>>>>>> >>>>>>> I suppose you are talking about this part: >>>>>>> if ((value != NULL && Universe::heap()->is_in_reserved(value)) >>>>>>> && >>>>>>> (is_alive == NULL || >>>>>>> is_alive->do_object_b(value))) { >>>>>>> >>>>>>> What you are saying is that I could have something like: >>>>>>> if (value != my_non_null_reference && >>>>>>> (is_alive == NULL || >>>>>>> is_alive->do_object_b(value))) { >>>>>>> >>>>>>> Is that what you meant? Is there really a reason to do so? >>>>>>> When I look at the code, is_in_reserved seems like a O(1) method call. I'm >>>>>>> not even sure we can have a >>>>>>> NULL value to be honest. I might have to study that to see >>>>>>> if this was not a paranoid test to begin with. >>>>>>> >>>>>>> The is_alive code has now morphed due to the comment below. >>>>>>> >>>>>>> >>>>>>> >>>>>>> - HeapMonitoring::weak_oops_do() does not seem to use >>>>>>> the >>>>>>> passed AbstractRefProcTaskExecutor. >>>>>>> >>>>>>> >>>>>>> It did use it: >>>>>>> size_t HeapMonitoring::weak_oops_do( >>>>>>> AbstractRefProcTaskExecutor *task_executor, >>>>>>> BoolObjectClosure* is_alive, >>>>>>> OopClosure *f, >>>>>>> VoidClosure *complete_gc) { >>>>>>> assert(SafepointSynchronize::is_at_safepoint(), "must >>>>>>> be at safepoint"); >>>>>>> >>>>>>> if (task_executor != NULL) { >>>>>>> task_executor->set_single_threaded_mode(); >>>>>>> } >>>>>>> return StackTraceStorage::storage()->weak_oops_do(is_alive, >>>>>>> f, complete_gc); >>>>>>> } >>>>>>> >>>>>>> But due to the comment below, I refactored this, so this is >>>>>>> no longer here. Now I have an always true closure that is passed. >>>>>>> >>>>>>> >>>>>>> - I do not understand allowing to call this method with >>>>>>> a NULL >>>>>>> complete_gc closure. This would mean that objects >>>>>>> referenced from the >>>>>>> object that is referenced by the StackTraceData are not >>>>>>> pulled, meaning >>>>>>> they would get stale. >>>>>>> >>>>>>> - same with is_alive parameter value of NULL >>>>>>> >>>>>>> >>>>>>> So these questions made me look a bit closer at this code. >>>>>>> This code I think was written this way to have a very small impact on the >>>>>>> file but you are right, there >>>>>>> is no reason for this here. I've simplified the code by >>>>>>> making in referenceProcessor.cpp a process_HeapSampling method that handles >>>>>>> everything there. >>>>>>> >>>>>>> The code allowed NULLs because it depended on where you were >>>>>>> coming from and how the code was being called. >>>>>>> >>>>>>> - I added a static always_true variable and pass that now to >>>>>>> be more consistent with the rest of the code. >>>>>>> - I moved the complete_gc into process_phaseHeapSampling now >>>>>>> (new method) and handle the task_executor and the complete_gc there >>>>>>> - Newbie question: in our code we did a >>>>>>> set_single_threaded_mode but I see that process_phaseJNI does it right >>>>>>> before its call, do I need to do it for the >>>>>>> process_phaseHeapSample? >>>>>>> That API is much cleaner (in my mind) and is consistent with >>>>>>> what is done around it (again in my mind). >>>>>>> >>>>>>> >>>>>>> - heapMonitoring.cpp:590: I do not completely >>>>>>> understand the purpose of >>>>>>> this code: in the end this results in a fixed value >>>>>>> directly dependent >>>>>>> on the Thread address anyway? In the end this results >>>>>>> in a fixed value >>>>>>> directly dependent on the Thread address anyway? >>>>>>> IOW, what is special about exactly 20 rounds? >>>>>>> >>>>>>> >>>>>>> So we really want a fast random number generator that has a >>>>>>> specific mean (512k is the default we use). The code uses the thread >>>>>>> address as the start number of the >>>>>>> sequence (why not, it is random enough is rationale). Then >>>>>>> instead of just starting there, we prime the sequence and really only start >>>>>>> at the 21st number, it is >>>>>>> arbitrary and I have not done a study to see if we could do >>>>>>> more or less of that. >>>>>>> >>>>>>> As I have the statistics of the system up and running, I'll >>>>>>> run some experiments to see if this is needed, is 20 good, or not. >>>>>>> >>>>>>> >>>>>>> - also I would consider stripping a few bits of the >>>>>>> threads' address as >>>>>>> initialization value for your rng. The last three bits >>>>>>> (and probably >>>>>>> more, check whether the Thread object is allocated on >>>>>>> special >>>>>>> boundaries) are always zero for them. >>>>>>> Not sure if the given "random" value is random enough >>>>>>> before/after, >>>>>>> this method, so just skip that comment if you think >>>>>>> this is not >>>>>>> required. >>>>>>> >>>>>>> >>>>>>> I don't know is the honest answer. I think what is important >>>>>>> is that we tend towards a mean and it is random "enough" to not fall in >>>>>>> pitfalls of only sampling a >>>>>>> subset of objects due to their allocation order. I added >>>>>>> that as test to do to see if it changes the mean in any way for the 512k >>>>>>> default value and/or if the first >>>>>>> 1000 elements look better. >>>>>>> >>>>>>> >>>>>>> Some more random nits I did not find a place to put >>>>>>> anywhere: >>>>>>> >>>>>>> - ThreadLocalAllocBuffer::_extra_space does not seem >>>>>>> to be used >>>>>>> anywhere? >>>>>>> >>>>>>> >>>>>>> Good catch :). >>>>>>> >>>>>>> >>>>>>> - Maybe indent the declaration of >>>>>>> ThreadLocalAllocBuffer::_bytes_until_sample to align below the >>>>>>> other members of that group. >>>>>>> >>>>>>> >>>>>>> Done moved it up a bit to have non static members together >>>>>>> and static separate. >>>>>>> >>>>>>> Thanks, >>>>>>> Thomas >>>>>>> >>>>>>> >>>>>>> Thanks for your review! >>>>>>> Jc >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleen.phillimore at oracle.com Tue Oct 10 02:36:15 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 9 Oct 2017 22:36:15 -0400 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> Message-ID: <0a232ef3-8e2b-3025-14ba-eaf8c6b409fe@oracle.com> This seems ok to me with Jamsheed's explanation. Thanks, Coleen On 9/14/17 2:54 AM, Dean Long wrote: > It looks like you accidentally dropped > hotspot-compiler-dev at openjdk.java.net when you added runtime. > > dl > > > On 9/13/2017 11:21 PM, jamsheed wrote: >> (adding runtime list for inputs) >> >> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>> brief desc: special handling of Object. in >>> TemplateInterpreter::deopt_reexecute_entry >>> >>> required last_sp to be reset explicitly in normal return path >>> >>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, >>> address bcp) { >>> ? assert(method->contains(bcp), "just checkin'"); >>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>> ? if (code == Bytecodes::_return) { >>> ??? // This is used for deopt during registration of finalizers >>> ??? // during Object..? We simply need to resume execution at >>> ??? // the standard return vtos bytecode to pop the frame normally. >>> ??? // reexecuting the real bytecode would cause double registration >>> ??? // of the finalizable object. >>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >> >> last_sp ! = null not an issue for this case, so i skip the assert in >> debug build >> >> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >> >> Please review. >> >> Best Regards, >> Jamsheed >> >> >> >> >> > From jamsheed.c.m at oracle.com Tue Oct 10 06:05:37 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Tue, 10 Oct 2017 11:35:37 +0530 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <0a232ef3-8e2b-3025-14ba-eaf8c6b409fe@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> <0a232ef3-8e2b-3025-14ba-eaf8c6b409fe@oracle.com> Message-ID: <0df5882b-b94c-569c-bed8-33f502e7fd8d@oracle.com> Thanks for the review, Coleen Best regards, Jamsheed On Tuesday 10 October 2017 08:06 AM, coleen.phillimore at oracle.com wrote: > > This seems ok to me with Jamsheed's explanation. > Thanks, > Coleen > > On 9/14/17 2:54 AM, Dean Long wrote: >> It looks like you accidentally dropped >> hotspot-compiler-dev at openjdk.java.net when you added runtime. >> >> dl >> >> >> On 9/13/2017 11:21 PM, jamsheed wrote: >>> (adding runtime list for inputs) >>> >>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>> brief desc: special handling of Object. in >>>> TemplateInterpreter::deopt_reexecute_entry >>>> >>>> required last_sp to be reset explicitly in normal return path >>>> >>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, >>>> address bcp) { >>>> ? assert(method->contains(bcp), "just checkin'"); >>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>> ? if (code == Bytecodes::_return) { >>>> ??? // This is used for deopt during registration of finalizers >>>> ??? // during Object..? We simply need to resume execution at >>>> ??? // the standard return vtos bytecode to pop the frame normally. >>>> ??? // reexecuting the real bytecode would cause double registration >>>> ??? // of the finalizable object. >>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>> >>> last_sp ! = null not an issue for this case, so i skip the assert in >>> debug build >>> >>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>> >>> Please review. >>> >>> Best Regards, >>> Jamsheed >>> >>> >>> >>> >>> >> > From dmitry.chuyko at bell-sw.com Tue Oct 10 14:54:33 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 10 Oct 2017 17:54:33 +0300 Subject: RFR (XS): 8188221 - AARCH64: Return type profiling is not performed from aarch64 interpreter Message-ID: <1de74a72-ec83-9fed-ffc8-091af58de457@bell-sw.com> Hello, TestArrayCopyNoInitDeopt jtreg test (JDK-8072016) fails in -XX:-TieredCompilation mode because return type is not profiled in interpreter. Please review the fix, it adds profiling for aarch64 similar to how it's implemented for other cpus. bug: https://bugs.openjdk.java.net/browse/JDK-8188221 patch: jdk10.patch attached -Dmitry -------------- next part -------------- A non-text attachment was scrubbed... Name: jdk10.patch Type: text/x-patch Size: 712 bytes Desc: not available URL: From vladimir.kozlov at oracle.com Tue Oct 10 15:11:43 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Oct 2017 08:11:43 -0700 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> Message-ID: Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 32-bit affected? Thanks, Vladimir On 9/13/17 11:54 PM, Dean Long wrote: > It looks like you accidentally dropped hotspot-compiler-dev at openjdk.java.net when you added runtime. > > dl > > > On 9/13/2017 11:21 PM, jamsheed wrote: >> (adding runtime list for inputs) >> >> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>> brief desc: special handling of Object. in TemplateInterpreter::deopt_reexecute_entry >>> >>> required last_sp to be reset explicitly in normal return path >>> >>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, address bcp) { >>> ? assert(method->contains(bcp), "just checkin'"); >>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>> ? if (code == Bytecodes::_return) { >>> ??? // This is used for deopt during registration of finalizers >>> ??? // during Object..? We simply need to resume execution at >>> ??? // the standard return vtos bytecode to pop the frame normally. >>> ??? // reexecuting the real bytecode would cause double registration >>> ??? // of the finalizable object. >>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >> >> last_sp ! = null not an issue for this case, so i skip the assert in debug build >> >> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >> >> Please review. >> >> Best Regards, >> Jamsheed >> >> >> >> >> > From dmitry.chuyko at bell-sw.com Tue Oct 10 15:13:31 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 10 Oct 2017 18:13:31 +0300 Subject: RFR (XS): 8188221 - AARCH64: Return type profiling is not performed from aarch64 interpreter In-Reply-To: <1de74a72-ec83-9fed-ffc8-091af58de457@bell-sw.com> References: <1de74a72-ec83-9fed-ffc8-091af58de457@bell-sw.com> Message-ID: <122214a9-e4e3-5765-fdca-10f73f79bf2a@bell-sw.com> --- old/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp 2017-10-02 09:10:20.917960334 +0000 +++ new/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp 2017-10-02 09:10:20.293932959 +0000 @@ -414,6 +414,14 @@ ?? __ restore_constant_pool_cache(); ?? __ get_method(rmethod); +? if (state == atos) { +??? Register obj = r0; +??? Register mdp = r1; +??? Register tmp = r2; +??? __ ldr(mdp, Address(rmethod, Method::method_data_offset())); +??? __ profile_return_type(mdp, obj, tmp); +? } + ?? // Pop N words from the stack ?? __ get_cache_and_index_at_bcp(r1, r2, 1, index_size); ?? __ ldr(r1, Address(r1, ConstantPoolCache::base_offset() + ConstantPoolCacheEntry::flags_offset())); On 10/10/2017 05:54 PM, Dmitry Chuyko wrote: > Hello, > > TestArrayCopyNoInitDeopt jtreg test (JDK-8072016) fails in > -XX:-TieredCompilation mode because return type is not profiled in > interpreter. > Please review the fix, it adds profiling for aarch64 similar to how > it's implemented for other cpus. > > bug: https://bugs.openjdk.java.net/browse/JDK-8188221 > patch: jdk10.patch attached > > -Dmitry From nils.eliasson at oracle.com Wed Oct 11 09:28:13 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 11 Oct 2017 11:28:13 +0200 Subject: RFR (XXS): 8160303: parse_method_pattern only scans 254 chars In-Reply-To: <2923877c-af26-398c-658a-2bace3b34fd3@oracle.com> References: <7555f1ba-383f-3652-a701-765eda0417ac@oracle.com> <2923877c-af26-398c-658a-2bace3b34fd3@oracle.com> Message-ID: Hi, *redface* Correct, fixed! Regards, Nils Eliasson On 2017-09-19 20:45, Vladimir Kozlov wrote: > It should be 1022: one for '(' + one for \0 at the end. > > Vladimir > > On 9/19/17 3:54 AM, Nils Eliasson wrote: >> Hi, >> >> This patch fixes the wrong (too short) scan length in the signature >> parsing in methodMatcher.cpp. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8160303 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8160303/webrev.01/ >> >> >> Please review, >> >> Nils Eliasson >> >> From vladimir.x.ivanov at oracle.com Wed Oct 11 11:59:22 2017 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Oct 2017 14:59:22 +0300 Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev In-Reply-To: <30dbb109-c259-4529-b846-e4afffc94bd0@default> References: <30dbb109-c259-4529-b846-e4afffc94bd0@default> Message-ID: <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> Looks good. Best regards, Vladimir Ivanov On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote: > Hi, > > Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev > > Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes. > > Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/ > jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175 > Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873 > > Test: Run jtreg and jprt hotspot testsets. > > Regards, > Muthusamy C > From jamsheed.c.m at oracle.com Wed Oct 11 12:48:05 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Wed, 11 Oct 2017 18:18:05 +0530 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> Message-ID: <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> Hi Vladimir, Thank you for pointing this. revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ Best Regards, Jamsheed On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote: > Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only > 32-bit affected? > > Thanks, > Vladimir > > On 9/13/17 11:54 PM, Dean Long wrote: >> It looks like you accidentally dropped >> hotspot-compiler-dev at openjdk.java.net when you added runtime. >> >> dl >> >> >> On 9/13/2017 11:21 PM, jamsheed wrote: >>> (adding runtime list for inputs) >>> >>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>> brief desc: special handling of Object. in >>>> TemplateInterpreter::deopt_reexecute_entry >>>> >>>> required last_sp to be reset explicitly in normal return path >>>> >>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, >>>> address bcp) { >>>> ? assert(method->contains(bcp), "just checkin'"); >>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>> ? if (code == Bytecodes::_return) { >>>> ??? // This is used for deopt during registration of finalizers >>>> ??? // during Object..? We simply need to resume execution at >>>> ??? // the standard return vtos bytecode to pop the frame normally. >>>> ??? // reexecuting the real bytecode would cause double registration >>>> ??? // of the finalizable object. >>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>> >>> last_sp ! = null not an issue for this case, so i skip the assert in >>> debug build >>> >>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>> >>> Please review. >>> >>> Best Regards, >>> Jamsheed >>> >>> >>> >>> >>> >> From nils.eliasson at oracle.com Wed Oct 11 13:15:37 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 11 Oct 2017 15:15:37 +0200 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: References: Message-ID: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com> Hi Roland, I have started reviewing and testing I will sponsor your change when the full review is completed. Best Regards, Nils On 2017-10-03 15:19, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8186027/webrev.00/ > > This converts loop: > > for (int i = start; i < stop; i += inc) { > // body > } > > to a loop nest: > > i = start; > if (i < stop) { > do { > int next = MIN(stop, i+LoopStripMiningIter*inc); > do { > // body > i += inc; > } while (i < next); > safepoint(); > } while (i < stop); > } > > (It's actually: > int next = MIN(stop - i, LoopStripMiningIter*inc) + i; > to protect against overflows) > > This should bring the best of running with UseCountedLoopSafepoints on > and running with it off: low time to safepoint with little to no impact > on throughput. That change was first pushed to the shenandoah repo > several months ago and we've been running with it enabled since. > > The command line argument LoopStripMiningIter is the number of > iterations between safepoints. In practice, with an arbitrary > LoopStripMiningIter=1000, we observe time to safepoint on par with the > current -XX:+UseCountedLoopSafepoints and most performance regressions > due to -XX:+UseCountedLoopSafepoints gone. The exception is when an > inner counted loop runs for a low number of iterations on average (and > the compiler doesn't have an upper bound on the number of iteration). > > This is enabled on the command line with: > -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 > > In PhaseIdealLoop::is_counted_loop(), when loop strip mining is enabled, > for an inner loop, the compiler builds a skeleton outer loop around the > the counted loop. The outer loop is kept as simple as possible so > required adjustments to the existing loop optimizations are not too > intrusive. The reason the outer loop is inserted early in the > optimization process is so that optimizations are not disrupted: an > alternate implementation could have kept the safepoint in the counted > loop until loop opts are over and then only have added the outer loop > and moved the safepoint to the outer loop. That would have prevented > nodes that are referenced in the safepoint to be sunk out of loop for > instance. > > The outer loop is a LoopNode with a backedge to a loop exit test and a > safepoint. The loop exit test is a CmpI with a new Opaque5Node. The > skeleton loop is populated with all required Phis after loop opts are > over during macro expansion. At that point only, the loop exit tests are > adjusted so the inner loop runs for at most LoopStripMiningIter. If the > compiler can prove the inner loop runs for no more than > LoopStripMiningIter then during macro expansion, the outer loop is > removed. The safepoint is removed only if the inner loop executes for > less than LoopStripMiningIterShortLoop so that if there are several > counted loops in a raw, we still poll for safepoints regularly. > > Until macro expansion, there can be only a few extra nodes in the outer > loop: nodes that would have sunk out of the inner loop and be kept in > the outer loop by the safepoint. > > PhaseIdealLoop::clone_loop() which is used by most loop opts has now > several ways of cloning a counted loop. For loop unswitching, both inner > and outer loops need to be cloned. For unrolling, only the inner loop > needs to be cloned. For pre/post loops insertion, only the inner loop > needs to be cloned but the control flow must connect one of the inner > loop copies to the outer loop of the other copy. > > Beyond verifying performance results with the usual benchmarks, when I > implemented that change, I wrote test cases for (hopefully) every loop > optimization and verified by inspection of the generated code that the > loop opt triggers correct with loop strip mining. > > Roland. From rwestrel at redhat.com Wed Oct 11 13:53:59 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 11 Oct 2017 15:53:59 +0200 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com> References: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com> Message-ID: > I have started reviewing and testing I will sponsor your change when the > full review is completed. Thanks! Roland. From dmitry.chuyko at bell-sw.com Wed Oct 11 16:30:54 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Wed, 11 Oct 2017 19:30:54 +0300 Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic Message-ID: Hello, Please review an improvement of CRC32 calculation on AArch64. MacroAssembler::kernel_crc32 gets table registers that are not used on -XX:+UseCRC32 path. They can be used to make neighbor loads and CRC calculations independent. Adding prologue and epilogue for main by-64 loop makes it applicable starting from len=128 so additional by-32 loop is added for smaller lengths. rfe: https://bugs.openjdk.java.net/browse/JDK-8189176 webrev: http://cr.openjdk.java.net/~dchuyko/8189176/webrev.00/ benchmark: http://cr.openjdk.java.net/~dchuyko/8189176/crc32/CRC32Bench.java Results for T88 and A53 are good, but splitting pair loads may slow down other CPUs so measurements on different HW are highly welcome. -Dmitry From igor.veresov at oracle.com Wed Oct 11 18:01:23 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 11 Oct 2017 11:01:23 -0700 Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo consolidation Message-ID: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com> This is to make mx-base project generation work again. Webrev: http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/ Thanks, igor From dean.long at oracle.com Wed Oct 11 21:58:00 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 11 Oct 2017 14:58:00 -0700 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> Message-ID: <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com> For AARCH64 in templateTable_arm.cpp, how about using the same code as generate_deopt_entry_for? ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP ? __ restore_stack_top(); dl On 10/11/17 5:48 AM, jamsheed wrote: > Hi Vladimir, > > Thank you for pointing this. > > revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ > > Best Regards, > > Jamsheed > > > On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote: >> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only >> 32-bit affected? >> >> Thanks, >> Vladimir >> >> On 9/13/17 11:54 PM, Dean Long wrote: >>> It looks like you accidentally dropped >>> hotspot-compiler-dev at openjdk.java.net when you added runtime. >>> >>> dl >>> >>> >>> On 9/13/2017 11:21 PM, jamsheed wrote: >>>> (adding runtime list for inputs) >>>> >>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>>> brief desc: special handling of Object. in >>>>> TemplateInterpreter::deopt_reexecute_entry >>>>> >>>>> required last_sp to be reset explicitly in normal return path >>>>> >>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, >>>>> address bcp) { >>>>> ? assert(method->contains(bcp), "just checkin'"); >>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>>> ? if (code == Bytecodes::_return) { >>>>> ??? // This is used for deopt during registration of finalizers >>>>> ??? // during Object..? We simply need to resume execution at >>>>> ??? // the standard return vtos bytecode to pop the frame normally. >>>>> ??? // reexecuting the real bytecode would cause double registration >>>>> ??? // of the finalizable object. >>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>>> >>>> last_sp ! = null not an issue for this case, so i skip the assert >>>> in debug build >>>> >>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>>> >>>> Please review. >>>> >>>> Best Regards, >>>> Jamsheed >>>> >>>> >>>> >>>> >>>> >>> > From dean.long at oracle.com Wed Oct 11 22:21:18 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 11 Oct 2017 15:21:18 -0700 Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo consolidation In-Reply-To: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com> References: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com> Message-ID: <6bca0cf3-ecf3-f201-c13b-65e0c6cee11e@oracle.com> Looks reasonable. dl On 10/11/17 11:01 AM, Igor Veresov wrote: > This is to make mx-base project generation work again. > > Webrev: http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/ > > Thanks, > igor From igor.veresov at oracle.com Wed Oct 11 23:18:13 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 11 Oct 2017 16:18:13 -0700 Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo consolidation In-Reply-To: <6bca0cf3-ecf3-f201-c13b-65e0c6cee11e@oracle.com> References: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com> <6bca0cf3-ecf3-f201-c13b-65e0c6cee11e@oracle.com> Message-ID: Thanks, Dean! igor > On Oct 11, 2017, at 3:21 PM, dean.long at oracle.com wrote: > > Looks reasonable. > > dl > > > On 10/11/17 11:01 AM, Igor Veresov wrote: >> This is to make mx-base project generation work again. >> >> Webrev: http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/ >> >> Thanks, >> igor > From vladimir.kozlov at oracle.com Wed Oct 11 23:20:07 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Oct 2017 16:20:07 -0700 Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo consolidation In-Reply-To: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com> References: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com> Message-ID: <879c4551-7ec3-a68e-a62e-9fc7f6499817@oracle.com> Looks good. Thanks, Vladimir On 10/11/17 11:01 AM, Igor Veresov wrote: > This is to make mx-base project generation work again. > > Webrev: http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/ > > Thanks, > igor > From tobias.hartmann at oracle.com Thu Oct 12 10:04:21 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 12 Oct 2017 12:04:21 +0200 Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out == prev || prev == __null) failed: no branches off of store slice" Message-ID: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8189067 http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/ The problem is in the C2 optimization that moves stores out of a loop [1]. We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we may end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and also affects performance of the generated code due to double execution of the same store (see details in the bug comments). My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion that this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we create a clone of the store and connect it to the load but this store is not connected to the rest of the memory graph, i.e. the memory effect of the store is not propagated. Although this may not cause incorrect execution (at least we were not able to trigger that), it may cause problems if other optimizations kick in and in some cases we still end up with the same store being executed twice. We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now (this also affects JDK 9). Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8080289 From jamsheed.c.m at oracle.com Thu Oct 12 10:33:34 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Thu, 12 Oct 2017 16:03:34 +0530 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com> Message-ID: <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com> Dean, Thank you for the review, yes there is check for extended sp equality too. made the change http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ Best regards, Jamsheed On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote: > For AARCH64 in templateTable_arm.cpp, how about using the same code as > generate_deopt_entry_for? > > ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP > ? __ restore_stack_top(); > > > dl > > On 10/11/17 5:48 AM, jamsheed wrote: >> Hi Vladimir, >> >> Thank you for pointing this. >> >> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ >> >> Best Regards, >> >> Jamsheed >> >> >> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote: >>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only >>> 32-bit affected? >>> >>> Thanks, >>> Vladimir >>> >>> On 9/13/17 11:54 PM, Dean Long wrote: >>>> It looks like you accidentally dropped >>>> hotspot-compiler-dev at openjdk.java.net when you added runtime. >>>> >>>> dl >>>> >>>> >>>> On 9/13/2017 11:21 PM, jamsheed wrote: >>>>> (adding runtime list for inputs) >>>>> >>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>>>> brief desc: special handling of Object. in >>>>>> TemplateInterpreter::deopt_reexecute_entry >>>>>> >>>>>> required last_sp to be reset explicitly in normal return path >>>>>> >>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* >>>>>> method, address bcp) { >>>>>> ? assert(method->contains(bcp), "just checkin'"); >>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>>>> ? if (code == Bytecodes::_return) { >>>>>> ??? // This is used for deopt during registration of finalizers >>>>>> ??? // during Object..? We simply need to resume execution at >>>>>> ??? // the standard return vtos bytecode to pop the frame normally. >>>>>> ??? // reexecuting the real bytecode would cause double registration >>>>>> ??? // of the finalizable object. >>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>>>> >>>>> last_sp ! = null not an issue for this case, so i skip the assert >>>>> in debug build >>>>> >>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>>>> >>>>> Please review. >>>>> >>>>> Best Regards, >>>>> Jamsheed >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >> > From muthusamy.chinnathambi at oracle.com Thu Oct 12 10:46:04 2017 From: muthusamy.chinnathambi at oracle.com (Muthusamy Chinnathambi) Date: Thu, 12 Oct 2017 03:46:04 -0700 (PDT) Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev In-Reply-To: <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> References: <30dbb109-c259-4529-b846-e4afffc94bd0@default> <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> Message-ID: May I please get a second review for the change. Regards, Muthusamy C -----Original Message----- From: Vladimir Ivanov Sent: Wednesday, October 11, 2017 5:29 PM To: Muthusamy Chinnathambi Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev Looks good. Best regards, Vladimir Ivanov On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote: > Hi, > > Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev > > Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes. > > Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/ > jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175 > Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873 > > Test: Run jtreg and jprt hotspot testsets. > > Regards, > Muthusamy C > From erik.osterlund at oracle.com Thu Oct 12 13:25:52 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 12 Oct 2017 15:25:52 +0200 Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev In-Reply-To: References: <30dbb109-c259-4529-b846-e4afffc94bd0@default> <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> Message-ID: <59DF6D60.8050501@oracle.com> Hi Muthusamy, Looks good. But... In PreserveFPRegistersTest.java:54 54: long regionSize = 1_000_000; //WB.g1RegionSize(); Was it intentional to hard code the region size to 1 000 000? Thanks, /Erik On 2017-10-12 12:46, Muthusamy Chinnathambi wrote: > May I please get a second review for the change. > > Regards, > Muthusamy C > > -----Original Message----- > From: Vladimir Ivanov > Sent: Wednesday, October 11, 2017 5:29 PM > To: Muthusamy Chinnathambi > Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler > Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev > > Looks good. > > Best regards, > Vladimir Ivanov > > On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote: >> Hi, >> >> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev >> >> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes. >> >> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/ >> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175 >> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873 >> >> Test: Run jtreg and jprt hotspot testsets. >> >> Regards, >> Muthusamy C >> From rwestrel at redhat.com Thu Oct 12 14:12:10 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 12 Oct 2017 16:12:10 +0200 Subject: RFR(S): 8186125: "DU iteration must converge quickly" assert in split if with unsafe accesses Message-ID: http://cr.openjdk.java.net/~roland/8186125/webrev.00/ Split if is missing support for graph shapes with the Opaque4Node that was introduced for unsafe accesses by JDK-8176506. In the test case, the 2 Unsafe accesses share a single Opaque4Node before the if. When split if encounters the Cmp->Bol->Opaque4->If chain, it only tries to clone Cmp->Bol when it should clone Cmp->Bol->Opaque4 to make one copy for each If. Roland. From tobias.hartmann at oracle.com Thu Oct 12 14:24:42 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 12 Oct 2017 16:24:42 +0200 Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out == prev || prev == __null) failed: no branches off of store slice" In-Reply-To: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> Message-ID: <65e2a138-cbf9-cff9-ce73-f6caecd86852@oracle.com> I forgot to mention that my fix also re-enables UseSubwordForMaxVector which was disabled due to JDK-8184995 [1] which turned out to be a duplicate of this issue and is not caused by UseSubwordForMaxVector. [1] https://bugs.openjdk.java.net/browse/JDK-8184995 On 12.10.2017 12:04, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8189067 > http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/ > > The problem is in the C2 optimization that moves stores out of a loop [1]. > > We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up > the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we may > end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and also > affects performance of the generated code due to double execution of the same store (see details in the bug comments). > > My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and > reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion that > this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we create a > clone of the store and connect it to the load but this store is not connected to the rest of the memory graph, i.e. the > memory effect of the store is not propagated. Although this may not cause incorrect execution (at least we were not able > to trigger that), it may cause problems if other optimizations kick in and in some cases we still end up with the same > store being executed twice. > > We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without > creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the > loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If > people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now (this > also affects JDK 9). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8080289 From vladimir.kozlov at oracle.com Thu Oct 12 17:33:31 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Oct 2017 10:33:31 -0700 Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out == prev || prev == __null) failed: no branches off of store slice" In-Reply-To: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> Message-ID: <837b748a-9bcc-00a4-9c42-660bfbf76902@oracle.com> Good. I think we should leave this conservative fix without optimizing it. We should not spend a lot of time optimizing C2 now. Thanks, Vladimir On 10/12/17 3:04 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8189067 > http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/ > > The problem is in the C2 optimization that moves stores out of a loop [1]. > > We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up > the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we may > end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and also > affects performance of the generated code due to double execution of the same store (see details in the bug comments). > > My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and > reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion that > this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we create a > clone of the store and connect it to the load but this store is not connected to the rest of the memory graph, i.e. the > memory effect of the store is not propagated. Although this may not cause incorrect execution (at least we were not able > to trigger that), it may cause problems if other optimizations kick in and in some cases we still end up with the same > store being executed twice. > > We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without > creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the > loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If > people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now (this > also affects JDK 9). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8080289 From vladimir.kozlov at oracle.com Thu Oct 12 17:49:46 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Oct 2017 10:49:46 -0700 Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev In-Reply-To: References: <30dbb109-c259-4529-b846-e4afffc94bd0@default> <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> Message-ID: Why do you need to add test explicitly to hotspot_compiler group? It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be used in all other testing. Did you check that the test is executed without you modifying TEST.groups? Thanks, Vladimir K On 10/12/17 3:46 AM, Muthusamy Chinnathambi wrote: > May I please get a second review for the change. > > Regards, > Muthusamy C > > -----Original Message----- > From: Vladimir Ivanov > Sent: Wednesday, October 11, 2017 5:29 PM > To: Muthusamy Chinnathambi > Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler > Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev > > Looks good. > > Best regards, > Vladimir Ivanov > > On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote: >> Hi, >> >> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev >> >> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes. >> >> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/ >> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175 >> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873 >> >> Test: Run jtreg and jprt hotspot testsets. >> >> Regards, >> Muthusamy C >> From dean.long at oracle.com Thu Oct 12 18:21:29 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 12 Oct 2017 11:21:29 -0700 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com> <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com> Message-ID: Looks good. dl On 10/12/17 3:33 AM, jamsheed wrote: > Dean, > > Thank you for the review, yes there is check for extended sp equality > too. made the change > > http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ > > Best regards, > > Jamsheed > > > On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote: >> For AARCH64 in templateTable_arm.cpp, how about using the same code >> as generate_deopt_entry_for? >> >> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP >> ? __ restore_stack_top(); >> >> >> dl >> >> On 10/11/17 5:48 AM, jamsheed wrote: >>> Hi Vladimir, >>> >>> Thank you for pointing this. >>> >>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ >>> >>> Best Regards, >>> >>> Jamsheed >>> >>> >>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote: >>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only >>>> 32-bit affected? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 9/13/17 11:54 PM, Dean Long wrote: >>>>> It looks like you accidentally dropped >>>>> hotspot-compiler-dev at openjdk.java.net when you added runtime. >>>>> >>>>> dl >>>>> >>>>> >>>>> On 9/13/2017 11:21 PM, jamsheed wrote: >>>>>> (adding runtime list for inputs) >>>>>> >>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>>>>> brief desc: special handling of Object. in >>>>>>> TemplateInterpreter::deopt_reexecute_entry >>>>>>> >>>>>>> required last_sp to be reset explicitly in normal return path >>>>>>> >>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* >>>>>>> method, address bcp) { >>>>>>> ? assert(method->contains(bcp), "just checkin'"); >>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>>>>> ? if (code == Bytecodes::_return) { >>>>>>> ??? // This is used for deopt during registration of finalizers >>>>>>> ??? // during Object..? We simply need to resume execution at >>>>>>> ??? // the standard return vtos bytecode to pop the frame normally. >>>>>>> ??? // reexecuting the real bytecode would cause double >>>>>>> registration >>>>>>> ??? // of the finalizable object. >>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>>>>> >>>>>> last_sp ! = null not an issue for this case, so i skip the assert >>>>>> in debug build >>>>>> >>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>>>>> >>>>>> Please review. >>>>>> >>>>>> Best Regards, >>>>>> Jamsheed >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>> >> > From vladimir.kozlov at oracle.com Thu Oct 12 18:50:45 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Oct 2017 11:50:45 -0700 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com> <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com> Message-ID: <8f40236b-c8ff-4ae5-2e9a-89de588f610e@oracle.com> +1 Thanks, Vladimir On 10/12/17 11:21 AM, dean.long at oracle.com wrote: > Looks good. > > dl > > > On 10/12/17 3:33 AM, jamsheed wrote: >> Dean, >> >> Thank you for the review, yes there is check for extended sp equality too. made the change >> >> http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ >> >> Best regards, >> >> Jamsheed >> >> >> On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote: >>> For AARCH64 in templateTable_arm.cpp, how about using the same code as generate_deopt_entry_for? >>> >>> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP >>> ? __ restore_stack_top(); >>> >>> >>> dl >>> >>> On 10/11/17 5:48 AM, jamsheed wrote: >>>> Hi Vladimir, >>>> >>>> Thank you for pointing this. >>>> >>>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ >>>> >>>> Best Regards, >>>> >>>> Jamsheed >>>> >>>> >>>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote: >>>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 32-bit affected? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 9/13/17 11:54 PM, Dean Long wrote: >>>>>> It looks like you accidentally dropped hotspot-compiler-dev at openjdk.java.net when you added runtime. >>>>>> >>>>>> dl >>>>>> >>>>>> >>>>>> On 9/13/2017 11:21 PM, jamsheed wrote: >>>>>>> (adding runtime list for inputs) >>>>>>> >>>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>>>>>> brief desc: special handling of Object. in TemplateInterpreter::deopt_reexecute_entry >>>>>>>> >>>>>>>> required last_sp to be reset explicitly in normal return path >>>>>>>> >>>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, address bcp) { >>>>>>>> ? assert(method->contains(bcp), "just checkin'"); >>>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>>>>>> ? if (code == Bytecodes::_return) { >>>>>>>> ??? // This is used for deopt during registration of finalizers >>>>>>>> ??? // during Object..? We simply need to resume execution at >>>>>>>> ??? // the standard return vtos bytecode to pop the frame normally. >>>>>>>> ??? // reexecuting the real bytecode would cause double registration >>>>>>>> ??? // of the finalizable object. >>>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>>>>>> >>>>>>> last_sp ! = null not an issue for this case, so i skip the assert in debug build >>>>>>> >>>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>>>>>> >>>>>>> Please review. >>>>>>> >>>>>>> Best Regards, >>>>>>> Jamsheed >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >> > From vladimir.kozlov at oracle.com Thu Oct 12 19:22:08 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Oct 2017 12:22:08 -0700 Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler dying subgraph with single if proj In-Reply-To: References: Message-ID: Yes, it is reasonable fix. We have other places where we check If's node outcnt(). May be move the check up to the method's beginning above Opcode() call which is virtual. Thanks, Vladimir On 10/2/17 4:46 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8188223/webrev.00/ > > I saw the following crash (that I cannot reproduce anymore having > deleted the replay file by mistake). > > With subgraph shape: > > UNC->Region->IfProj->RangeCheck > > The region has the IfProj as single input. The following code in > RegionNode::Ideal(): > > if (can_reshape && cnt == 1) { > // Is it dead loop? > // If it is LoopNopde it had 2 (+1 itself) inputs and > // one of them was cut. The loop is dead if it was EntryContol. > // Loop node may have only one input because entry path > // is removed in PhaseIdealLoop::Dominators(). > assert(!this->is_Loop() || cnt_orig <= 3, "Loop node should have 3 or less inputs"); > if ((this->is_Loop() && (del_it == LoopNode::EntryControl || > (del_it == 0 && is_unreachable_region(phase)))) || > (!this->is_Loop() && has_phis && is_unreachable_region(phase))) { > > finds that the subgraph is unreachable which causes the IfProj to be > removed. RangeCheckNode::Ideal() is later called on a dominated range > check which walks the graph, hit the RangeCheck that has a single > projection and causes a crash. > > I think it makes sense to make IfNode::range_check_trap_proj() handle > the case of a RangeCheckNode with a single input. > > Roland. > From dean.long at oracle.com Thu Oct 12 19:46:27 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 12 Oct 2017 12:46:27 -0700 Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp() overhead Message-ID: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8189244 http://cr.openjdk.java.net/~dlong/8189244/webrev The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but the compiler cannot completely eliminate the code because of the virtual call to is_compiled() that could have side-effects. We can fix the problem by wrapping the whole thing in #ifdef ASSERT. This change reduces the size of libjvm.so by almost 2K, and the size of frame::sender() by 8%. dl -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Oct 12 21:13:15 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Oct 2017 14:13:15 -0700 Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp() overhead In-Reply-To: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com> References: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com> Message-ID: <35de2110-96f0-93f0-5444-94c695459e41@oracle.com> Nice find. Thanks, Vladimir On 10/12/17 12:46 PM, dean.long at oracle.com wrote: > https://bugs.openjdk.java.net/browse/JDK-8189244 > > http://cr.openjdk.java.net/~dlong/8189244/webrev > > The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but the compiler cannot completely eliminate the code > because of the virtual call to is_compiled() that could have side-effects. We can fix the problem by wrapping the whole > thing in #ifdef ASSERT. > > This change reduces the size of libjvm.so by almost 2K, and the size of frame::sender() by 8%. > > dl From dean.long at oracle.com Thu Oct 12 21:25:37 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 12 Oct 2017 14:25:37 -0700 Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp() overhead In-Reply-To: <35de2110-96f0-93f0-5444-94c695459e41@oracle.com> References: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com> <35de2110-96f0-93f0-5444-94c695459e41@oracle.com> Message-ID: Thanks Vladimir. dl On 10/12/17 2:13 PM, Vladimir Kozlov wrote: > Nice find. > > Thanks, > Vladimir > > On 10/12/17 12:46 PM, dean.long at oracle.com wrote: >> https://bugs.openjdk.java.net/browse/JDK-8189244 >> >> http://cr.openjdk.java.net/~dlong/8189244/webrev >> >> The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but >> the compiler cannot completely eliminate the code because of the >> virtual call to is_compiled() that could have side-effects. We can >> fix the problem by wrapping the whole thing in #ifdef ASSERT. >> >> This change reduces the size of libjvm.so by almost 2K, and the size >> of frame::sender() by 8%. >> >> dl From jamsheed.c.m at oracle.com Fri Oct 13 05:38:46 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Fri, 13 Oct 2017 11:08:46 +0530 Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE: InterpreterMacroAssembler::call_VM_base: last_sp != NULL In-Reply-To: <8f40236b-c8ff-4ae5-2e9a-89de588f610e@oracle.com> References: <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com> <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com> <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com> <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com> <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com> <8f40236b-c8ff-4ae5-2e9a-89de588f610e@oracle.com> Message-ID: <37326ba7-f520-ca96-bdc2-31c8fe99a52d@oracle.com> Thanks for the review, Dean, Vladimir Best regards, Jamsheed On Friday 13 October 2017 12:20 AM, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir > > On 10/12/17 11:21 AM, dean.long at oracle.com wrote: >> Looks good. >> >> dl >> >> >> On 10/12/17 3:33 AM, jamsheed wrote: >>> Dean, >>> >>> Thank you for the review, yes there is check for extended sp >>> equality too. made the change >>> >>> http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ >>> >>> Best regards, >>> >>> Jamsheed >>> >>> >>> On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote: >>>> For AARCH64 in templateTable_arm.cpp, how about using the same code >>>> as generate_deopt_entry_for? >>>> >>>> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP >>>> ? __ restore_stack_top(); >>>> >>>> >>>> dl >>>> >>>> On 10/11/17 5:48 AM, jamsheed wrote: >>>>> Hi Vladimir, >>>>> >>>>> Thank you for pointing this. >>>>> >>>>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/ >>>>> >>>>> Best Regards, >>>>> >>>>> Jamsheed >>>>> >>>>> >>>>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote: >>>>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only >>>>>> 32-bit affected? >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 9/13/17 11:54 PM, Dean Long wrote: >>>>>>> It looks like you accidentally dropped >>>>>>> hotspot-compiler-dev at openjdk.java.net when you added runtime. >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> >>>>>>> On 9/13/2017 11:21 PM, jamsheed wrote: >>>>>>>> (adding runtime list for inputs) >>>>>>>> >>>>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote: >>>>>>>>> brief desc: special handling of Object. in >>>>>>>>> TemplateInterpreter::deopt_reexecute_entry >>>>>>>>> >>>>>>>>> required last_sp to be reset explicitly in normal return path >>>>>>>>> >>>>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* >>>>>>>>> method, address bcp) { >>>>>>>>> ? assert(method->contains(bcp), "just checkin'"); >>>>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp); >>>>>>>>> ? if (code == Bytecodes::_return) { >>>>>>>>> ??? // This is used for deopt during registration of finalizers >>>>>>>>> ??? // during Object..? We simply need to resume >>>>>>>>> execution at >>>>>>>>> ??? // the standard return vtos bytecode to pop the frame >>>>>>>>> normally. >>>>>>>>> ??? // reexecuting the real bytecode would cause double >>>>>>>>> registration >>>>>>>>> ??? // of the finalizable object. >>>>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); >>>>>>>> >>>>>>>> last_sp ! = null not an issue for this case, so i skip the >>>>>>>> assert in debug build >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/ >>>>>>>> >>>>>>>> Please review. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> From tobias.hartmann at oracle.com Fri Oct 13 06:08:56 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 13 Oct 2017 08:08:56 +0200 Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out == prev || prev == __null) failed: no branches off of store slice" In-Reply-To: <837b748a-9bcc-00a4-9c42-660bfbf76902@oracle.com> References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> <837b748a-9bcc-00a4-9c42-660bfbf76902@oracle.com> Message-ID: <78d1643c-a89d-7453-01d4-1b3cbd33d5e4@oracle.com> Hi Vladimir, thanks for the review! On 12.10.2017 19:33, Vladimir Kozlov wrote: > Good. I think we should leave this conservative fix without optimizing it. We should not spend a lot of time optimizing > C2 now. Okay, that's fine with me. I anyone wants to optimize that later, he/she can file an RFE. Best regards, Tobias > On 10/12/17 3:04 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8189067 >> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/ >> >> The problem is in the C2 optimization that moves stores out of a loop [1]. >> >> We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up >> the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we >> may end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and >> also affects performance of the generated code due to double execution of the same store (see details in the bug >> comments). >> >> My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and >> reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion >> that this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we >> create a clone of the store and connect it to the load but this store is not connected to the rest of the memory >> graph, i.e. the memory effect of the store is not propagated. Although this may not cause incorrect execution (at >> least we were not able to trigger that), it may cause problems if other optimizations kick in and in some cases we >> still end up with the same store being executed twice. >> >> We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without >> creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the >> loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If >> people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now >> (this also affects JDK 9). >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8080289 From rwestrel at redhat.com Fri Oct 13 07:14:08 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 13 Oct 2017 09:14:08 +0200 Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out == prev || prev == __null) failed: no branches off of store slice" In-Reply-To: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> Message-ID: > http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/ That looks good to me. Roland. From tobias.hartmann at oracle.com Fri Oct 13 07:15:06 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 13 Oct 2017 09:15:06 +0200 Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out == prev || prev == __null) failed: no branches off of store slice" In-Reply-To: References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com> Message-ID: <7dbb7e2e-6f1a-9607-5284-8deab79a647a@oracle.com> Thanks Roland! Best regards, Tobias On 13.10.2017 09:14, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/ > > That looks good to me. > > Roland. > From kevin.walls at oracle.com Fri Oct 13 09:25:01 2017 From: kevin.walls at oracle.com (Kevin Walls) Date: Fri, 13 Oct 2017 10:25:01 +0100 Subject: [8u] RFF(S): 8164954: split_if creates empty phi and region nodes Message-ID: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com> Hi, I'd like to get a review of a backport to 8u. bug: https://bugs.openjdk.java.net/browse/JDK-8164954 9 changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/38f38c10a11d Review thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025773.html It doesn't hg import cleanly as the surrounding code is a little different: this change adds a condition in split_if() which may make that method return earlier, but 8u does not have the block after the change, beginning "if (nb_predicate_proj > 1) {", that comes in with 8078426. The 8u change has been through jprt testing and also tested with the testsuite of a Java-based product which was seen hitting the same assert as in this bug.? hg diff of the proposed 8u change is below, I think that's enough but can offer a webrev if anybody needs one. Thanks! Kevin bash-4.2$ hg diff src/share/vm/opto/ifnode.cpp diff -r c89173159237 src/share/vm/opto/ifnode.cpp --- a/src/share/vm/opto/ifnode.cpp????? Thu Sep 07 10:15:21 2017 -0400 +++ b/src/share/vm/opto/ifnode.cpp????? Fri Oct 13 02:03:00 2017 -0700 @@ -234,6 +234,13 @@ ?????? predicate_proj = proj; ???? } ?? } + +? // If all the defs of the phi are the same constant, we already have the desired end state. +? // Skip the split that would create empty phi and region nodes. +? if((r->req() - req_c) == 1) { +??? return NULL; +? } + ?? Node* predicate_c = NULL; ?? Node* predicate_x = NULL; ?? bool counted_loop = r->is_CountedLoop(); -------------- next part -------------- An HTML attachment was scrubbed... URL: From muthusamy.chinnathambi at oracle.com Fri Oct 13 10:53:58 2017 From: muthusamy.chinnathambi at oracle.com (Muthusamy Chinnathambi) Date: Fri, 13 Oct 2017 03:53:58 -0700 (PDT) Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev In-Reply-To: References: <30dbb109-c259-4529-b846-e4afffc94bd0@default> <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> Message-ID: <91ca5a56-8d20-4714-8b09-c767574af4ae@default> Hi Vladimir, > Why do you need to add test explicitly to hotspot_compiler group? > It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be used in all other > testing. You are right, it should get picked implicitly as part of compact1_minimal group. > Did you check that the test is executed without you modifying TEST.groups? Now - yes. Without my TEST.groups modification the test gets executed. I will drop the change in TEST.groups file. Please note, this request is only for 8u. Regards, Muthusamy C -----Original Message----- From: Vladimir Kozlov Sent: Thursday, October 12, 2017 11:20 PM To: Muthusamy Chinnathambi ; hotspot compiler ; hotspot-gc-dev at openjdk.java.net Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev Why do you need to add test explicitly to hotspot_compiler group? It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be used in all other testing. Did you check that the test is executed without you modifying TEST.groups? Thanks, Vladimir K On 10/12/17 3:46 AM, Muthusamy Chinnathambi wrote: > May I please get a second review for the change. > > Regards, > Muthusamy C > > -----Original Message----- > From: Vladimir Ivanov > Sent: Wednesday, October 11, 2017 5:29 PM > To: Muthusamy Chinnathambi > Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler > Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev > > Looks good. > > Best regards, > Vladimir Ivanov > > On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote: >> Hi, >> >> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev >> >> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes. >> >> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/ >> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175 >> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873 >> >> Test: Run jtreg and jprt hotspot testsets. >> >> Regards, >> Muthusamy C >> From claes.redestad at oracle.com Fri Oct 13 12:08:19 2017 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 13 Oct 2017 14:08:19 +0200 Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp() overhead In-Reply-To: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com> References: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com> Message-ID: <3b0da201-1c3a-fed1-1a10-c59af9476c4e@oracle.com> Hi Dean, you asked me to do a quick check if this helps Exception/stack walk performance: Benchmark????????????????? Mode? Cnt? Score?? Error?? Units Throw.throwSyncException? thrpt? 100? 0.803 ? 0.029? ops/us Throw.throwSyncException? thrpt? 100? 0.867 ? 0.028? ops/us?? # 8% ... thus a significant improvement! Startup is improved on some measures (total #instructions down 500k, significant) but not enough to be statistically significant on wall clock measures. /Claes On 2017-10-12 21:46, dean.long at oracle.com wrote: > > https://bugs.openjdk.java.net/browse/JDK-8189244 > > http://cr.openjdk.java.net/~dlong/8189244/webrev > > The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but > the compiler cannot completely eliminate the code because of the > virtual call to is_compiled() that could have side-effects. We can fix > the problem by wrapping the whole thing in #ifdef ASSERT. > > This change reduces the size of libjvm.so by almost 2K, and the size > of frame::sender() by 8%. > > dl -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Oct 13 17:53:11 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 13 Oct 2017 10:53:11 -0700 Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev In-Reply-To: <91ca5a56-8d20-4714-8b09-c767574af4ae@default> References: <30dbb109-c259-4529-b846-e4afffc94bd0@default> <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com> <91ca5a56-8d20-4714-8b09-c767574af4ae@default> Message-ID: Good. Thanks, Vladimir On 10/13/17 3:53 AM, Muthusamy Chinnathambi wrote: > Hi Vladimir, > >> Why do you need to add test explicitly to hotspot_compiler group? >> It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be used in all other >> testing. > You are right, it should get picked implicitly as part of compact1_minimal group. > >> Did you check that the test is executed without you modifying TEST.groups? > Now - yes. Without my TEST.groups modification the test gets executed. > > I will drop the change in TEST.groups file. > Please note, this request is only for 8u. > > Regards, > Muthusamy C > > -----Original Message----- > From: Vladimir Kozlov > Sent: Thursday, October 12, 2017 11:20 PM > To: Muthusamy Chinnathambi ; hotspot compiler ; hotspot-gc-dev at openjdk.java.net > Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev > > Why do you need to add test explicitly to hotspot_compiler group? > It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be > used in all other testing. Did you check that the test is executed without you modifying TEST.groups? > > Thanks, > Vladimir K > > On 10/12/17 3:46 AM, Muthusamy Chinnathambi wrote: >> May I please get a second review for the change. >> >> Regards, >> Muthusamy C >> >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Wednesday, October 11, 2017 5:29 PM >> To: Muthusamy Chinnathambi >> Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler >> Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev >> >> Looks good. >> >> Best regards, >> Vladimir Ivanov >> >> On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote: >>> Hi, >>> >>> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev >>> >>> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes. >>> >>> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/ >>> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175 >>> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873 >>> >>> Test: Run jtreg and jprt hotspot testsets. >>> >>> Regards, >>> Muthusamy C >>> From vladimir.kozlov at oracle.com Fri Oct 13 17:57:46 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 13 Oct 2017 10:57:46 -0700 Subject: [8u] RFF(S): 8164954: split_if creates empty phi and region nodes In-Reply-To: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com> References: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com> Message-ID: 8u change looks good. Thanks, Vladimir On 10/13/17 2:25 AM, Kevin Walls wrote: > Hi, > > I'd like to get a review of a backport to 8u. > > bug: https://bugs.openjdk.java.net/browse/JDK-8164954 > > 9 changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/38f38c10a11d > > Review thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025773.html > > > It doesn't hg import cleanly as the surrounding code is a little different: this change adds a condition in split_if() > which may make that method return earlier, but 8u does not have the block after the change, beginning "if > (nb_predicate_proj > 1) {", that comes in with 8078426. > > The 8u change has been through jprt testing and also tested with the testsuite of a Java-based product which was seen > hitting the same assert as in this bug.? hg diff of the proposed 8u change is below, I think that's enough but can offer > a webrev if anybody needs one. > > Thanks! > Kevin > > > bash-4.2$ hg diff src/share/vm/opto/ifnode.cpp > diff -r c89173159237 src/share/vm/opto/ifnode.cpp > --- a/src/share/vm/opto/ifnode.cpp????? Thu Sep 07 10:15:21 2017 -0400 > +++ b/src/share/vm/opto/ifnode.cpp????? Fri Oct 13 02:03:00 2017 -0700 > @@ -234,6 +234,13 @@ > ?????? predicate_proj = proj; > ???? } > ?? } > + > +? // If all the defs of the phi are the same constant, we already have the desired end state. > +? // Skip the split that would create empty phi and region nodes. > +? if((r->req() - req_c) == 1) { > +??? return NULL; > +? } > + > ?? Node* predicate_c = NULL; > ?? Node* predicate_x = NULL; > ?? bool counted_loop = r->is_CountedLoop(); > > From kevin.walls at oracle.com Fri Oct 13 22:04:16 2017 From: kevin.walls at oracle.com (Kevin Walls) Date: Fri, 13 Oct 2017 23:04:16 +0100 Subject: [8u] RFF(S): 8164954: split_if creates empty phi and region nodes In-Reply-To: References: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com> Message-ID: Thanks Vladimir! On 13/10/2017 18:57, Vladimir Kozlov wrote: > 8u change looks good. > > Thanks, > Vladimir > > On 10/13/17 2:25 AM, Kevin Walls wrote: >> Hi, >> >> I'd like to get a review of a backport to 8u. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8164954 >> >> 9 changeset: >> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/38f38c10a11d >> >> Review thread: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025773.html >> >> >> It doesn't hg import cleanly as the surrounding code is a little >> different: this change adds a condition in split_if() which may make >> that method return earlier, but 8u does not have the block after the >> change, beginning "if (nb_predicate_proj > 1) {", that comes in with >> 8078426. >> >> The 8u change has been through jprt testing and also tested with the >> testsuite of a Java-based product which was seen hitting the same >> assert as in this bug.? hg diff of the proposed 8u change is below, I >> think that's enough but can offer a webrev if anybody needs one. >> >> Thanks! >> Kevin >> >> >> bash-4.2$ hg diff src/share/vm/opto/ifnode.cpp >> diff -r c89173159237 src/share/vm/opto/ifnode.cpp >> --- a/src/share/vm/opto/ifnode.cpp????? Thu Sep 07 10:15:21 2017 -0400 >> +++ b/src/share/vm/opto/ifnode.cpp????? Fri Oct 13 02:03:00 2017 -0700 >> @@ -234,6 +234,13 @@ >> ??????? predicate_proj = proj; >> ????? } >> ??? } >> + >> +? // If all the defs of the phi are the same constant, we already >> have the desired end state. >> +? // Skip the split that would create empty phi and region nodes. >> +? if((r->req() - req_c) == 1) { >> +??? return NULL; >> +? } >> + >> ??? Node* predicate_c = NULL; >> ??? Node* predicate_x = NULL; >> ??? bool counted_loop = r->is_CountedLoop(); >> >> From rkennke at redhat.com Sat Oct 14 22:41:05 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 00:41:05 +0200 Subject: RFR: 8171853: Remove Shark compiler Message-ID: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. What I have done: grep -i -R shark src grep -i -R shark make grep -i -R shark doc grep -i -R shark doc and purged any reference to shark. Almost everything was straightforward. The only things I wasn't really sure of: - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope that's good? - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing to do. If not, what *would* be the right thing? Then of course I did: rm -rf src/hotspot/share/shark I also went through the build machinery and removed stuff related to Shark and LLVM libs. Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) I tested by building a regular x86 JVM and running JTREG tests. All looks fine. - I could not build zero because it seems broken because of the recent Atomic::* changes - I could not test any of the other arches that seemed to reference Shark (arm and sparc) Here's the full webrev: http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ Can I get a review on this? Thanks, Roman -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at redhat.com Sun Oct 15 20:20:17 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 22:20:17 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> Message-ID: Hi Adrian, > Please let me look at SPARC next week first before merging this. Thanks! Will wait for your feedback! > And thanks for notifying me that Zero is broken again *sigh*. It seems to be only a little thing. I have a fix that I'm currently testing. Will file another bug and an RFR soon. Thanks, Roman From david.holmes at oracle.com Sun Oct 15 20:48:23 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 06:48:23 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> Hi Roman, The build changes must be reviewed on build-dev - now cc'd. Thanks, David On 15/10/2017 8:41 AM, Roman Kennke wrote: > The JEP to remove the Shark compiler has received exclusively positive > feedback (JDK-8189173) on zero-dev. So here comes the big patch to > remove it. > > What I have done: > > grep -i -R shark src > grep -i -R shark make > grep -i -R shark doc > grep -i -R shark doc > > and purged any reference to shark. Almost everything was straightforward. > > The only things I wasn't really sure of: > > - in globals.hpp, I re-arranged the KIND_* bits to account for the gap > that removing KIND_SHARK left. I hope that's good? > - in relocInfo_zero.hpp I put a ShouldNotCallThis() in > pd_address_in_code(), I am not sure it is the right thing to do. If not, > what *would* be the right thing? > > Then of course I did: > > rm -rf src/hotspot/share/shark > > I also went through the build machinery and removed stuff related to > Shark and LLVM libs. > > Now the only references in the whole JDK tree to shark is a 'Shark Bay' > in a timezone file, and 'Wireshark' in some tests ;-) > > I tested by building a regular x86 JVM and running JTREG tests. All > looks fine. > > - I could not build zero because it seems broken because of the recent > Atomic::* changes > - I could not test any of the other arches that seemed to reference > Shark (arm and sparc) > > Here's the full webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ > > > Can I get a review on this? > > Thanks, Roman > From rkennke at redhat.com Sun Oct 15 21:01:42 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:01:42 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> Message-ID: <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Hi David, thanks! I'm uploading a 2nd revision of the patch that excludes the generated-configure.sh part, and adds a smallish Zero-related fix. http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ Thanks, Roman > Hi Roman, > > The build changes must be reviewed on build-dev - now cc'd. > > Thanks, > David > > On 15/10/2017 8:41 AM, Roman Kennke wrote: >> The JEP to remove the Shark compiler has received exclusively >> positive feedback (JDK-8189173) on zero-dev. So here comes the big >> patch to remove it. >> >> What I have done: >> >> grep -i -R shark src >> grep -i -R shark make >> grep -i -R shark doc >> grep -i -R shark doc >> >> and purged any reference to shark. Almost everything was >> straightforward. >> >> The only things I wasn't really sure of: >> >> - in globals.hpp, I re-arranged the KIND_* bits to account for the >> gap that removing KIND_SHARK left. I hope that's good? >> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >> pd_address_in_code(), I am not sure it is the right thing to do. If >> not, what *would* be the right thing? >> >> Then of course I did: >> >> rm -rf src/hotspot/share/shark >> >> I also went through the build machinery and removed stuff related to >> Shark and LLVM libs. >> >> Now the only references in the whole JDK tree to shark is a 'Shark >> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >> >> I tested by building a regular x86 JVM and running JTREG tests. All >> looks fine. >> >> - I could not build zero because it seems broken because of the >> recent Atomic::* changes >> - I could not test any of the other arches that seemed to reference >> Shark (arm and sparc) >> >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >> >> >> Can I get a review on this? >> >> Thanks, Roman >> From david.holmes at oracle.com Sun Oct 15 21:23:52 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:23:52 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> Message-ID: <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> Hi Roman, I've looked at all the changes for the build and hotspot and everything appears okay to me. Still need someone from compiler team and build team to sign off on this though. One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these includes would seem to be impossible: 38 #ifdef COMPILER1 39 #include "c1/c1_Runtime1.hpp" 40 #endif 41 #ifdef COMPILER2 42 #include "opto/runtime.hpp" 43 #endif no? In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment entirely as it's obviously C2: if (is_c2_compile(comp_level)) { // C2 Ditto in src/hotspot/share/compiler/compileBroker.cpp ! // C2 make_thread(name_buffer, _c2_compile_queue, counters, _compilers[1], compiler_thread, CHECK); Thanks, David ----- On 16/10/2017 6:48 AM, David Holmes wrote: > Hi Roman, > > The build changes must be reviewed on build-dev - now cc'd. > > Thanks, > David > > On 15/10/2017 8:41 AM, Roman Kennke wrote: >> The JEP to remove the Shark compiler has received exclusively positive >> feedback (JDK-8189173) on zero-dev. So here comes the big patch to >> remove it. >> >> What I have done: >> >> grep -i -R shark src >> grep -i -R shark make >> grep -i -R shark doc >> grep -i -R shark doc >> >> and purged any reference to shark. Almost everything was straightforward. >> >> The only things I wasn't really sure of: >> >> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap >> that removing KIND_SHARK left. I hope that's good? >> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >> pd_address_in_code(), I am not sure it is the right thing to do. If >> not, what *would* be the right thing? >> >> Then of course I did: >> >> rm -rf src/hotspot/share/shark >> >> I also went through the build machinery and removed stuff related to >> Shark and LLVM libs. >> >> Now the only references in the whole JDK tree to shark is a 'Shark >> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >> >> I tested by building a regular x86 JVM and running JTREG tests. All >> looks fine. >> >> - I could not build zero because it seems broken because of the recent >> Atomic::* changes >> - I could not test any of the other arches that seemed to reference >> Shark (arm and sparc) >> >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >> >> >> Can I get a review on this? >> >> Thanks, Roman >> From david.holmes at oracle.com Sun Oct 15 21:25:04 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:25:04 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: On 16/10/2017 7:01 AM, Roman Kennke wrote: > Hi David, > > thanks! > > I'm uploading a 2nd revision of the patch that excludes the > generated-configure.sh part, and adds a smallish Zero-related fix. > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ > Can you point me to the exact change please as I don't want to re-examine it all. :) I'll pull this in and do a test build run internally. Thanks, David > Thanks, Roman > > >> Hi Roman, >> >> The build changes must be reviewed on build-dev - now cc'd. >> >> Thanks, >> David >> >> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>> The JEP to remove the Shark compiler has received exclusively >>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>> patch to remove it. >>> >>> What I have done: >>> >>> grep -i -R shark src >>> grep -i -R shark make >>> grep -i -R shark doc >>> grep -i -R shark doc >>> >>> and purged any reference to shark. Almost everything was >>> straightforward. >>> >>> The only things I wasn't really sure of: >>> >>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>> gap that removing KIND_SHARK left. I hope that's good? >>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>> pd_address_in_code(), I am not sure it is the right thing to do. If >>> not, what *would* be the right thing? >>> >>> Then of course I did: >>> >>> rm -rf src/hotspot/share/shark >>> >>> I also went through the build machinery and removed stuff related to >>> Shark and LLVM libs. >>> >>> Now the only references in the whole JDK tree to shark is a 'Shark >>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>> >>> I tested by building a regular x86 JVM and running JTREG tests. All >>> looks fine. >>> >>> - I could not build zero because it seems broken because of the >>> recent Atomic::* changes >>> - I could not test any of the other arches that seemed to reference >>> Shark (arm and sparc) >>> >>> Here's the full webrev: >>> >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>> >>> >>> Can I get a review on this? >>> >>> Thanks, Roman >>> > From david.holmes at oracle.com Sun Oct 15 21:29:33 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:29:33 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: Just spotted this: ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ David On 16/10/2017 7:25 AM, David Holmes wrote: > On 16/10/2017 7:01 AM, Roman Kennke wrote: >> Hi David, >> >> thanks! >> >> I'm uploading a 2nd revision of the patch that excludes the >> generated-configure.sh part, and adds a smallish Zero-related fix. >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >> > > Can you point me to the exact change please as I don't want to > re-examine it all. :) > > I'll pull this in and do a test build run internally. > > Thanks, > David > >> Thanks, Roman >> >> >>> Hi Roman, >>> >>> The build changes must be reviewed on build-dev - now cc'd. >>> >>> Thanks, >>> David >>> >>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>> The JEP to remove the Shark compiler has received exclusively >>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>>> patch to remove it. >>>> >>>> What I have done: >>>> >>>> grep -i -R shark src >>>> grep -i -R shark make >>>> grep -i -R shark doc >>>> grep -i -R shark doc >>>> >>>> and purged any reference to shark. Almost everything was >>>> straightforward. >>>> >>>> The only things I wasn't really sure of: >>>> >>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>>> gap that removing KIND_SHARK left. I hope that's good? >>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>> pd_address_in_code(), I am not sure it is the right thing to do. If >>>> not, what *would* be the right thing? >>>> >>>> Then of course I did: >>>> >>>> rm -rf src/hotspot/share/shark >>>> >>>> I also went through the build machinery and removed stuff related to >>>> Shark and LLVM libs. >>>> >>>> Now the only references in the whole JDK tree to shark is a 'Shark >>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>> >>>> I tested by building a regular x86 JVM and running JTREG tests. All >>>> looks fine. >>>> >>>> - I could not build zero because it seems broken because of the >>>> recent Atomic::* changes >>>> - I could not test any of the other arches that seemed to reference >>>> Shark (arm and sparc) >>>> >>>> Here's the full webrev: >>>> >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>> >>>> >>>> Can I get a review on this? >>>> >>>> Thanks, Roman >>>> >> From rkennke at redhat.com Sun Oct 15 21:31:51 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:31:51 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> Message-ID: <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com> Hi David, thanks for reviewing! > > One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these > includes would seem to be impossible: > > ? 38 #ifdef COMPILER1 > ? 39 #include "c1/c1_Runtime1.hpp" > ? 40 #endif > ? 41 #ifdef COMPILER2 > ? 42 #include "opto/runtime.hpp" > ? 43 #endif > > no? I have no idea. It is at least theoretically possible to have a platform with C1 and/or C2 support based on the Zero interpreter? I'm leaving that in for now as it was pre-existing and not related to Shark removal, ok? > > In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment > entirely as it's obviously C2: > > if (is_c2_compile(comp_level)) { // C2 > > Ditto in src/hotspot/share/compiler/compileBroker.cpp > > !???? // C2 > ????? make_thread(name_buffer, _c2_compile_queue, counters, > _compilers[1], compiler_thread, CHECK); Ok, right. For consistency, I also remove // C1 in ciEnv.cpp similarily obvious is_c1_compile() call :-) New webrev: http://cr.openjdk.java.net/~rkennke/8171853/webrev.02/ Roman From david.holmes at oracle.com Sun Oct 15 21:33:44 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:33:44 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com> Message-ID: <86c02492-ecf5-197b-7ca1-a411f68000c5@oracle.com> On 16/10/2017 7:31 AM, Roman Kennke wrote: > Hi David, > > thanks for reviewing! > >> >> One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these >> includes would seem to be impossible: >> >> ? 38 #ifdef COMPILER1 >> ? 39 #include "c1/c1_Runtime1.hpp" >> ? 40 #endif >> ? 41 #ifdef COMPILER2 >> ? 42 #include "opto/runtime.hpp" >> ? 43 #endif >> >> no? > > I have no idea. It is at least theoretically possible to have a platform > with C1 and/or C2 support based on the Zero interpreter? I'm leaving > that in for now as it was pre-existing and not related to Shark removal, > ok? Yep that's fine. Thanks. David >> >> In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment >> entirely as it's obviously C2: >> >> if (is_c2_compile(comp_level)) { // C2 >> >> Ditto in src/hotspot/share/compiler/compileBroker.cpp >> >> !???? // C2 >> ????? make_thread(name_buffer, _c2_compile_queue, counters, >> _compilers[1], compiler_thread, CHECK); > > Ok, right. For consistency, I also remove // C1 in ciEnv.cpp similarily > obvious is_c1_compile() call :-) > > New webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.02/ > > > Roman From rkennke at redhat.com Sun Oct 15 21:39:54 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:39:54 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com> Am 15.10.2017 um 23:25 schrieb David Holmes: > On 16/10/2017 7:01 AM, Roman Kennke wrote: >> Hi David, >> >> thanks! >> >> I'm uploading a 2nd revision of the patch that excludes the >> generated-configure.sh part, and adds a smallish Zero-related fix. >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >> > > Can you point me to the exact change please as I don't want to > re-examine it all. :) Oops, sorry. The diff between 00 and 01 is this (apart from generated-configure.sh): diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp --- a/src/hotspot/share/utilities/vmError.cpp +++ b/src/hotspot/share/utilities/vmError.cpp @@ -192,6 +192,7 @@ ???? st->cr(); ???? // Print the frames +??? StackFrameStream sfs(jt); ???? for(int i = 0; !sfs.is_done(); sfs.next(), i++) { ?????? sfs.current()->zero_print_on_error(i, st, buf, buflen); ?????? st->cr(); I.e. I added back the sfs variable that I accidentally removed in webrev.00. From david.holmes at oracle.com Sun Oct 15 21:44:04 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:44:04 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com> Message-ID: On 16/10/2017 7:39 AM, Roman Kennke wrote: > Am 15.10.2017 um 23:25 schrieb David Holmes: >> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>> Hi David, >>> >>> thanks! >>> >>> I'm uploading a 2nd revision of the patch that excludes the >>> generated-configure.sh part, and adds a smallish Zero-related fix. >>> >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>> >> >> Can you point me to the exact change please as I don't want to >> re-examine it all. :) > Oops, sorry. The diff between 00 and 01 is this (apart from > generated-configure.sh): > > diff --git a/src/hotspot/share/utilities/vmError.cpp > b/src/hotspot/share/utilities/vmError.cpp > --- a/src/hotspot/share/utilities/vmError.cpp > +++ b/src/hotspot/share/utilities/vmError.cpp > @@ -192,6 +192,7 @@ > ???? st->cr(); > > ???? // Print the frames > +??? StackFrameStream sfs(jt); > ???? for(int i = 0; !sfs.is_done(); sfs.next(), i++) { > ?????? sfs.current()->zero_print_on_error(i, st, buf, buflen); > ?????? st->cr(); > > I.e. I added back the sfs variable that I accidentally removed in > webrev.00. Looks good! David From rkennke at redhat.com Sun Oct 15 22:00:15 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 00:00:15 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: Ok, I fixed all the comments you mentioned. Differential (against webrev.01): http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ Full webrev: http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ Roman > Just spotted this: > > ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** > {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ > > David > > On 16/10/2017 7:25 AM, David Holmes wrote: >> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>> Hi David, >>> >>> thanks! >>> >>> I'm uploading a 2nd revision of the patch that excludes the >>> generated-configure.sh part, and adds a smallish Zero-related fix. >>> >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>> >> >> Can you point me to the exact change please as I don't want to >> re-examine it all. :) >> >> I'll pull this in and do a test build run internally. >> >> Thanks, >> David >> >>> Thanks, Roman >>> >>> >>>> Hi Roman, >>>> >>>> The build changes must be reviewed on build-dev - now cc'd. >>>> >>>> Thanks, >>>> David >>>> >>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>> The JEP to remove the Shark compiler has received exclusively >>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>>>> patch to remove it. >>>>> >>>>> What I have done: >>>>> >>>>> grep -i -R shark src >>>>> grep -i -R shark make >>>>> grep -i -R shark doc >>>>> grep -i -R shark doc >>>>> >>>>> and purged any reference to shark. Almost everything was >>>>> straightforward. >>>>> >>>>> The only things I wasn't really sure of: >>>>> >>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>>>> gap that removing KIND_SHARK left. I hope that's good? >>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>> If not, what *would* be the right thing? >>>>> >>>>> Then of course I did: >>>>> >>>>> rm -rf src/hotspot/share/shark >>>>> >>>>> I also went through the build machinery and removed stuff related >>>>> to Shark and LLVM libs. >>>>> >>>>> Now the only references in the whole JDK tree to shark is a 'Shark >>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>> >>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>> All looks fine. >>>>> >>>>> - I could not build zero because it seems broken because of the >>>>> recent Atomic::* changes >>>>> - I could not test any of the other arches that seemed to >>>>> reference Shark (arm and sparc) >>>>> >>>>> Here's the full webrev: >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>> >>>>> >>>>> Can I get a review on this? >>>>> >>>>> Thanks, Roman >>>>> >>> From david.holmes at oracle.com Sun Oct 15 22:08:52 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 08:08:52 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> Looks good. Thanks, David On 16/10/2017 8:00 AM, Roman Kennke wrote: > > Ok, I fixed all the comments you mentioned. > > Differential (against webrev.01): > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ > > Full webrev: > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ > > > Roman > >> Just spotted this: >> >> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >> >> David >> >> On 16/10/2017 7:25 AM, David Holmes wrote: >>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>> Hi David, >>>> >>>> thanks! >>>> >>>> I'm uploading a 2nd revision of the patch that excludes the >>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>> >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>> >>> >>> Can you point me to the exact change please as I don't want to >>> re-examine it all. :) >>> >>> I'll pull this in and do a test build run internally. >>> >>> Thanks, >>> David >>> >>>> Thanks, Roman >>>> >>>> >>>>> Hi Roman, >>>>> >>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>>>>> patch to remove it. >>>>>> >>>>>> What I have done: >>>>>> >>>>>> grep -i -R shark src >>>>>> grep -i -R shark make >>>>>> grep -i -R shark doc >>>>>> grep -i -R shark doc >>>>>> >>>>>> and purged any reference to shark. Almost everything was >>>>>> straightforward. >>>>>> >>>>>> The only things I wasn't really sure of: >>>>>> >>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>>>>> gap that removing KIND_SHARK left. I hope that's good? >>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>> If not, what *would* be the right thing? >>>>>> >>>>>> Then of course I did: >>>>>> >>>>>> rm -rf src/hotspot/share/shark >>>>>> >>>>>> I also went through the build machinery and removed stuff related >>>>>> to Shark and LLVM libs. >>>>>> >>>>>> Now the only references in the whole JDK tree to shark is a 'Shark >>>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>> >>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>> All looks fine. >>>>>> >>>>>> - I could not build zero because it seems broken because of the >>>>>> recent Atomic::* changes >>>>>> - I could not test any of the other arches that seemed to >>>>>> reference Shark (arm and sparc) >>>>>> >>>>>> Here's the full webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>> >>>>>> >>>>>> Can I get a review on this? >>>>>> >>>>>> Thanks, Roman >>>>>> >>>> > From vladimir.kozlov at oracle.com Sun Oct 15 22:14:53 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 15 Oct 2017 15:14:53 -0700 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> Message-ID: <85b68a77-f418-c619-0a51-c7389d7c5a86@oracle.com> +1 Thanks, Vladimir On 10/15/17 3:08 PM, David Holmes wrote: > Looks good. > > Thanks, > David > > On 16/10/2017 8:00 AM, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** {@code CompLevel::CompLevel_full_optimization} -- C2 >>> or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the generated-configure.sh part, and adds a smallish >>>>> Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>> >>>> Can you point me to the exact change please as I don't want to re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So >>>>>>> here comes the big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope >>>>>>> that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing >>>>>>> to do. If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff related to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in >>>>>>> some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> From david.holmes at oracle.com Mon Oct 16 00:31:55 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 10:31:55 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> Message-ID: <331579a0-29de-f152-2dd4-66987896c463@oracle.com> My internal JPRT run went fine. So this just needs a build team signoff from the perspective of the patch. However, as this has had a JEP submitted for it, the code changes can not be pushed until the JEP has been targeted. Thanks, David On 16/10/2017 8:08 AM, David Holmes wrote: > Looks good. > > Thanks, > David > > On 16/10/2017 8:00 AM, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>> >>>> >>>> Can you point me to the exact change please as I don't want to >>>> re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>> big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was >>>>>>> straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>>> If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff related >>>>>>> to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>> All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the >>>>>>> recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to >>>>>>> reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> From rkennke at redhat.com Mon Oct 16 05:49:26 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 07:49:26 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <331579a0-29de-f152-2dd4-66987896c463@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> <331579a0-29de-f152-2dd4-66987896c463@oracle.com> Message-ID: Hi David, thanks for reviewing and testing! The interaction between JEPs and patches going in is not really clear to me, nor is it well documented. For example, we're already pushing patches for JEP 304: Garbage Collection Interface, even though it's only in 'candidate' state... In any case, I'll ping Mark Reinhold about moving the Shark JEP forward. Thanks again, Roman > My internal JPRT run went fine. So this just needs a build team > signoff from the perspective of the patch. > > However, as this has had a JEP submitted for it, the code changes can > not be pushed until the JEP has been targeted. > > Thanks, > David > > On 16/10/2017 8:08 AM, David Holmes wrote: >> Looks good. >> >> Thanks, >> David >> >> On 16/10/2017 8:00 AM, Roman Kennke wrote: >>> >>> Ok, I fixed all the comments you mentioned. >>> >>> Differential (against webrev.01): >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >>> >>> Full webrev: >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >>> >>> >>> Roman >>> >>>> Just spotted this: >>>> >>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>>> >>>> David >>>> >>>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>>> Hi David, >>>>>> >>>>>> thanks! >>>>>> >>>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>>> >>>>> >>>>> Can you point me to the exact change please as I don't want to >>>>> re-examine it all. :) >>>>> >>>>> I'll pull this in and do a test build run internally. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, Roman >>>>>> >>>>>> >>>>>>> Hi Roman, >>>>>>> >>>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>>> big patch to remove it. >>>>>>>> >>>>>>>> What I have done: >>>>>>>> >>>>>>>> grep -i -R shark src >>>>>>>> grep -i -R shark make >>>>>>>> grep -i -R shark doc >>>>>>>> grep -i -R shark doc >>>>>>>> >>>>>>>> and purged any reference to shark. Almost everything was >>>>>>>> straightforward. >>>>>>>> >>>>>>>> The only things I wasn't really sure of: >>>>>>>> >>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>>> pd_address_in_code(), I am not sure it is the right thing to >>>>>>>> do. If not, what *would* be the right thing? >>>>>>>> >>>>>>>> Then of course I did: >>>>>>>> >>>>>>>> rm -rf src/hotspot/share/shark >>>>>>>> >>>>>>>> I also went through the build machinery and removed stuff >>>>>>>> related to Shark and LLVM libs. >>>>>>>> >>>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>>> >>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>>> All looks fine. >>>>>>>> >>>>>>>> - I could not build zero because it seems broken because of the >>>>>>>> recent Atomic::* changes >>>>>>>> - I could not test any of the other arches that seemed to >>>>>>>> reference Shark (arm and sparc) >>>>>>>> >>>>>>>> Here's the full webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> Can I get a review on this? >>>>>>>> >>>>>>>> Thanks, Roman >>>>>>>> >>>>>> >>> From david.holmes at oracle.com Mon Oct 16 06:10:19 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 16:10:19 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> <331579a0-29de-f152-2dd4-66987896c463@oracle.com> Message-ID: <456436e4-955c-75f5-ac92-e2fd4a2fb280@oracle.com> On 16/10/2017 3:49 PM, Roman Kennke wrote: > > Hi David, > > thanks for reviewing and testing! > > The interaction between JEPs and patches going in is not really clear to > me, nor is it well documented. For example, we're already pushing > patches for JEP 304: Garbage Collection Interface, even though it's only > in 'candidate' state... If patches can be separated out into generally useful cleanup or enabling changes then it can be okay to push them independently of the JEP AFAIK. That's obviously a little subjective. In this case though we're talking about the whole thing at once, so AFAIK the JEP has to be targeted before the changes can be pushed. > In any case, I'll ping Mark Reinhold about moving the Shark JEP forward. Thanks. Should be simple enough, I hope. :) Cheers, David > Thanks again, > Roman > >> My internal JPRT run went fine. So this just needs a build team >> signoff from the perspective of the patch. >> >> However, as this has had a JEP submitted for it, the code changes can >> not be pushed until the JEP has been targeted. >> >> Thanks, >> David >> >> On 16/10/2017 8:08 AM, David Holmes wrote: >>> Looks good. >>> >>> Thanks, >>> David >>> >>> On 16/10/2017 8:00 AM, Roman Kennke wrote: >>>> >>>> Ok, I fixed all the comments you mentioned. >>>> >>>> Differential (against webrev.01): >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >>>> >>>> Full webrev: >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >>>> >>>> >>>> Roman >>>> >>>>> Just spotted this: >>>>> >>>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>>>> >>>>> David >>>>> >>>>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> thanks! >>>>>>> >>>>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>>>> >>>>>> >>>>>> Can you point me to the exact change please as I don't want to >>>>>> re-examine it all. :) >>>>>> >>>>>> I'll pull this in and do a test build run internally. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>>>> >>>>>>>> Hi Roman, >>>>>>>> >>>>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>>>> big patch to remove it. >>>>>>>>> >>>>>>>>> What I have done: >>>>>>>>> >>>>>>>>> grep -i -R shark src >>>>>>>>> grep -i -R shark make >>>>>>>>> grep -i -R shark doc >>>>>>>>> grep -i -R shark doc >>>>>>>>> >>>>>>>>> and purged any reference to shark. Almost everything was >>>>>>>>> straightforward. >>>>>>>>> >>>>>>>>> The only things I wasn't really sure of: >>>>>>>>> >>>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>>>> pd_address_in_code(), I am not sure it is the right thing to >>>>>>>>> do. If not, what *would* be the right thing? >>>>>>>>> >>>>>>>>> Then of course I did: >>>>>>>>> >>>>>>>>> rm -rf src/hotspot/share/shark >>>>>>>>> >>>>>>>>> I also went through the build machinery and removed stuff >>>>>>>>> related to Shark and LLVM libs. >>>>>>>>> >>>>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>>>> >>>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>>>> All looks fine. >>>>>>>>> >>>>>>>>> - I could not build zero because it seems broken because of the >>>>>>>>> recent Atomic::* changes >>>>>>>>> - I could not test any of the other arches that seemed to >>>>>>>>> reference Shark (arm and sparc) >>>>>>>>> >>>>>>>>> Here's the full webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Can I get a review on this? >>>>>>>>> >>>>>>>>> Thanks, Roman >>>>>>>>> >>>>>>> >>>> > From erik.joelsson at oracle.com Mon Oct 16 08:24:56 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 16 Oct 2017 10:24:56 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> Hello Roman, In hotspot.m4, I believe the check on line 328 (pre changes) is still relevant for just the zero case. Otherwise build changes look good to me. /Erik On 2017-10-16 00:00, Roman Kennke wrote: > > Ok, I fixed all the comments you mentioned. > > Differential (against webrev.01): > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ > > Full webrev: > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ > > > Roman > >> Just spotted this: >> >> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >> >> David >> >> On 16/10/2017 7:25 AM, David Holmes wrote: >>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>> Hi David, >>>> >>>> thanks! >>>> >>>> I'm uploading a 2nd revision of the patch that excludes the >>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>> >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>> >>> >>> Can you point me to the exact change please as I don't want to >>> re-examine it all. :) >>> >>> I'll pull this in and do a test build run internally. >>> >>> Thanks, >>> David >>> >>>> Thanks, Roman >>>> >>>> >>>>> Hi Roman, >>>>> >>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>> big patch to remove it. >>>>>> >>>>>> What I have done: >>>>>> >>>>>> grep -i -R shark src >>>>>> grep -i -R shark make >>>>>> grep -i -R shark doc >>>>>> grep -i -R shark doc >>>>>> >>>>>> and purged any reference to shark. Almost everything was >>>>>> straightforward. >>>>>> >>>>>> The only things I wasn't really sure of: >>>>>> >>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>> If not, what *would* be the right thing? >>>>>> >>>>>> Then of course I did: >>>>>> >>>>>> rm -rf src/hotspot/share/shark >>>>>> >>>>>> I also went through the build machinery and removed stuff related >>>>>> to Shark and LLVM libs. >>>>>> >>>>>> Now the only references in the whole JDK tree to shark is a >>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>> >>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>> All looks fine. >>>>>> >>>>>> - I could not build zero because it seems broken because of the >>>>>> recent Atomic::* changes >>>>>> - I could not test any of the other arches that seemed to >>>>>> reference Shark (arm and sparc) >>>>>> >>>>>> Here's the full webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>> >>>>>> >>>>>> Can I get a review on this? >>>>>> >>>>>> Thanks, Roman >>>>>> >>>> > From magnus.ihse.bursie at oracle.com Mon Oct 16 09:25:59 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 16 Oct 2017 11:25:59 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> Message-ID: On 2017-10-16 10:24, Erik Joelsson wrote: > Hello Roman, > > In hotspot.m4, I believe the check on line 328 (pre changes) is still > relevant for just the zero case. Yes, it is indeed. > > Otherwise build changes look good to me. Agree, looks good. /Magnus > > /Erik > > > On 2017-10-16 00:00, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>> >>>> >>>> Can you point me to the exact change please as I don't want to >>>> re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>> big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was >>>>>>> straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>>> If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff >>>>>>> related to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>> All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the >>>>>>> recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to >>>>>>> reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at redhat.com Mon Oct 16 10:26:43 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 12:26:43 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> Message-ID: <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com> Hi Erik, You mean like this? http://cr.openjdk.java.net/~rkennke/8171853/webrev.04.diff/ Full webrev here: http://cr.openjdk.java.net/~rkennke/8171853/webrev.04/ Thanks, Roman > Hello Roman, > > In hotspot.m4, I believe the check on line 328 (pre changes) is still > relevant for just the zero case. > > Otherwise build changes look good to me. > > /Erik > > > On 2017-10-16 00:00, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>> >>>> >>>> Can you point me to the exact change please as I don't want to >>>> re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>> big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was >>>>>>> straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>>> If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff >>>>>>> related to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>> All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the >>>>>>> recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to >>>>>>> reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.joelsson at oracle.com Mon Oct 16 10:55:28 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 16 Oct 2017 12:55:28 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com> Message-ID: That looks correct. Thanks! /Erik On 2017-10-16 12:26, Roman Kennke wrote: > > Hi Erik, > > You mean like this? > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.04.diff/ > > > Full webrev here: > http://cr.openjdk.java.net/~rkennke/8171853/webrev.04/ > > > Thanks, > Roman > >> Hello Roman, >> >> In hotspot.m4, I believe the check on line 328 (pre changes) is still >> relevant for just the zero case. >> >> Otherwise build changes look good to me. >> >> /Erik >> >> >> On 2017-10-16 00:00, Roman Kennke wrote: >>> >>> Ok, I fixed all the comments you mentioned. >>> >>> Differential (against webrev.01): >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >>> >>> Full webrev: >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >>> >>> >>> Roman >>> >>>> Just spotted this: >>>> >>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>>> >>>> David >>>> >>>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>>> Hi David, >>>>>> >>>>>> thanks! >>>>>> >>>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>>> >>>>> >>>>> Can you point me to the exact change please as I don't want to >>>>> re-examine it all. :) >>>>> >>>>> I'll pull this in and do a test build run internally. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, Roman >>>>>> >>>>>> >>>>>>> Hi Roman, >>>>>>> >>>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>>> big patch to remove it. >>>>>>>> >>>>>>>> What I have done: >>>>>>>> >>>>>>>> grep -i -R shark src >>>>>>>> grep -i -R shark make >>>>>>>> grep -i -R shark doc >>>>>>>> grep -i -R shark doc >>>>>>>> >>>>>>>> and purged any reference to shark. Almost everything was >>>>>>>> straightforward. >>>>>>>> >>>>>>>> The only things I wasn't really sure of: >>>>>>>> >>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>>> pd_address_in_code(), I am not sure it is the right thing to >>>>>>>> do. If not, what *would* be the right thing? >>>>>>>> >>>>>>>> Then of course I did: >>>>>>>> >>>>>>>> rm -rf src/hotspot/share/shark >>>>>>>> >>>>>>>> I also went through the build machinery and removed stuff >>>>>>>> related to Shark and LLVM libs. >>>>>>>> >>>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>>> >>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>>> All looks fine. >>>>>>>> >>>>>>>> - I could not build zero because it seems broken because of the >>>>>>>> recent Atomic::* changes >>>>>>>> - I could not test any of the other arches that seemed to >>>>>>>> reference Shark (arm and sparc) >>>>>>>> >>>>>>>> Here's the full webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> Can I get a review on this? >>>>>>>> >>>>>>>> Thanks, Roman >>>>>>>> >>>>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Mon Oct 16 15:46:07 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 16 Oct 2017 17:46:07 +0200 Subject: Low-Overhead Heap Profiling In-Reply-To: References: <1497366226.2829.109.camel@oracle.com> <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> Message-ID: <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> Hi JC, I saw a webrev.12 in the directory, with only test changes(11->12), so I took that version. I had a look and tested the tests, worked fine! First glance at the code (looking at full v12) some minor things below, mostly unused stuff. Thanks, Robbin diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp --- a/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct 16 16:54:06 2017 +0200 +++ b/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct 16 17:42:42 2017 +0200 @@ -211,2 +211,3 @@ void initialize(int max_storage) { + // validate max_storage to sane value ? What would 0 mean ? MutexLocker mu(HeapMonitor_lock); @@ -227,8 +228,4 @@ bool initialized() { return _initialized; } - volatile bool *initialized_address() { return &_initialized; } private: - // Protects the traces currently sampled (below). - volatile intptr_t _stack_storage_lock[1]; - // The traces currently sampled. @@ -313,3 +310,2 @@ _initialized(false) { - _stack_storage_lock[0] = 0; } @@ -532,13 +528,2 @@ -// Delegate the initialization question to the underlying storage system. -bool HeapMonitoring::initialized() { - return StackTraceStorage::storage()->initialized(); -} - -// Delegate the initialization question to the underlying storage system. -bool *HeapMonitoring::initialized_address() { - return - const_cast(StackTraceStorage::storage()->initialized_address()); -} - void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) { diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp --- a/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct 16 16:54:06 2017 +0200 +++ b/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct 16 17:42:42 2017 +0200 @@ -35,3 +35,2 @@ static uint64_t _rnd; - static bool _initialized; static jint _monitoring_rate; @@ -92,7 +91,2 @@ - // Is the profiler initialized and where is the address to the initialized - // boolean. - static bool initialized(); - static bool *initialized_address(); - // Called when o is to be sampled from a given thread and a given size. On 10/10/2017 12:57 AM, JC Beyler wrote: > Dear all, > > Thread-safety is back!! Here is the update webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/ > > Full webrev is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/ > > In order to really test this, I needed to add this so thought now was a good time. It required a few changes here for the creation to ensure correctness and safety. Now we > keep the static pointer but clear the data internally so on re-initialize, it will be a bit more costly than before. I don't think this is a huge use-case so I did not > think it was a problem. I used the internal MutexLocker, I think I used it well, let me know. > > I also added three tests: > > 1) Stack depth test: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch > > This test shows that the maximum stack depth system is working. > > 2) Thread safety: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch > > The test creates 24 threads and they all allocate at the same time. The test then checks it does find samples from all the threads. > > 3) Thread on/off safety > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch > > The test creates 24 threads that all allocate a bunch of memory. Then another thread turns the sampling on/off. > > Btw, both tests 2 & 3 failed without the locks. > > As I worked on this, I saw a lot of places where the tests are doing very similar things, I'm going to clean up the code a bit and make a HeapAllocator class that all tests > can call directly. This will greatly simplify the code. > > Thanks for any comments/criticisms! > Jc > > > On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler > wrote: > > Dear all, > > Small update to the webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ > > Full webrev is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ > > I updated a bit of the naming, removed a TODO comment, and I added a test for testing the sampling rate. I also updated the maximum stack depth to 1024, there is no > reason to keep it so small. I did a micro benchmark that tests the overhead and it seems relatively the same. > > I compared allocations from a stack depth of 10 and allocations from a stack depth of 1024 (allocations are from the same helper method in > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java > ): > ? ? ? ? ? - For an array of 1 integer allocated in a loop; stack depth 1024 vs stack depth 10: 1% slower > ??????????- For an array of 200k integers allocated in a loop; stack depth 1024 vs stack depth 10: 3% slower > > So basically now moving the maximum stack depth to 1024 but we only copy over the stack depths actually used. > > For the next webrev, I will be adding a stack depth test to show that it works and probably put back the mutex locking so that we can see how difficult it is to keep > thread safe. > > Let me know what you think! > Jc > > > > On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler > wrote: > > Forgot to say that for my numbers: > ?- Not in the test are the actual numbers I got for the various array sizes, I ran the program 30 times and parsed the output; here are the averages and standard > deviation: > ? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation > ? ? ? 10000:? ? 1.59% average; 1.25% standard deviation > ? ? ? 100000:? ?1.26% average; 1.26% standard deviation > > The 1000/10000/100000 are the sizes of the arrays being allocated. These are allocated 100k times and the sampling rate is 111 times the size of the array. > > Thanks! > Jc > > > On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler > wrote: > > Hi all, > > After a bit of a break, I am back working on this :). As before, here are two webrevs: > > - Full change set: http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/ > - Compared to version 8: http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/ > ? ? (This version is compared to version 8 I last showed but ported to the new folder hierarchy) > > In this version I have: > ? - Handled Thomas' comments from his email of 07/03: > ? ? ? ?- Merged the logging to be standard > ? ? ? ?- Fixed up the code a bit where asked > ? ? ? ?- Added some notes about the code not being thread-safe yet > ? ?- Removed additional dead code from the version that modifies interpreter/c1/c2 > ? ?- Fixed compiler issues so that it compiles with --disable-precompiled-header > ? ? ? ? - Tested with ./configure --with-boot-jdk= --with-debug-level=slowdebug --disable-precompiled-headers > > Additionally, I added a test to check the sanity of the sampler: HeapMonitorStatCorrectnessTest > (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch ) > ? ?- This allocates a number of arrays and checks that we obtain the number of samples we want with an accepted error of 5%. I tested it 100 times and it > passed everytime, I can test more if wanted > ? ?- Not in the test are the actual numbers I got for the various array sizes, I ran the program 30 times and parsed the output; here are the averages and > standard deviation: > ? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation > ? ? ? 10000:? ? 1.59% average; 1.25% standard deviation > ? ? ? 100000:? ?1.26% average; 1.26% standard deviation > > What this means is that we were always at about 1~2% of the number of samples the test expected. > > Let me know what you think, > Jc > > On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler > wrote: > > Hi all, > > I apologize, I have not yet handled your remarks but thought this new webrev would also be useful to see and comment on perhaps. > > Here is the latest webrev, it is generated slightly different than the others since now I'm using webrev.ksh without the -N option: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ > > And the webrev.07 to webrev.08 diff is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ > > (Let me know if it works well) > > It's a small change between versions but it: > ? - provides a fix that makes the average sample rate correct (more on that below). > ? - fixes the code to actually have it play nicely with the fast tlab refill > ? - cleaned up a bit the JVMTI text and now use jvmtiFrameInfo > - moved the capability to be onload solo > > With this webrev, I've done a small study of the random number generator we use here for the sampling rate. I took a small program and it can be simplified to: > > for (outer loop) > for (inner loop) > int[] tmp = new int[arraySize]; > > - I've fixed the outer and inner loops to being 800 for this experiment, meaning we allocate 640000 times an array of a given array size. > > - Each program provides the average sample size used for the whole execution > > - Then, I ran each variation 30 times and then calculated the average of the average sample size used for various array sizes. I selected the array size to > be one of the following: 1, 10, 100, 1000. > > - When compared to 512kb, the average sample size of 30 runs: > 1: 4.62% of error > 10: 3.09% of error > 100: 0.36% of error > 1000: 0.1% of error > 10000: 0.03% of error > > What it shows is that, depending on the number of samples, the average does become better. This is because with an allocation of 1 element per array, it > will take longer to hit one of the thresholds. This is seen by looking at the sample count statistic I put in. For the same number of iterations (800 * > 800), the different array sizes provoke: > 1: 62 samples > 10: 125 samples > 100: 788 samples > 1000: 6166 samples > 10000: 57721 samples > > And of course, the more samples you have, the more sample rates you pick, which means that your average gets closer using that math. > > Thanks, > Jc > > On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler > wrote: > > Thanks Robbin, > > This seems to have worked. When I have the next webrev ready, we will find out but I'm fairly confident it will work! > > Thanks agian! > Jc > > On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn > wrote: > > Hi JC, > > On 06/29/2017 12:15 AM, JC Beyler wrote: > > B) Incremental changes > > > I guess the most common work flow here is using mq : > hg qnew fix_v1 > edit files > hg qrefresh > hg qnew fix_v2 > edit files > hg qrefresh > > if you do hg log you will see 2 commits > > webrev.ksh -r -2 -o my_inc_v1_v2 > webrev.ksh -o my_full_v2 > > > In? your .hgrc you might need: > [extensions] > mq = > > /Robbin > > > Again another newbiew question here... > > For showing the incremental changes, is there a link that explains how to do that? I apologize for my newbie questions all the time :) > > Right now, I do: > > ? ksh ../webrev.ksh -m -N > > That generates a webrev.zip and send it to Chuck Rasbold. He then uploads it to a new webrev. > > I tried commiting my change and adding a small change. Then if I just do ksh ../webrev.ksh without any options, it seems to produce a similar > page but now with only the changes I had (so the 06-07 comparison you were talking about) and a changeset that has it all. I imagine that is > what you meant. > > Which means that my workflow would become: > > 1) Make changes > 2) Make a webrev without any options to show just the differences with the tip > 3) Amend my changes to my local commit so that I have it done with > 4) Go to 1 > > Does that seem correct to you? > > Note that when I do this, I only see the full change of a file in the full change set (Side note here: now the page says change set and not > patch, which is maybe why Serguei was having issues?). > > Thanks! > Jc > > > > On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn >> wrote: > > ? ? Hi, > > ? ? On 06/28/2017 12:04 AM, JC Beyler wrote: > > ? ? ? ? Dear Thomas et al, > > ? ? ? ? Here is the newest webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ > > > > > > ? ? You have some more bits to in there but generally this looks good and really nice with more tests. > ? ? I'll do and deep dive and re-test this when I get back from my long vacation with whatever patch version you have then. > > ? ? Also I think it's time you provide incremental (v06->07 changes) as well as complete change-sets. > > ? ? Thanks, Robbin > > > > > ? ? ? ? Thomas, I "think" I have answered all your remarks. The summary is: > > ? ? ? ? - The statistic system is up and provides insight on what the heap sampler is doing > ? ? ? ? ? ? ?- I've noticed that, though the sampling rate is at the right mean, we are missing some samples, I have not yet tracked out why > (details below) > > ? ? ? ? - I've run a tiny benchmark that is the worse case: it is a very tight loop and allocated a small array > ? ? ? ? ? ? ?- In this case, I see no overhead when the system is off so that is a good start :) > ? ? ? ? ? ? ?- I see right now a high overhead in this case when sampling is on. This is not a really too surprising but I'm going to see if > this is consistent with our > ? ? ? ? internal implementation. The benchmark is really allocation stressful so I'm not too surprised but I want to do the due diligence. > > ? ? ? ? ? ?- The statistic system up is up and I have a new test > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch > > ? ? ? ? > > ? ? ? ? ? ? ? - I did a bit of a study about the random generator here, more details are below but basically it seems to work well > > ? ? ? ? ? ?- I added a capability but since this is the first time doing this, I was not sure I did it right > ? ? ? ? ? ? ?- I did add a test though for it and the test seems to do what I expect (all methods are failing with the > JVMTI_ERROR_MUST_POSSESS_CAPABILITY error). > ? ? ? ? ? ? ? ? ?- > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch > > > > > > ? ? ? ? ? ?- I still need to figure out what to do about the multi-agent vs single-agent issue > > ? ? ? ? ? ?- As far as measurements, it seems I still need to look at: > ? ? ? ? ? ? ?- Why we do the 20 random calls first, are they necessary? > ? ? ? ? ? ? ?- Look at the mean of the sampling rate that the random generator does and also what is actually sampled > ? ? ? ? ? ? ?- What is the overhead in terms of memory/performance when on? > > ? ? ? ? I have inlined my answers, I think I got them all in the new webrev, let me know your thoughts. > > ? ? ? ? Thanks again! > ? ? ? ? Jc > > > ? ? ? ? On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl > > > > ? ? ? ? >>> wrote: > > ? ? ? ? ? ? ?Hi, > > ? ? ? ? ? ? ?On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote: > ? ? ? ? ? ? ?> Hi all, > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> First off: Thanks again to Robbin and Thomas for their reviews :) > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> Next, I've uploaded a new webrev: > ? ? ? ? ? ? ?> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ > > > ? ? ? ? > >> > > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> Here is an update: > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> - @Robbin, I forgot to say that yes I need to look at implementing > ? ? ? ? ? ? ?> this for the other architectures and testing it before it is all > ? ? ? ? ? ? ?> ready to go. Is it common to have it working on all possible > ? ? ? ? ? ? ?> combinations or is there a subset that I should be doing first and we > ? ? ? ? ? ? ?> can do the others later? > ? ? ? ? ? ? ?> - I've tested slowdebug, built and ran the JTreg tests I wrote with > ? ? ? ? ? ? ?> slowdebug and fixed a few more issues > ? ? ? ? ? ? ?> - I've refactored a bit of the code following Thomas' comments > ? ? ? ? ? ? ?>? ? - I think I've handled all the comments from Thomas (I put > ? ? ? ? ? ? ?> comments inline below for the specifics) > > ? ? ? ? ? ? ?Thanks for handling all those. > > ? ? ? ? ? ? ?> - Following Thomas' comments on statistics, I want to add some > ? ? ? ? ? ? ?> quality assurance tests and find that the easiest way would be to > ? ? ? ? ? ? ?> have a few counters of what is happening in the sampler and expose > ? ? ? ? ? ? ?> that to the user. > ? ? ? ? ? ? ?>? ? - I'll be adding that in the next version if no one sees any > ? ? ? ? ? ? ?> objections to that. > ? ? ? ? ? ? ?>? ? - This will allow me to add a sanity test in JTreg about number of > ? ? ? ? ? ? ?> samples and average of sampling rate > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> @Thomas: I had a few questions that I inlined below but I will > ? ? ? ? ? ? ?> summarize the "bigger ones" here: > ? ? ? ? ? ? ?>? ? - You mentioned constants are not using the right conventions, I > ? ? ? ? ? ? ?> looked around and didn't see any convention except normal naming then > ? ? ? ? ? ? ?> for static constants. Is that right? > > ? ? ? ? ? ? ?I looked through https://wiki.openjdk.java.net/display/HotSpot/StyleGui > > > ? ? ? ? > >> > ? ? ? ? ? ? ?de and the rule is to "follow an existing pattern and must have a > ? ? ? ? ? ? ?distinct appearance from other names". Which does not help a lot I > ? ? ? ? ? ? ?guess :/ The GC team started using upper camel case, e.g. > ? ? ? ? ? ? ?SomeOtherConstant, but very likely this is probably not applied > ? ? ? ? ? ? ?consistently throughout. So I am fine with not adding another style > ? ? ? ? ? ? ?(like kMaxStackDepth with the "k" in front with some unknown meaning) > ? ? ? ? ? ? ?is fine. > > ? ? ? ? ? ? ?(Chances are you will find that style somewhere used anyway too, > ? ? ? ? ? ? ?apologies if so :/) > > > ? ? ? ? Thanks for that link, now I know where to look. I used the upper camel case in my code as well then :) I should have gotten them all. > > > ? ? ? ? ? ? ? > PS: I've also inlined my answers to Thomas below: > ? ? ? ? ? ? ? > > ? ? ? ? ? ? ? > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl ? ? ? ? ? ? ? > e.com > wrote: > ? ? ? ? ? ? ? > > Hi all, > ? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote: > ? ? ? ? ? ? ? > > > Dear all, > ? ? ? ? ? ? ? > > > > ? ? ? ? ? ? ? > > > I've continued working on this and have done the following > ? ? ? ? ? ? ? > > webrev: > ? ? ? ? ? ? ? > > > http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ > > > ? ? ? ? > >> > > ? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? > > [...] > ? ? ? ? ? ? ? > > > Things I still need to do: > ? ? ? ? ? ? ? > > >? ? - Have to fix that TLAB case for the FastTLABRefill > ? ? ? ? ? ? ? > > >? ? - Have to start looking at the data to see that it is > ? ? ? ? ? ? ? > > consistent and does gather the right samples, right frequency, etc. > ? ? ? ? ? ? ? > > >? ? - Have to check the GC elements and what that produces > ? ? ? ? ? ? ? > > >? ? - Run a slowdebug run and ensure I fixed all those issues you > ? ? ? ? ? ? ? > > saw > Robbin > ? ? ? ? ? ? ? > > > > ? ? ? ? ? ? ? > > > Thanks for looking at the webrev and have a great week! > ? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? > >? ?scratching a bit on the surface of this change, so apologies for > ? ? ? ? ? ? ? > > rather shallow comments: > ? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? > > - macroAssembler_x86.cpp:5604: while this is compiler code, and I > ? ? ? ? ? ? ? > > am not sure this is final, please avoid littering the code with > ? ? ? ? ? ? ? > > TODO remarks :) They tend to be candidates for later wtf moments > ? ? ? ? ? ? ? > > only. > ? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? > > Just file a CR for that. > ? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? > Newcomer question: what is a CR and not sure I have the rights to do > ? ? ? ? ? ? ? > that yet ? :) > > ? ? ? ? ? ? ?Apologies. CR is a change request, this suggests to file a bug in the > ? ? ? ? ? ? ?bug tracker. And you are right, you can't just create a new account in > ? ? ? ? ? ? ?the OpenJDK JIRA yourselves. :( > > > ? ? ? ? Ok good to know, I'll continue with my own todo list but I'll work hard on not letting it slip in the webrevs anymore :) > > > ? ? ? ? ? ? ?I was mostly referring to the "... but it is a TODO" part of that > ? ? ? ? ? ? ?comment in macroassembler_x86.cpp. Comments about the why of the code > ? ? ? ? ? ? ?are appreciated. > > ? ? ? ? ? ? ?[Note that I now understand that this is to some degree still work in > ? ? ? ? ? ? ?progress. As long as the final changeset does no contain TODO's I am > ? ? ? ? ? ? ?fine (and it's not a hard objection, rather their use in "final" code > ? ? ? ? ? ? ?is typically limited in my experience)] > > ? ? ? ? ? ? ?5603? ?// Currently, if this happens, just set back the actual end to > ? ? ? ? ? ? ?where it was. > ? ? ? ? ? ? ?5604? ?// We miss a chance to sample here. > > ? ? ? ? ? ? ?Would be okay, if explaining "this" and the "why" of missing a chance > ? ? ? ? ? ? ?to sample here would be best. > > ? ? ? ? ? ? ?Like maybe: > > ? ? ? ? ? ? ?// If we needed to refill TLABs, just set the actual end point to > ? ? ? ? ? ? ?// the end of the TLAB again. We do not sample here although we could. > > ? ? ? ? Done with your comment, it works well in my mind. > > ? ? ? ? ? ? ?I am not sure whether "miss a chance to sample" meant "we could, but > ? ? ? ? ? ? ?consciously don't because it's not that useful" or "it would be > ? ? ? ? ? ? ?necessary but don't because it's too complicated to do.". > > ? ? ? ? ? ? ?Looking at the original comment once more, I am also not sure if that > ? ? ? ? ? ? ?comment shouldn't referring to the "end" variable (not actual_end) > ? ? ? ? ? ? ?because that's the variable that is responsible for taking the sampling > ? ? ? ? ? ? ?path? (Going from the member description of ThreadLocalAllocBuffer). > > > ? ? ? ? I've moved this code and it no longer shows up here but the rationale and answer was: > > ? ? ? ? So.. Yes, end is the variable provoking the sampling. Actual end is the actual end of the TLAB. > > ? ? ? ? What was happening here is that the code is resetting _end to point towards the end of the new TLAB. Because, we now have the end for > sampling and _actual_end for > ? ? ? ? the actual end, we need to update the actual_end as well. > > ? ? ? ? Normally, were we to do the real work here, we would calculate the (end - start) offset, then do: > > ? ? ? ? - Set the new end to : start + (old_end - old_start) > ? ? ? ? - Set the actual end like we do here now where it because it is the actual end. > > ? ? ? ? Why is this not done here now anymore? > ? ? ? ? ? ? - I was still debating which path to take: > ? ? ? ? ? ? ? ?- Do it in the fast refill code, it has its perks: > ? ? ? ? ? ? ? ? ? ?- In a world where fast refills are happening all the time or a lot, we can augment there the code to do the sampling > ? ? ? ? ? ? ? ?- Remember what we had as an end before leaving the slowpath and check on return > ? ? ? ? ? ? ? ? ? ?- This is what I'm doing now, it removes the need to go fix up all fast refill paths but if you remain in fast refill paths, > you won't get sampling. I > ? ? ? ? have to think of the consequences of that, maybe a future change later on? > ? ? ? ? ? ? ? ? ? ? ? - I have the statistics now so I'm going to study that > ? ? ? ? ? ? ? ? ? ? ? ? ?-> By the way, though my statistics are showing I'm missing some samples, if I turn off FastTlabRefill, it is the same > loss so for now, it seems > ? ? ? ? this does not occur in my simple test. > > > > ? ? ? ? ? ? ?But maybe I am only confused and it's best to just leave the comment > ? ? ? ? ? ? ?away. :) > > ? ? ? ? ? ? ?Thinking about it some more, doesn't this not-sampling in this case > ? ? ? ? ? ? ?mean that sampling does not work in any collector that does inline TLAB > ? ? ? ? ? ? ?allocation at the moment? (Or is inline TLAB alloc automatically > ? ? ? ? ? ? ?disabled with sampling somehow?) > > ? ? ? ? ? ? ?That would indeed be a bigger TODO then :) > > > ? ? ? ? Agreed, this remark made me think that perhaps as a first step the new way of doing it is better but I did have to: > ? ? ? ? ? ?- Remove the const of the ThreadLocalBuffer remaining and hard_end methods > ? ? ? ? ? ?- Move hard_end out of the header file to have a bit more logic there > > ? ? ? ? Please let me know what you think of that and if you prefer it this way or changing the fast refills. (I prefer this way now because it > is more incremental). > > > ? ? ? ? ? ? ?> > - calling HeapMonitoring::do_weak_oops() (which should probably be > ? ? ? ? ? ? ?> > called weak_oops_do() like other similar methods) only if string > ? ? ? ? ? ? ?> > deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong. > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> The call should be at least around 6 lines up outside the if. > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> Preferentially in a method like process_weak_jni_handles(), including > ? ? ? ? ? ? ?> additional logging. (No new (G1) gc phase without minimal logging > ? ? ? ? ? ? ?> :)). > ? ? ? ? ? ? ?> Done but really not sure because: > ? ? ? ? ? ? ?> > ? ? ? ? ? ? ?> I put for logging: > ? ? ? ? ? ? ?>? ?log_develop_trace(gc, freelist)("G1ConcRegionFreeing [other] : heap > ? ? ? ? ? ? ?> monitoring"); > > ? ? ? ? ? ? ?I would think that "gc, ref" would be more appropriate log tags for > ? ? ? ? ? ? ?this similar to jni handles. > ? ? ? ? ? ? ?(I am als not sure what weak reference handling has to do with > ? ? ? ? ? ? ?G1ConcRegionFreeing, so I am a bit puzzled) > > > ? ? ? ? I was not sure what to put for the tags or really as the message. I cleaned it up a bit now to: > ? ? ? ? ? ? ?log_develop_trace(gc, ref)("HeapSampling [other] : heap monitoring processing"); > > > > ? ? ? ? ? ? ?> Since weak_jni_handles didn't have logging for me to be inspired > ? ? ? ? ? ? ?> from, I did that but unconvinced this is what should be done. > > ? ? ? ? ? ? ?The JNI handle processing does have logging, but only in > ? ? ? ? ? ? ?ReferenceProcessor::process_discovered_references(). In > ? ? ? ? ? ? ?process_weak_jni_handles() only overall time is measured (in a G1 > ? ? ? ? ? ? ?specific way, since only G1 supports disabling reference procesing) :/ > > ? ? ? ? ? ? ?The code in ReferenceProcessor prints both time taken > ? ? ? ? ? ? ?referenceProcessor.cpp:254, as well as the count, but strangely only in > ? ? ? ? ? ? ?debug VMs. > > ? ? ? ? ? ? ?I have no idea why this logging is that unimportant to only print that > ? ? ? ? ? ? ?in a debug VM. However there are reviews out for changing this area a > ? ? ? ? ? ? ?bit, so it might be useful to wait for that (JDK-8173335). > > > ? ? ? ? I cleaned it up a bit anyway and now it returns the count of objects that are in the system. > > > ? ? ? ? ? ? ?> > - the change doubles the size of > ? ? ? ? ? ? ?> > CollectedHeap::allocate_from_tlab_slow() above the "small and nice" > ? ? ? ? ? ? ?> > threshold. Maybe it could be refactored a bit. > ? ? ? ? ? ? ?> Done I think, it looks better to me :). > > ? ? ? ? ? ? ?In ThreadLocalAllocBuffer::handle_sample() I think the > ? ? ? ? ? ? ?set_back_actual_end()/pick_next_sample() calls could be hoisted out of > ? ? ? ? ? ? ?the "if" :) > > > ? ? ? ? Done! > > > ? ? ? ? ? ? ?> > - referenceProcessor.cpp:261: the change should add logging about > ? ? ? ? ? ? ?> > the number of references encountered, maybe after the corresponding > ? ? ? ? ? ? ?> > "JNI weak reference count" log message. > ? ? ? ? ? ? ?> Just to double check, are you saying that you'd like to have the heap > ? ? ? ? ? ? ?> sampler to keep in store how many sampled objects were encountered in > ? ? ? ? ? ? ?> the HeapMonitoring::weak_oops_do? > ? ? ? ? ? ? ?>? ? - Would a return of the method with the number of handled > ? ? ? ? ? ? ?> references and logging that work? > > ? ? ? ? ? ? ?Yes, it's fine if HeapMonitoring::weak_oops_do() only returned the > ? ? ? ? ? ? ?number of processed weak oops. > > > ? ? ? ? Done also (but I admit I have not tested the output yet) :) > > > ? ? ? ? ? ? ?>? ? - Additionally, would you prefer it in a separate block with its > ? ? ? ? ? ? ?> GCTraceTime? > > ? ? ? ? ? ? ?Yes. Both kinds of information is interesting: while the time taken is > ? ? ? ? ? ? ?typically more important, the next question would be why, and the > ? ? ? ? ? ? ?number of references typically goes a long way there. > > ? ? ? ? ? ? ?See above though, it is probably best to wait a bit. > > > ? ? ? ? Agreed that I "could" wait but, if it's ok, I'll just refactor/remove this when we get closer to something final. Either, JDK-8173335 > ? ? ? ? has gone in and I will notice it now or it will soon and I can change it then. > > > ? ? ? ? ? ? ?> > - threadLocalAllocBuffer.cpp:331: one more "TODO" > ? ? ? ? ? ? ?> Removed it and added it to my personal todos to look at. > ? ? ? ? ? ? ?>? ? ? > > > ? ? ? ? ? ? ?> > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class > ? ? ? ? ? ? ?> > documentation should be updated about the sampling additions. I > ? ? ? ? ? ? ?> > would have no clue what the difference between "actual_end" and > ? ? ? ? ? ? ?> > "end" would be from the given information. > ? ? ? ? ? ? ?> If you are talking about the comments in this file, I made them more > ? ? ? ? ? ? ?> clear I hope in the new webrev. If it was somewhere else, let me know > ? ? ? ? ? ? ?> where to change. > > ? ? ? ? ? ? ?Thanks, that's much better. Maybe a note in the comment of the class > ? ? ? ? ? ? ?that ThreadLocalBuffer provides some sampling facility by modifying the > ? ? ? ? ? ? ?end() of the TLAB to cause "frequent" calls into the runtime call where > ? ? ? ? ? ? ?actual sampling takes place. > > > ? ? ? ? Done, I think it's better now. Added something about the slow_path_end as well. > > > ? ? ? ? ? ? ?> > - in heapMonitoring.hpp: there are some random comments about some > ? ? ? ? ? ? ?> > code that has been grabbed from "util/math/fastmath.[h|cc]". I > ? ? ? ? ? ? ?> > can't tell whether this is code that can be used but I assume that > ? ? ? ? ? ? ?> > Noam Shazeer is okay with that (i.e. that's all Google code). > ? ? ? ? ? ? ?> Jeremy and I double checked and we can release that as I thought. I > ? ? ? ? ? ? ?> removed the comment from that piece of code entirely. > > ? ? ? ? ? ? ?Thanks. > > ? ? ? ? ? ? ?> > - heapMonitoring.hpp/cpp static constant naming does not correspond > ? ? ? ? ? ? ?> > to Hotspot's. Additionally, in Hotspot static methods are cased > ? ? ? ? ? ? ?> > like other methods. > ? ? ? ? ? ? ?> I think I fixed the methods to be cased the same way as all other > ? ? ? ? ? ? ?> methods. For static constants, I was not sure. I fixed a few other > ? ? ? ? ? ? ?> variables but I could not seem to really see a consistent trend for > ? ? ? ? ? ? ?> constants. I made them as variables but I'm not sure now. > > ? ? ? ? ? ? ?Sorry again, style is a kind of mess. The goal of my suggestions here > ? ? ? ? ? ? ?is only to prevent yet another style creeping in. > > ? ? ? ? ? ? ?> > - in heapMonitoring.cpp there are a few cryptic comments at the top > ? ? ? ? ? ? ?> > that seem to refer to internal stuff that should probably be > ? ? ? ? ? ? ?> > removed. > ? ? ? ? ? ? ?> Sorry about that! My personal todos not cleared out. > > ? ? ? ? ? ? ?I am happy about comments, but I simply did not understand any of that > ? ? ? ? ? ? ?and I do not know about other readers as well. > > ? ? ? ? ? ? ?If you think you will remember removing/updating them until the review > ? ? ? ? ? ? ?proper (I misunderstood the review situation a little it seems). > > ? ? ? ? ? ? ?> > I did not think through the impact of the TLAB changes on collector > ? ? ? ? ? ? ?> > behavior yet (if there are). Also I did not check for problems with > ? ? ? ? ? ? ?> > concurrent mark and SATB/G1 (if there are). > ? ? ? ? ? ? ?> I would love to know your thoughts on this, I think this is fine. I > > ? ? ? ? ? ? ?I think so too now. No objects are made live out of thin air :) > > ? ? ? ? ? ? ?> see issues with multiple threads right now hitting the stack storage > ? ? ? ? ? ? ?> instance. Previous webrevs had a mutex lock here but we took it out > ? ? ? ? ? ? ?> for simplificity (and only for now). > > ? ? ? ? ? ? ?:) When looking at this after some thinking I now assume for this > ? ? ? ? ? ? ?review that this code is not MT safe at all. There seems to be more > ? ? ? ? ? ? ?synchronization missing than just the one for the StackTraceStorage. So > ? ? ? ? ? ? ?no comments about this here. > > > ? ? ? ? I doubled checked a bit (quickly I admit) but it seems that synchronization in StackTraceStorage is really all you need (all methods > lead to a StackTraceStorage one > ? ? ? ? and can be multithreaded outside of that). > ? ? ? ? There is a question about the initialization where the method HeapMonitoring::initialize_profiling is not thread safe. > ? ? ? ? It would work (famous last words) and not crash if there was a race but we could add a synchronization point there as well (and > therefore on the stop as well). > > ? ? ? ? But anyway I will really check and do this once we add back synchronization. > > > ? ? ? ? ? ? ?Also, this would require some kind of specification of what is allowed > ? ? ? ? ? ? ?to be called when and where. > > > ? ? ? ? Would we specify this with the methods in the jvmti.xml file? We could start by specifying in each that they are not thread safe but I > saw no mention of that for > ? ? ? ? other methods. > > > ? ? ? ? ? ? ?One potentially relevant observation about locking here: depending on > ? ? ? ? ? ? ?sampling frequency, StackTraceStore::add_trace() may be rather > ? ? ? ? ? ? ?frequently called. I assume that you are going to do measurements :) > > > ? ? ? ? Though we don't have the TLAB implementation in our code, the compiler generated sampler uses 2% of overhead with a 512k sampling rate. > I can do real measurements > ? ? ? ? when the code settles and we can see how costly this is as a TLAB implementation. > ? ? ? ? However, my theory is that if the rate is 512k, the memory/performance overhead should be minimal since it is what we saw with our > code/workloads (though not called > ? ? ? ? the same way, we call it essentially at the same rate). > ? ? ? ? If you have a benchmark you'd like me to test, let me know! > > ? ? ? ? Right now, with my really small test, this does use a bit of overhead even for a 512k sample size. I don't know yet why, I'm going to > see what is going on. > > ? ? ? ? Finally, I think it is not reasonable to suppose the overhead to be negligible if the sampling rate used is too low. The user should > know that the lower the rate, > ? ? ? ? the higher the overhead (documentation TODO?). > > > ? ? ? ? ? ? ?I am not sure what the expected usage of the API is, but > ? ? ? ? ? ? ?StackTraceStore::add_trace() seems to be able to grow without bounds. > ? ? ? ? ? ? ?Only a GC truncates them to the live ones. That in itself seems to be > ? ? ? ? ? ? ?problematic (GCs can be *wide* apart), and of course some of the API > ? ? ? ? ? ? ?methods add to that because they duplicate that unbounded array. Do you > ? ? ? ? ? ? ?have any concerns/measurements about this? > > > ? ? ? ? So, the theory is that yes add_trace can be able to grow without bounds but it grows at a sample per 512k of allocated space. The > stacks it gathers are currently > ? ? ? ? maxed at 64 (I'd like to expand that to an option to the user though at some point). So I have no concerns because: > > ? ? ? ? - If really this is taking a lot of space, that means the job is keeping a lot of objects in memory as well, therefore the entire heap > is getting huge > ? ? ? ? - If this is the case, you will be triggering a GC at some point anyway. > > ? ? ? ? (I'm putting under the rug the issue of "What if we set the rate to 1 for example" because as you lower the sampling rate, we cannot > guarantee low overhead; the > ? ? ? ? idea behind this feature is to have a means of having meaningful allocated samples at a low overhead) > > ? ? ? ? I have no measurements really right now but since I now have some statistics I can poll, I will look a bit more at this question. > > ? ? ? ? I have the same last sentence than above: the user should expect this to happen if the sampling rate is too small. That probably can be > reflected in the > ? ? ? ? StartHeapSampling as a note : careful this might impact your performance. > > > ? ? ? ? ? ? ?Also, these stack traces might hold on to huge arrays. Any > ? ? ? ? ? ? ?consideration of that? Particularly it might be the cause for OOMEs in > ? ? ? ? ? ? ?tight memory situations. > > > ? ? ? ? There is a stack size maximum that is set to 64 so it should not hold huge arrays. I don't think this is an issue but I can double > check with a test or two. > > > ? ? ? ? ? ? ?- please consider adding a safepoint check in > ? ? ? ? ? ? ?HeapMonitoring::weak_oops_do to prevent accidental misuse. > > ? ? ? ? ? ? ?- in struct StackTraceStorage, the public fields may also need > ? ? ? ? ? ? ?underscores. At least some files in the runtime directory have structs > ? ? ? ? ? ? ?with underscored public members (and some don't). The runtime team > ? ? ? ? ? ? ?should probably comment on that. > > > ? ? ? ? Agreed I did not know. I looked around and a lot of structs did not have them it seemed so I left it as is. I will happily change it if > someone prefers (I was not > ? ? ? ? sure if you really preferred or not, your sentence seemed to be more a note of "this might need to change but I don't know if the > runtime team enforces that", let > ? ? ? ? me know if I read that wrongly). > > > ? ? ? ? ? ? ?- In StackTraceStorage::weak_oops_do(), when examining the > ? ? ? ? ? ? ?StackTraceData, maybe it is useful to consider having a non-NULL > ? ? ? ? ? ? ?reference outside of the heap's reserved space an error. There should > ? ? ? ? ? ? ?be no oop outside of the heap's reserved space ever. > > ? ? ? ? ? ? ?Unless you allow storing random values in StackTraceData::obj, which I > ? ? ? ? ? ? ?would not encourage. > > > ? ? ? ? I suppose you are talking about this part: > ? ? ? ? if ((value != NULL && Universe::heap()->is_in_reserved(value)) && > ? ? ? ? ? ? ? ? ? ? (is_alive == NULL || is_alive->do_object_b(value))) { > > ? ? ? ? What you are saying is that I could have something like: > ? ? ? ? if (value != my_non_null_reference && > ? ? ? ? ? ? ? ? ? ? (is_alive == NULL || is_alive->do_object_b(value))) { > > ? ? ? ? Is that what you meant? Is there really a reason to do so? When I look at the code, is_in_reserved seems like a O(1) method call. I'm > not even sure we can have a > ? ? ? ? NULL value to be honest. I might have to study that to see if this was not a paranoid test to begin with. > > ? ? ? ? The is_alive code has now morphed due to the comment below. > > > > ? ? ? ? ? ? ?- HeapMonitoring::weak_oops_do() does not seem to use the > ? ? ? ? ? ? ?passed AbstractRefProcTaskExecutor. > > > ? ? ? ? It did use it: > ? ? ? ? ? ?size_t HeapMonitoring::weak_oops_do( > ? ? ? ? ? ? ? AbstractRefProcTaskExecutor *task_executor, > ? ? ? ? ? ? ? BoolObjectClosure* is_alive, > ? ? ? ? ? ? ? OopClosure *f, > ? ? ? ? ? ? ? VoidClosure *complete_gc) { > ? ? ? ? ? ? assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint"); > > ? ? ? ? ? ? if (task_executor != NULL) { > ? ? ? ? ? ? ? task_executor->set_single_threaded_mode(); > ? ? ? ? ? ? } > ? ? ? ? ? ? return StackTraceStorage::storage()->weak_oops_do(is_alive, f, complete_gc); > ? ? ? ? } > > ? ? ? ? But due to the comment below, I refactored this, so this is no longer here. Now I have an always true closure that is passed. > > > ? ? ? ? ? ? ?- I do not understand allowing to call this method with a NULL > ? ? ? ? ? ? ?complete_gc closure. This would mean that objects referenced from the > ? ? ? ? ? ? ?object that is referenced by the StackTraceData are not pulled, meaning > ? ? ? ? ? ? ?they would get stale. > > ? ? ? ? ? ? ?- same with is_alive parameter value of NULL > > > ? ? ? ? So these questions made me look a bit closer at this code. This code I think was written this way to have a very small impact on the > file but you are right, there > ? ? ? ? is no reason for this here. I've simplified the code by making in referenceProcessor.cpp a process_HeapSampling method that handles > everything there. > > ? ? ? ? The code allowed NULLs because it depended on where you were coming from and how the code was being called. > > ? ? ? ? - I added a static always_true variable and pass that now to be more consistent with the rest of the code. > ? ? ? ? - I moved the complete_gc into process_phaseHeapSampling now (new method) and handle the task_executor and the complete_gc there > ? ? ? ? ? ? ?- Newbie question: in our code we did a set_single_threaded_mode but I see that process_phaseJNI does it right before its call, do > I need to do it for the > ? ? ? ? process_phaseHeapSample? > ? ? ? ? That API is much cleaner (in my mind) and is consistent with what is done around it (again in my mind). > > > ? ? ? ? ? ? ?- heapMonitoring.cpp:590: I do not completely understand the purpose of > ? ? ? ? ? ? ?this code: in the end this results in a fixed value directly dependent > ? ? ? ? ? ? ?on the Thread address anyway? In the end this results in a fixed value > ? ? ? ? ? ? ?directly dependent on the Thread address anyway? > ? ? ? ? ? ? ?IOW, what is special about exactly 20 rounds? > > > ? ? ? ? So we really want a fast random number generator that has a specific mean (512k is the default we use). The code uses the thread > address as the start number of the > ? ? ? ? sequence (why not, it is random enough is rationale). Then instead of just starting there, we prime the sequence and really only start > at the 21st number, it is > ? ? ? ? arbitrary and I have not done a study to see if we could do more or less of that. > > ? ? ? ? As I have the statistics of the system up and running, I'll run some experiments to see if this is needed, is 20 good, or not. > > > ? ? ? ? ? ? ?- also I would consider stripping a few bits of the threads' address as > ? ? ? ? ? ? ?initialization value for your rng. The last three bits (and probably > ? ? ? ? ? ? ?more, check whether the Thread object is allocated on special > ? ? ? ? ? ? ?boundaries) are always zero for them. > ? ? ? ? ? ? ?Not sure if the given "random" value is random enough before/after, > ? ? ? ? ? ? ?this method, so just skip that comment if you think this is not > ? ? ? ? ? ? ?required. > > > ? ? ? ? I don't know is the honest answer. I think what is important is that we tend towards a mean and it is random "enough" to not fall in > pitfalls of only sampling a > ? ? ? ? subset of objects due to their allocation order. I added that as test to do to see if it changes the mean in any way for the 512k > default value and/or if the first > ? ? ? ? 1000 elements look better. > > > ? ? ? ? ? ? ?Some more random nits I did not find a place to put anywhere: > > ? ? ? ? ? ? ?- ThreadLocalAllocBuffer::_extra_space does not seem to be used > ? ? ? ? ? ? ?anywhere? > > > ? ? ? ? Good catch :). > > > ? ? ? ? ? ? ?- Maybe indent the declaration of ThreadLocalAllocBuffer::_bytes_until_sample to align below the other members of that group. > > > ? ? ? ? Done moved it up a bit to have non static members together and static separate. > > ? ? ? ? ? ? ?Thanks, > ? ? ? ? ? ? ? ? Thomas > > > ? ? ? ? Thanks for your review! > ? ? ? ? Jc > > > > > > > > From jcbeyler at google.com Mon Oct 16 16:34:15 2017 From: jcbeyler at google.com (JC Beyler) Date: Mon, 16 Oct 2017 09:34:15 -0700 Subject: Low-Overhead Heap Profiling In-Reply-To: <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> References: <1497366226.2829.109.camel@oracle.com> <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> Message-ID: Hi Robbin, That is because version 11 to 12 was only a test change. I was going to write about it and say here are the webrev links: Incremental: http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/ Full webrev: http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/ This change focused only on refactoring the tests to be more manageable, readable, maintainable. As all tests are looking at allocations, I moved common code to a java class: http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch And then most tests call into that class to turn on/off the sampling, allocate, etc. This has removed almost 500 lines of test code so I'm happy about that. Thanks for your changes, a bit of relics of previous versions :). I've already integrated them into my code and will make a new webrev end of this week with a bit of refactor of the code handling the tlab slow path. I find it could use a bit of refactoring to make it easier to follow so I'm going to take a stab at it this week. Any other issues/comments? Thanks! Jc On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn wrote: > Hi JC, > > I saw a webrev.12 in the directory, with only test changes(11->12), so I > took that version. > I had a look and tested the tests, worked fine! > > First glance at the code (looking at full v12) some minor things below, > mostly unused stuff. > > Thanks, Robbin > > diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp > --- a/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct 16 > 16:54:06 2017 +0200 > +++ b/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct 16 > 17:42:42 2017 +0200 > @@ -211,2 +211,3 @@ > void initialize(int max_storage) { > + // validate max_storage to sane value ? What would 0 mean ? > MutexLocker mu(HeapMonitor_lock); > @@ -227,8 +228,4 @@ > bool initialized() { return _initialized; } > - volatile bool *initialized_address() { return &_initialized; } > > private: > - // Protects the traces currently sampled (below). > - volatile intptr_t _stack_storage_lock[1]; > - > // The traces currently sampled. > @@ -313,3 +310,2 @@ > _initialized(false) { > - _stack_storage_lock[0] = 0; > } > @@ -532,13 +528,2 @@ > > -// Delegate the initialization question to the underlying storage system. > -bool HeapMonitoring::initialized() { > - return StackTraceStorage::storage()->initialized(); > -} > - > -// Delegate the initialization question to the underlying storage system. > -bool *HeapMonitoring::initialized_address() { > - return > - const_cast(StackTraceStorage::storage()->initialized_ > address()); > -} > - > void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) { > diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp > --- a/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct 16 > 16:54:06 2017 +0200 > +++ b/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct 16 > 17:42:42 2017 +0200 > @@ -35,3 +35,2 @@ > static uint64_t _rnd; > - static bool _initialized; > static jint _monitoring_rate; > @@ -92,7 +91,2 @@ > > - // Is the profiler initialized and where is the address to the > initialized > - // boolean. > - static bool initialized(); > - static bool *initialized_address(); > - > // Called when o is to be sampled from a given thread and a given size. > > > > On 10/10/2017 12:57 AM, JC Beyler wrote: > >> Dear all, >> >> Thread-safety is back!! Here is the update webrev: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/ >> >> Full webrev is here: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/ >> >> In order to really test this, I needed to add this so thought now was a >> good time. It required a few changes here for the creation to ensure >> correctness and safety. Now we keep the static pointer but clear the data >> internally so on re-initialize, it will be a bit more costly than before. I >> don't think this is a huge use-case so I did not think it was a problem. I >> used the internal MutexLocker, I think I used it well, let me know. >> >> I also added three tests: >> >> 1) Stack depth test: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitorStackDepthTest.java.patch >> >> This test shows that the maximum stack depth system is working. >> >> 2) Thread safety: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitorThreadTest.java.patch >> >> The test creates 24 threads and they all allocate at the same time. The >> test then checks it does find samples from all the threads. >> >> 3) Thread on/off safety >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitorThreadOnOffTest.java.patch >> >> The test creates 24 threads that all allocate a bunch of memory. Then >> another thread turns the sampling on/off. >> >> Btw, both tests 2 & 3 failed without the locks. >> >> As I worked on this, I saw a lot of places where the tests are doing very >> similar things, I'm going to clean up the code a bit and make a >> HeapAllocator class that all tests can call directly. This will greatly >> simplify the code. >> >> Thanks for any comments/criticisms! >> Jc >> >> >> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler > jcbeyler at google.com>> wrote: >> >> Dear all, >> >> Small update to the webrev: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/> >> >> Full webrev is here: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/> >> >> I updated a bit of the naming, removed a TODO comment, and I added a >> test for testing the sampling rate. I also updated the maximum stack depth >> to 1024, there is no >> reason to keep it so small. I did a micro benchmark that tests the >> overhead and it seems relatively the same. >> >> I compared allocations from a stack depth of 10 and allocations from >> a stack depth of 1024 (allocations are from the same helper method in >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi >> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/ >> MyPackage/HeapMonitorStatRateTest.java >> > iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor >> /MyPackage/HeapMonitorStatRateTest.java>): >> - For an array of 1 integer allocated in a loop; stack >> depth 1024 vs stack depth 10: 1% slower >> - For an array of 200k integers allocated in a loop; stack >> depth 1024 vs stack depth 10: 3% slower >> >> So basically now moving the maximum stack depth to 1024 but we only >> copy over the stack depths actually used. >> >> For the next webrev, I will be adding a stack depth test to show that >> it works and probably put back the mutex locking so that we can see how >> difficult it is to keep >> thread safe. >> >> Let me know what you think! >> Jc >> >> >> >> On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler > > wrote: >> >> Forgot to say that for my numbers: >> - Not in the test are the actual numbers I got for the various >> array sizes, I ran the program 30 times and parsed the output; here are the >> averages and standard >> deviation: >> 1000: 1.28% average; 1.13% standard deviation >> 10000: 1.59% average; 1.25% standard deviation >> 100000: 1.26% average; 1.26% standard deviation >> >> The 1000/10000/100000 are the sizes of the arrays being >> allocated. These are allocated 100k times and the sampling rate is 111 >> times the size of the array. >> >> Thanks! >> Jc >> >> >> On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler > > wrote: >> >> Hi all, >> >> After a bit of a break, I am back working on this :). As >> before, here are two webrevs: >> >> - Full change set: http://cr.openjdk.java.net/~ra >> sbold/8171119/webrev.09/ > asbold/8171119/webrev.09/> >> - Compared to version 8: http://cr.openjdk.java.net/~ra >> sbold/8171119/webrev.08_09/ > asbold/8171119/webrev.08_09/> >> (This version is compared to version 8 I last showed but >> ported to the new folder hierarchy) >> >> In this version I have: >> - Handled Thomas' comments from his email of 07/03: >> - Merged the logging to be standard >> - Fixed up the code a bit where asked >> - Added some notes about the code not being >> thread-safe yet >> - Removed additional dead code from the version that >> modifies interpreter/c1/c2 >> - Fixed compiler issues so that it compiles with >> --disable-precompiled-header >> - Tested with ./configure --with-boot-jdk= >> --with-debug-level=slowdebug --disable-precompiled-headers >> >> Additionally, I added a test to check the sanity of the >> sampler: HeapMonitorStatCorrectnessTest >> (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te >> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorStatCorrectnessTest.java.patch > asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit >> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>) >> - This allocates a number of arrays and checks that we >> obtain the number of samples we want with an accepted error of 5%. I tested >> it 100 times and it >> passed everytime, I can test more if wanted >> - Not in the test are the actual numbers I got for the >> various array sizes, I ran the program 30 times and parsed the output; here >> are the averages and >> standard deviation: >> 1000: 1.28% average; 1.13% standard deviation >> 10000: 1.59% average; 1.25% standard deviation >> 100000: 1.26% average; 1.26% standard deviation >> >> What this means is that we were always at about 1~2% of the >> number of samples the test expected. >> >> Let me know what you think, >> Jc >> >> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler < >> jcbeyler at google.com > wrote: >> >> Hi all, >> >> I apologize, I have not yet handled your remarks but >> thought this new webrev would also be useful to see and comment on perhaps. >> >> Here is the latest webrev, it is generated slightly >> different than the others since now I'm using webrev.ksh without the -N >> option: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/> >> >> And the webrev.07 to webrev.08 diff is here: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ >> >> >> (Let me know if it works well) >> >> It's a small change between versions but it: >> - provides a fix that makes the average sample rate >> correct (more on that below). >> - fixes the code to actually have it play nicely with >> the fast tlab refill >> - cleaned up a bit the JVMTI text and now use >> jvmtiFrameInfo >> - moved the capability to be onload solo >> >> With this webrev, I've done a small study of the random >> number generator we use here for the sampling rate. I took a small program >> and it can be simplified to: >> >> for (outer loop) >> for (inner loop) >> int[] tmp = new int[arraySize]; >> >> - I've fixed the outer and inner loops to being 800 for >> this experiment, meaning we allocate 640000 times an array of a given array >> size. >> >> - Each program provides the average sample size used for >> the whole execution >> >> - Then, I ran each variation 30 times and then calculated >> the average of the average sample size used for various array sizes. I >> selected the array size to >> be one of the following: 1, 10, 100, 1000. >> >> - When compared to 512kb, the average sample size of 30 >> runs: >> 1: 4.62% of error >> 10: 3.09% of error >> 100: 0.36% of error >> 1000: 0.1% of error >> 10000: 0.03% of error >> >> What it shows is that, depending on the number of >> samples, the average does become better. This is because with an allocation >> of 1 element per array, it >> will take longer to hit one of the thresholds. This is >> seen by looking at the sample count statistic I put in. For the same number >> of iterations (800 * >> 800), the different array sizes provoke: >> 1: 62 samples >> 10: 125 samples >> 100: 788 samples >> 1000: 6166 samples >> 10000: 57721 samples >> >> And of course, the more samples you have, the more sample >> rates you pick, which means that your average gets closer using that math. >> >> Thanks, >> Jc >> >> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler < >> jcbeyler at google.com > wrote: >> >> Thanks Robbin, >> >> This seems to have worked. When I have the next >> webrev ready, we will find out but I'm fairly confident it will work! >> >> Thanks agian! >> Jc >> >> On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn < >> robbin.ehn at oracle.com > wrote: >> >> Hi JC, >> >> On 06/29/2017 12:15 AM, JC Beyler wrote: >> >> B) Incremental changes >> >> >> I guess the most common work flow here is using >> mq : >> hg qnew fix_v1 >> edit files >> hg qrefresh >> hg qnew fix_v2 >> edit files >> hg qrefresh >> >> if you do hg log you will see 2 commits >> >> webrev.ksh -r -2 -o my_inc_v1_v2 >> webrev.ksh -o my_full_v2 >> >> >> In your .hgrc you might need: >> [extensions] >> mq = >> >> /Robbin >> >> >> Again another newbiew question here... >> >> For showing the incremental changes, is there >> a link that explains how to do that? I apologize for my newbie questions >> all the time :) >> >> Right now, I do: >> >> ksh ../webrev.ksh -m -N >> >> That generates a webrev.zip and send it to >> Chuck Rasbold. He then uploads it to a new webrev. >> >> I tried commiting my change and adding a >> small change. Then if I just do ksh ../webrev.ksh without any options, it >> seems to produce a similar >> page but now with only the changes I had (so >> the 06-07 comparison you were talking about) and a changeset that has it >> all. I imagine that is >> what you meant. >> >> Which means that my workflow would become: >> >> 1) Make changes >> 2) Make a webrev without any options to show >> just the differences with the tip >> 3) Amend my changes to my local commit so >> that I have it done with >> 4) Go to 1 >> >> Does that seem correct to you? >> >> Note that when I do this, I only see the full >> change of a file in the full change set (Side note here: now the page says >> change set and not >> patch, which is maybe why Serguei was having >> issues?). >> >> Thanks! >> Jc >> >> >> >> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn < >> robbin.ehn at oracle.com > robbin.ehn at oracle.com >> >> wrote: >> >> Hi, >> >> On 06/28/2017 12:04 AM, JC Beyler wrote: >> >> Dear Thomas et al, >> >> Here is the newest webrev: >> http://cr.openjdk.java.net/~ra >> sbold/8171119/webrev.07/ > asbold/8171119/webrev.07/> >> > asbold/8171119/webrev.07/ > asbold/8171119/webrev.07/>> >> >> >> >> You have some more bits to in there but >> generally this looks good and really nice with more tests. >> I'll do and deep dive and re-test this >> when I get back from my long vacation with whatever patch version you have >> then. >> >> Also I think it's time you provide >> incremental (v06->07 changes) as well as complete change-sets. >> >> Thanks, Robbin >> >> >> >> >> Thomas, I "think" I have answered >> all your remarks. The summary is: >> >> - The statistic system is up and >> provides insight on what the heap sampler is doing >> - I've noticed that, though the >> sampling rate is at the right mean, we are missing some samples, I have not >> yet tracked out why >> (details below) >> >> - I've run a tiny benchmark that is >> the worse case: it is a very tight loop and allocated a small array >> - In this case, I see no >> overhead when the system is off so that is a good start :) >> - I see right now a high >> overhead in this case when sampling is on. This is not a really too >> surprising but I'm going to see if >> this is consistent with our >> internal implementation. The >> benchmark is really allocation stressful so I'm not too surprised but I >> want to do the due diligence. >> >> - The statistic system up is up >> and I have a new test >> http://cr.openjdk.java.net/~ra >> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorStatTest.java.patch >> > asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorStatTest.java.patch> >> > asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorStatTest.java.patch >> > asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorStatTest.java.patch>> >> - I did a bit of a study about >> the random generator here, more details are below but basically it seems to >> work well >> >> - I added a capability but since >> this is the first time doing this, I was not sure I did it right >> - I did add a test though for >> it and the test seems to do what I expect (all methods are failing with the >> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error). >> - >> http://cr.openjdk.java.net/~ra >> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch >> > asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch> >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch >> > asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>> >> >> - I still need to figure out what >> to do about the multi-agent vs single-agent issue >> >> - As far as measurements, it >> seems I still need to look at: >> - Why we do the 20 random calls >> first, are they necessary? >> - Look at the mean of the >> sampling rate that the random generator does and also what is actually >> sampled >> - What is the overhead in terms >> of memory/performance when on? >> >> I have inlined my answers, I think I >> got them all in the new webrev, let me know your thoughts. >> >> Thanks again! >> Jc >> >> >> On Fri, Jun 23, 2017 at 3:52 AM, >> Thomas Schatzl > com> >> > thomas.schatzl at oracle.com>> > thomas.schatzl at oracle.com> >> >> > >>> wrote: >> >> Hi, >> >> On Wed, 2017-06-21 at 13:45 >> -0700, JC Beyler wrote: >> > Hi all, >> > >> > First off: Thanks again to >> Robbin and Thomas for their reviews :) >> > >> > Next, I've uploaded a new >> webrev: >> > >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/> >> > asbold/8171119/webrev.06/ > asbold/8171119/webrev.06/>> >> > asbold/8171119/webrev.06/ > asbold/8171119/webrev.06/> >> > asbold/8171119/webrev.06/ > asbold/8171119/webrev.06/>>> >> >> > >> > Here is an update: >> > >> > - @Robbin, I forgot to say >> that yes I need to look at implementing >> > this for the other >> architectures and testing it before it is all >> > ready to go. Is it common to >> have it working on all possible >> > combinations or is there a >> subset that I should be doing first and we >> > can do the others later? >> > - I've tested slowdebug, >> built and ran the JTreg tests I wrote with >> > slowdebug and fixed a few >> more issues >> > - I've refactored a bit of >> the code following Thomas' comments >> > - I think I've handled all >> the comments from Thomas (I put >> > comments inline below for the >> specifics) >> >> Thanks for handling all those. >> >> > - Following Thomas' comments >> on statistics, I want to add some >> > quality assurance tests and >> find that the easiest way would be to >> > have a few counters of what >> is happening in the sampler and expose >> > that to the user. >> > - I'll be adding that in >> the next version if no one sees any >> > objections to that. >> > - This will allow me to >> add a sanity test in JTreg about number of >> > samples and average of >> sampling rate >> > >> > @Thomas: I had a few >> questions that I inlined below but I will >> > summarize the "bigger ones" >> here: >> > - You mentioned constants >> are not using the right conventions, I >> > looked around and didn't see >> any convention except normal naming then >> > for static constants. Is that >> right? >> >> I looked through >> https://wiki.openjdk.java.net/display/HotSpot/StyleGui < >> https://wiki.openjdk.java.net/display/HotSpot/StyleGui> >> > /display/HotSpot/StyleGui > /display/HotSpot/StyleGui>> >> > /display/HotSpot/StyleGui > /display/HotSpot/StyleGui> >> > /display/HotSpot/StyleGui > /display/HotSpot/StyleGui>>> >> de and the rule is to "follow >> an existing pattern and must have a >> distinct appearance from other >> names". Which does not help a lot I >> guess :/ The GC team started >> using upper camel case, e.g. >> SomeOtherConstant, but very >> likely this is probably not applied >> consistently throughout. So I >> am fine with not adding another style >> (like kMaxStackDepth with the >> "k" in front with some unknown meaning) >> is fine. >> >> (Chances are you will find that >> style somewhere used anyway too, >> apologies if so :/) >> >> >> Thanks for that link, now I know >> where to look. I used the upper camel case in my code as well then :) I >> should have gotten them all. >> >> >> > PS: I've also inlined my >> answers to Thomas below: >> > >> > On Tue, Jun 13, 2017 at 8:03 >> AM, Thomas Schatzl > > e.com < >> http://e.com> > wrote: >> > > Hi all, >> > > >> > > On Mon, 2017-06-12 at >> 11:11 -0700, JC Beyler wrote: >> > > > Dear all, >> > > > >> > > > I've continued working >> on this and have done the following >> > > webrev: >> > > > >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/> >> > asbold/8171119/webrev.05/ > asbold/8171119/webrev.05/>> >> > asbold/8171119/webrev.05/ > asbold/8171119/webrev.05/> >> > asbold/8171119/webrev.05/ > asbold/8171119/webrev.05/>>> >> >> > > >> > > [...] >> > > > Things I still need to >> do: >> > > > - Have to fix that >> TLAB case for the FastTLABRefill >> > > > - Have to start >> looking at the data to see that it is >> > > consistent and does gather >> the right samples, right frequency, etc. >> > > > - Have to check the >> GC elements and what that produces >> > > > - Run a slowdebug run >> and ensure I fixed all those issues you >> > > saw > Robbin >> > > > >> > > > Thanks for looking at >> the webrev and have a great week! >> > > >> > > scratching a bit on the >> surface of this change, so apologies for >> > > rather shallow comments: >> > > >> > > - >> macroAssembler_x86.cpp:5604: while this is compiler code, and I >> > > am not sure this is final, >> please avoid littering the code with >> > > TODO remarks :) They tend >> to be candidates for later wtf moments >> > > only. >> > > >> > > Just file a CR for that. >> > > >> > Newcomer question: what is a >> CR and not sure I have the rights to do >> > that yet ? :) >> >> Apologies. CR is a change >> request, this suggests to file a bug in the >> bug tracker. And you are right, >> you can't just create a new account in >> the OpenJDK JIRA yourselves. :( >> >> >> Ok good to know, I'll continue with >> my own todo list but I'll work hard on not letting it slip in the webrevs >> anymore :) >> >> >> I was mostly referring to the >> "... but it is a TODO" part of that >> comment in >> macroassembler_x86.cpp. Comments about the why of the code >> are appreciated. >> >> [Note that I now understand >> that this is to some degree still work in >> progress. As long as the final >> changeset does no contain TODO's I am >> fine (and it's not a hard >> objection, rather their use in "final" code >> is typically limited in my >> experience)] >> >> 5603 // Currently, if this >> happens, just set back the actual end to >> where it was. >> 5604 // We miss a chance to >> sample here. >> >> Would be okay, if explaining >> "this" and the "why" of missing a chance >> to sample here would be best. >> >> Like maybe: >> >> // If we needed to refill >> TLABs, just set the actual end point to >> // the end of the TLAB again. >> We do not sample here although we could. >> >> Done with your comment, it works >> well in my mind. >> >> I am not sure whether "miss a >> chance to sample" meant "we could, but >> consciously don't because it's >> not that useful" or "it would be >> necessary but don't because >> it's too complicated to do.". >> >> Looking at the original comment >> once more, I am also not sure if that >> comment shouldn't referring to >> the "end" variable (not actual_end) >> because that's the variable >> that is responsible for taking the sampling >> path? (Going from the member >> description of ThreadLocalAllocBuffer). >> >> >> I've moved this code and it no >> longer shows up here but the rationale and answer was: >> >> So.. Yes, end is the variable >> provoking the sampling. Actual end is the actual end of the TLAB. >> >> What was happening here is that the >> code is resetting _end to point towards the end of the new TLAB. Because, >> we now have the end for >> sampling and _actual_end for >> the actual end, we need to update >> the actual_end as well. >> >> Normally, were we to do the real >> work here, we would calculate the (end - start) offset, then do: >> >> - Set the new end to : start + >> (old_end - old_start) >> - Set the actual end like we do here >> now where it because it is the actual end. >> >> Why is this not done here now >> anymore? >> - I was still debating which >> path to take: >> - Do it in the fast refill >> code, it has its perks: >> - In a world where fast >> refills are happening all the time or a lot, we can augment there the code >> to do the sampling >> - Remember what we had as an >> end before leaving the slowpath and check on return >> - This is what I'm doing >> now, it removes the need to go fix up all fast refill paths but if you >> remain in fast refill paths, >> you won't get sampling. I >> have to think of the consequences of >> that, maybe a future change later on? >> - I have the >> statistics now so I'm going to study that >> -> By the way, >> though my statistics are showing I'm missing some samples, if I turn off >> FastTlabRefill, it is the same >> loss so for now, it seems >> this does not occur in my simple >> test. >> >> >> >> But maybe I am only confused >> and it's best to just leave the comment >> away. :) >> >> Thinking about it some more, >> doesn't this not-sampling in this case >> mean that sampling does not >> work in any collector that does inline TLAB >> allocation at the moment? (Or >> is inline TLAB alloc automatically >> disabled with sampling somehow?) >> >> That would indeed be a bigger >> TODO then :) >> >> >> Agreed, this remark made me think >> that perhaps as a first step the new way of doing it is better but I did >> have to: >> - Remove the const of the >> ThreadLocalBuffer remaining and hard_end methods >> - Move hard_end out of the header >> file to have a bit more logic there >> >> Please let me know what you think of >> that and if you prefer it this way or changing the fast refills. (I prefer >> this way now because it >> is more incremental). >> >> >> > > - calling >> HeapMonitoring::do_weak_oops() (which should probably be >> > > called weak_oops_do() like >> other similar methods) only if string >> > > deduplication is enabled >> (in g1CollectedHeap.cpp:4511) seems wrong. >> > >> > The call should be at least >> around 6 lines up outside the if. >> > >> > Preferentially in a method >> like process_weak_jni_handles(), including >> > additional logging. (No new >> (G1) gc phase without minimal logging >> > :)). >> > Done but really not sure >> because: >> > >> > I put for logging: >> > log_develop_trace(gc, >> freelist)("G1ConcRegionFreeing [other] : heap >> > monitoring"); >> >> I would think that "gc, ref" >> would be more appropriate log tags for >> this similar to jni handles. >> (I am als not sure what weak >> reference handling has to do with >> G1ConcRegionFreeing, so I am a >> bit puzzled) >> >> >> I was not sure what to put for the >> tags or really as the message. I cleaned it up a bit now to: >> log_develop_trace(gc, >> ref)("HeapSampling [other] : heap monitoring processing"); >> >> >> >> > Since weak_jni_handles didn't >> have logging for me to be inspired >> > from, I did that but >> unconvinced this is what should be done. >> >> The JNI handle processing does >> have logging, but only in >> ReferenceProcessor::process_discovered_references(). >> In >> process_weak_jni_handles() only >> overall time is measured (in a G1 >> specific way, since only G1 >> supports disabling reference procesing) :/ >> >> The code in ReferenceProcessor >> prints both time taken >> referenceProcessor.cpp:254, as >> well as the count, but strangely only in >> debug VMs. >> >> I have no idea why this logging >> is that unimportant to only print that >> in a debug VM. However there >> are reviews out for changing this area a >> bit, so it might be useful to >> wait for that (JDK-8173335). >> >> >> I cleaned it up a bit anyway and now >> it returns the count of objects that are in the system. >> >> >> > > - the change doubles the >> size of >> > > >> CollectedHeap::allocate_from_tlab_slow() above the "small and nice" >> > > threshold. Maybe it could >> be refactored a bit. >> > Done I think, it looks better >> to me :). >> >> In >> ThreadLocalAllocBuffer::handle_sample() I think the >> set_back_actual_end()/pick_next_sample() >> calls could be hoisted out of >> the "if" :) >> >> >> Done! >> >> >> > > - >> referenceProcessor.cpp:261: the change should add logging about >> > > the number of references >> encountered, maybe after the corresponding >> > > "JNI weak reference count" >> log message. >> > Just to double check, are you >> saying that you'd like to have the heap >> > sampler to keep in store how >> many sampled objects were encountered in >> > the >> HeapMonitoring::weak_oops_do? >> > - Would a return of the >> method with the number of handled >> > references and logging that >> work? >> >> Yes, it's fine if >> HeapMonitoring::weak_oops_do() only returned the >> number of processed weak oops. >> >> >> Done also (but I admit I have not >> tested the output yet) :) >> >> >> > - Additionally, would you >> prefer it in a separate block with its >> > GCTraceTime? >> >> Yes. Both kinds of information >> is interesting: while the time taken is >> typically more important, the >> next question would be why, and the >> number of references typically >> goes a long way there. >> >> See above though, it is >> probably best to wait a bit. >> >> >> Agreed that I "could" wait but, if >> it's ok, I'll just refactor/remove this when we get closer to something >> final. Either, JDK-8173335 >> has gone in and I will notice it now >> or it will soon and I can change it then. >> >> >> > > - >> threadLocalAllocBuffer.cpp:331: one more "TODO" >> > Removed it and added it to my >> personal todos to look at. >> > > > >> > > - >> threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class >> > > documentation should be >> updated about the sampling additions. I >> > > would have no clue what the >> difference between "actual_end" and >> > > "end" would be from the >> given information. >> > If you are talking about the >> comments in this file, I made them more >> > clear I hope in the new >> webrev. If it was somewhere else, let me know >> > where to change. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Oct 16 18:06:59 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 16 Oct 2017 11:06:59 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> Message-ID: I looked more on changes. First, please, run RBT testing. May be ask SQE to run testing with AOTed java.base as they did before. I did not look on Graal assuming Labs reviewed it already. JVMCI changes looks fine to me. JAOTC. In addition to previous comment about change in DataPatchProcessor.java. ------ AOTCompiledClass.java - I wish metadataName() was defined in corresponding classes instead of manual checking type. It is fine for now but I would add assert for 'else' case that only HotSpotResolvedObjectType ref is expected there. Should we also unify how we generate method name? We use JavaMethodInfo.uniqueMethodName() in few places. Then we have AOTHotSpotResolvedJavaMethod.getNameAndSignature(). And now you added new metadataName(). Hotspot AOT code. ----------------- aotLoader.hpp - you don't need 2 methods. Move UseAOT check into .cpp code and in .hpp you can do: static bool reconcile_dynamic_invoke(InstanceKlass* holder, int index, Method* adapter_method, Klass *appendix_klass) NOT_AOT({ return true; }); aotCodeHeap.* - I don't like that you have separate reconcile_dynamic_klass() method only for one use. Instead of passin [2] array pas it as separate parameter so you can pass NULL when it is not defined. Hotspot fingerprint. ------------------- I am concern that you changed logic when and how klass's fingerprint is generated. With your changes it become more expensive: if (UseAOT && ik->supers_have_passed_fingerprint_checks()) { + uint64_t str_fp = _stream->compute_fingerprint(); Why removing !result->is_anonymous() check is not enough?: if (InstanceKlass::should_store_fingerprint()) { result->store_fingerprint(stream->compute_fingerprint()); Thanks, Vladimir On 10/6/17 12:52 PM, dean.long at oracle.com wrote: > On 10/6/17 12:37 PM, dean.long at oracle.com wrote: > >> On 10/6/17 10:03 AM, Igor Veresov wrote: >> >>> >>> >>>> On Oct 6, 2017, at 9:52 AM, Vladimir Kozlov > wrote: >>>> >>>> On 10/5/17 11:16 PM, Igor Veresov wrote: >>>>>> On Oct 5, 2017, at 10:57 AM, dean.long at oracle.com wrote: >>>>>> >>>>>> On 10/4/17 6:27 PM, Vladimir Kozlov wrote: >>>>>> >>>>>>> Yes, I start looking on it. >>>>>>> >>>>>>> In DataPatchProcessor.java why you removed addDependentKlassData() call?: >>>>>>> >>>>>>> + AOTCompiledClass.addFingerprintKlassData(binaryContainer, type); >>>>>>> + ???????????????targetSymbol = AOTCompiledClass.metadataName(type); >>>>>>> ?????????????????gotName = ((action == HotSpotConstantLoadAction.INITIALIZE) ? "got.init." : "got.") + targetSymbol; >>>>>>> - ???????????????methodInfo.addDependentKlassData(binaryContainer, type); >>>>>>> ?????????????} else if (metaspaceConstant.asResolvedJavaMethod() != null && action == >>>>>>> HotSpotConstantLoadAction.LOAD_COUNTERS) { >>>>>>> >>>>>> >>>>>> It is supposed to be an optimization, to prevent adding dependencies when we don't need them. ?We add dependencies >>>>>> elsewhere if we inline a method or reference a field, etc. ?I don't think we need a dependency just because we >>>>>> reference a constant. >>>>>> Igor, do you agree? >>>>>> >>>>> I suppose you?re right. Field offset seems to be the only place where a dependency would be required and we should >>>>> get it covered. Perhaps this was added before we had field access recording. But I?d test it in case something pops >>>>> up (although nothing come to mind right now). >>>> >>>> What about allocations and runtime guard checks (class checks)? >>> >>> >>> Yes, good point. Allocation will have the size of the object as a constant, which is definitely something we need a >>> dependency for. So either we need to collect all types allocated in the parser, or leave that statement as it were. >>> Perhaps we need a followup RFE to clean this up. >>> >> >> OK let me see if I can simply revert that change.? There may be an ordering problem that I was trying to fix at the >> same time. >> > > I forgot to mention, if it's in the metadata, then there should be a dependency, and we have an assert that checks for > that.? Do all allocations and class checks generate a metadata entry?? But I agree that a followup RFE is safer. > > dl > > >> dl >> >>> igor >>> >>>> >>>> Vladimir >>>> >>>>> igor >>>>>>> Is HotSpotConstantPoolObject is real oop (Java object)? oop_got array is scanned for oops only. >>>>>>> >>>>>> >>>>>> Yes, it's the appendix object. >>>>>> >>>>>>> Can you explain why the same class can have several Metaspace Names? Are all of them correspond to one class (and >>>>>>> its methods)? Should we do more in load_klass_data() in such case. >>>>>>> >>>>>> >>>>>> It is a many to one mapping (aliases) for anonymous classes because we can't rely on the temporary name that the >>>>>> JVM creates. Regular classes use load_klass_data but anonymous classes don't. ?I have a TODO in >>>>>> AOTCodeHeap::reconcile_dynamic_klass() for loading code for anonymous classes, but it is disabled because I >>>>>> haven't implemented AOT of anonymous classes yet: >>>>>> >>>>>> ??// TODO: hook up any AOT code >>>>>> ??// load_klass_data(dyno_data, thread); >>>>>> >>>>>>> Please, check consumption of Java heap since you are passing oops for metadata generation instead of strings >>>>>>> through JVMCI. >>>>>>> >>>>>> >>>>>> OK. >>>>>> >>>>>> dl >>>>>> >>>>>>> Vladimir >>>>>>> >>>>>>> On 10/4/17 2:50 PM, dean.long at oracle.com wrote: >>>>>>>> Hi Vladimir, do you have time to review this? >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> >>>>>>>> On 9/11/17 7:21 PM, Dean Long wrote: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132547 >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~dlong/8132547/ >>>>>>>>> >>>>>>>>> This enhancement is a first step in supporting invokedynamic instructions in AOT. ?Previously, when we saw an >>>>>>>>> invokedynamic instruction, or any anonymous class, we would generate code to bail out and deoptimize. ?With >>>>>>>>> this changeset we go a little further and call into the runtime to resolve the dynamic constant pool entry, >>>>>>>>> running the bootstrap method, and returning the adapter method and appendix object. ?Like class initialization >>>>>>>>> in AOT, we only do this the first time through. Because AOT double-checks classes using fingerprints and >>>>>>>>> symbolic names, special care was required to handle anonymous class names. ?The solution I chose was to name >>>>>>>>> anonymous types with aliases based on their constant pool location ("adapter" and >>>>>>>>> appendix"). >>>>>>>>> >>>>>>>>> Future work is needed to AOT-compile the anonymous classes and/or inline through them, so this change is not >>>>>>>>> expected to affect AOT performance. ?In my tests I was not able to measure any difference. >>>>>>>>> >>>>>>>>> Upstream Graal changes have already been pushed. ?I broke the JVMCI and hotspot changes into separate webrevs. >>>>>>>>> >>>>>>>>> dl >>> >> > From igor.veresov at oracle.com Mon Oct 16 23:51:48 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 16 Oct 2017 16:51:48 -0700 Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts Message-ID: This fixes paths in a couple of more places after repo consolidation. Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/ Thanks, igor From dean.long at oracle.com Tue Oct 17 07:57:00 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 17 Oct 2017 00:57:00 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> Message-ID: <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> New hotspot webrev is here: http://cr.openjdk.java.net/~dlong/8132547/hs.2/ Comments inlined below... On 10/16/17 11:06 AM, Vladimir Kozlov wrote: > I looked more on changes. > > First, please, run RBT testing. May be ask SQE to run testing with > AOTed java.base as they did before. > > I did not look on Graal assuming Labs reviewed it already. > JVMCI changes looks fine to me. > > JAOTC. In addition to previous comment about change in > DataPatchProcessor.java. > ------ > > AOTCompiledClass.java - I wish metadataName() was defined in > corresponding classes instead of manual checking type. To do that I would probably need to wrap the JVMCI types in new JAOTC types.? I don't want to pollute the JVMCI types. > It is fine for now but I would add assert for 'else' case that only > HotSpotResolvedObjectType ref is expected there. > Sure, thanks for catching that.? The cast will fail with an exception without the assert, but the assert can give a more informative error message. > Should we also unify how we generate method name? > We use JavaMethodInfo.uniqueMethodName() in few places. > Then we have AOTHotSpotResolvedJavaMethod.getNameAndSignature(). > And now you added new metadataName(). > OK, I filed RFE 8189411. > Hotspot AOT code. > ----------------- > > aotLoader.hpp - you don't need 2 methods. Move UseAOT check into .cpp > code and in .hpp you can do: > > static bool reconcile_dynamic_invoke(InstanceKlass* holder, int index, > Method* adapter_method, Klass *appendix_klass) NOT_AOT({ return true; }); > > aotCodeHeap.* - I don't like that you have separate > reconcile_dynamic_klass() method only for one use. Instead of passin > [2] array pas it as separate parameter so you can pass NULL when it is > not defined. > OK. > Hotspot fingerprint. > ------------------- > > I am concern that you changed logic when and how klass's fingerprint > is generated. With your changes it become more expensive: > > ?? if (UseAOT && ik->supers_have_passed_fingerprint_checks()) { > +??? uint64_t str_fp = _stream->compute_fingerprint(); > You are right, I will revert these changes that were left over from an earlier version. > ?Why removing !result->is_anonymous() check is not enough?: > > ?if (InstanceKlass::should_store_fingerprint()) { > ?? result->store_fingerprint(stream->compute_fingerprint()); > Because InstanceKlass::should_store_fingerprint() will return false for an anonymous class. dl > Thanks, > Vladimir > > On 10/6/17 12:52 PM, dean.long at oracle.com wrote: >> On 10/6/17 12:37 PM, dean.long at oracle.com wrote: >> >>> On 10/6/17 10:03 AM, Igor Veresov wrote: >>> >>>> >>>> >>>>> On Oct 6, 2017, at 9:52 AM, Vladimir Kozlov >>>>> > >>>>> wrote: >>>>> >>>>> On 10/5/17 11:16 PM, Igor Veresov wrote: >>>>>>> On Oct 5, 2017, at 10:57 AM, dean.long at oracle.com >>>>>>> wrote: >>>>>>> >>>>>>> On 10/4/17 6:27 PM, Vladimir Kozlov wrote: >>>>>>> >>>>>>>> Yes, I start looking on it. >>>>>>>> >>>>>>>> In DataPatchProcessor.java why you removed >>>>>>>> addDependentKlassData() call?: >>>>>>>> >>>>>>>> + AOTCompiledClass.addFingerprintKlassData(binaryContainer, type); >>>>>>>> + ???????????????targetSymbol = >>>>>>>> AOTCompiledClass.metadataName(type); >>>>>>>> ?????????????????gotName = ((action == >>>>>>>> HotSpotConstantLoadAction.INITIALIZE) ? "got.init." : "got.") + >>>>>>>> targetSymbol; >>>>>>>> - >>>>>>>> ???????????????methodInfo.addDependentKlassData(binaryContainer, >>>>>>>> type); >>>>>>>> ?????????????} else if >>>>>>>> (metaspaceConstant.asResolvedJavaMethod() != null && action == >>>>>>>> HotSpotConstantLoadAction.LOAD_COUNTERS) { >>>>>>>> >>>>>>> >>>>>>> It is supposed to be an optimization, to prevent adding >>>>>>> dependencies when we don't need them. ?We add dependencies >>>>>>> elsewhere if we inline a method or reference a field, etc. ?I >>>>>>> don't think we need a dependency just because we reference a >>>>>>> constant. >>>>>>> Igor, do you agree? >>>>>>> >>>>>> I suppose you?re right. Field offset seems to be the only place >>>>>> where a dependency would be required and we should get it >>>>>> covered. Perhaps this was added before we had field access >>>>>> recording. But I?d test it in case something pops up (although >>>>>> nothing come to mind right now). >>>>> >>>>> What about allocations and runtime guard checks (class checks)? >>>> >>>> >>>> Yes, good point. Allocation will have the size of the object as a >>>> constant, which is definitely something we need a dependency for. >>>> So either we need to collect all types allocated in the parser, or >>>> leave that statement as it were. Perhaps we need a followup RFE to >>>> clean this up. >>>> >>> >>> OK let me see if I can simply revert that change.? There may be an >>> ordering problem that I was trying to fix at the same time. >>> >> >> I forgot to mention, if it's in the metadata, then there should be a >> dependency, and we have an assert that checks for that.? Do all >> allocations and class checks generate a metadata entry?? But I agree >> that a followup RFE is safer. >> >> dl >> >> >>> dl >>> >>>> igor >>>> >>>>> >>>>> Vladimir >>>>> >>>>>> igor >>>>>>>> Is HotSpotConstantPoolObject is real oop (Java object)? oop_got >>>>>>>> array is scanned for oops only. >>>>>>>> >>>>>>> >>>>>>> Yes, it's the appendix object. >>>>>>> >>>>>>>> Can you explain why the same class can have several Metaspace >>>>>>>> Names? Are all of them correspond to one class (and its >>>>>>>> methods)? Should we do more in load_klass_data() in such case. >>>>>>>> >>>>>>> >>>>>>> It is a many to one mapping (aliases) for anonymous classes >>>>>>> because we can't rely on the temporary name that the JVM >>>>>>> creates. Regular classes use load_klass_data but anonymous >>>>>>> classes don't. ?I have a TODO in >>>>>>> AOTCodeHeap::reconcile_dynamic_klass() for loading code for >>>>>>> anonymous classes, but it is disabled because I haven't >>>>>>> implemented AOT of anonymous classes yet: >>>>>>> >>>>>>> ??// TODO: hook up any AOT code >>>>>>> ??// load_klass_data(dyno_data, thread); >>>>>>> >>>>>>>> Please, check consumption of Java heap since you are passing >>>>>>>> oops for metadata generation instead of strings through JVMCI. >>>>>>>> >>>>>>> >>>>>>> OK. >>>>>>> >>>>>>> dl >>>>>>> >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 10/4/17 2:50 PM, dean.long at oracle.com >>>>>>>> wrote: >>>>>>>>> Hi Vladimir, do you have time to review this? >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> dl >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/11/17 7:21 PM, Dean Long wrote: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132547 >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~dlong/8132547/ >>>>>>>>>> >>>>>>>>>> This enhancement is a first step in supporting invokedynamic >>>>>>>>>> instructions in AOT. ?Previously, when we saw an >>>>>>>>>> invokedynamic instruction, or any anonymous class, we would >>>>>>>>>> generate code to bail out and deoptimize. ?With this >>>>>>>>>> changeset we go a little further and call into the runtime to >>>>>>>>>> resolve the dynamic constant pool entry, running the >>>>>>>>>> bootstrap method, and returning the adapter method and >>>>>>>>>> appendix object. ?Like class initialization in AOT, we only >>>>>>>>>> do this the first time through. Because AOT double-checks >>>>>>>>>> classes using fingerprints and symbolic names, special care >>>>>>>>>> was required to handle anonymous class names. ?The solution I >>>>>>>>>> chose was to name anonymous types with aliases based on their >>>>>>>>>> constant pool location ("adapter" and >>>>>>>>>> appendix"). >>>>>>>>>> >>>>>>>>>> Future work is needed to AOT-compile the anonymous classes >>>>>>>>>> and/or inline through them, so this change is not expected to >>>>>>>>>> affect AOT performance. ?In my tests I was not able to >>>>>>>>>> measure any difference. >>>>>>>>>> >>>>>>>>>> Upstream Graal changes have already been pushed. ?I broke the >>>>>>>>>> JVMCI and hotspot changes into separate webrevs. >>>>>>>>>> >>>>>>>>>> dl >>>> >>> >> From george.triantafillou at oracle.com Tue Oct 17 14:59:52 2017 From: george.triantafillou at oracle.com (George Triantafillou) Date: Tue, 17 Oct 2017 10:59:52 -0400 Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts In-Reply-To: References: Message-ID: <0f64832d-0625-fa88-1ae8-55608dc73614@oracle.com> Hi Igor, Looks good. -George On 10/16/2017 7:51 PM, Igor Veresov wrote: > This fixes paths in a couple of more places after repo consolidation. > > Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/ > > Thanks, > igor From vladimir.kozlov at oracle.com Tue Oct 17 18:15:23 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Oct 2017 11:15:23 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> Message-ID: <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> On 10/17/17 12:57 AM, dean.long at oracle.com wrote: > New hotspot webrev is here: > > http://cr.openjdk.java.net/~dlong/8132547/hs.2/ > > Comments inlined below... > > > On 10/16/17 11:06 AM, Vladimir Kozlov wrote: > >> I looked more on changes. >> >> First, please, run RBT testing. May be ask SQE to run testing with >> AOTed java.base as they did before. >> >> I did not look on Graal assuming Labs reviewed it already. >> JVMCI changes looks fine to me. >> >> JAOTC. In addition to previous comment about change in >> DataPatchProcessor.java. >> ------ >> >> AOTCompiledClass.java - I wish metadataName() was defined in >> corresponding classes instead of manual checking type. > > To do that I would probably need to wrap the JVMCI types in new JAOTC > types.? I don't want to pollute the JVMCI types. Agree. > >> It is fine for now but I would add assert for 'else' case that only >> HotSpotResolvedObjectType ref is expected there. >> > > Sure, thanks for catching that.? The cast will fail with an exception > without the assert, but the assert can give a more informative error > message. > >> Should we also unify how we generate method name? >> We use JavaMethodInfo.uniqueMethodName() in few places. >> Then we have AOTHotSpotResolvedJavaMethod.getNameAndSignature(). >> And now you added new metadataName(). >> > > OK, I filed RFE 8189411. Okay. > >> Hotspot AOT code. >> ----------------- >> >> aotLoader.hpp - you don't need 2 methods. Move UseAOT check into .cpp >> code and in .hpp you can do: >> >> static bool reconcile_dynamic_invoke(InstanceKlass* holder, int index, >> Method* adapter_method, Klass *appendix_klass) NOT_AOT({ return true; }); >> >> aotCodeHeap.* - I don't like that you have separate >> reconcile_dynamic_klass() method only for one use. Instead of passin >> [2] array pas it as separate parameter so you can pass NULL when it is >> not defined. >> > > OK. > >> Hotspot fingerprint. >> ------------------- >> >> I am concern that you changed logic when and how klass's fingerprint >> is generated. With your changes it become more expensive: >> >> ?? if (UseAOT && ik->supers_have_passed_fingerprint_checks()) { >> +??? uint64_t str_fp = _stream->compute_fingerprint(); >> > > You are right, I will revert these changes that were left over from an > earlier version. > >> ?Why removing !result->is_anonymous() check is not enough?: >> >> ?if (InstanceKlass::should_store_fingerprint()) { >> ?? result->store_fingerprint(stream->compute_fingerprint()); >> > > Because InstanceKlass::should_store_fingerprint() will return false for > an anonymous class. should_store_fingerprint() only checks flags. Do you mean it to return 'true' during execution too for anonymous classes? But next code will recalculate fingerprint for all classes!!! when you need compute only for anonymous: + if (result->has_stored_fingerprint()) { + result->store_fingerprint(stream->compute_fingerprint()); } Thanks, Vladimir > > dl > > >> Thanks, >> Vladimir >> >> On 10/6/17 12:52 PM, dean.long at oracle.com wrote: >>> On 10/6/17 12:37 PM, dean.long at oracle.com wrote: >>> >>>> On 10/6/17 10:03 AM, Igor Veresov wrote: >>>> >>>>> >>>>> >>>>>> On Oct 6, 2017, at 9:52 AM, Vladimir Kozlov >>>>>> > >>>>>> wrote: >>>>>> >>>>>> On 10/5/17 11:16 PM, Igor Veresov wrote: >>>>>>>> On Oct 5, 2017, at 10:57 AM, dean.long at oracle.com >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 10/4/17 6:27 PM, Vladimir Kozlov wrote: >>>>>>>> >>>>>>>>> Yes, I start looking on it. >>>>>>>>> >>>>>>>>> In DataPatchProcessor.java why you removed >>>>>>>>> addDependentKlassData() call?: >>>>>>>>> >>>>>>>>> + AOTCompiledClass.addFingerprintKlassData(binaryContainer, type); >>>>>>>>> + ???????????????targetSymbol = >>>>>>>>> AOTCompiledClass.metadataName(type); >>>>>>>>> ?????????????????gotName = ((action == >>>>>>>>> HotSpotConstantLoadAction.INITIALIZE) ? "got.init." : "got.") + >>>>>>>>> targetSymbol; >>>>>>>>> - >>>>>>>>> ???????????????methodInfo.addDependentKlassData(binaryContainer, type); >>>>>>>>> >>>>>>>>> ?????????????} else if >>>>>>>>> (metaspaceConstant.asResolvedJavaMethod() != null && action == >>>>>>>>> HotSpotConstantLoadAction.LOAD_COUNTERS) { >>>>>>>>> >>>>>>>> >>>>>>>> It is supposed to be an optimization, to prevent adding >>>>>>>> dependencies when we don't need them. ?We add dependencies >>>>>>>> elsewhere if we inline a method or reference a field, etc. ?I >>>>>>>> don't think we need a dependency just because we reference a >>>>>>>> constant. >>>>>>>> Igor, do you agree? >>>>>>>> >>>>>>> I suppose you?re right. Field offset seems to be the only place >>>>>>> where a dependency would be required and we should get it >>>>>>> covered. Perhaps this was added before we had field access >>>>>>> recording. But I?d test it in case something pops up (although >>>>>>> nothing come to mind right now). >>>>>> >>>>>> What about allocations and runtime guard checks (class checks)? >>>>> >>>>> >>>>> Yes, good point. Allocation will have the size of the object as a >>>>> constant, which is definitely something we need a dependency for. >>>>> So either we need to collect all types allocated in the parser, or >>>>> leave that statement as it were. Perhaps we need a followup RFE to >>>>> clean this up. >>>>> >>>> >>>> OK let me see if I can simply revert that change.? There may be an >>>> ordering problem that I was trying to fix at the same time. >>>> >>> >>> I forgot to mention, if it's in the metadata, then there should be a >>> dependency, and we have an assert that checks for that.? Do all >>> allocations and class checks generate a metadata entry?? But I agree >>> that a followup RFE is safer. >>> >>> dl >>> >>> >>>> dl >>>> >>>>> igor >>>>> >>>>>> >>>>>> Vladimir >>>>>> >>>>>>> igor >>>>>>>>> Is HotSpotConstantPoolObject is real oop (Java object)? oop_got >>>>>>>>> array is scanned for oops only. >>>>>>>>> >>>>>>>> >>>>>>>> Yes, it's the appendix object. >>>>>>>> >>>>>>>>> Can you explain why the same class can have several Metaspace >>>>>>>>> Names? Are all of them correspond to one class (and its >>>>>>>>> methods)? Should we do more in load_klass_data() in such case. >>>>>>>>> >>>>>>>> >>>>>>>> It is a many to one mapping (aliases) for anonymous classes >>>>>>>> because we can't rely on the temporary name that the JVM >>>>>>>> creates. Regular classes use load_klass_data but anonymous >>>>>>>> classes don't. ?I have a TODO in >>>>>>>> AOTCodeHeap::reconcile_dynamic_klass() for loading code for >>>>>>>> anonymous classes, but it is disabled because I haven't >>>>>>>> implemented AOT of anonymous classes yet: >>>>>>>> >>>>>>>> ??// TODO: hook up any AOT code >>>>>>>> ??// load_klass_data(dyno_data, thread); >>>>>>>> >>>>>>>>> Please, check consumption of Java heap since you are passing >>>>>>>>> oops for metadata generation instead of strings through JVMCI. >>>>>>>>> >>>>>>>> >>>>>>>> OK. >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 10/4/17 2:50 PM, dean.long at oracle.com >>>>>>>>> wrote: >>>>>>>>>> Hi Vladimir, do you have time to review this? >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> >>>>>>>>>> dl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9/11/17 7:21 PM, Dean Long wrote: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132547 >>>>>>>>>>> >>>>>>>>>>> http://cr.openjdk.java.net/~dlong/8132547/ >>>>>>>>>>> >>>>>>>>>>> This enhancement is a first step in supporting invokedynamic >>>>>>>>>>> instructions in AOT. ?Previously, when we saw an >>>>>>>>>>> invokedynamic instruction, or any anonymous class, we would >>>>>>>>>>> generate code to bail out and deoptimize. ?With this >>>>>>>>>>> changeset we go a little further and call into the runtime to >>>>>>>>>>> resolve the dynamic constant pool entry, running the >>>>>>>>>>> bootstrap method, and returning the adapter method and >>>>>>>>>>> appendix object. ?Like class initialization in AOT, we only >>>>>>>>>>> do this the first time through. Because AOT double-checks >>>>>>>>>>> classes using fingerprints and symbolic names, special care >>>>>>>>>>> was required to handle anonymous class names. ?The solution I >>>>>>>>>>> chose was to name anonymous types with aliases based on their >>>>>>>>>>> constant pool location ("adapter" and >>>>>>>>>>> appendix"). >>>>>>>>>>> >>>>>>>>>>> Future work is needed to AOT-compile the anonymous classes >>>>>>>>>>> and/or inline through them, so this change is not expected to >>>>>>>>>>> affect AOT performance. ?In my tests I was not able to >>>>>>>>>>> measure any difference. >>>>>>>>>>> >>>>>>>>>>> Upstream Graal changes have already been pushed. ?I broke the >>>>>>>>>>> JVMCI and hotspot changes into separate webrevs. >>>>>>>>>>> >>>>>>>>>>> dl >>>>> >>>> >>> > From vladimir.kozlov at oracle.com Tue Oct 17 18:21:22 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Oct 2017 11:21:22 -0700 Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts In-Reply-To: References: Message-ID: <9c3ff9aa-cfcb-1f82-3fa4-c6a5666486ef@oracle.com> Looks good. Thanks, Vladimir On 10/16/17 4:51 PM, Igor Veresov wrote: > This fixes paths in a couple of more places after repo consolidation. > > Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/ > > Thanks, > igor > From dean.long at oracle.com Tue Oct 17 20:41:41 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 17 Oct 2017 13:41:41 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> Message-ID: <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> Comment below... On 10/17/17 11:15 AM, Vladimir Kozlov wrote: >>> ?Why removing !result->is_anonymous() check is not enough?: >>> >>> ?if (InstanceKlass::should_store_fingerprint()) { >>> result->store_fingerprint(stream->compute_fingerprint()); >>> >> >> Because InstanceKlass::should_store_fingerprint() will return false >> for an anonymous class. > > should_store_fingerprint() only checks flags. Do you mean it to return > 'true' during execution too for anonymous classes? But next code will > recalculate fingerprint for all classes!!! when you need compute only > for anonymous: > > +? if (result->has_stored_fingerprint()) { > + result->store_fingerprint(stream->compute_fingerprint()); > ?? } > It should be for anonymous only (in AOT mode), unless I'm missing something: 1982 bool InstanceKlass::has_stored_fingerprint() const { 1983 #if INCLUDE_AOT 1984 return should_store_fingerprint(is_anonymous()) || is_shared(); 1985 #else 1986 return false; 1987 #endif 1988 } 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) { [...]1971 if (UseAOT && is_anonymous) { 1972 // (3) We are using AOT code from a shared library and see an anonymous class 1973 return true; 1974 } dl > Thanks, > Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Oct 17 20:49:20 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 17 Oct 2017 13:49:20 -0700 Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts In-Reply-To: <9c3ff9aa-cfcb-1f82-3fa4-c6a5666486ef@oracle.com> References: <9c3ff9aa-cfcb-1f82-3fa4-c6a5666486ef@oracle.com> Message-ID: <87D1F4CA-C2D3-4A16-B6A1-94E48A989046@oracle.com> Thanks! > On Oct 17, 2017, at 11:21 AM, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 10/16/17 4:51 PM, Igor Veresov wrote: >> This fixes paths in a couple of more places after repo consolidation. >> Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/ >> Thanks, >> igor From vladimir.kozlov at oracle.com Tue Oct 17 22:30:34 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Oct 2017 15:30:34 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> Message-ID: On 10/17/17 1:41 PM, dean.long at oracle.com wrote: > Comment below... > > > On 10/17/17 11:15 AM, Vladimir Kozlov wrote: >>>> ?Why removing !result->is_anonymous() check is not enough?: >>>> >>>> ?if (InstanceKlass::should_store_fingerprint()) { >>>> result->store_fingerprint(stream->compute_fingerprint()); >>>> >>> >>> Because InstanceKlass::should_store_fingerprint() will return false >>> for an anonymous class. >> >> should_store_fingerprint() only checks flags. Do you mean it to return >> 'true' during execution too for anonymous classes? But next code will >> recalculate fingerprint for all classes!!! when you need compute only >> for anonymous: >> >> +? if (result->has_stored_fingerprint()) { >> + result->store_fingerprint(stream->compute_fingerprint()); >> ?? } >> > > It should be for anonymous only (in AOT mode), unless I'm missing something: > > 1982 bool InstanceKlass::has_stored_fingerprint() const { > 1983 #if INCLUDE_AOT > 1984 return should_store_fingerprint(is_anonymous()) || is_shared(); I mean should_store_fingerprint() will return true for all klasses in CDS too. So you recalculating them. Vladimir > 1985 #else > 1986 return false; > 1987 #endif > 1988 } > > 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) { > [...]1971 if (UseAOT && is_anonymous) { > 1972 // (3) We are using AOT code from a shared library and see an > anonymous class > 1973 return true; > 1974 } dl > >> Thanks, >> Vladimir > From dean.long at oracle.com Wed Oct 18 01:36:50 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 17 Oct 2017 18:36:50 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> Message-ID: <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com> On 10/17/17 3:30 PM, Vladimir Kozlov wrote: > > On 10/17/17 1:41 PM, dean.long at oracle.com wrote: >> Comment below... >> >> >> On 10/17/17 11:15 AM, Vladimir Kozlov wrote: >>>>> ?Why removing !result->is_anonymous() check is not enough?: >>>>> >>>>> ?if (InstanceKlass::should_store_fingerprint()) { >>>>> result->store_fingerprint(stream->compute_fingerprint()); >>>>> >>>> >>>> Because InstanceKlass::should_store_fingerprint() will return false >>>> for an anonymous class. >>> >>> should_store_fingerprint() only checks flags. Do you mean it to >>> return 'true' during execution too for anonymous classes? But next >>> code will recalculate fingerprint for all classes!!! when you need >>> compute only for anonymous: >>> >>> +? if (result->has_stored_fingerprint()) { >>> + result->store_fingerprint(stream->compute_fingerprint()); >>> ?? } >>> >> >> It should be for anonymous only (in AOT mode), unless I'm missing >> something: >> >> 1982 bool InstanceKlass::has_stored_fingerprint() const { >> 1983 #if INCLUDE_AOT >> 1984 return should_store_fingerprint(is_anonymous()) || is_shared(); > > I mean should_store_fingerprint() will return true for all klasses in > CDS too. So you recalculating them. > I see what you mean now.? New webrev: http://cr.openjdk.java.net/~dlong/8132547//hs.3/ dl > Vladimir > >> 1985 #else >> 1986?? return false; >> 1987 #endif >> 1988 } >> >> 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) >> { [...]1971 if (UseAOT && is_anonymous) { >> 1972 // (3) We are using AOT code from a shared library and see an >> anonymous class >> 1973 return true; >> 1974 } dl >> >>> Thanks, >>> Vladimir >> From igor.ignatyev at oracle.com Wed Oct 18 04:45:56 2017 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 17 Oct 2017 21:45:56 -0700 Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java doesn't have timeout and hang on windows Message-ID: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html > 546 lines changed: 188 ins; 88 del; 270 mod; Hi all, could you please review this fix for ctw test? in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution. the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows. webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html testing: applications/ctw/modules tests JBS: https://bugs.openjdk.java.net/browse/JDK-8186618 Thanks, -- Igor From nils.eliasson at oracle.com Wed Oct 18 08:03:19 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 18 Oct 2017 10:03:19 +0200 Subject: Reduced MaxVectorSize and vector type initialization Message-ID: HI, I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives the best performance. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the platform has the capability. Type.cpp:~660 [...] > if (Matcher::vector_size_supported(T_FLOAT,8)) { > TypeVect::VECTY = TypeVect::make(T_FLOAT,8); > } [...] > mreg2type[Op_VecY] = TypeVect::VECTY; In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch. On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like: assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity"); Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, but they might not be used if MaxVectorSize is limited.) This is a patch that solves the problem, but I have not convinced myself that it is the right way: http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ Feedback appreciated, Regards, Nils Eliasson http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From christos at zoulas.com Wed Oct 18 12:25:52 2017 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 18 Oct 2017 08:25:52 -0400 Subject: what is the SLA for responding to bugs? Message-ID: <20171018122552.DD4B117FDB6@rebar.astron.com> I am asking because I filed: https://bugs.openjdk.java.net/browse/JDK-8189172 and I have not heard a word since. Thanks, christos From volker.simonis at gmail.com Wed Oct 18 13:54:35 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 18 Oct 2017 13:54:35 +0000 Subject: what is the SLA for responding to bugs? In-Reply-To: <20171018122552.DD4B117FDB6@rebar.astron.com> References: <20171018122552.DD4B117FDB6@rebar.astron.com> Message-ID: Christos, this is an open source project, so you get exactly the SLA you are paying for :) If you want more, you could either kindly ask or get a support contract from Oracle or any other OpenJDK distributor. Regards, Volker Christos Zoulas schrieb am Mi. 18. Okt. 2017 um 14:26: > > I am asking because I filed: > https://bugs.openjdk.java.net/browse/JDK-8189172 > and I have not heard a word since. > > Thanks, > > christos > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Wed Oct 18 14:11:40 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 18 Oct 2017 16:11:40 +0200 Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler dying subgraph with single if proj In-Reply-To: References: Message-ID: Thanks for the review, Vladimir. I followed your suggestion. Here is a ready to push changeset: http://cr.openjdk.java.net/~roland/8188223/8188223.patch Roland. From rwestrel at redhat.com Wed Oct 18 14:16:14 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 18 Oct 2017 16:16:14 +0200 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> Message-ID: Here is an updated webrev with Dean's suggestion: http://cr.openjdk.java.net/~roland/8188151/webrev.01/ Can this be considered reviewed by you, Dean? Roland. From christos at zoulas.com Wed Oct 18 14:47:36 2017 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 18 Oct 2017 10:47:36 -0400 Subject: what is the SLA for responding to bugs? In-Reply-To: from Volker Simonis (Oct 18, 1:54pm) Message-ID: <20171018144736.330FC17FDB6@rebar.astron.com> On Oct 18, 1:54pm, volker.simonis at gmail.com (Volker Simonis) wrote: -- Subject: Re: what is the SLA for responding to bugs? | Christos, | | this is an open source project, so you get exactly the SLA you are paying | for :) | | If you want more, you could either kindly ask or get a support contract | from Oracle or any other OpenJDK distributor. | | Regards, | Volker You are right, I should get paid support. I tried and I got 404... The link from http://bugreport.java.com to "Oracle Java SE Support" goes to: https://www.oracle.com/java/java-se-support.html Best, christos From vladimir.x.ivanov at oracle.com Wed Oct 18 14:48:14 2017 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 18 Oct 2017 17:48:14 +0300 Subject: what is the SLA for responding to bugs? In-Reply-To: <20171018122552.DD4B117FDB6@rebar.astron.com> References: <20171018122552.DD4B117FDB6@rebar.astron.com> Message-ID: <3f2cc7c3-b849-e034-c9d2-511d0a0acf66@oracle.com> Thanks for the detailed report, Christos. I'm not aware about any SLA, but development team tries to triage incoming bugs in prompt manner. Unfortunately, the bug was filed w/o subcategory set, so it went unnoticed. I was able to reproduce it and added some root cause analysis, but I can't promise anything about fixing it in 8u. (And the fact that 9 isn't affected makes it less likely.) Best regards, Vladimir Ivanov On 10/18/17 3:25 PM, christos at zoulas.com wrote: > I am asking because I filed: https://bugs.openjdk.java.net/browse/JDK-8189172 > and I have not heard a word since. > > Thanks, > > christos > From christos at zoulas.com Wed Oct 18 14:50:50 2017 From: christos at zoulas.com (Christos Zoulas) Date: Wed, 18 Oct 2017 10:50:50 -0400 Subject: what is the SLA for responding to bugs? In-Reply-To: <3f2cc7c3-b849-e034-c9d2-511d0a0acf66@oracle.com> from Vladimir Ivanov (Oct 18, 5:48pm) Message-ID: <20171018145050.263FC17FDBA@rebar.astron.com> On Oct 18, 5:48pm, vladimir.x.ivanov at oracle.com (Vladimir Ivanov) wrote: -- Subject: Re: what is the SLA for responding to bugs? | Thanks for the detailed report, Christos. | | I'm not aware about any SLA, but development team tries to triage | incoming bugs in prompt manner. Unfortunately, the bug was filed w/o | subcategory set, so it went unnoticed. | | I was able to reproduce it and added some root cause analysis, but I | can't promise anything about fixing it in 8u. (And the fact that 9 isn't | affected makes it less likely.) Thanks you very much! I am trying to get some paid support on my side to see if this can be fixed in 8... Best, christos From dean.long at oracle.com Wed Oct 18 17:01:25 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 18 Oct 2017 10:01:25 -0700 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: References: Message-ID: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> How about initializing TypeVect::VECTY and friends unconditionally?? I am nervous about exchanging one guarding condition for another. dl On 10/18/17 1:03 AM, Nils Eliasson wrote: > > HI, > > I ran into a problem with the interaction between MaxVectorSize and > the UseAVX. For some AMD CPUs we limit the vector size to 16 because > it gives the best performance. > >> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >> ???? } > > Whenf MaxVecorSize is set to 16 it has the sideeffect that the > TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though > the platform has the capability. > > Type.cpp:~660 > > [...] > >?? if (Matcher::vector_size_supported(T_FLOAT,8)) { > >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8); > >?? } > [...] > >?? mreg2type[Op_VecY] = TypeVect::VECTY; > > > In the ad-files feature flags (UseAVX etc.) are used to control what > rules should be matched if it has effects on specific vector > registers. Here we have a mismatch. > > On a platform that supports AVX2 but have MaxVectorSize limited to 16, > the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is > uninitialized. We will also hit asserts in a few places like: > assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), > "sanity"); > > Shouldn't the type initialization in type.cpp be dependent on feature > flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector > registers are initialized if the platform supports them, but they > might not be used if MaxVectorSize is limited.) > > This is a patch that solves the problem, but I have not convinced > myself that it is the right way: > http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ > > Feedback appreciated, > > Regards, > Nils Eliasson > > > > > > http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Oct 18 17:53:23 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Oct 2017 10:53:23 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com> Message-ID: <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com> New code is good I think. Thanks, Vladimir On 10/17/17 6:36 PM, dean.long at oracle.com wrote: > On 10/17/17 3:30 PM, Vladimir Kozlov wrote: > >> >> On 10/17/17 1:41 PM, dean.long at oracle.com wrote: >>> Comment below... >>> >>> >>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote: >>>>>> ?Why removing !result->is_anonymous() check is not enough?: >>>>>> >>>>>> ?if (InstanceKlass::should_store_fingerprint()) { >>>>>> result->store_fingerprint(stream->compute_fingerprint()); >>>>>> >>>>> >>>>> Because InstanceKlass::should_store_fingerprint() will return false >>>>> for an anonymous class. >>>> >>>> should_store_fingerprint() only checks flags. Do you mean it to >>>> return 'true' during execution too for anonymous classes? But next >>>> code will recalculate fingerprint for all classes!!! when you need >>>> compute only for anonymous: >>>> >>>> +? if (result->has_stored_fingerprint()) { >>>> + result->store_fingerprint(stream->compute_fingerprint()); >>>> ?? } >>>> >>> >>> It should be for anonymous only (in AOT mode), unless I'm missing >>> something: >>> >>> 1982 bool InstanceKlass::has_stored_fingerprint() const { >>> 1983 #if INCLUDE_AOT >>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared(); >> >> I mean should_store_fingerprint() will return true for all klasses in >> CDS too. So you recalculating them. >> > > I see what you mean now.? New webrev: > > http://cr.openjdk.java.net/~dlong/8132547//hs.3/ > > dl > >> Vladimir >> >>> 1985 #else >>> 1986?? return false; >>> 1987 #endif >>> 1988 } >>> >>> 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) >>> { [...]1971 if (UseAOT && is_anonymous) { >>> 1972 // (3) We are using AOT code from a shared library and see an >>> anonymous class >>> 1973 return true; >>> 1974 } dl >>> >>>> Thanks, >>>> Vladimir >>> > From vladimir.kozlov at oracle.com Wed Oct 18 18:21:55 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Oct 2017 11:21:55 -0700 Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler dying subgraph with single if proj In-Reply-To: References: Message-ID: Good. Thanks, Vladimir On 10/18/17 7:11 AM, Roland Westrelin wrote: > > Thanks for the review, Vladimir. I followed your suggestion. Here is a > ready to push changeset: > > http://cr.openjdk.java.net/~roland/8188223/8188223.patch > > Roland. > From dean.long at oracle.com Thu Oct 19 01:55:13 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 18 Oct 2017 18:55:13 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com> <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com> Message-ID: Thanks Vladimir. dl On 10/18/17 10:53 AM, Vladimir Kozlov wrote: > New code is good I think. > > Thanks, > Vladimir > > On 10/17/17 6:36 PM, dean.long at oracle.com wrote: >> On 10/17/17 3:30 PM, Vladimir Kozlov wrote: >> >>> >>> On 10/17/17 1:41 PM, dean.long at oracle.com wrote: >>>> Comment below... >>>> >>>> >>>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote: >>>>>>> ?Why removing !result->is_anonymous() check is not enough?: >>>>>>> >>>>>>> ?if (InstanceKlass::should_store_fingerprint()) { >>>>>>> result->store_fingerprint(stream->compute_fingerprint()); >>>>>>> >>>>>> >>>>>> Because InstanceKlass::should_store_fingerprint() will return >>>>>> false for an anonymous class. >>>>> >>>>> should_store_fingerprint() only checks flags. Do you mean it to >>>>> return 'true' during execution too for anonymous classes? But next >>>>> code will recalculate fingerprint for all classes!!! when you need >>>>> compute only for anonymous: >>>>> >>>>> +? if (result->has_stored_fingerprint()) { >>>>> + result->store_fingerprint(stream->compute_fingerprint()); >>>>> ?? } >>>>> >>>> >>>> It should be for anonymous only (in AOT mode), unless I'm missing >>>> something: >>>> >>>> 1982 bool InstanceKlass::has_stored_fingerprint() const { >>>> 1983 #if INCLUDE_AOT >>>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared(); >>> >>> I mean should_store_fingerprint() will return true for all klasses >>> in CDS too. So you recalculating them. >>> >> >> I see what you mean now.? New webrev: >> >> http://cr.openjdk.java.net/~dlong/8132547//hs.3/ >> >> dl >> >>> Vladimir >>> >>>> 1985 #else >>>> 1986?? return false; >>>> 1987 #endif >>>> 1988 } >>>> >>>> 1960 bool InstanceKlass::should_store_fingerprint(bool >>>> is_anonymous) { [...]1971 if (UseAOT && is_anonymous) { >>>> 1972 // (3) We are using AOT code from a shared library and see an >>>> anonymous class >>>> 1973 return true; >>>> 1974 } dl >>>> >>>>> Thanks, >>>>> Vladimir >>>> >> From dean.long at oracle.com Thu Oct 19 03:19:13 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 18 Oct 2017 20:19:13 -0700 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> Message-ID: <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> Yes, but I'm not a Reviewer. dl On 10/18/17 7:16 AM, Roland Westrelin wrote: > Here is an updated webrev with Dean's suggestion: > > http://cr.openjdk.java.net/~roland/8188151/webrev.01/ > > Can this be considered reviewed by you, Dean? > > Roland. From lutz.schmidt at sap.com Thu Oct 19 08:10:33 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 19 Oct 2017 08:10:33 +0000 Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction Message-ID: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com> Dear all, I would like to request reviews for this s390-only bug fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8189616 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189616.00/index.html STCK is an ancient instruction to store a CPU timer value. It guarantees strict monotonicity of the stored values across all CPUs in a system. The inherent synchronization has a performance impact which becomes ?considerable? (according to IBM specialists) with the recently announced processor generation (z14). This change removes the STCK instruction from s390 platform code. The intent is to prevent inadvertent use of the instruction. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Thu Oct 19 09:21:34 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 19 Oct 2017 11:21:34 +0200 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> Message-ID: > Yes, but I'm not a Reviewer. Thanks for the review! Anyone for another review? Roland. From goetz.lindenmaier at sap.com Thu Oct 19 11:04:58 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 19 Oct 2017 11:04:58 +0000 Subject: FW: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: <68cb012b5aa2478aba35be73b91b5995@sap.com> Resending this to hotspot-compiler-dev, which is proper list for this. Best regards, Goetz. -----Original Message----- From: Lindenmaier, Goetz Sent: Donnerstag, 19. Oktober 2017 13:03 To: 'Kazunori Ogata' ; Doerr, Martin Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Hi Kazunori, To me, this seems to be a very large increase. Considering that not only the required code cache size but also the compiler cpu time will increase in this magnitude, this seems to be a rather risky step that should be tested for its benefits on systems that are highly contended. In this case, you probably had enough space in the code cache so that no recompilation etc. happened. To further look at this I could think of 1. finding the minimal code cache size with the old flags where the JIT is not disabled 2. finding the same size for the new flag settings --> How much more is needed for the new settings? Then you should compare the performance with the bigger code cache size for both, and see whether there still is performance improvement, or whether it's eaten up by more compile time. I.e. you should have a setup where compiler threads and application threads compete for the available CPUs. What do you think? Best regards, Goetz. > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of Kazunori Ogata > Sent: Donnerstag, 19. Oktober 2017 08:43 > To: Doerr, Martin > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other > platforms > > Hi Martin, > > Thank you for your comment. I checked the code cache size by running > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB). > > The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb > (+12%). Is the increase too large? > > > The raw output of -XX:+PrintCodeCache are: > > === Original === > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb > max_used=13884Kb free=638595Kb > bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000] > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb > max_used=26593Kb > free=625886Kb > bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000] > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb > free=4254Kb > bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000] > total_blobs=16606 nmethods=10265 adapters=653 > compilation: enabled > > > === Modified (webrev.00) === > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb > max_used=18516Kb free=633964Kb > bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000] > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb > max_used=26963Kb > free=625516Kb > bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000] > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb > free=4232Kb > bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000] > total_blobs=16561 nmethods=10295 adapters=653 > compilation: enabled > > > Regards, > Ogata > > > > > From: "Doerr, Martin" > To: Kazunori Ogata , "hotspot- > dev at openjdk.java.net" > , "ppc-aix-port-dev at openjdk.java.net" > > Date: 2017/10/18 19:43 > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > same as other platforms > > > > Hi Ogata, > > sorry for the delay. I had missed this one. > > The change looks feasible to me. > > It may only impact the utilization of the Code Cache. Can you evaluate > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? > > Thanks and best regards, > Martin > > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf > Of Kazunori Ogata > Sent: Freitag, 29. September 2017 08:42 > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as > other platforms > > Hi all, > > Please review a change for JDK-8188131. > > Bug report: > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__bugs.openjdk.java.net_browse_JDK- > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > 73lAZxkNhGsrlDkk- > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e= > > Webrev: > https://urldefense.proofpoint.com/v2/url?u=http- > 3A__cr.openjdk.java.net_- > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB- > i9r6lTggpGH3Np8kmONkkMAg&e= > > > This change increases the default values of FreqInlineSize and > InlineSmallCode in ppc64 to 325 and 2500, respectively. These values are > the same as aarch64. The performance of TPC-DS Q96 was improved by > about > 6% with this change. > > > Regards, > Ogata > > > From dean.long at oracle.com Fri Oct 20 06:01:58 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 19 Oct 2017 23:01:58 -0700 Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions In-Reply-To: <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com> References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com> <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com> <640fb281-c554-694b-29a3-9c038283db75@oracle.com> <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com> <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com> <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com> <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com> <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com> <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com> <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com> Message-ID: <649e1071-8d59-f56e-d69b-8327667be7ed@oracle.com> Sorry, I need to make one additional change: diff -r 578d216b57ad src/hotspot/share/jvmci/compilerRuntime.cpp --- a/src/hotspot/share/jvmci/compilerRuntime.cpp??? Thu Oct 19 19:23:48 2017 -0700 +++ b/src/hotspot/share/jvmci/compilerRuntime.cpp??? Thu Oct 19 22:59:49 2017 -0700 @@ -24,7 +24,9 @@ ?#include "precompiled.hpp" ?#include "classfile/stringTable.hpp" ?#include "classfile/symbolTable.hpp" +#include "interpreter/linkResolver.hpp" ?#include "jvmci/compilerRuntime.hpp" +#include "oops/oop.inline.hpp" ?#include "runtime/compilationPolicy.hpp" ?#include "runtime/deoptimization.hpp" ?#include "runtime/interfaceSupport.hpp" JPRT caught the missing header files in the open solaris build without precompiled headers. dl On 10/18/17 10:53 AM, Vladimir Kozlov wrote: > New code is good I think. > > Thanks, > Vladimir > > On 10/17/17 6:36 PM, dean.long at oracle.com wrote: >> On 10/17/17 3:30 PM, Vladimir Kozlov wrote: >> >>> >>> On 10/17/17 1:41 PM, dean.long at oracle.com wrote: >>>> Comment below... >>>> >>>> >>>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote: >>>>>>> ?Why removing !result->is_anonymous() check is not enough?: >>>>>>> >>>>>>> ?if (InstanceKlass::should_store_fingerprint()) { >>>>>>> result->store_fingerprint(stream->compute_fingerprint()); >>>>>>> >>>>>> >>>>>> Because InstanceKlass::should_store_fingerprint() will return >>>>>> false for an anonymous class. >>>>> >>>>> should_store_fingerprint() only checks flags. Do you mean it to >>>>> return 'true' during execution too for anonymous classes? But next >>>>> code will recalculate fingerprint for all classes!!! when you need >>>>> compute only for anonymous: >>>>> >>>>> +? if (result->has_stored_fingerprint()) { >>>>> + result->store_fingerprint(stream->compute_fingerprint()); >>>>> ?? } >>>>> >>>> >>>> It should be for anonymous only (in AOT mode), unless I'm missing >>>> something: >>>> >>>> 1982 bool InstanceKlass::has_stored_fingerprint() const { >>>> 1983 #if INCLUDE_AOT >>>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared(); >>> >>> I mean should_store_fingerprint() will return true for all klasses >>> in CDS too. So you recalculating them. >>> >> >> I see what you mean now.? New webrev: >> >> http://cr.openjdk.java.net/~dlong/8132547//hs.3/ >> >> dl >> >>> Vladimir >>> >>>> 1985 #else >>>> 1986?? return false; >>>> 1987 #endif >>>> 1988 } >>>> >>>> 1960 bool InstanceKlass::should_store_fingerprint(bool >>>> is_anonymous) { [...]1971 if (UseAOT && is_anonymous) { >>>> 1972 // (3) We are using AOT code from a shared library and see an >>>> anonymous class >>>> 1973 return true; >>>> 1974 } dl >>>> >>>>> Thanks, >>>>> Vladimir >>>> >> From tobias.hartmann at oracle.com Fri Oct 20 08:04:04 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 20 Oct 2017 10:04:04 +0200 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8188785 http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/ Since 8186777 [1], we require two loads to retrieve the java mirror from a klass oop: LoadP(LoadP(AddP(klass_oop, java_mirror_offset))) The problem is that now the type of the outermost LoadP does not depend on the inner LoadP (which has a raw pointer type) but on the type of the AddP which is one level up. CPP only propagates the types downwards to the direct users and as a result, the mirror LoadP ends up with an incorrect (too narrow/optimistic) type. I've verified the fix with the failing test and also verified that 8188835 [2] is a duplicate. Gory details: During CCP, we compute the type of a Phi that merges oops of type A and B where B is a subtype of A. Since the type of the A input was not computed yet (it was initialized to TOP at the beginning of CCP), the Phi temporarily ends up with type B (i.e. with a type that is too narrow/optimistic). This type is propagated downwards and is being used to optimize a java mirror load from the klass oop: LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi))))))) The mirror load is then folded to TypeInstPtr::make(B) which is not correct because the oop can be of type A at runtime. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8186777 [2] https://bugs.openjdk.java.net/browse/JDK-8188835 From vladimir.kozlov at oracle.com Fri Oct 20 16:36:29 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Oct 2017 09:36:29 -0700 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: References: Message-ID: Hmm. Is this only LoadP or general problem? May be add code to next lines when m->is_AddP() : 1734 if (m->bottom_type() != type(m)) { // If not already bottomed out 1735 worklist.push(m); // Propagate change to user I think we should do similar to PhaseIterGVN::add_users_to_worklist(). Thanks, Vladimir On 10/20/17 1:04 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8188785 > http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/ > > Since 8186777 [1], we require two loads to retrieve the java mirror from > a klass oop: > > LoadP(LoadP(AddP(klass_oop, java_mirror_offset))) > > The problem is that now the type of the outermost LoadP does not depend > on the inner LoadP (which has a raw pointer type) but on the type of the > AddP which is one level up. CPP only propagates the types downwards to > the direct users and as a result, the mirror LoadP ends up with an > incorrect (too narrow/optimistic) type. > > I've verified the fix with the failing test and also verified that > 8188835 [2] is a duplicate. > > Gory details: > During CCP, we compute the type of a Phi that merges oops of type A and > B where B is a subtype of A. Since the type of the A input was not > computed yet (it was initialized to TOP at the beginning of CCP), the > Phi temporarily ends up with type B (i.e. with a type that is too > narrow/optimistic). This type is propagated downwards and is being used > to optimize a java mirror load from the klass oop: > > LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi))))))) > > The mirror load is then folded to TypeInstPtr::make(B) which is not > correct because the oop can be of type A at runtime. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8186777 > [2] https://bugs.openjdk.java.net/browse/JDK-8188835 From vladimir.kozlov at oracle.com Fri Oct 20 16:43:22 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 20 Oct 2017 09:43:22 -0700 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: References: Message-ID: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> On 10/20/17 9:36 AM, Vladimir Kozlov wrote: > Hmm. Is this only LoadP or general problem? > > May be add code to next lines when m->is_AddP() : > > 1734???????? if (m->bottom_type() != type(m)) { // If not already > bottomed out > 1735?????????? worklist.push(m);???? // Propagate change to user > > I think we should do similar to PhaseIterGVN::add_users_to_worklist(). Hmm, PhaseIterGVN::add_users_to_worklist() is not good example - it only puts near loads/stores. Should we fix it too? Do we have other cases when we calculate type based not on immediate inputs but their inputs? Thanks, Vladimir > > Thanks, > Vladimir > > On 10/20/17 1:04 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8188785 >> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/ >> >> Since 8186777 [1], we require two loads to retrieve the java mirror >> from a klass oop: >> >> LoadP(LoadP(AddP(klass_oop, java_mirror_offset))) >> >> The problem is that now the type of the outermost LoadP does not >> depend on the inner LoadP (which has a raw pointer type) but on the >> type of the AddP which is one level up. CPP only propagates the types >> downwards to the direct users and as a result, the mirror LoadP ends >> up with an incorrect (too narrow/optimistic) type. >> >> I've verified the fix with the failing test and also verified that >> 8188835 [2] is a duplicate. >> >> Gory details: >> During CCP, we compute the type of a Phi that merges oops of type A >> and B where B is a subtype of A. Since the type of the A input was not >> computed yet (it was initialized to TOP at the beginning of CCP), the >> Phi temporarily ends up with type B (i.e. with a type that is too >> narrow/optimistic). This type is propagated downwards and is being >> used to optimize a java mirror load from the klass oop: >> >> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi))))))) >> >> The mirror load is then folded to TypeInstPtr::make(B) which is not >> correct because the oop can be of type A at runtime. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8186777 >> [2] https://bugs.openjdk.java.net/browse/JDK-8188835 From dmitry.chuyko at bell-sw.com Fri Oct 20 17:45:47 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Fri, 20 Oct 2017 20:45:47 +0300 Subject: [10] RFR (S): 8189177 - AARCH64: Improve _updateBytesCRC32C intrinsic Message-ID: <18544a56-5885-784f-b448-7f412861d916@bell-sw.com> Hello, Please review an improvement of CRC32C calculation on AArch64. It is done pretty similar to a change for JDK-8189176 described in [1]. MacroAssembler::kernel_crc32c gets unused table registers. They can be used to make neighbor loads and CRC calculations independent. Adding prologue and epilogue for main by-64 loop makes it applicable starting from len=128 so additional by-32 loop is added for smaller lengths. rfe: https://bugs.openjdk.java.net/browse/JDK-8189177 webrev: http://cr.openjdk.java.net/~dchuyko/8189177/webrev.00/ benchmark: http://cr.openjdk.java.net/~dchuyko/8189177/crc32c/CRC32CBench.java Results for T88 and A53 [2] are similar to CRC32 change (good), but again splitting pair loads may slow down other CPUs so measurements on different HW are welcome. -Dmitry [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-October/027225.html [2] https://bugs.openjdk.java.net/browse/JDK-8189177?focusedCommentId=14124535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14124535 From riasat.abir at gmail.com Fri Oct 20 21:35:04 2017 From: riasat.abir at gmail.com (Riasat Abir) Date: Fri, 20 Oct 2017 14:35:04 -0700 Subject: Jdk random crashes Message-ID: I can't figure out the problem, on this system jdk is crashing randomly. Attached 3 different logs. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hs_err_pid3437.log Type: text/x-log Size: 58961 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hs_err_pid6466.log Type: text/x-log Size: 57579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hs_err_pid6471.log Type: text/x-log Size: 58731 bytes Desc: not available URL: From aph at redhat.com Sat Oct 21 08:30:14 2017 From: aph at redhat.com (Andrew Haley) Date: Sat, 21 Oct 2017 09:30:14 +0100 Subject: Jdk random crashes In-Reply-To: References: Message-ID: <8d5124bb-7c73-eab8-30fd-cb131ac62f26@redhat.com> On 20/10/17 22:35, Riasat Abir wrote: > I can't figure out the problem, on this system jdk is crashing randomly. > Attached 3 different logs. It's very hard to say. But we can't really diagnose anything because this isn't OpenJDK: it's the Oracle proprietary JDK. It's also out of date. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kirk.pepperdine at gmail.com Sat Oct 21 09:01:14 2017 From: kirk.pepperdine at gmail.com (Kirk Pepperdine) Date: Sat, 21 Oct 2017 11:01:14 +0200 Subject: Jdk random crashes In-Reply-To: References: Message-ID: You may want to look at the bug database as these all appear to be internal errors. Moving to a newer version of the JDK may fix them but then it may not. Can you confirm that you?re environment is not corrupted in any way? Kind regards, Kirk Pepperdine > On Oct 20, 2017, at 11:35 PM, Riasat Abir wrote: > > I can't figure out the problem, on this system jdk is crashing randomly. > Attached 3 different logs. > From tobias.hartmann at oracle.com Mon Oct 23 08:04:06 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 23 Oct 2017 10:04:06 +0200 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> References: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> Message-ID: <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com> Hi Vladimir, thanks for the review! On 20.10.2017 18:43, Vladimir Kozlov wrote: > On 10/20/17 9:36 AM, Vladimir Kozlov wrote: >> Hmm. Is this only LoadP or general problem? This is a general problem with nodes that compute their type not based on immediate inputs. >> May be add code to next lines when m->is_AddP() : >> >> 1734???????? if (m->bottom_type() != type(m)) { // If not already bottomed out >> 1735?????????? worklist.push(m);???? // Propagate change to user Where should I add that code exactly? My fix already checks for "ut != type(u)". >> I think we should do similar to PhaseIterGVN::add_users_to_worklist(). > > Hmm, PhaseIterGVN::add_users_to_worklist() is not good example - it only puts near loads/stores. Should we fix it too? Yes, I think it makes sense to update add_users_to_worklist() as well: http://cr.openjdk.java.net/~thartmann/8188785/webrev.01/ > Do we have other cases when we calculate type based not on immediate inputs but their inputs? Yes, see code right above my changes: // CmpU nodes can get their type information from two nodes up in the // graph (instead of from the nodes immediately above). Make sure they // are added to the worklist if nodes they depend on are updated, since // they could be missed and get wrong types otherwise. http://hg.openjdk.java.net/jdk10/hs/file/6126617b8508/src/hotspot/share/opto/phaseX.cpp#l1738 The same goes for CallNodes and counted loop exit conditions (see surrounding code). I'm not aware of any other cases. Thanks, Tobias >> On 10/20/17 1:04 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8188785 >>> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/ >>> >>> Since 8186777 [1], we require two loads to retrieve the java mirror from a klass oop: >>> >>> LoadP(LoadP(AddP(klass_oop, java_mirror_offset))) >>> >>> The problem is that now the type of the outermost LoadP does not depend on the inner LoadP (which has a raw pointer >>> type) but on the type of the AddP which is one level up. CPP only propagates the types downwards to the direct users >>> and as a result, the mirror LoadP ends up with an incorrect (too narrow/optimistic) type. >>> >>> I've verified the fix with the failing test and also verified that 8188835 [2] is a duplicate. >>> >>> Gory details: >>> During CCP, we compute the type of a Phi that merges oops of type A and B where B is a subtype of A. Since the type >>> of the A input was not computed yet (it was initialized to TOP at the beginning of CCP), the Phi temporarily ends up >>> with type B (i.e. with a type that is too narrow/optimistic). This type is propagated downwards and is being used to >>> optimize a java mirror load from the klass oop: >>> >>> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi))))))) >>> >>> The mirror load is then folded to TypeInstPtr::make(B) which is not correct because the oop can be of type A at runtime. >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8186777 >>> [2] https://bugs.openjdk.java.net/browse/JDK-8188835 From martin.doerr at sap.com Mon Oct 23 08:36:01 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 23 Oct 2017 08:36:01 +0000 Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction In-Reply-To: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com> References: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com> Message-ID: <1f7f471e58414a78a375e321dba08f2a@sap.com> Hi Lutz, looks good. I think there?s no reason for using stck since we have stckf, so I?m ok with it. Thanks for removing the z900 code. We only support z10 and newer. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Donnerstag, 19. Oktober 2017 10:11 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction Dear all, I would like to request reviews for this s390-only bug fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8189616 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189616.00/index.html STCK is an ancient instruction to store a CPU timer value. It guarantees strict monotonicity of the stored values across all CPUs in a system. The inherent synchronization has a performance impact which becomes ?considerable? (according to IBM specialists) with the recently announced processor generation (z14). This change removes the STCK instruction from s390 platform code. The intent is to prevent inadvertent use of the instruction. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Mon Oct 23 09:06:09 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 23 Oct 2017 09:06:09 +0000 Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction In-Reply-To: <1f7f471e58414a78a375e321dba08f2a@sap.com> References: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com> <1f7f471e58414a78a375e321dba08f2a@sap.com> Message-ID: <021930F1-D5CD-403D-917D-3B5793F9B7C9@sap.com> Hi Martin, thank you for your review! Regards, Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 On 23.10.2017, 10:36, "Doerr, Martin" > wrote: Hi Lutz, looks good. I think there?s no reason for using stck since we have stckf, so I?m ok with it. Thanks for removing the z900 code. We only support z10 and newer. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Donnerstag, 19. Oktober 2017 10:11 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction Dear all, I would like to request reviews for this s390-only bug fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8189616 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189616.00/index.html STCK is an ancient instruction to store a CPU timer value. It guarantees strict monotonicity of the stored values across all CPUs in a system. The inherent synchronization has a performance impact which becomes ?considerable? (according to IBM specialists) with the recently announced processor generation (z14). This change removes the STCK instruction from s390 platform code. The intent is to prevent inadvertent use of the instruction. Thank you! Lutz -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Mon Oct 23 14:16:35 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 23 Oct 2017 16:16:35 +0200 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: References: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com> Message-ID: <63b109f1-48af-f594-588b-519364ad931f@oracle.com> Hi Roland, Sorry for the delay. First - It's a very impressive work you have done! Currently your patch doesn't apply cleanly. The fix of JDK-8189067 changes loopopts.cpp. I have run your code (based on jdk10 before JDK-8189067) through testing. I encountered a minor build problem on solaris_x64 (patch below), otherwise it was stable with no encountered test failures. I have also run performance testing with the conclusion that no significant regression can be seen. In some benchmarks like scimark.sparse.large that has a known safepointing issue (https://bugs.openjdk.java.net/browse/JDK-8177704), very good results can be seen. scimark.sparse.large using G1: -XX:-UseCountedLoopSafepoints (default) ~86 ops/m -XX:+UseCountedLoopSafepoints ~106 ops/m -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m The positive results leads us to the conclusion that we would like UseCountedLoopSafepoints to bedefault true, and LoopStripMiningIter default to 1000. c2_globals.hpp: - product(bool, UseCountedLoopSafepoints, false, + product(bool, UseCountedLoopSafepoints, true, - product(uintx, LoopStripMiningIter, 0, + product(uintx, LoopStripMiningIter, 1000, solaris_x64 complained about type conversion: src/hotspot/share/opto/loopopts.cpp: @@ -1729,7 +1729,7 @@ Node* l = cl->outer_loop(); Node* tail = cl->outer_loop_tail(); IfNode* le = cl->outer_loop_end(); - Node* sfpt = cl->outer_safepoint(); + Node* sfpt = (Node*) cl->outer_safepoint(); src/hotspot/share/opto/opaquenode.cpp @@ -144,7 +144,7 @@ assert(iter_estimate > 0, "broken"); if ((jlong)scaled_iters != scaled_iters_long || iter_estimate <= short_scaled_iters) { // Remove outer loop and safepoint (too few iterations) - Node* outer_sfpt = inner_cl->outer_safepoint(); + Node* outer_sfpt = (Node*) inner_cl->outer_safepoint(); In the TraceLoopOpts print out I suggest changing space to underscore to conform with how the other print outs look: "PreMainPost Loop: N153/N130 limit_check predicated counted [0,int),+1 (26 iters) has_sfpt strip mined" loopnode.cpp:1867 - tty->print(" strip mined"); + tty->print(" strip_mined"); When your patch is updated, I will do some additional functional testing. Also, a second reviewer is required. Best regards, Nils Eliasson On 2017-10-11 15:53, Roland Westrelin wrote: >> I have started reviewing and testing I will sponsor your change when the >> full review is completed. > Thanks! > > Roland. From jcbeyler at google.com Mon Oct 23 15:27:50 2017 From: jcbeyler at google.com (JC Beyler) Date: Mon, 23 Oct 2017 08:27:50 -0700 Subject: Low-Overhead Heap Profiling In-Reply-To: References: <1497366226.2829.109.camel@oracle.com> <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> Message-ID: Dear all, Small update this week with this new webrev: - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/ - Incremental is here: http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/ I patched the code changes showed by Robbin last week and I refactored collectedHeap.cpp: http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/share/gc/shared/collectedHeap.cpp.patch The original code became a bit too complex in my opinion with the handle_heap_sampling handling too many things. So I subdivided the logic into two smaller methods and moved out a bit of the logic to make it more clear. Hopefully it is :) Let me know if you have any questions/comments :) Jc On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler wrote: > Hi Robbin, > > That is because version 11 to 12 was only a test change. I was going to > write about it and say here are the webrev links: > Incremental: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/ > > Full webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/ > > This change focused only on refactoring the tests to be more manageable, > readable, maintainable. As all tests are looking at allocations, I moved > common code to a java class: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/ > test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ > HeapMonitor.java.patch > > And then most tests call into that class to turn on/off the sampling, > allocate, etc. This has removed almost 500 lines of test code so I'm happy > about that. > > Thanks for your changes, a bit of relics of previous versions :). I've > already integrated them into my code and will make a new webrev end of this > week with a bit of refactor of the code handling the tlab slow path. I find > it could use a bit of refactoring to make it easier to follow so I'm going > to take a stab at it this week. > > Any other issues/comments? > > Thanks! > Jc > > > On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn wrote: > >> Hi JC, >> >> I saw a webrev.12 in the directory, with only test changes(11->12), so I >> took that version. >> I had a look and tested the tests, worked fine! >> >> First glance at the code (looking at full v12) some minor things below, >> mostly unused stuff. >> >> Thanks, Robbin >> >> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp >> --- a/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct 16 >> 16:54:06 2017 +0200 >> +++ b/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct 16 >> 17:42:42 2017 +0200 >> @@ -211,2 +211,3 @@ >> void initialize(int max_storage) { >> + // validate max_storage to sane value ? What would 0 mean ? >> MutexLocker mu(HeapMonitor_lock); >> @@ -227,8 +228,4 @@ >> bool initialized() { return _initialized; } >> - volatile bool *initialized_address() { return &_initialized; } >> >> private: >> - // Protects the traces currently sampled (below). >> - volatile intptr_t _stack_storage_lock[1]; >> - >> // The traces currently sampled. >> @@ -313,3 +310,2 @@ >> _initialized(false) { >> - _stack_storage_lock[0] = 0; >> } >> @@ -532,13 +528,2 @@ >> >> -// Delegate the initialization question to the underlying storage system. >> -bool HeapMonitoring::initialized() { >> - return StackTraceStorage::storage()->initialized(); >> -} >> - >> -// Delegate the initialization question to the underlying storage system. >> -bool *HeapMonitoring::initialized_address() { >> - return >> - const_cast(StackTraceStorage::storage()->initialized_ >> address()); >> -} >> - >> void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) { >> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp >> --- a/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct 16 >> 16:54:06 2017 +0200 >> +++ b/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct 16 >> 17:42:42 2017 +0200 >> @@ -35,3 +35,2 @@ >> static uint64_t _rnd; >> - static bool _initialized; >> static jint _monitoring_rate; >> @@ -92,7 +91,2 @@ >> >> - // Is the profiler initialized and where is the address to the >> initialized >> - // boolean. >> - static bool initialized(); >> - static bool *initialized_address(); >> - >> // Called when o is to be sampled from a given thread and a given size. >> >> >> >> On 10/10/2017 12:57 AM, JC Beyler wrote: >> >>> Dear all, >>> >>> Thread-safety is back!! Here is the update webrev: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/ >>> >>> Full webrev is here: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/ >>> >>> In order to really test this, I needed to add this so thought now was a >>> good time. It required a few changes here for the creation to ensure >>> correctness and safety. Now we keep the static pointer but clear the data >>> internally so on re-initialize, it will be a bit more costly than before. I >>> don't think this is a huge use-case so I did not think it was a problem. I >>> used the internal MutexLocker, I think I used it well, let me know. >>> >>> I also added three tests: >>> >>> 1) Stack depth test: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >>> eapMonitorStackDepthTest.java.patch >>> >>> This test shows that the maximum stack depth system is working. >>> >>> 2) Thread safety: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >>> eapMonitorThreadTest.java.patch >>> >>> The test creates 24 threads and they all allocate at the same time. The >>> test then checks it does find samples from all the threads. >>> >>> 3) Thread on/off safety >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >>> eapMonitorThreadOnOffTest.java.patch >>> >>> The test creates 24 threads that all allocate a bunch of memory. Then >>> another thread turns the sampling on/off. >>> >>> Btw, both tests 2 & 3 failed without the locks. >>> >>> As I worked on this, I saw a lot of places where the tests are doing >>> very similar things, I'm going to clean up the code a bit and make a >>> HeapAllocator class that all tests can call directly. This will greatly >>> simplify the code. >>> >>> Thanks for any comments/criticisms! >>> Jc >>> >>> >>> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler >> jcbeyler at google.com>> wrote: >>> >>> Dear all, >>> >>> Small update to the webrev: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ < >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/> >>> >>> Full webrev is here: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ < >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/> >>> >>> I updated a bit of the naming, removed a TODO comment, and I added a >>> test for testing the sampling rate. I also updated the maximum stack depth >>> to 1024, there is no >>> reason to keep it so small. I did a micro benchmark that tests the >>> overhead and it seems relatively the same. >>> >>> I compared allocations from a stack depth of 10 and allocations from >>> a stack depth of 1024 (allocations are from the same helper method in >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi >>> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/ >>> MyPackage/HeapMonitorStatRateTest.java >>> >> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor >>> /MyPackage/HeapMonitorStatRateTest.java>): >>> - For an array of 1 integer allocated in a loop; stack >>> depth 1024 vs stack depth 10: 1% slower >>> - For an array of 200k integers allocated in a loop; >>> stack depth 1024 vs stack depth 10: 3% slower >>> >>> So basically now moving the maximum stack depth to 1024 but we only >>> copy over the stack depths actually used. >>> >>> For the next webrev, I will be adding a stack depth test to show >>> that it works and probably put back the mutex locking so that we can see >>> how difficult it is to keep >>> thread safe. >>> >>> Let me know what you think! >>> Jc >>> >>> >>> >>> On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler >> > wrote: >>> >>> Forgot to say that for my numbers: >>> - Not in the test are the actual numbers I got for the various >>> array sizes, I ran the program 30 times and parsed the output; here are the >>> averages and standard >>> deviation: >>> 1000: 1.28% average; 1.13% standard deviation >>> 10000: 1.59% average; 1.25% standard deviation >>> 100000: 1.26% average; 1.26% standard deviation >>> >>> The 1000/10000/100000 are the sizes of the arrays being >>> allocated. These are allocated 100k times and the sampling rate is 111 >>> times the size of the array. >>> >>> Thanks! >>> Jc >>> >>> >>> On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler >> > wrote: >>> >>> Hi all, >>> >>> After a bit of a break, I am back working on this :). As >>> before, here are two webrevs: >>> >>> - Full change set: http://cr.openjdk.java.net/~ra >>> sbold/8171119/webrev.09/ >> asbold/8171119/webrev.09/> >>> - Compared to version 8: http://cr.openjdk.java.net/~ra >>> sbold/8171119/webrev.08_09/ >> asbold/8171119/webrev.08_09/> >>> (This version is compared to version 8 I last showed >>> but ported to the new folder hierarchy) >>> >>> In this version I have: >>> - Handled Thomas' comments from his email of 07/03: >>> - Merged the logging to be standard >>> - Fixed up the code a bit where asked >>> - Added some notes about the code not being >>> thread-safe yet >>> - Removed additional dead code from the version that >>> modifies interpreter/c1/c2 >>> - Fixed compiler issues so that it compiles with >>> --disable-precompiled-header >>> - Tested with ./configure --with-boot-jdk= >>> --with-debug-level=slowdebug --disable-precompiled-headers >>> >>> Additionally, I added a test to check the sanity of the >>> sampler: HeapMonitorStatCorrectnessTest >>> (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te >>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >>> HeapMonitorStatCorrectnessTest.java.patch >> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit >>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch >>> >) >>> - This allocates a number of arrays and checks that we >>> obtain the number of samples we want with an accepted error of 5%. I tested >>> it 100 times and it >>> passed everytime, I can test more if wanted >>> - Not in the test are the actual numbers I got for the >>> various array sizes, I ran the program 30 times and parsed the output; here >>> are the averages and >>> standard deviation: >>> 1000: 1.28% average; 1.13% standard deviation >>> 10000: 1.59% average; 1.25% standard deviation >>> 100000: 1.26% average; 1.26% standard deviation >>> >>> What this means is that we were always at about 1~2% of the >>> number of samples the test expected. >>> >>> Let me know what you think, >>> Jc >>> >>> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler < >>> jcbeyler at google.com > wrote: >>> >>> Hi all, >>> >>> I apologize, I have not yet handled your remarks but >>> thought this new webrev would also be useful to see and comment on perhaps. >>> >>> Here is the latest webrev, it is generated slightly >>> different than the others since now I'm using webrev.ksh without the -N >>> option: >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ < >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/> >>> >>> And the webrev.07 to webrev.08 diff is here: >>> http://cr.openjdk.java.net/~ra >>> sbold/8171119/webrev.07_08/ >> asbold/8171119/webrev.07_08/> >>> >>> (Let me know if it works well) >>> >>> It's a small change between versions but it: >>> - provides a fix that makes the average sample rate >>> correct (more on that below). >>> - fixes the code to actually have it play nicely with >>> the fast tlab refill >>> - cleaned up a bit the JVMTI text and now use >>> jvmtiFrameInfo >>> - moved the capability to be onload solo >>> >>> With this webrev, I've done a small study of the random >>> number generator we use here for the sampling rate. I took a small program >>> and it can be simplified to: >>> >>> for (outer loop) >>> for (inner loop) >>> int[] tmp = new int[arraySize]; >>> >>> - I've fixed the outer and inner loops to being 800 for >>> this experiment, meaning we allocate 640000 times an array of a given array >>> size. >>> >>> - Each program provides the average sample size used for >>> the whole execution >>> >>> - Then, I ran each variation 30 times and then >>> calculated the average of the average sample size used for various array >>> sizes. I selected the array size to >>> be one of the following: 1, 10, 100, 1000. >>> >>> - When compared to 512kb, the average sample size of 30 >>> runs: >>> 1: 4.62% of error >>> 10: 3.09% of error >>> 100: 0.36% of error >>> 1000: 0.1% of error >>> 10000: 0.03% of error >>> >>> What it shows is that, depending on the number of >>> samples, the average does become better. This is because with an allocation >>> of 1 element per array, it >>> will take longer to hit one of the thresholds. This is >>> seen by looking at the sample count statistic I put in. For the same number >>> of iterations (800 * >>> 800), the different array sizes provoke: >>> 1: 62 samples >>> 10: 125 samples >>> 100: 788 samples >>> 1000: 6166 samples >>> 10000: 57721 samples >>> >>> And of course, the more samples you have, the more >>> sample rates you pick, which means that your average gets closer using that >>> math. >>> >>> Thanks, >>> Jc >>> >>> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler < >>> jcbeyler at google.com > wrote: >>> >>> Thanks Robbin, >>> >>> This seems to have worked. When I have the next >>> webrev ready, we will find out but I'm fairly confident it will work! >>> >>> Thanks agian! >>> Jc >>> >>> On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn < >>> robbin.ehn at oracle.com > wrote: >>> >>> Hi JC, >>> >>> On 06/29/2017 12:15 AM, JC Beyler wrote: >>> >>> B) Incremental changes >>> >>> >>> I guess the most common work flow here is using >>> mq : >>> hg qnew fix_v1 >>> edit files >>> hg qrefresh >>> hg qnew fix_v2 >>> edit files >>> hg qrefresh >>> >>> if you do hg log you will see 2 commits >>> >>> webrev.ksh -r -2 -o my_inc_v1_v2 >>> webrev.ksh -o my_full_v2 >>> >>> >>> In your .hgrc you might need: >>> [extensions] >>> mq = >>> >>> /Robbin >>> >>> >>> Again another newbiew question here... >>> >>> For showing the incremental changes, is >>> there a link that explains how to do that? I apologize for my newbie >>> questions all the time :) >>> >>> Right now, I do: >>> >>> ksh ../webrev.ksh -m -N >>> >>> That generates a webrev.zip and send it to >>> Chuck Rasbold. He then uploads it to a new webrev. >>> >>> I tried commiting my change and adding a >>> small change. Then if I just do ksh ../webrev.ksh without any options, it >>> seems to produce a similar >>> page but now with only the changes I had (so >>> the 06-07 comparison you were talking about) and a changeset that has it >>> all. I imagine that is >>> what you meant. >>> >>> Which means that my workflow would become: >>> >>> 1) Make changes >>> 2) Make a webrev without any options to show >>> just the differences with the tip >>> 3) Amend my changes to my local commit so >>> that I have it done with >>> 4) Go to 1 >>> >>> Does that seem correct to you? >>> >>> Note that when I do this, I only see the >>> full change of a file in the full change set (Side note here: now the page >>> says change set and not >>> patch, which is maybe why Serguei was having >>> issues?). >>> >>> Thanks! >>> Jc >>> >>> >>> >>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn < >>> robbin.ehn at oracle.com >> robbin.ehn at oracle.com >>> >> wrote: >>> >>> Hi, >>> >>> On 06/28/2017 12:04 AM, JC Beyler wrote: >>> >>> Dear Thomas et al, >>> >>> Here is the newest webrev: >>> http://cr.openjdk.java.net/~ra >>> sbold/8171119/webrev.07/ >> asbold/8171119/webrev.07/> >>> >> asbold/8171119/webrev.07/ >> asbold/8171119/webrev.07/>> >>> >>> >>> >>> You have some more bits to in there but >>> generally this looks good and really nice with more tests. >>> I'll do and deep dive and re-test this >>> when I get back from my long vacation with whatever patch version you have >>> then. >>> >>> Also I think it's time you provide >>> incremental (v06->07 changes) as well as complete change-sets. >>> >>> Thanks, Robbin >>> >>> >>> >>> >>> Thomas, I "think" I have answered >>> all your remarks. The summary is: >>> >>> - The statistic system is up and >>> provides insight on what the heap sampler is doing >>> - I've noticed that, though >>> the sampling rate is at the right mean, we are missing some samples, I have >>> not yet tracked out why >>> (details below) >>> >>> - I've run a tiny benchmark that is >>> the worse case: it is a very tight loop and allocated a small array >>> - In this case, I see no >>> overhead when the system is off so that is a good start :) >>> - I see right now a high >>> overhead in this case when sampling is on. This is not a really too >>> surprising but I'm going to see if >>> this is consistent with our >>> internal implementation. The >>> benchmark is really allocation stressful so I'm not too surprised but I >>> want to do the due diligence. >>> >>> - The statistic system up is up >>> and I have a new test >>> http://cr.openjdk.java.net/~ra >>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito >>> r/MyPackage/HeapMonitorStatTest.java.patch >>> >> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >>> or/MyPackage/HeapMonitorStatTest.java.patch> >>> >> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >>> or/MyPackage/HeapMonitorStatTest.java.patch >>> >> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >>> or/MyPackage/HeapMonitorStatTest.java.patch>> >>> - I did a bit of a study >>> about the random generator here, more details are below but basically it >>> seems to work well >>> >>> - I added a capability but since >>> this is the first time doing this, I was not sure I did it right >>> - I did add a test though for >>> it and the test seems to do what I expect (all methods are failing with the >>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error). >>> - >>> http://cr.openjdk.java.net/~ra >>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito >>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch >>> >> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch> >>> < >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >>> bilityTest.java.patch >>> >> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>> >>> >>> - I still need to figure out >>> what to do about the multi-agent vs single-agent issue >>> >>> - As far as measurements, it >>> seems I still need to look at: >>> - Why we do the 20 random >>> calls first, are they necessary? >>> - Look at the mean of the >>> sampling rate that the random generator does and also what is actually >>> sampled >>> - What is the overhead in >>> terms of memory/performance when on? >>> >>> I have inlined my answers, I think >>> I got them all in the new webrev, let me know your thoughts. >>> >>> Thanks again! >>> Jc >>> >>> >>> On Fri, Jun 23, 2017 at 3:52 AM, >>> Thomas Schatzl >> com> >>> >> thomas.schatzl at oracle.com>> >> thomas.schatzl at oracle.com> >>> >>> >> >>> wrote: >>> >>> Hi, >>> >>> On Wed, 2017-06-21 at 13:45 >>> -0700, JC Beyler wrote: >>> > Hi all, >>> > >>> > First off: Thanks again to >>> Robbin and Thomas for their reviews :) >>> > >>> > Next, I've uploaded a new >>> webrev: >>> > >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ < >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/> >>> >> asbold/8171119/webrev.06/ >> asbold/8171119/webrev.06/>> >>> >> asbold/8171119/webrev.06/ >> asbold/8171119/webrev.06/> >>> >> asbold/8171119/webrev.06/ >> asbold/8171119/webrev.06/>>> >>> >>> > >>> > Here is an update: >>> > >>> > - @Robbin, I forgot to say >>> that yes I need to look at implementing >>> > this for the other >>> architectures and testing it before it is all >>> > ready to go. Is it common to >>> have it working on all possible >>> > combinations or is there a >>> subset that I should be doing first and we >>> > can do the others later? >>> > - I've tested slowdebug, >>> built and ran the JTreg tests I wrote with >>> > slowdebug and fixed a few >>> more issues >>> > - I've refactored a bit of >>> the code following Thomas' comments >>> > - I think I've handled >>> all the comments from Thomas (I put >>> > comments inline below for >>> the specifics) >>> >>> Thanks for handling all those. >>> >>> > - Following Thomas' comments >>> on statistics, I want to add some >>> > quality assurance tests and >>> find that the easiest way would be to >>> > have a few counters of what >>> is happening in the sampler and expose >>> > that to the user. >>> > - I'll be adding that in >>> the next version if no one sees any >>> > objections to that. >>> > - This will allow me to >>> add a sanity test in JTreg about number of >>> > samples and average of >>> sampling rate >>> > >>> > @Thomas: I had a few >>> questions that I inlined below but I will >>> > summarize the "bigger ones" >>> here: >>> > - You mentioned constants >>> are not using the right conventions, I >>> > looked around and didn't see >>> any convention except normal naming then >>> > for static constants. Is >>> that right? >>> >>> I looked through >>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui < >>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui> >>> >> /display/HotSpot/StyleGui >> /display/HotSpot/StyleGui>> >>> >> /display/HotSpot/StyleGui >> /display/HotSpot/StyleGui> >>> >> /display/HotSpot/StyleGui >> /display/HotSpot/StyleGui>>> >>> de and the rule is to "follow >>> an existing pattern and must have a >>> distinct appearance from other >>> names". Which does not help a lot I >>> guess :/ The GC team started >>> using upper camel case, e.g. >>> SomeOtherConstant, but very >>> likely this is probably not applied >>> consistently throughout. So I >>> am fine with not adding another style >>> (like kMaxStackDepth with the >>> "k" in front with some unknown meaning) >>> is fine. >>> >>> (Chances are you will find >>> that style somewhere used anyway too, >>> apologies if so :/) >>> >>> >>> Thanks for that link, now I know >>> where to look. I used the upper camel case in my code as well then :) I >>> should have gotten them all. >>> >>> >>> > PS: I've also inlined my >>> answers to Thomas below: >>> > >>> > On Tue, Jun 13, 2017 at >>> 8:03 AM, Thomas Schatzl >> > e.com < >>> http://e.com> > wrote: >>> > > Hi all, >>> > > >>> > > On Mon, 2017-06-12 at >>> 11:11 -0700, JC Beyler wrote: >>> > > > Dear all, >>> > > > >>> > > > I've continued working >>> on this and have done the following >>> > > webrev: >>> > > > >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ < >>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/> >>> >> asbold/8171119/webrev.05/ >> asbold/8171119/webrev.05/>> >>> >> asbold/8171119/webrev.05/ >> asbold/8171119/webrev.05/> >>> >> asbold/8171119/webrev.05/ >> asbold/8171119/webrev.05/>>> >>> >>> > > >>> > > [...] >>> > > > Things I still need to >>> do: >>> > > > - Have to fix that >>> TLAB case for the FastTLABRefill >>> > > > - Have to start >>> looking at the data to see that it is >>> > > consistent and does >>> gather the right samples, right frequency, etc. >>> > > > - Have to check the >>> GC elements and what that produces >>> > > > - Run a slowdebug >>> run and ensure I fixed all those issues you >>> > > saw > Robbin >>> > > > >>> > > > Thanks for looking at >>> the webrev and have a great week! >>> > > >>> > > scratching a bit on the >>> surface of this change, so apologies for >>> > > rather shallow comments: >>> > > >>> > > - >>> macroAssembler_x86.cpp:5604: while this is compiler code, and I >>> > > am not sure this is >>> final, please avoid littering the code with >>> > > TODO remarks :) They tend >>> to be candidates for later wtf moments >>> > > only. >>> > > >>> > > Just file a CR for that. >>> > > >>> > Newcomer question: what is >>> a CR and not sure I have the rights to do >>> > that yet ? :) >>> >>> Apologies. CR is a change >>> request, this suggests to file a bug in the >>> bug tracker. And you are >>> right, you can't just create a new account in >>> the OpenJDK JIRA yourselves. :( >>> >>> >>> Ok good to know, I'll continue with >>> my own todo list but I'll work hard on not letting it slip in the webrevs >>> anymore :) >>> >>> >>> I was mostly referring to the >>> "... but it is a TODO" part of that >>> comment in >>> macroassembler_x86.cpp. Comments about the why of the code >>> are appreciated. >>> >>> [Note that I now understand >>> that this is to some degree still work in >>> progress. As long as the final >>> changeset does no contain TODO's I am >>> fine (and it's not a hard >>> objection, rather their use in "final" code >>> is typically limited in my >>> experience)] >>> >>> 5603 // Currently, if this >>> happens, just set back the actual end to >>> where it was. >>> 5604 // We miss a chance to >>> sample here. >>> >>> Would be okay, if explaining >>> "this" and the "why" of missing a chance >>> to sample here would be best. >>> >>> Like maybe: >>> >>> // If we needed to refill >>> TLABs, just set the actual end point to >>> // the end of the TLAB again. >>> We do not sample here although we could. >>> >>> Done with your comment, it works >>> well in my mind. >>> >>> I am not sure whether "miss a >>> chance to sample" meant "we could, but >>> consciously don't because it's >>> not that useful" or "it would be >>> necessary but don't because >>> it's too complicated to do.". >>> >>> Looking at the original >>> comment once more, I am also not sure if that >>> comment shouldn't referring to >>> the "end" variable (not actual_end) >>> because that's the variable >>> that is responsible for taking the sampling >>> path? (Going from the member >>> description of ThreadLocalAllocBuffer). >>> >>> >>> I've moved this code and it no >>> longer shows up here but the rationale and answer was: >>> >>> So.. Yes, end is the variable >>> provoking the sampling. Actual end is the actual end of the TLAB. >>> >>> What was happening here is that the >>> code is resetting _end to point towards the end of the new TLAB. Because, >>> we now have the end for >>> sampling and _actual_end for >>> the actual end, we need to update >>> the actual_end as well. >>> >>> Normally, were we to do the real >>> work here, we would calculate the (end - start) offset, then do: >>> >>> - Set the new end to : start + >>> (old_end - old_start) >>> - Set the actual end like we do >>> here now where it because it is the actual end. >>> >>> Why is this not done here now >>> anymore? >>> - I was still debating which >>> path to take: >>> - Do it in the fast refill >>> code, it has its perks: >>> - In a world where fast >>> refills are happening all the time or a lot, we can augment there the code >>> to do the sampling >>> - Remember what we had as an >>> end before leaving the slowpath and check on return >>> - This is what I'm doing >>> now, it removes the need to go fix up all fast refill paths but if you >>> remain in fast refill paths, >>> you won't get sampling. I >>> have to think of the consequences >>> of that, maybe a future change later on? >>> - I have the >>> statistics now so I'm going to study that >>> -> By the way, >>> though my statistics are showing I'm missing some samples, if I turn off >>> FastTlabRefill, it is the same >>> loss so for now, it seems >>> this does not occur in my simple >>> test. >>> >>> >>> >>> But maybe I am only confused >>> and it's best to just leave the comment >>> away. :) >>> >>> Thinking about it some more, >>> doesn't this not-sampling in this case >>> mean that sampling does not >>> work in any collector that does inline TLAB >>> allocation at the moment? (Or >>> is inline TLAB alloc automatically >>> disabled with sampling >>> somehow?) >>> >>> That would indeed be a bigger >>> TODO then :) >>> >>> >>> Agreed, this remark made me think >>> that perhaps as a first step the new way of doing it is better but I did >>> have to: >>> - Remove the const of the >>> ThreadLocalBuffer remaining and hard_end methods >>> - Move hard_end out of the >>> header file to have a bit more logic there >>> >>> Please let me know what you think >>> of that and if you prefer it this way or changing the fast refills. (I >>> prefer this way now because it >>> is more incremental). >>> >>> >>> > > - calling >>> HeapMonitoring::do_weak_oops() (which should probably be >>> > > called weak_oops_do() like >>> other similar methods) only if string >>> > > deduplication is enabled >>> (in g1CollectedHeap.cpp:4511) seems wrong. >>> > >>> > The call should be at least >>> around 6 lines up outside the if. >>> > >>> > Preferentially in a method >>> like process_weak_jni_handles(), including >>> > additional logging. (No new >>> (G1) gc phase without minimal logging >>> > :)). >>> > Done but really not sure >>> because: >>> > >>> > I put for logging: >>> > log_develop_trace(gc, >>> freelist)("G1ConcRegionFreeing [other] : heap >>> > monitoring"); >>> >>> I would think that "gc, ref" >>> would be more appropriate log tags for >>> this similar to jni handles. >>> (I am als not sure what weak >>> reference handling has to do with >>> G1ConcRegionFreeing, so I am a >>> bit puzzled) >>> >>> >>> I was not sure what to put for the >>> tags or really as the message. I cleaned it up a bit now to: >>> log_develop_trace(gc, >>> ref)("HeapSampling [other] : heap monitoring processing"); >>> >>> >>> >>> > Since weak_jni_handles >>> didn't have logging for me to be inspired >>> > from, I did that but >>> unconvinced this is what should be done. >>> >>> The JNI handle processing does >>> have logging, but only in >>> ReferenceProcessor::process_discovered_references(). >>> In >>> process_weak_jni_handles() >>> only overall time is measured (in a G1 >>> specific way, since only G1 >>> supports disabling reference procesing) :/ >>> >>> The code in ReferenceProcessor >>> prints both time taken >>> referenceProcessor.cpp:254, as >>> well as the count, but strangely only in >>> debug VMs. >>> >>> I have no idea why this >>> logging is that unimportant to only print that >>> in a debug VM. However there >>> are reviews out for changing this area a >>> bit, so it might be useful to >>> wait for that (JDK-8173335). >>> >>> >>> I cleaned it up a bit anyway and >>> now it returns the count of objects that are in the system. >>> >>> >>> > > - the change doubles the >>> size of >>> > > >>> CollectedHeap::allocate_from_tlab_slow() above the "small and nice" >>> > > threshold. Maybe it could >>> be refactored a bit. >>> > Done I think, it looks >>> better to me :). >>> >>> In >>> ThreadLocalAllocBuffer::handle_sample() I think the >>> set_back_actual_end()/pick_next_sample() >>> calls could be hoisted out of >>> the "if" :) >>> >>> >>> Done! >>> >>> >>> > > - >>> referenceProcessor.cpp:261: the change should add logging about >>> > > the number of references >>> encountered, maybe after the corresponding >>> > > "JNI weak reference count" >>> log message. >>> > Just to double check, are >>> you saying that you'd like to have the heap >>> > sampler to keep in store how >>> many sampled objects were encountered in >>> > the >>> HeapMonitoring::weak_oops_do? >>> > - Would a return of the >>> method with the number of handled >>> > references and logging that >>> work? >>> >>> Yes, it's fine if >>> HeapMonitoring::weak_oops_do() only returned the >>> number of processed weak oops. >>> >>> >>> Done also (but I admit I have not >>> tested the output yet) :) >>> >>> >>> > - Additionally, would you >>> prefer it in a separate block with its >>> > GCTraceTime? >>> >>> Yes. Both kinds of information >>> is interesting: while the time taken is >>> typically more important, the >>> next question would be why, and the >>> number of references typically >>> goes a long way there. >>> >>> See above though, it is >>> probably best to wait a bit. >>> >>> >>> Agreed that I "could" wait but, if >>> it's ok, I'll just refactor/remove this when we get closer to something >>> final. Either, JDK-8173335 >>> has gone in and I will notice it >>> now or it will soon and I can change it then. >>> >>> >>> > > - >>> threadLocalAllocBuffer.cpp:331: one more "TODO" >>> > Removed it and added it to >>> my personal todos to look at. >>> > > > >>> > > - >>> threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class >>> > > documentation should be >>> updated about the sampling additions. I >>> > > would have no clue what >>> the difference between "actual_end" and >>> > > "end" would be from the >>> given information. >>> > If you are talking about the >>> comments in this file, I made them more >>> > clear I hope in the new >>> webrev. If it was somewhere else, let me know >>> > where to change. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Oct 23 16:19:16 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 23 Oct 2017 09:19:16 -0700 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com> References: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com> Message-ID: <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com> Hi Tobias On 10/23/17 1:04 AM, Tobias Hartmann wrote: > Hi Vladimir, > > thanks for the review! > > On 20.10.2017 18:43, Vladimir Kozlov wrote: >> On 10/20/17 9:36 AM, Vladimir Kozlov wrote: >>> Hmm. Is this only LoadP or general problem? > > This is a general problem with nodes that compute their type not based > on immediate inputs. I think we need to file a bug or rfe to fix other cases too. > >>> May be add code to next lines when m->is_AddP() : >>> >>> 1734???????? if (m->bottom_type() != type(m)) { // If not already >>> bottomed out >>> 1735?????????? worklist.push(m);???? // Propagate change to user > > Where should I add that code exactly? My fix already checks for "ut != > type(u)". My bad - I forgot that raw LoadP may not change its type but you still want to push its users when n (AddP) change its type. > >>> I think we should do similar to PhaseIterGVN::add_users_to_worklist(). >> >> Hmm, PhaseIterGVN::add_users_to_worklist() is not good example - it >> only puts near loads/stores. Should we fix it too? > > Yes, I think it makes sense to update add_users_to_worklist() as well: > http://cr.openjdk.java.net/~thartmann/8188785/webrev.01/ In add_users_to_worklist() you don't need to check type ut != > type(u) - just push node on worklist. Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP to cover stores too. The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type of the field somewhere so they should be on worklist. > >> Do we have other cases when we calculate type based not on immediate >> inputs but their inputs? > > Yes, see code right above my changes: > ? // CmpU nodes can get their type information from two nodes up in the > ? // graph (instead of from the nodes immediately above). Make sure they > ? // are added to the worklist if nodes they depend on are updated, since > ? // they could be missed and get wrong types otherwise. > > http://hg.openjdk.java.net/jdk10/hs/file/6126617b8508/src/hotspot/share/opto/phaseX.cpp#l1738 > Okay. Thanks, Vladimir > > The same goes for CallNodes and counted loop exit conditions (see > surrounding code). > > I'm not aware of any other cases. > > Thanks, > Tobias > >>> On 10/20/17 1:04 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8188785 >>>> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/ >>>> >>>> Since 8186777 [1], we require two loads to retrieve the java mirror >>>> from a klass oop: >>>> >>>> LoadP(LoadP(AddP(klass_oop, java_mirror_offset))) >>>> >>>> The problem is that now the type of the outermost LoadP does not >>>> depend on the inner LoadP (which has a raw pointer type) but on the >>>> type of the AddP which is one level up. CPP only propagates the >>>> types downwards to the direct users and as a result, the mirror >>>> LoadP ends up with an incorrect (too narrow/optimistic) type. >>>> >>>> I've verified the fix with the failing test and also verified that >>>> 8188835 [2] is a duplicate. >>>> >>>> Gory details: >>>> During CCP, we compute the type of a Phi that merges oops of type A >>>> and B where B is a subtype of A. Since the type of the A input was >>>> not computed yet (it was initialized to TOP at the beginning of >>>> CCP), the Phi temporarily ends up with type B (i.e. with a type that >>>> is too narrow/optimistic). This type is propagated downwards and is >>>> being used to optimize a java mirror load from the klass oop: >>>> >>>> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi))))))) >>>> >>>> The mirror load is then folded to TypeInstPtr::make(B) which is not >>>> correct because the oop can be of type A at runtime. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8186777 >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8188835 From dean.long at oracle.com Tue Oct 24 00:27:45 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 23 Oct 2017 17:27:45 -0700 Subject: RFR(XS): 8189649: AOT: assert(caller_frame.cb()->as_nmethod_or_null() == cm) failed: expect top frame nmethod Message-ID: <8692a329-0811-4a78-3937-8a244863737f@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8189649 http://cr.openjdk.java.net/~dlong/8189649/webrev/ We just need to relax the assert to allow any compiled method. dl From vladimir.kozlov at oracle.com Tue Oct 24 02:22:32 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 23 Oct 2017 19:22:32 -0700 Subject: RFR(XS): 8189649: AOT: assert(caller_frame.cb()->as_nmethod_or_null() == cm) failed: expect top frame nmethod In-Reply-To: <8692a329-0811-4a78-3937-8a244863737f@oracle.com> References: <8692a329-0811-4a78-3937-8a244863737f@oracle.com> Message-ID: <5f8da8a8-489e-f9b7-388b-dfe61c359413@oracle.com> Good. Thanks, Vladimir On 10/23/17 5:27 PM, dean.long at oracle.com wrote: > https://bugs.openjdk.java.net/browse/JDK-8189649 > > http://cr.openjdk.java.net/~dlong/8189649/webrev/ > > > We just need to relax the assert to allow any compiled method. > > > dl > From dean.long at oracle.com Tue Oct 24 04:47:17 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 23 Oct 2017 21:47:17 -0700 Subject: RFR(XS): 8189649: AOT: assert(caller_frame.cb()->as_nmethod_or_null() == cm) failed: expect top frame nmethod In-Reply-To: <5f8da8a8-489e-f9b7-388b-dfe61c359413@oracle.com> References: <8692a329-0811-4a78-3937-8a244863737f@oracle.com> <5f8da8a8-489e-f9b7-388b-dfe61c359413@oracle.com> Message-ID: <53eda7fb-bb1e-e432-31fa-747541d60b95@oracle.com> Thanks Vladimir. dl On 10/23/17 7:22 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 10/23/17 5:27 PM, dean.long at oracle.com wrote: >> https://bugs.openjdk.java.net/browse/JDK-8189649 >> >> http://cr.openjdk.java.net/~dlong/8189649/webrev/ >> >> >> We just need to relax the assert to allow any compiled method. >> >> >> dl >> From goetz.lindenmaier at sap.com Tue Oct 24 07:24:43 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 24 Oct 2017 07:24:43 +0000 Subject: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: <374596dc72ed4b54bd9e3cc43e221d72@sap.com> Hi Ogata, > It is helpful if you could explain what is the difference of the JIT > behavior when the code cache is large enough and when it is the minimum If the code cache is not large enough, code can get evicted and recompiled. Then the compiler threads keep concurring for cpu with the application threads, assuming the application utilizes all cpus for application threads. Generating bigger code obviously will bring the application faster into this situation. Please, as this is a compiler issue, it should be discussed on hotspot-compiler-dev. Best regards, Goetz. > -----Original Message----- > From: Kazunori Ogata [mailto:OGATAK at jp.ibm.com] > Sent: Freitag, 20. Oktober 2017 08:32 > To: Lindenmaier, Goetz > Cc: hotspot-dev at openjdk.java.net; Doerr, Martin ; > ppc-aix-port-dev at openjdk.java.net > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other > platforms > > Hi Goetz, > > Thank you for your comment. OK, I'll evaluate the patch more by comparing > the minimum code cache sizes and the performance on the cache size. > > It is helpful if you could explain what is the difference of the JIT > behavior when the code cache is large enough and when it is the minimum > size. It seems almost the same to me because all the methods that needed > to be compiled should be compiled in both cases, but I may miss something. > > > By the way, the benchmark I confirmed performance improvement was TPC- > DS > q96, but I measured the code cache size of SPECjbb2015 by my mistake. I'll > compare the minimum code cache sizes and the performance of both > benchmarks, as this patch will affect all applications. > > > Regards, > Ogata > > > > From: "Lindenmaier, Goetz" > To: Kazunori Ogata , "Doerr, Martin" > > Cc: "ppc-aix-port-dev at openjdk.java.net" > , "hotspot-dev at openjdk.java.net" > > Date: 2017/10/19 20:03 > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > same as other platforms > > > > Hi Kazunori, > > To me, this seems to be a very large increase. > Considering that not only the required code cache size but also the > compiler cpu time will increase in this magnitude, this seems to be > a rather risky step that should be tested for its benefits on systems > that are highly contended. > > In this case, you probably had enough space in the code cache so that > no recompilation etc. happened. > > To further look at this I could think of > 1. finding the minimal code cache size with the old flags where > the JIT is not disabled > 2. finding the same size for the new flag settings > --> How much more is needed for the new settings? > > Then you should compare the performance with the bigger > code cache size for both, and see whether there still is performance > improvement, or whether it's eaten up by more compile time. > I.e. you should have a setup where compiler threads and application > threads compete for the available CPUs. > > What do you think? > > Best regards, > Goetz. > > > -----Original Message----- > > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > > Behalf Of Kazunori Ogata > > Sent: Donnerstag, 19. Oktober 2017 08:43 > > To: Doerr, Martin > > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as > other > > platforms > > > > Hi Martin, > > > > Thank you for your comment. I checked the code cache size by running > > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB). > > > > The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb > > (+12%). Is the increase too large? > > > > > > The raw output of -XX:+PrintCodeCache are: > > > > === Original === > > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb > > max_used=13884Kb free=638595Kb > > bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000] > > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb > > max_used=26593Kb > > free=625886Kb > > bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000] > > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb > > free=4254Kb > > bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000] > > total_blobs=16606 nmethods=10265 adapters=653 > > compilation: enabled > > > > > > === Modified (webrev.00) === > > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb > > max_used=18516Kb free=633964Kb > > bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000] > > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb > > max_used=26963Kb > > free=625516Kb > > bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000] > > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb > > free=4232Kb > > bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000] > > total_blobs=16561 nmethods=10295 adapters=653 > > compilation: enabled > > > > > > Regards, > > Ogata > > > > > > > > > > From: "Doerr, Martin" > > To: Kazunori Ogata , "hotspot- > > dev at openjdk.java.net" > > , "ppc-aix-port-dev at openjdk.java.net" > > > > Date: 2017/10/18 19:43 > > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > > same as other platforms > > > > > > > > Hi Ogata, > > > > sorry for the delay. I had missed this one. > > > > The change looks feasible to me. > > > > It may only impact the utilization of the Code Cache. Can you evaluate > > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? > > > > Thanks and best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > > Behalf > > Of Kazunori Ogata > > Sent: Freitag, 29. September 2017 08:42 > > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as > > other platforms > > > > Hi all, > > > > Please review a change for JDK-8188131. > > > > Bug report: > > https://urldefense.proofpoint.com/v2/url?u=https- > > 3A__bugs.openjdk.java.net_browse_JDK- > > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > > 73lAZxkNhGsrlDkk- > > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e= > > > > Webrev: > > https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__cr.openjdk.java.net_- > > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=p- > > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB- > > i9r6lTggpGH3Np8kmONkkMAg&e= > > > > > > This change increases the default values of FreqInlineSize and > > InlineSmallCode in ppc64 to 325 and 2500, respectively. These values > are > > the same as aarch64. The performance of TPC-DS Q96 was improved by > > about > > 6% with this change. > > > > > > Regards, > > Ogata > > > > > > > > > From tobias.hartmann at oracle.com Tue Oct 24 07:33:52 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 24 Oct 2017 09:33:52 +0200 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com> References: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com> <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com> Message-ID: <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com> Hi Vladimir, On 23.10.2017 18:19, Vladimir Kozlov wrote: > I think we need to file a bug or rfe to fix other cases too. Okay, it's difficult to file a bug for a not yet known issue so I've filed an RFE to look into this: https://bugs.openjdk.java.net/browse/JDK-8189856 > In add_users_to_worklist() you don't need to check type ut != > > type(u) - just push node on worklist. Right, fixed: http://cr.openjdk.java.net/~thartmann/8188785/webrev.02/ > Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP to > cover stores too. > The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type of > the field somewhere so they should be on worklist. Why is that necessary? If the raw LoadP changes its type, all direct users will be added to the worklist anyway. The problem in the failing case is that the type of the AddP changed but the type of the raw LoadP didn't (it stays raw). However, the InstPtr load depends on the type of the AddP: InstPtrLoadP(RawLoadP(AddP(..))) Do you expect other memory users of the raw LoadP to depend on the type of the AddP? I think we should only add handling for known special cases but here's the corresponding webrev: http://cr.openjdk.java.net/~thartmann/8188785/webrev.03/ Thanks, Tobias From rwestrel at redhat.com Tue Oct 24 08:37:01 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 24 Oct 2017 10:37:01 +0200 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: <63b109f1-48af-f594-588b-519364ad931f@oracle.com> References: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com> <63b109f1-48af-f594-588b-519364ad931f@oracle.com> Message-ID: Hi Nils, Thanks for going over the patch and testing it. Here is an updated webrev: http://cr.openjdk.java.net/~roland/8186027/webrev.01/ I also made the changes you suggested, except for: > src/hotspot/share/opto/loopopts.cpp: > @@ -1729,7 +1729,7 @@ > Node* l = cl->outer_loop(); > Node* tail = cl->outer_loop_tail(); > IfNode* le = cl->outer_loop_end(); > - Node* sfpt = cl->outer_safepoint(); > + Node* sfpt = (Node*) cl->outer_safepoint(); > > src/hotspot/share/opto/opaquenode.cpp > @@ -144,7 +144,7 @@ > assert(iter_estimate > 0, "broken"); > if ((jlong)scaled_iters != scaled_iters_long || iter_estimate <= > short_scaled_iters) { > // Remove outer loop and safepoint (too few iterations) > - Node* outer_sfpt = inner_cl->outer_safepoint(); > + Node* outer_sfpt = (Node*) inner_cl->outer_safepoint(); for which I used the patch below instead (I ran the build with precompiled headers disabled to verify that change). Roland. diff --git a/src/hotspot/share/opto/loopopts.cpp b/src/hotspot/share/opto/loopopts.cpp --- a/src/hotspot/share/opto/loopopts.cpp +++ b/src/hotspot/share/opto/loopopts.cpp @@ -26,6 +26,7 @@ #include "memory/allocation.inline.hpp" #include "memory/resourceArea.hpp" #include "opto/addnode.hpp" +#include "opto/callnode.hpp" #include "opto/castnode.hpp" #include "opto/connode.hpp" #include "opto/castnode.hpp" @@ -845,7 +846,6 @@ assert(n_loop->_parent == outer_loop, "broken loop tree"); } #endif - int count = phi->replace_edge(n, n->in(MemNode::Memory)); assert(count > 0, "inconsistent phi"); // Compute latest point this store can go diff --git a/src/hotspot/share/opto/opaquenode.cpp b/src/hotspot/share/opto/opaquenode.cpp --- a/src/hotspot/share/opto/opaquenode.cpp +++ b/src/hotspot/share/opto/opaquenode.cpp @@ -24,6 +24,7 @@ #include "precompiled.hpp" #include "opto/addnode.hpp" +#include "opto/callnode.hpp" #include "opto/cfgnode.hpp" #include "opto/connode.hpp" #include "opto/divnode.hpp" From ionutb83 at yahoo.com Tue Oct 24 09:05:38 2017 From: ionutb83 at yahoo.com (Ionut) Date: Tue, 24 Oct 2017 09:05:38 +0000 (UTC) Subject: Vectorized Loop Unrolling on x64? References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> Message-ID: <1302875736.3225693.1508835938207@mail.yahoo.com> Hello All, ? ? I want to ask you about?https://bugs.openjdk.java.net/browse/JDK-8129920?- Vectorized loop unrolling?which says it is applicable?only?for x86 targets. Do you plan to port this for x64 as well? Or I miss something here? RegardsIonut -------------- next part -------------- An HTML attachment was scrubbed... URL: From nils.eliasson at oracle.com Tue Oct 24 09:06:50 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 24 Oct 2017 11:06:50 +0200 Subject: Vectorized Loop Unrolling on x64? In-Reply-To: <1302875736.3225693.1508835938207@mail.yahoo.com> References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> Message-ID: <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> Hi Ionut, In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. Regards, Nils Eliasson On 2017-10-24 11:05, Ionut wrote: > Hello All, > > I want to ask you about > https://bugs.openjdk.java.net/browse/JDK-8129920* - Vectorized loop > unrolling *which says it is applicable _only__for x86 targets_. Do you > plan to port this for x64 as well? Or I miss something here? > > Regards > Ionut -------------- next part -------------- An HTML attachment was scrubbed... URL: From ionutb83 at yahoo.com Tue Oct 24 10:23:57 2017 From: ionutb83 at yahoo.com (Ionut) Date: Tue, 24 Oct 2017 10:23:57 +0000 (UTC) Subject: Vectorized Loop Unrolling on x64? In-Reply-To: <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> Message-ID: <1933684779.3254078.1508840637072@mail.yahoo.com> Hi?Nils, ? Thanks, it is clear. However, I have tried a simple example (e.g.??just iterating through an array and do the sum?using JMH) on my x64 Linux and it seems to not be vectorized ...? Below initial source code and assembly.?Could you please provide me any hint, am I doing something wrong? JDK is 9.0.1 Source code: @BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })@State(Scope.Benchmark)public class Sum1ToNArray {? ? private int[] array; ? ? public static void main(String[] args) {? ? ? ? Options opt = ? ? ? ? ? ? new OptionsBuilder()? ? ? ? ? ? ? ? .include(Sum1ToNArray.class.getSimpleName())? ? ? ? ? ? ? ? .build();? ? ? ? new Runner(opt).run();? ? } ? ? @Setup(Level.Trial)? ? public void setUp() {? ? ? ? this.array = new int[100_000_000];? ? ? ? for (int i = 0; i < array.length; i++)? ? ? ? ? ? array[i] = i + 1;? ? } ? ? @Benchmark? ? public long hotMethod() { ? ? ? ? long sum = 0;? ? ? ? for (int i = 0; i < array.length; i++) {? ? ? ? ? ? sum += array[i];? ? ? ? }? ? ? ? return sum;? ? }} Assembly:....[Hottest Region 1]..............................................................................c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10]?11.08%? ? 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14]? 0.30%? ? 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c]? 8.86%? ? 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? [r14+r11*4+0x28]?10.49%? ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24]? 0.38%? ? 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20]? 0.03%? ? 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c]? 0.23%? ? 0.22%? ? ???? 0x00007f7bf1bff13c: add? ? rsi,rdx?10.58%? ?18.59%? ???? 0x00007f7bf1bff13f: add? ? rbp,rsi? 0.32%? ? 0.17%? ? ???? 0x00007f7bf1bff142: add? ? r13,rbp? 0.05%? ? 0.04%? ? ???? 0x00007f7bf1bff145: add? ? rdi,r13?26.10%? ?28.47%? ???? 0x00007f7bf1bff148: add? ? rbx,rdi? 5.55%? ? 5.48%? ? ???? 0x00007f7bf1bff14b: add? ? rcx,rbx? 5.66%? ? 1.32%? ? ???? 0x00007f7bf1bff14e: add? ? r9,rcx? 7.85%? ? 3.11%? ? ???? 0x00007f7bf1bff151: add? ? rax,r9? ? ? ? ? ? ?;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)?10.19%? ? 5.67%? ? ??? 0x00007f7bf1bff154: add? ? r11d,0x8? ? ? ? ?;*iinc {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)? 0.38%? ? 0.12%? ? ???? 0x00007f7bf1bff158: cmp? ? r11d,r8d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 10 (line 52)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? ? ? ? ? ? ; *lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? ? ? ? ? ? ? ;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) Regards On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson wrote: Hi Ionut, In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. Regards, Nils Eliasson On 2017-10-24 11:05, Ionut wrote: Hello All, ? ? I want to ask you about?https://bugs.openjdk.java.net/browse/JDK-8129920?- Vectorized loop unrolling?which says it is applicable?only?for x86 targets. Do you plan to port this for x64 as well? Or I miss something here? Regards Ionut -------------- next part -------------- An HTML attachment was scrubbed... URL: From OGATAK at jp.ibm.com Tue Oct 24 11:11:53 2017 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Tue, 24 Oct 2017 20:11:53 +0900 Subject: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: Hi Goetz, Thank you for clarification and re-directing discussion to hotspot-compiler-dev ML. I understood the intention of the measurement around the lower bound of the code cache size. I'll post the results when I finish measurements. Regards, Ogata From: "Lindenmaier, Goetz" To: Kazunori Ogata , "'hotspot-compiler-dev at openjdk.java.net'" Cc: "Doerr, Martin" , "ppc-aix-port-dev at openjdk.java.net" Date: 2017/10/24 16:30 Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Hi Ogata, > It is helpful if you could explain what is the difference of the JIT > behavior when the code cache is large enough and when it is the minimum If the code cache is not large enough, code can get evicted and recompiled. Then the compiler threads keep concurring for cpu with the application threads, assuming the application utilizes all cpus for application threads. Generating bigger code obviously will bring the application faster into this situation. Please, as this is a compiler issue, it should be discussed on hotspot-compiler-dev. Best regards, Goetz. > -----Original Message----- > From: Kazunori Ogata [mailto:OGATAK at jp.ibm.com] > Sent: Freitag, 20. Oktober 2017 08:32 > To: Lindenmaier, Goetz > Cc: hotspot-dev at openjdk.java.net; Doerr, Martin ; > ppc-aix-port-dev at openjdk.java.net > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other > platforms > > Hi Goetz, > > Thank you for your comment. OK, I'll evaluate the patch more by comparing > the minimum code cache sizes and the performance on the cache size. > > It is helpful if you could explain what is the difference of the JIT > behavior when the code cache is large enough and when it is the minimum > size. It seems almost the same to me because all the methods that needed > to be compiled should be compiled in both cases, but I may miss something. > > > By the way, the benchmark I confirmed performance improvement was TPC- > DS > q96, but I measured the code cache size of SPECjbb2015 by my mistake. I'll > compare the minimum code cache sizes and the performance of both > benchmarks, as this patch will affect all applications. > > > Regards, > Ogata > > > > From: "Lindenmaier, Goetz" > To: Kazunori Ogata , "Doerr, Martin" > > Cc: "ppc-aix-port-dev at openjdk.java.net" > , "hotspot-dev at openjdk.java.net" > > Date: 2017/10/19 20:03 > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > same as other platforms > > > > Hi Kazunori, > > To me, this seems to be a very large increase. > Considering that not only the required code cache size but also the > compiler cpu time will increase in this magnitude, this seems to be > a rather risky step that should be tested for its benefits on systems > that are highly contended. > > In this case, you probably had enough space in the code cache so that > no recompilation etc. happened. > > To further look at this I could think of > 1. finding the minimal code cache size with the old flags where > the JIT is not disabled > 2. finding the same size for the new flag settings > --> How much more is needed for the new settings? > > Then you should compare the performance with the bigger > code cache size for both, and see whether there still is performance > improvement, or whether it's eaten up by more compile time. > I.e. you should have a setup where compiler threads and application > threads compete for the available CPUs. > > What do you think? > > Best regards, > Goetz. > > > -----Original Message----- > > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > > Behalf Of Kazunori Ogata > > Sent: Donnerstag, 19. Oktober 2017 08:43 > > To: Doerr, Martin > > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as > other > > platforms > > > > Hi Martin, > > > > Thank you for your comment. I checked the code cache size by running > > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB). > > > > The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb > > (+12%). Is the increase too large? > > > > > > The raw output of -XX:+PrintCodeCache are: > > > > === Original === > > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb > > max_used=13884Kb free=638595Kb > > bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000] > > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb > > max_used=26593Kb > > free=625886Kb > > bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000] > > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb > > free=4254Kb > > bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000] > > total_blobs=16606 nmethods=10265 adapters=653 > > compilation: enabled > > > > > > === Modified (webrev.00) === > > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb > > max_used=18516Kb free=633964Kb > > bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000] > > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb > > max_used=26963Kb > > free=625516Kb > > bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000] > > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb > > free=4232Kb > > bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000] > > total_blobs=16561 nmethods=10295 adapters=653 > > compilation: enabled > > > > > > Regards, > > Ogata > > > > > > > > > > From: "Doerr, Martin" > > To: Kazunori Ogata , "hotspot- > > dev at openjdk.java.net" > > , "ppc-aix-port-dev at openjdk.java.net" > > > > Date: 2017/10/18 19:43 > > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > > same as other platforms > > > > > > > > Hi Ogata, > > > > sorry for the delay. I had missed this one. > > > > The change looks feasible to me. > > > > It may only impact the utilization of the Code Cache. Can you evaluate > > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? > > > > Thanks and best regards, > > Martin > > > > > > -----Original Message----- > > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > > Behalf > > Of Kazunori Ogata > > Sent: Freitag, 29. September 2017 08:42 > > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as > > other platforms > > > > Hi all, > > > > Please review a change for JDK-8188131. > > > > Bug report: > > https://urldefense.proofpoint.com/v2/url?u=https- > > 3A__bugs.openjdk.java.net_browse_JDK- > > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > > 73lAZxkNhGsrlDkk- > > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e= > > > > Webrev: > > https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__cr.openjdk.java.net_- > > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=p- > > > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB- > > i9r6lTggpGH3Np8kmONkkMAg&e= > > > > > > This change increases the default values of FreqInlineSize and > > InlineSmallCode in ppc64 to 325 and 2500, respectively. These values > are > > the same as aarch64. The performance of TPC-DS Q96 was improved by > > about > > 6% with this change. > > > > > > Regards, > > Ogata > > > > > > > > > From vladimir.kozlov at oracle.com Tue Oct 24 16:43:13 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Oct 2017 09:43:13 -0700 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com> References: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com> <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com> <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com> Message-ID: <66832428-3a6a-232d-9c57-7efd01cc97a2@oracle.com> On 10/24/17 12:33 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 23.10.2017 18:19, Vladimir Kozlov wrote: >> I think we need to file a bug or rfe to fix other cases too. > > Okay, it's difficult to file a bug for a not yet known issue so I've filed an RFE to look into this: > https://bugs.openjdk.java.net/browse/JDK-8189856 Good. RFE is fine. > >> In add_users_to_worklist() you don't need to check type ut != >> ?> type(u) - just push node on worklist. > > Right, fixed: > http://cr.openjdk.java.net/~thartmann/8188785/webrev.02/ Okay, you are right, lets use this version for the fix. We can do additional changes for 8189856. Thanks, Vladimir > >> Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP to >> cover stores too. >> The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type >> of the field somewhere so they should be on worklist. > > Why is that necessary? If the raw LoadP changes its type, all direct users will be added to the worklist anyway. > > The problem in the failing case is that the type of the AddP changed but the type of the raw LoadP didn't (it stays > raw). However, the InstPtr load depends on the type of the AddP: > > ? InstPtrLoadP(RawLoadP(AddP(..))) > > Do you expect other memory users of the raw LoadP to depend on the type of the AddP? I think we should only add handling > for known special cases but here's the corresponding webrev: > http://cr.openjdk.java.net/~thartmann/8188785/webrev.03/ > > Thanks, > Tobias From ionutb83 at yahoo.com Tue Oct 24 16:46:17 2017 From: ionutb83 at yahoo.com (Ionut) Date: Tue, 24 Oct 2017 16:46:17 +0000 (UTC) Subject: Vectorized Loop Unrolling on x64? In-Reply-To: <1933684779.3254078.1508840637072@mail.yahoo.com> References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> <1933684779.3254078.1508840637072@mail.yahoo.com> Message-ID: <354890084.3509873.1508863577533@mail.yahoo.com> Hello All, ? ?Meanwhile I tested two more other scenarios, as follows: - a[i] = b[i] + c[i]? ? ? ? ? ? ? ? ? ? // where?a, b, c are arrays of ints- a[i] = a[i] + ? ? ? // where might be a constant, etc In both cases they were vectorized, but my initial example (e.g. iterating through the array of ints and computing the sum of elements) is not ... which makes me think this case is currently not supported by JIT. Could you please confirm this? RegardsIonut On Tuesday, October 24, 2017 12:24 PM, Ionut wrote: Hi?Nils, ? Thanks, it is clear. However, I have tried a simple example (e.g.??just iterating through an array and do the sum?using JMH) on my x64 Linux and it seems to not be vectorized ...? Below initial source code and assembly.?Could you please provide me any hint, am I doing something wrong? JDK is 9.0.1 Source code: @BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })@State(Scope.Benchmark)public class Sum1ToNArray {? ? private int[] array; ? ? public static void main(String[] args) {? ? ? ? Options opt = ? ? ? ? ? ? new OptionsBuilder()? ? ? ? ? ? ? ? .include(Sum1ToNArray.class.getSimpleName())? ? ? ? ? ? ? ? .build();? ? ? ? new Runner(opt).run();? ? } ? ? @Setup(Level.Trial)? ? public void setUp() {? ? ? ? this.array = new int[100_000_000];? ? ? ? for (int i = 0; i < array.length; i++)? ? ? ? ? ? array[i] = i + 1;? ? } ? ? @Benchmark? ? public long hotMethod() { ? ? ? ? long sum = 0;? ? ? ? for (int i = 0; i < array.length; i++) {? ? ? ? ? ? sum += array[i];? ? ? ? }? ? ? ? return sum;? ? }} Assembly:....[Hottest Region 1]..............................................................................c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10]?11.08%? ? 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14]? 0.30%? ? 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c]? 8.86%? ? 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? [r14+r11*4+0x28]?10.49%? ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24]? 0.38%? ? 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20]? 0.03%? ? 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c]? 0.23%? ? 0.22%? ? ???? 0x00007f7bf1bff13c: add? ? rsi,rdx?10.58%? ?18.59%? ???? 0x00007f7bf1bff13f: add? ? rbp,rsi? 0.32%? ? 0.17%? ? ???? 0x00007f7bf1bff142: add? ? r13,rbp? 0.05%? ? 0.04%? ? ???? 0x00007f7bf1bff145: add? ? rdi,r13?26.10%? ?28.47%? ???? 0x00007f7bf1bff148: add? ? rbx,rdi? 5.55%? ? 5.48%? ? ???? 0x00007f7bf1bff14b: add? ? rcx,rbx? 5.66%? ? 1.32%? ? ???? 0x00007f7bf1bff14e: add? ? r9,rcx? 7.85%? ? 3.11%? ? ???? 0x00007f7bf1bff151: add? ? rax,r9? ? ? ? ? ? ?;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)?10.19%? ? 5.67%? ? ??? 0x00007f7bf1bff154: add? ? r11d,0x8? ? ? ? ?;*iinc {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)? 0.38%? ? 0.12%? ? ???? 0x00007f7bf1bff158: cmp? ? r11d,r8d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 10 (line 52)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? ? ? ? ? ? ; *lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? ? ? ? ? ? ? ;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) Regards On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson wrote: Hi Ionut, In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. Regards, Nils Eliasson On 2017-10-24 11:05, Ionut wrote: Hello All, ? ? I want to ask you about?https://bugs.openjdk.java.net/browse/JDK-8129920?- Vectorized loop unrolling?which says it is applicable?only?for x86 targets. Do you plan to port this for x64 as well? Or I miss something here? Regards Ionut -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Tue Oct 24 16:51:56 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 24 Oct 2017 18:51:56 +0200 Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load In-Reply-To: <66832428-3a6a-232d-9c57-7efd01cc97a2@oracle.com> References: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com> <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com> <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com> <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com> <66832428-3a6a-232d-9c57-7efd01cc97a2@oracle.com> Message-ID: Hi Vladimir, On 24.10.2017 18:43, Vladimir Kozlov wrote: > Okay, you are right, lets use this version for the fix. We can do additional changes for 8189856. Okay, thanks for reviewing! I'll push webrev.02. Best regards, Tobias >>> Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP >>> to cover stores too. >>> The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type >>> of the field somewhere so they should be on worklist. >> >> Why is that necessary? If the raw LoadP changes its type, all direct users will be added to the worklist anyway. >> >> The problem in the failing case is that the type of the AddP changed but the type of the raw LoadP didn't (it stays >> raw). However, the InstPtr load depends on the type of the AddP: >> >> ?? InstPtrLoadP(RawLoadP(AddP(..))) >> >> Do you expect other memory users of the raw LoadP to depend on the type of the AddP? I think we should only add >> handling for known special cases but here's the corresponding webrev: >> http://cr.openjdk.java.net/~thartmann/8188785/webrev.03/ >> >> Thanks, >> Tobias From nils.eliasson at oracle.com Tue Oct 24 16:59:44 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 24 Oct 2017 18:59:44 +0200 Subject: Vectorized Loop Unrolling on x64? In-Reply-To: <354890084.3509873.1508863577533@mail.yahoo.com> References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> <1933684779.3254078.1508840637072@mail.yahoo.com> <354890084.3509873.1508863577533@mail.yahoo.com> Message-ID: <5da48501-b7ca-f814-3d19-1d482d9bb337@oracle.com> Hi, Array reduction operations is implemented but are disabled in some settings. See excellent blog post by Richard Startin: http://richardstartin.uk/tricking-java-into-adding-up-arrays-faster/ https://bugs.openjdk.java.net/browse/JDK-8188313 https://bugs.openjdk.java.net/browse/JDK-8078563 Regards, Nils Eliasosn On 2017-10-24 18:46, Ionut wrote: > Hello All, > > ? ?Meanwhile I tested two more other scenarios, as follows: > > - a[i] = b[i] + c[i]? ? ? ? ? ? ? ? ? ? // where?a, b, c are arrays of > ints > - a[i] = a[i] + ? ? ? // where might be a > constant, etc > > In both cases they were vectorized, but my initial example (e.g. > iterating through the array of ints and computing the sum of elements) > is not ... which makes me think this case is currently not supported > by JIT. > > Could you please confirm this? > > Regards > Ionut > > > On Tuesday, October 24, 2017 12:24 PM, Ionut wrote: > > > Hi Nils, > > Thanks, it is clear. However, I have tried a simple example (e.g. just > iterating through an array and do the sum using JMH) on my x64 Linux > and it seems to not be vectorized ...? Below initial source code and > assembly. > Could you please provide me any hint, am I doing something wrong? > > *JDK is 9.0.1* > > *_Source code:_* > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS) > @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS) > @Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", > "-Xbatch", "-XX:+UseSuperWord" }) > @State(Scope.Benchmark) > public class _Sum1ToNArray _{ > private int[] array; > > ? ? public static void main(String[] args) { > ? ? ? ? Options opt = > ? new OptionsBuilder() > .include(Sum1ToNArray.class.getSimpleName()) > ? ? ? .build(); > new Runner(opt).run(); > ? ? } > > @Setup(Level.Trial) > ? ? public void setUp() { > this.array = new int[100_000_000]; > for (int i = 0; i < array.length; i++) > ? array[i] = i + 1; > ? ? } > > @Benchmark > public long hotMethod() { > long sum = 0; > for (int i = 0; i < array.length; i++) { > ? sum += array[i]; > ? ? ? ? } > return sum; > ? ? } > } > > *_Assembly:_* > ....[Hottest Region > 1].............................................................................. > c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d > ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9 > ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1 > ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1 > ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114 > ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx > ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d > ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? > ;*lload_1 {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ???? ?; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53) > ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR > [r14+r11*4+0x10] > ?11.08% 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR > [r14+r11*4+0x14] > ? 0.30% 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR > [r14+r11*4+0x18] > ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR > [r14+r11*4+0x2c] > ? 8.86% 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? > [r14+r11*4+0x28] > ?10.49% ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR > [r14+r11*4+0x24] > ? 0.38% 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR > [r14+r11*4+0x20] > ? 0.03% 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR > [r14+r11*4+0x1c] > ? 0.23% 0.22%? ? ???? 0x00007f7bf1bff13c: add rsi,rdx > ?10.58% ?18.59%? ???? 0x00007f7bf1bff13f: add rbp,rsi > ? 0.32% 0.17%? ? ???? 0x00007f7bf1bff142: add r13,rbp > ? 0.05% 0.04%? ? ???? 0x00007f7bf1bff145: add rdi,r13 > ?26.10% ?28.47%? ???? 0x00007f7bf1bff148: add rbx,rdi > ? 5.55% 5.48%? ? ???? 0x00007f7bf1bff14b: add rcx,rbx > ? 5.66% 1.32%? ? ???? 0x00007f7bf1bff14e: add r9,rcx > ? 7.85% 3.11%? ? ???? 0x00007f7bf1bff151: add rax,r9? ? ? ? ? ? > ?;*ladd {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; > - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) > ?10.19% 5.67%? ? ??? 0x00007f7bf1bff154: add r11d,0x8? ? ? ? ?;*iinc > {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ?? ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52) > ? 0.38% 0.12%? ? ???? 0x00007f7bf1bff158: cmp r11d,r8d > ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? > 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 10 > (line 52) > ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d > ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174 > ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? > ? ? ? ? ? ; *lload_1 {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53) > ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR > [r14+r11*4+0x10] > ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? > ? ? ? ? ? ? ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) > > Regards > > > On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson > wrote: > > > Hi Ionut, > In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. > Regards, > Nils Eliasson > > On 2017-10-24 11:05, Ionut wrote: >> Hello All, >> >> ? I want to ask you about >> https://bugs.openjdk.java.net/browse/JDK-8129920*?- Vectorized loop >> unrolling *which says it is applicable _only__for x86 targets_. Do >> you plan to port this for x64 as well? Or I miss something here? >> >> Regards >> Ionut > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Oct 24 17:03:51 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Oct 2017 10:03:51 -0700 Subject: Vectorized Loop Unrolling on x64? In-Reply-To: <354890084.3509873.1508863577533@mail.yahoo.com> References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> <1933684779.3254078.1508840637072@mail.yahoo.com> <354890084.3509873.1508863577533@mail.yahoo.com> Message-ID: You are right - your initial examples are not supported by current HotSpot JIT vectorization. Second example (sum/reduction) could be optimized https://bugs.openjdk.java.net/browse/JDK-8074981 but because generated code is very expensive we limited it to cases where benefit overweights expense: https://bugs.openjdk.java.net/browse/JDK-8078563. Regards, Vladimir On 10/24/17 9:46 AM, Ionut wrote: > Hello All, > > ? ?Meanwhile I tested two more other scenarios, as follows: > > - a[i] = b[i] + c[i]? ? ? ? ? ? ? ? ? ? // where?a, b, c are arrays of ints > - a[i] = a[i] + ? ? ? // where might be a constant, etc > > In both cases they were vectorized, but my initial example (e.g. iterating through the array of ints and computing the > sum of elements) is not ... which makes me think this case is currently not supported by JIT. > > Could you please confirm this? > > Regards > Ionut > > > On Tuesday, October 24, 2017 12:24 PM, Ionut wrote: > > > Hi Nils, > > ? Thanks, it is clear. However, I have tried a simple example (e.g. just iterating through an array and do the sum > using JMH) on my x64 Linux and it seems to not be vectorized ...? Below initial source code and assembly. > Could you please provide me any hint, am I doing something wrong? > > *JDK is 9.0.1* > > *_Source code:_* > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS) > @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS) > @Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" }) > @State(Scope.Benchmark) > public class _Sum1ToNArray _{ > ? ? private int[] array; > > ? ? public static void main(String[] args) { > ? ? ? ? Options opt = > ? ? ? ? ? ? new OptionsBuilder() > ? ? ? ? ? ? ? ? .include(Sum1ToNArray.class.getSimpleName()) > ? ? ? ? ? ? ? ? .build(); > ? ? ? ? new Runner(opt).run(); > ? ? } > > ? ? @Setup(Level.Trial) > ? ? public void setUp() { > ? ? ? ? this.array = new int[100_000_000]; > ? ? ? ? for (int i = 0; i < array.length; i++) > ? ? ? ? ? ? array[i] = i + 1; > ? ? } > > ? ? @Benchmark > ? ? public long hotMethod() { > ? ? ? ? long sum = 0; > ? ? ? ? for (int i = 0; i < array.length; i++) { > ? ? ? ? ? ? sum += array[i]; > ? ? ? ? } > ? ? ? ? return sum; > ? ? } > } > > *_Assembly:_* > ....[Hottest Region 1].............................................................................. > c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? ;*lload_1 {reexecute=0 rethrow=0 > return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > com.jpt.Sum1ToNArray::hotMethod at 13 (line 53) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10] > ?11.08%? ? 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14] > ? 0.30%? ? 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c] > ? 8.86%? ? 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? [r14+r11*4+0x28] > ?10.49%? ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24] > ? 0.38%? ? 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20] > ? 0.03%? ? 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c] > ? 0.23%? ? 0.22%? ? ???? 0x00007f7bf1bff13c: add? ? rsi,rdx > ?10.58%? ?18.59%? ???? 0x00007f7bf1bff13f: add? ? rbp,rsi > ? 0.32%? ? 0.17%? ? ???? 0x00007f7bf1bff142: add? ? r13,rbp > ? 0.05%? ? 0.04%? ? ???? 0x00007f7bf1bff145: add? ? rdi,r13 > ?26.10%? ?28.47%? ???? 0x00007f7bf1bff148: add? ? rbx,rdi > ? 5.55%? ? 5.48%? ? ???? 0x00007f7bf1bff14b: add? ? rcx,rbx > ? 5.66%? ? 1.32%? ? ???? 0x00007f7bf1bff14e: add? ? r9,rcx > ? 7.85%? ? 3.11%? ? ???? 0x00007f7bf1bff151: add? ? rax,r9? ? ? ? ? ? ?;*ladd {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) > ?10.19%? ? 5.67%? ? ??? 0x00007f7bf1bff154: add? ? r11d,0x8? ? ? ? ?;*iinc {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.jpt.Sum1ToNArray::hotMethod at 23 (line 52) > ? 0.38%? ? 0.12%? ? ???? 0x00007f7bf1bff158: cmp? ? r11d,r8d > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 > return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > com.jpt.Sum1ToNArray::hotMethod at 10 (line 52) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? ? ? ? ? ? ; *lload_1 {reexecute=0 > rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.jpt.Sum1ToNArray::hotMethod at 13 (line 53) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? ? ? ? ? ? ? ;*ladd {reexecute=0 > rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - > com.jpt.Sum1ToNArray::hotMethod at 21 (line 53) > > Regards > > > On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson wrote: > > > Hi Ionut, > In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. > Regards, > Nils Eliasson > > On 2017-10-24 11:05, Ionut wrote: >> Hello All, >> >> ? ? I want to ask you about https://bugs.openjdk.java.net/browse/JDK-8129920*?- Vectorized loop unrolling *which says >> it is applicable _only__for x86 targets_. Do you plan to port this for x64 as well? Or I miss something here? >> >> Regards >> Ionut > > > > > From sitnikov.vladimir at gmail.com Tue Oct 24 17:20:41 2017 From: sitnikov.vladimir at gmail.com (Vladimir Sitnikov) Date: Tue, 24 Oct 2017 17:20:41 +0000 Subject: Vectorized Loop Unrolling on x64? In-Reply-To: References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> <1933684779.3254078.1508840637072@mail.yahoo.com> <354890084.3509873.1508863577533@mail.yahoo.com> Message-ID: Just in case, here's Vladimir Ivanov's vectorization talk: *http://2017.jpoint.ru/en/talks/vector-programming-in-java/ * Slide 89 describes sum misundervectorization. Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Oct 24 21:08:11 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Oct 2017 14:08:11 -0700 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> Message-ID: <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com> It looks good to me too. The only issue is test's placement - /c1 subdir is nothing to do with C1 compiler. I think test should be put into compiler/exceptions/ directory. I submitted pre-integration testing. Thanks, Vladimir On 10/18/17 8:19 PM, dean.long at oracle.com wrote: > Yes, but I'm not a Reviewer. > > dl > > > On 10/18/17 7:16 AM, Roland Westrelin wrote: >> Here is an updated webrev with Dean's suggestion: >> >> http://cr.openjdk.java.net/~roland/8188151/webrev.01/ >> >> Can this be considered reviewed by you, Dean? >> >> Roland. > From vladimir.kozlov at oracle.com Tue Oct 24 22:02:37 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Oct 2017 15:02:37 -0700 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> References: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> Message-ID: We can't use platform specific UseAVX flag in shared code in type.cpp. I would say we should not support AVX (set UseAVX to 0) on AMD < 17h. And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and corresponding vectors 32 and 64 bytes. If AMD's Instructions Set before 17h does not support whole 32 bytes vectors we can't call it AVX. Thanks, Vladimir On 10/18/17 10:01 AM, dean.long at oracle.com wrote: > How about initializing TypeVect::VECTY and friends unconditionally?? I am nervous about exchanging one guarding condition for another. > > dl > > > On 10/18/17 1:03 AM, Nils Eliasson wrote: >> >> HI, >> >> I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives the best performance. >> >>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> ???? } >> >> Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the platform has the capability. >> >> Type.cpp:~660 >> >> [...] >> >?? if (Matcher::vector_size_supported(T_FLOAT,8)) { >> >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8); >> >?? } >> [...] >> >?? mreg2type[Op_VecY] = TypeVect::VECTY; >> >> >> In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch. >> >> On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like: >> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity"); >> >> Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, >> but they might not be used if MaxVectorSize is limited.) >> >> This is a patch that solves the problem, but I have not convinced myself that it is the right way: >> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >> >> Feedback appreciated, >> >> Regards, >> Nils Eliasson >> >> >> >> >> >> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >> > From vladimir.kozlov at oracle.com Tue Oct 24 23:02:46 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 24 Oct 2017 16:02:46 -0700 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: References: Message-ID: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> Roland, Did you consider less intrusive approach by adding branch over SafePoint with masking on index variable? int mask = LoopStripMiningMask * inc; // simplified for (int i = start; i < stop; i += inc) { // body if (i & mask != 0) continue; safepoint; } Or may be doing it inside .ad file in new SafePoint node implementation so that ideal graph is not affected. I am concern that suggested changes may affect Range Check elimination (you changed limit to variable value/flag) in addition to complexity of changes which may affect stability of C2. Thanks, Vladimir On 10/3/17 6:19 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8186027/webrev.00/ > > This converts loop: > > for (int i = start; i < stop; i += inc) { > // body > } > > to a loop nest: > > i = start; > if (i < stop) { > do { > int next = MIN(stop, i+LoopStripMiningIter*inc); > do { > // body > i += inc; > } while (i < next); > safepoint(); > } while (i < stop); > } > > (It's actually: > int next = MIN(stop - i, LoopStripMiningIter*inc) + i; > to protect against overflows) > > This should bring the best of running with UseCountedLoopSafepoints on > and running with it off: low time to safepoint with little to no impact > on throughput. That change was first pushed to the shenandoah repo > several months ago and we've been running with it enabled since. > > The command line argument LoopStripMiningIter is the number of > iterations between safepoints. In practice, with an arbitrary > LoopStripMiningIter=1000, we observe time to safepoint on par with the > current -XX:+UseCountedLoopSafepoints and most performance regressions > due to -XX:+UseCountedLoopSafepoints gone. The exception is when an > inner counted loop runs for a low number of iterations on average (and > the compiler doesn't have an upper bound on the number of iteration). > > This is enabled on the command line with: > -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 > > In PhaseIdealLoop::is_counted_loop(), when loop strip mining is enabled, > for an inner loop, the compiler builds a skeleton outer loop around the > the counted loop. The outer loop is kept as simple as possible so > required adjustments to the existing loop optimizations are not too > intrusive. The reason the outer loop is inserted early in the > optimization process is so that optimizations are not disrupted: an > alternate implementation could have kept the safepoint in the counted > loop until loop opts are over and then only have added the outer loop > and moved the safepoint to the outer loop. That would have prevented > nodes that are referenced in the safepoint to be sunk out of loop for > instance. > > The outer loop is a LoopNode with a backedge to a loop exit test and a > safepoint. The loop exit test is a CmpI with a new Opaque5Node. The > skeleton loop is populated with all required Phis after loop opts are > over during macro expansion. At that point only, the loop exit tests are > adjusted so the inner loop runs for at most LoopStripMiningIter. If the > compiler can prove the inner loop runs for no more than > LoopStripMiningIter then during macro expansion, the outer loop is > removed. The safepoint is removed only if the inner loop executes for > less than LoopStripMiningIterShortLoop so that if there are several > counted loops in a raw, we still poll for safepoints regularly. > > Until macro expansion, there can be only a few extra nodes in the outer > loop: nodes that would have sunk out of the inner loop and be kept in > the outer loop by the safepoint. > > PhaseIdealLoop::clone_loop() which is used by most loop opts has now > several ways of cloning a counted loop. For loop unswitching, both inner > and outer loops need to be cloned. For unrolling, only the inner loop > needs to be cloned. For pre/post loops insertion, only the inner loop > needs to be cloned but the control flow must connect one of the inner > loop copies to the outer loop of the other copy. > > Beyond verifying performance results with the usual benchmarks, when I > implemented that change, I wrote test cases for (hopefully) every loop > optimization and verified by inspection of the generated code that the > loop opt triggers correct with loop strip mining. > > Roland. > From igor.veresov at oracle.com Wed Oct 25 03:52:42 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 24 Oct 2017 20:52:42 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter Message-ID: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 Thanks, igor From robbin.ehn at oracle.com Wed Oct 25 07:30:38 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 25 Oct 2017 09:30:38 +0200 Subject: Low-Overhead Heap Profiling In-Reply-To: References: <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> Message-ID: <68d73f67-1113-0997-8f5a-0baa23151397@oracle.com> Hi, 325 HeapWord *tlab_old_end = thread->tlab().return end(); Should be something like: 325 HeapWord *tlab_old_end = thread->tlab().end(); Thanks, Robbin On 2017-10-23 17:27, JC Beyler wrote: > Dear all, > > Small update this week with this new webrev: > ? - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/ > ? - Incremental is here: http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/ > > I patched the code changes showed by Robbin last week and I refactored > collectedHeap.cpp: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/share/gc/shared/collectedHeap.cpp.patch > > The original code became a bit too complex in my opinion with the > handle_heap_sampling handling too many things. So I subdivided the logic into > two smaller methods and moved out a bit of the logic to make it more clear. > Hopefully it is :) > > Let me know if you have any questions/comments :) > Jc > > On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler > wrote: > > Hi Robbin, > > That is because version 11 to 12 was only a test change. I was going to > write about it and say here are the webrev links: > Incremental: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/ > > > Full webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/ > > > This change focused only on refactoring the tests to be more manageable, > readable, maintainable. As all tests are looking at allocations, I moved > common code to a java class: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch > > > And then most tests call into that class to turn on/off the sampling, > allocate, etc. This has removed almost 500 lines of test code so I'm happy > about that. > > Thanks for your changes, a bit of relics of previous versions :). I've > already integrated them into my code and will make a new webrev end of this > week with a bit of refactor of the code handling the tlab slow path. I find > it could use a bit of refactoring to make it easier to follow so I'm going > to take a stab at it this week. > > Any other issues/comments? > > Thanks! > Jc > > > On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn > wrote: > > Hi JC, > > I saw a webrev.12 in the directory, with only test changes(11->12), so I > took that version. > I had a look and tested the tests, worked fine! > > First glance at the code (looking at full v12) some minor things below, > mostly unused stuff. > > Thanks, Robbin > > diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp > --- a/src/hotspot/share/runtime/heapMonitoring.cpp? ? ? Mon Oct 16 > 16:54:06 2017 +0200 > +++ b/src/hotspot/share/runtime/heapMonitoring.cpp? ? ? Mon Oct 16 > 17:42:42 2017 +0200 > @@ -211,2 +211,3 @@ > ? ?void initialize(int max_storage) { > +? ? // validate max_storage to sane value ? What would 0 mean ? > ? ? ?MutexLocker mu(HeapMonitor_lock); > @@ -227,8 +228,4 @@ > ? ?bool initialized() { return _initialized; } > -? volatile bool *initialized_address() { return &_initialized; } > > ? private: > -? // Protects the traces currently sampled (below). > -? volatile intptr_t _stack_storage_lock[1]; > - > ? ?// The traces currently sampled. > @@ -313,3 +310,2 @@ > ? ?_initialized(false) { > -? ? _stack_storage_lock[0] = 0; > ?} > @@ -532,13 +528,2 @@ > > -// Delegate the initialization question to the underlying storage system. > -bool HeapMonitoring::initialized() { > -? return StackTraceStorage::storage()->initialized(); > -} > - > -// Delegate the initialization question to the underlying storage system. > -bool *HeapMonitoring::initialized_address() { > -? return > - > const_cast(StackTraceStorage::storage()->initialized_address()); > -} > - > ?void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) { > diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp > --- a/src/hotspot/share/runtime/heapMonitoring.hpp? ? ? Mon Oct 16 > 16:54:06 2017 +0200 > +++ b/src/hotspot/share/runtime/heapMonitoring.hpp? ? ? Mon Oct 16 > 17:42:42 2017 +0200 > @@ -35,3 +35,2 @@ > ? ?static uint64_t _rnd; > -? static bool _initialized; > ? ?static jint _monitoring_rate; > @@ -92,7 +91,2 @@ > > -? // Is the profiler initialized and where is the address to the > initialized > -? // boolean. > -? static bool initialized(); > -? static bool *initialized_address(); > - > ? ?// Called when o is to be sampled from a given thread and a given size. > > > > On 10/10/2017 12:57 AM, JC Beyler wrote: > > Dear all, > > Thread-safety is back!! Here is the update webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/ > > > Full webrev is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/ > > > In order to really test this, I needed to add this so thought now > was a good time. It required a few changes here for the creation to > ensure correctness and safety. Now we keep the static pointer but > clear the data internally so on re-initialize, it will be a bit more > costly than before. I don't think this is a huge use-case so I did > not think it was a problem. I used the internal MutexLocker, I think > I used it well, let me know. > > I also added three tests: > > 1) Stack depth test: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch > > > This test shows that the maximum stack depth system is working. > > 2) Thread safety: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch > > > The test creates 24 threads and they all allocate at the same time. > The test then checks it does find samples from all the threads. > > 3) Thread on/off safety > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch > > > The test creates 24 threads that all allocate a bunch of memory. > Then another thread turns the sampling on/off. > > Btw, both tests 2 & 3 failed without the locks. > > As I worked on this, I saw a lot of places where the tests are doing > very similar things, I'm going to clean up the code a bit and make a > HeapAllocator class that all tests can call directly. This will > greatly simplify the code. > > Thanks for any comments/criticisms! > Jc > > > On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler >> wrote: > > ? ? Dear all, > > ? ? Small update to the webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ > > > > > ? ? Full webrev is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ > > > > > ? ? I updated a bit of the naming, removed a TODO comment, and I > added a test for testing the sampling rate. I also updated the > maximum stack depth to 1024, there is no > ? ? reason to keep it so small. I did a micro benchmark that tests > the overhead and it seems relatively the same. > > ? ? I compared allocations from a stack depth of 10 and allocations > from a stack depth of 1024 (allocations are from the same helper > method in > http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java > > > >): > ? ? ?? ? ? ? ? - For an array of 1 integer allocated in a loop; > stack depth 1024 vs stack depth 10: 1% slower > ? ? ???????????- For an array of 200k integers allocated in a loop; > stack depth 1024 vs stack depth 10: 3% slower > > ? ? So basically now moving the maximum stack depth to 1024 but we > only copy over the stack depths actually used. > > ? ? For the next webrev, I will be adding a stack depth test to > show that it works and probably put back the mutex locking so that > we can see how difficult it is to keep > ? ? thread safe. > > ? ? Let me know what you think! > ? ? Jc > > > > ? ? On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler >> wrote: > > ? ? ? ? Forgot to say that for my numbers: > ? ? ? ? ??- Not in the test are the actual numbers I got for the > various array sizes, I ran the program 30 times and parsed the > output; here are the averages and standard > ? ? ? ? deviation: > ? ? ? ? ?? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation > ? ? ? ? ?? ? ? 10000:? ? 1.59% average; 1.25% standard deviation > ? ? ? ? ?? ? ? 100000:? ?1.26% average; 1.26% standard deviation > > ? ? ? ? The 1000/10000/100000 are the sizes of the arrays being > allocated. These are allocated 100k times and the sampling rate is > 111 times the size of the array. > > ? ? ? ? Thanks! > ? ? ? ? Jc > > > ? ? ? ? On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler > > >> wrote: > > ? ? ? ? ? ? Hi all, > > ? ? ? ? ? ? After a bit of a break, I am back working on this :). > As before, here are two webrevs: > > ? ? ? ? ? ? - Full change set: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/ > > > > ? ? ? ? ? ? - Compared to version 8: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/ > > > > ? ? ? ? ? ? ?? ? (This version is compared to version 8 I last > showed but ported to the new folder hierarchy) > > ? ? ? ? ? ? In this version I have: > ? ? ? ? ? ? ?? - Handled Thomas' comments from his email of 07/03: > ? ? ? ? ? ? ?? ? ? ?- Merged the logging to be standard > ? ? ? ? ? ? ?? ? ? ?- Fixed up the code a bit where asked > ? ? ? ? ? ? ?? ? ? ?- Added some notes about the code not being > thread-safe yet > ? ? ? ? ? ? ?? ?- Removed additional dead code from the version > that modifies interpreter/c1/c2 > ? ? ? ? ? ? ?? ?- Fixed compiler issues so that it compiles with > --disable-precompiled-header > ? ? ? ? ? ? ?? ? ? ? - Tested with ./configure > --with-boot-jdk= --with-debug-level=slowdebug > --disable-precompiled-headers > > ? ? ? ? ? ? Additionally, I added a test to check the sanity of the > sampler: HeapMonitorStatCorrectnessTest > > (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch > > >) > ? ? ? ? ? ? ?? ?- This allocates a number of arrays and checks that > we obtain the number of samples we want with an accepted error of > 5%. I tested it 100 times and it > ? ? ? ? ? ? passed everytime, I can test more if wanted > ? ? ? ? ? ? ?? ?- Not in the test are the actual numbers I got for > the various array sizes, I ran the program 30 times and parsed the > output; here are the averages and > ? ? ? ? ? ? standard deviation: > ? ? ? ? ? ? ?? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation > ? ? ? ? ? ? ?? ? ? 10000:? ? 1.59% average; 1.25% standard deviation > ? ? ? ? ? ? ?? ? ? 100000:? ?1.26% average; 1.26% standard deviation > > ? ? ? ? ? ? What this means is that we were always at about 1~2% of > the number of samples the test expected. > > ? ? ? ? ? ? Let me know what you think, > ? ? ? ? ? ? Jc > > ? ? ? ? ? ? On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler > > >> wrote: > > ? ? ? ? ? ? ? ? Hi all, > > ? ? ? ? ? ? ? ? I apologize, I have not yet handled your remarks > but thought this new webrev would also be useful to see and comment > on perhaps. > > ? ? ? ? ? ? ? ? Here is the latest webrev, it is generated slightly > different than the others since now I'm using webrev.ksh without the > -N option: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ > > > > > ? ? ? ? ? ? ? ? And the webrev.07 to webrev.08 diff is here: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ > > > > > ? ? ? ? ? ? ? ? (Let me know if it works well) > > ? ? ? ? ? ? ? ? It's a small change between versions but it: > ? ? ? ? ? ? ? ? ?? - provides a fix that makes the average sample > rate correct (more on that below). > ? ? ? ? ? ? ? ? ?? - fixes the code to actually have it play nicely > with the fast tlab refill > ? ? ? ? ? ? ? ? ?? - cleaned up a bit the JVMTI text and now use > jvmtiFrameInfo > ? ? ? ? ? ? ? ? - moved the capability to be onload solo > > ? ? ? ? ? ? ? ? With this webrev, I've done a small study of the > random number generator we use here for the sampling rate. I took a > small program and it can be simplified to: > > ? ? ? ? ? ? ? ? for (outer loop) > ? ? ? ? ? ? ? ? for (inner loop) > ? ? ? ? ? ? ? ? int[] tmp = new int[arraySize]; > > ? ? ? ? ? ? ? ? - I've fixed the outer and inner loops to being 800 > for this experiment, meaning we allocate 640000 times an array of a > given array size. > > ? ? ? ? ? ? ? ? - Each program provides the average sample size > used for the whole execution > > ? ? ? ? ? ? ? ? - Then, I ran each variation 30 times and then > calculated the average of the average sample size used for various > array sizes. I selected the array size to > ? ? ? ? ? ? ? ? be one of the following: 1, 10, 100, 1000. > > ? ? ? ? ? ? ? ? - When compared to 512kb, the average sample size > of 30 runs: > ? ? ? ? ? ? ? ? 1: 4.62% of error > ? ? ? ? ? ? ? ? 10: 3.09% of error > ? ? ? ? ? ? ? ? 100: 0.36% of error > ? ? ? ? ? ? ? ? 1000: 0.1% of error > ? ? ? ? ? ? ? ? 10000: 0.03% of error > > ? ? ? ? ? ? ? ? What it shows is that, depending on the number of > samples, the average does become better. This is because with an > allocation of 1 element per array, it > ? ? ? ? ? ? ? ? will take longer to hit one of the thresholds. This > is seen by looking at the sample count statistic I put in. For the > same number of iterations (800 * > ? ? ? ? ? ? ? ? 800), the different array sizes provoke: > ? ? ? ? ? ? ? ? 1: 62 samples > ? ? ? ? ? ? ? ? 10: 125 samples > ? ? ? ? ? ? ? ? 100: 788 samples > ? ? ? ? ? ? ? ? 1000: 6166 samples > ? ? ? ? ? ? ? ? 10000: 57721 samples > > ? ? ? ? ? ? ? ? And of course, the more samples you have, the more > sample rates you pick, which means that your average gets closer > using that math. > > ? ? ? ? ? ? ? ? Thanks, > ? ? ? ? ? ? ? ? Jc > > ? ? ? ? ? ? ? ? On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler > > >> wrote: > > ? ? ? ? ? ? ? ? ? ? Thanks Robbin, > > ? ? ? ? ? ? ? ? ? ? This seems to have worked. When I have the next > webrev ready, we will find out but I'm fairly confident it will work! > > ? ? ? ? ? ? ? ? ? ? Thanks agian! > ? ? ? ? ? ? ? ? ? ? Jc > > ? ? ? ? ? ? ? ? ? ? On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn > > >> wrote: > > ? ? ? ? ? ? ? ? ? ? ? ? Hi JC, > > ? ? ? ? ? ? ? ? ? ? ? ? On 06/29/2017 12:15 AM, JC Beyler wrote: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? B) Incremental changes > > > ? ? ? ? ? ? ? ? ? ? ? ? I guess the most common work flow here is > using mq : > ? ? ? ? ? ? ? ? ? ? ? ? hg qnew fix_v1 > ? ? ? ? ? ? ? ? ? ? ? ? edit files > ? ? ? ? ? ? ? ? ? ? ? ? hg qrefresh > ? ? ? ? ? ? ? ? ? ? ? ? hg qnew fix_v2 > ? ? ? ? ? ? ? ? ? ? ? ? edit files > ? ? ? ? ? ? ? ? ? ? ? ? hg qrefresh > > ? ? ? ? ? ? ? ? ? ? ? ? if you do hg log you will see 2 commits > > ? ? ? ? ? ? ? ? ? ? ? ? webrev.ksh -r -2 -o my_inc_v1_v2 > ? ? ? ? ? ? ? ? ? ? ? ? webrev.ksh -o my_full_v2 > > > ? ? ? ? ? ? ? ? ? ? ? ? In? your .hgrc you might need: > ? ? ? ? ? ? ? ? ? ? ? ? [extensions] > ? ? ? ? ? ? ? ? ? ? ? ? mq = > > ? ? ? ? ? ? ? ? ? ? ? ? /Robbin > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Again another newbiew question here... > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? For showing the incremental changes, is > there a link that explains how to do that? I apologize for my newbie > questions all the time :) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Right now, I do: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ksh ../webrev.ksh -m -N > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? That generates a webrev.zip and send it > to Chuck Rasbold. He then uploads it to a new webrev. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? I tried commiting my change and adding > a small change. Then if I just do ksh ../webrev.ksh without any > options, it seems to produce a similar > ? ? ? ? ? ? ? ? ? ? ? ? ? ? page but now with only the changes I > had (so the 06-07 comparison you were talking about) and a changeset > that has it all. I imagine that is > ? ? ? ? ? ? ? ? ? ? ? ? ? ? what you meant. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Which means that my workflow would become: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1) Make changes > ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2) Make a webrev without any options to > show just the differences with the tip > ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3) Amend my changes to my local commit > so that I have it done with > ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4) Go to 1 > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Does that seem correct to you? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Note that when I do this, I only see > the full change of a file in the full change set (Side note here: > now the page says change set and not > ? ? ? ? ? ? ? ? ? ? ? ? ? ? patch, which is maybe why Serguei was > having issues?). > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Thanks! > ? ? ? ? ? ? ? ? ? ? ? ? ? ? Jc > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? On Wed, Jun 28, 2017 at 1:12 AM, Robbin > Ehn > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? >>> wrote: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? Hi, > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? On 06/28/2017 12:04 AM, JC Beyler > wrote: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Dear Thomas et al, > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Here is the newest webrev: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ > > > > > > >> > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? You have some more bits to in > there but generally this looks good and really nice with more tests. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? I'll do and deep dive and re-test > this when I get back from my long vacation with whatever patch > version you have then. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? Also I think it's time you provide > incremental (v06->07 changes) as well as complete change-sets. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? Thanks, Robbin > > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Thomas, I "think" I have > answered all your remarks. The summary is: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - The statistic system is up > and provides insight on what the heap sampler is doing > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- I've noticed that, > though the sampling rate is at the right mean, we are missing some > samples, I have not yet tracked out why > ? ? ? ? ? ? ? ? ? ? ? ? ? ? (details below) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - I've run a tiny benchmark > that is the worse case: it is a very tight loop and allocated a > small array > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- In this case, I see no > overhead when the system is off so that is a good start :) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- I see right now a high > overhead in this case when sampling is on. This is not a really too > surprising but I'm going to see if > ? ? ? ? ? ? ? ? ? ? ? ? ? ? this is consistent with our > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? internal implementation. The > benchmark is really allocation stressful so I'm not too surprised > but I want to do the due diligence. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- The statistic system up > is up and I have a new test > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch > > > > > > > > >> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? - I did a bit of a study > about the random generator here, more details are below but > basically it seems to work well > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- I added a capability but > since this is the first time doing this, I was not sure I did it right > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- I did add a test though > for it and the test seems to do what I expect (all methods are > failing with the > ? ? ? ? ? ? ? ? ? ? ? ? ? ? JVMTI_ERROR_MUST_POSSESS_CAPABILITY error). > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ?- > http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch > > > > > > > > >> > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- I still need to figure > out what to do about the multi-agent vs single-agent issue > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- As far as measurements, > it seems I still need to look at: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- Why we do the 20 random > calls first, are they necessary? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- Look at the mean of the > sampling rate that the random generator does and also what is > actually sampled > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- What is the overhead in > terms of memory/performance when on? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I have inlined my answers, I > think I got them all in the new webrev, let me know your thoughts. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Thanks again! > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Jc > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? On Fri, Jun 23, 2017 at 3:52 > AM, Thomas Schatzl > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? >> > > > > > > > >>>> wrote: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Hi, > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?On Wed, 2017-06-21 at > 13:45 -0700, JC Beyler wrote: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Hi all, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> First off: Thanks again > to Robbin and Thomas for their reviews :) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Next, I've uploaded a > new webrev: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ > > > > > > >> > > > > > > > >>> > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Here is an update: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - @Robbin, I forgot to > say that yes I need to look at implementing > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> this for the other > architectures and testing it before it is all > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> ready to go. Is it > common to have it working on all possible > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> combinations or is > there a subset that I should be doing first and we > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> can do the others later? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - I've tested > slowdebug, built and ran the JTreg tests I wrote with > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> slowdebug and fixed a > few more issues > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - I've refactored a bit > of the code following Thomas' comments > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - I think I've > handled all the comments from Thomas (I put > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> comments inline below > for the specifics) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Thanks for handling all > those. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - Following Thomas' > comments on statistics, I want to add some > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> quality assurance tests > and find that the easiest way would be to > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> have a few counters of > what is happening in the sampler and expose > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> that to the user. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - I'll be adding > that in the next version if no one sees any > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> objections to that. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - This will allow me > to add a sanity test in JTreg about number of > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> samples and average of > sampling rate > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> @Thomas: I had a few > questions that I inlined below but I will > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> summarize the "bigger > ones" here: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - You mentioned > constants are not using the right conventions, I > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> looked around and > didn't see any convention except normal naming then > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> for static constants. > Is that right? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I looked through > https://wiki.openjdk.java.net/display/HotSpot/StyleGui > > > > > > >> > > > > > > > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?de and the rule is to > "follow an existing pattern and must have a > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?distinct appearance from > other names". Which does not help a lot I > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?guess :/ The GC team > started using upper camel case, e.g. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?SomeOtherConstant, but > very likely this is probably not applied > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?consistently throughout. > So I am fine with not adding another style > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?(like kMaxStackDepth with > the "k" in front with some unknown meaning) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?is fine. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?(Chances are you will > find that style somewhere used anyway too, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?apologies if so :/) > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Thanks for that link, now I > know where to look. I used the upper camel case in my code as well > then :) I should have gotten them all. > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > PS: I've also inlined > my answers to Thomas below: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > On Tue, Jun 13, 2017 > at 8:03 AM, Thomas Schatzl ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > e.com > > wrote: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > Hi all, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > On Mon, 2017-06-12 > at 11:11 -0700, JC Beyler wrote: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > Dear all, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > I've continued > working on this and have done the following > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > webrev: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > > http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ > > > > > > >> > > > > > > > >>> > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > [...] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > Things I still > need to do: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Have to fix > that TLAB case for the FastTLABRefill > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Have to start > looking at the data to see that it is > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > consistent and does > gather the right samples, right frequency, etc. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Have to check > the GC elements and what that produces > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Run a > slowdebug run and ensure I fixed all those issues you > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > saw > Robbin > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > Thanks for looking > at the webrev and have a great week! > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >? ?scratching a bit > on the surface of this change, so apologies for > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > rather shallow comments: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > - > macroAssembler_x86.cpp:5604: while this is compiler code, and I > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > am not sure this is > final, please avoid littering the code with > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > TODO remarks :) They > tend to be candidates for later wtf moments > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > only. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > Just file a CR for that. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > Newcomer question: > what is a CR and not sure I have the rights to do > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > that yet ? :) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Apologies. CR is a change > request, this suggests to file a bug in the > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?bug tracker. And you are > right, you can't just create a new account in > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?the OpenJDK JIRA > yourselves. :( > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Ok good to know, I'll continue > with my own todo list but I'll work hard on not letting it slip in > the webrevs anymore :) > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I was mostly referring to > the "... but it is a TODO" part of that > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?comment in > macroassembler_x86.cpp. Comments about the why of the code > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?are appreciated. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?[Note that I now > understand that this is to some degree still work in > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?progress. As long as the > final changeset does no contain TODO's I am > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?fine (and it's not a hard > objection, rather their use in "final" code > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?is typically limited in > my experience)] > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?5603? ?// Currently, if > this happens, just set back the actual end to > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?where it was. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?5604? ?// We miss a > chance to sample here. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Would be okay, if > explaining "this" and the "why" of missing a chance > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?to sample here would be best. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Like maybe: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?// If we needed to refill > TLABs, just set the actual end point to > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?// the end of the TLAB > again. We do not sample here although we could. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Done with your comment, it > works well in my mind. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I am not sure whether > "miss a chance to sample" meant "we could, but > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?consciously don't because > it's not that useful" or "it would be > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?necessary but don't > because it's too complicated to do.". > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Looking at the original > comment once more, I am also not sure if that > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?comment shouldn't > referring to the "end" variable (not actual_end) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?because that's the > variable that is responsible for taking the sampling > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?path? (Going from the > member description of ThreadLocalAllocBuffer). > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I've moved this code and it no > longer shows up here but the rationale and answer was: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? So.. Yes, end is the variable > provoking the sampling. Actual end is the actual end of the TLAB. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? What was happening here is > that the code is resetting _end to point towards the end of the new > TLAB. Because, we now have the end for > ? ? ? ? ? ? ? ? ? ? ? ? ? ? sampling and _actual_end for > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? the actual end, we need to > update the actual_end as well. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Normally, were we to do the > real work here, we would calculate the (end - start) offset, then do: > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - Set the new end to : start + > (old_end - old_start) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - Set the actual end like we > do here now where it because it is the actual end. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Why is this not done here now > anymore? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? - I was still debating > which path to take: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?- Do it in the fast > refill code, it has its perks: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?- In a world where > fast refills are happening all the time or a lot, we can augment > there the code to do the sampling > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?- Remember what we had > as an end before leaving the slowpath and check on return > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?- This is what I'm > doing now, it removes the need to go fix up all fast refill paths > but if you remain in fast refill paths, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? you won't get sampling. I > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? have to think of the > consequences of that, maybe a future change later on? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? - I have the > statistics now so I'm going to study that > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ?-> By the > way, though my statistics are showing I'm missing some samples, if I > turn off FastTlabRefill, it is the same > ? ? ? ? ? ? ? ? ? ? ? ? ? ? loss so for now, it seems > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? this does not occur in my > simple test. > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?But maybe I am only > confused and it's best to just leave the comment > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?away. :) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Thinking about it some > more, doesn't this not-sampling in this case > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?mean that sampling does > not work in any collector that does inline TLAB > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?allocation at the moment? > (Or is inline TLAB alloc automatically > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?disabled with sampling > somehow?) > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?That would indeed be a > bigger TODO then :) > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Agreed, this remark made me > think that perhaps as a first step the new way of doing it is better > but I did have to: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- Remove the const of the > ThreadLocalBuffer remaining and hard_end methods > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- Move hard_end out of the > header file to have a bit more logic there > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Please let me know what you > think of that and if you prefer it this way or changing the fast > refills. (I prefer this way now because it > ? ? ? ? ? ? ? ? ? ? ? ? ? ? is more incremental). > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - calling > HeapMonitoring::do_weak_oops() (which should probably be > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > called weak_oops_do() > like other similar methods) only if string > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > deduplication is > enabled (in g1CollectedHeap.cpp:4511) seems wrong. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> The call should be at > least around 6 lines up outside the if. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Preferentially in a > method like process_weak_jni_handles(), including > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> additional logging. (No > new (G1) gc phase without minimal logging > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> :)). > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Done but really not > sure because: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> I put for logging: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ?log_develop_trace(gc, > freelist)("G1ConcRegionFreeing [other] : heap > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> monitoring"); > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I would think that "gc, > ref" would be more appropriate log tags for > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?this similar to jni handles. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?(I am als not sure what > weak reference handling has to do with > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?G1ConcRegionFreeing, so I > am a bit puzzled) > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I was not sure what to put for > the tags or really as the message. I cleaned it up a bit now to: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?log_develop_trace(gc, > ref)("HeapSampling [other] : heap monitoring processing"); > > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Since weak_jni_handles > didn't have logging for me to be inspired > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> from, I did that but > unconvinced this is what should be done. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?The JNI handle processing > does have logging, but only in > > ?ReferenceProcessor::process_discovered_references(). In > > ?process_weak_jni_handles() only overall time is measured (in a G1 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?specific way, since only > G1 supports disabling reference procesing) :/ > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?The code in > ReferenceProcessor prints both time taken > > ?referenceProcessor.cpp:254, as well as the count, but strangely > only in > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?debug VMs. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I have no idea why this > logging is that unimportant to only print that > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?in a debug VM. However > there are reviews out for changing this area a > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?bit, so it might be > useful to wait for that (JDK-8173335). > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I cleaned it up a bit anyway > and now it returns the count of objects that are in the system. > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - the change doubles > the size of > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > > CollectedHeap::allocate_from_tlab_slow() above the "small and nice" > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > threshold. Maybe it > could be refactored a bit. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Done I think, it looks > better to me :). > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?In > ThreadLocalAllocBuffer::handle_sample() I think the > > ?set_back_actual_end()/pick_next_sample() calls could be hoisted out of > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?the "if" :) > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Done! > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - > referenceProcessor.cpp:261: the change should add logging about > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > the number of > references encountered, maybe after the corresponding > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > "JNI weak reference > count" log message. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Just to double check, > are you saying that you'd like to have the heap > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> sampler to keep in > store how many sampled objects were encountered in > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> the > HeapMonitoring::weak_oops_do? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - Would a return of > the method with the number of handled > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> references and logging > that work? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Yes, it's fine if > HeapMonitoring::weak_oops_do() only returned the > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?number of processed weak > oops. > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Done also (but I admit I have > not tested the output yet) :) > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - Additionally, > would you prefer it in a separate block with its > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> GCTraceTime? > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Yes. Both kinds of > information is interesting: while the time taken is > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?typically more important, > the next question would be why, and the > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?number of references > typically goes a long way there. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?See above though, it is > probably best to wait a bit. > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Agreed that I "could" wait > but, if it's ok, I'll just refactor/remove this when we get closer > to something final. Either, JDK-8173335 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? has gone in and I will notice > it now or it will soon and I can change it then. > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - > threadLocalAllocBuffer.cpp:331: one more "TODO" > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Removed it and added it > to my personal todos to look at. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? ? > > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - > threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > documentation should > be updated about the sampling additions. I > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > would have no clue > what the difference between "actual_end" and > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > "end" would be from > the given information. > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> If you are talking > about the comments in this file, I made them more > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> clear I hope in the new > webrev. If it was somewhere else, let me know > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> where to change. > > > From jamsheed.c.m at oracle.com Wed Oct 25 09:35:35 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Wed, 25 Oct 2017 15:05:35 +0530 Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for is_deopt_suspend needlessly Message-ID: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> Hi, request for review, webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/ jbs: https://bugs.openjdk.java.net/browse/JDK-6523512 desc: removed the is_deopt_suspend() from has_special_runtime_exit_condition checks Best regards, Jamsheed From lutz.schmidt at sap.com Wed Oct 25 10:01:41 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 25 Oct 2017 10:01:41 +0000 Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions Message-ID: Dear all, I would like to request reviews for this s390-only enhancement: Bug: https://bugs.openjdk.java.net/browse/JDK-8189793 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189793.00/index.html Vector instructions, which have been available on System z for a while (since z13), promise noticeable performance improvements. This enhancement improves the String Compress and String Inflate intrinsics by exploiting vector instructions, when available. For long strings, up to 2x performance improvement has been observed in micro-benchmarks. Special care was taken to preserve good performance for short strings. All examined workloads showed a high ratio of short and very short strings. Thank you! Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Wed Oct 25 10:07:57 2017 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 25 Oct 2017 12:07:57 +0200 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> Message-ID: <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com> 581 582 // Fixup the case of C1's inability to optimize profiling of a statically bindable call site 583 if (entries == 1) { 584 counts[0] = totalCount; 585 } 586 But what happens if we're looking at a profile from the interpreter? In that case, won't totalCount == 0 && counts[0] have the right value? In which case, the above fixup will lose this information. Maybe it should be: counts[0] += totalCount; -Doug > On 25 Oct 2017, at 05:52, Igor Veresov wrote: > > This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. > > Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 > > Thanks, > igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Wed Oct 25 12:43:08 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 25 Oct 2017 14:43:08 +0200 Subject: Low-Overhead Heap Profiling In-Reply-To: References: <1497366226.2829.109.camel@oracle.com> <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> Message-ID: <1508935388.13554.11.camel@oracle.com> Hi Jc, sorry for taking a bit long to respond.... ;) On Mon, 2017-10-23 at 08:27 -0700, JC Beyler wrote: > Dear all, > > Small update this week with this new webrev: > ? -?http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/ > ? - Incremental is here:?http://cr.openjdk.java.net/~rasbold/8171119/ > webrev.12_13/ > > I patched the code changes showed by Robbin last week and I > refactored collectedHeap.cpp: > http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/ > share/gc/shared/collectedHeap.cpp.patch > > The original code became a bit too complex in my opinion with the > handle_heap_sampling handling too many things. So I subdivided the > logic into two smaller methods and moved out a bit of the logic to > make it more clear. Hopefully it is :) > > Let me know if you have any questions/comments :) > Jc A few minor issues: - weak reference handling has been factored out in JDK-8189359, now you only need to add the additions required for this change to one place. :) Please update the webrev :) - the one issue Robin noticed. - in the declaration of CollectedHeap::sample_allocation, it would be nice if the fix_sample_rate parameter would be described - it takes a time to figure out what it's used for. I.e. in case an allocation goes beyond the sampling watermark, this value which represents the amount of overallocation is used to adjust the next sampling watermark to sample at the correct rate. Something like this - and if what I wrote is incorrect, there is even more reason to document it. Or maybe just renaming "fix_sample_rate" to something more descriptive - but I have no good idea about that. With lack of units in the type, it would also be nice to have the unit in the identifier name, as done elsewhere. - some (or most actually) of the new setters and getters in the ThreadLocalAllocBuffer class could be private I think. Also, we typically do not use "simple" getters that just return a member in the class where they are defined. - ThreadLocalAllocBuffer::set_sample_end(): please use pointer_delta() for pointer subtractions. - ThreadLocalAllocBuffer::pick_next_sample() - I recommend making the first check an assert - it seems that it is only useful to call this with heap monitoring enabled, as is done right now. - ThreadLocalAllocBuffer::pick_next_sample() - please use "PTR_FORMAT" (or INTPTR_FORMAT - they are the same) as format string for printing pointer values as is customary within Hotspot. %p output is OS dependent. I.e. I heard that e.g. on Ubuntu it prints "null" instead of 0x0...0 .... which is kind of annoying. - personal preference: do not allocate HeapMonitoring::AlwaysTrueClosure globally, but only locally when it's used. Setting it up seems to be very cheap. - HeapMonitoring::next_random() - the different names for the constants use different formatting. Preferable (to me) is UpperCamelCase, but at least make them uniform. - in HeapMonitoring::next_random(), you might want to use right_n_bits() to create your mask. - not really convinced that it is a good idea to not somehow guard StartHeapSampling() and StopHeapSampling() against being called by multiple threads. Otherwise looks okay from what I can see. Thanks, Thomas From nils.eliasson at oracle.com Wed Oct 25 13:13:08 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 25 Oct 2017 15:13:08 +0200 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: References: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> Message-ID: <3ee0024d-6af5-afe9-8127-0dc2cc3a1711@oracle.com> Deans suggestion with making the TypeVect initialization unconditional also removes all platform dependencies on type.cpp: http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev.02/ Regards, Nils On 2017-10-25 00:02, Vladimir Kozlov wrote: > We can't use platform specific UseAVX flag in shared code in type.cpp. > > I would say we should not support AVX (set UseAVX to 0) on AMD < 17h. > And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 > and corresponding vectors 32 and 64 bytes. > If AMD's Instructions Set before 17h does not support whole 32 bytes > vectors we can't call it AVX. > > Thanks, > Vladimir > > On 10/18/17 10:01 AM, dean.long at oracle.com wrote: >> How about initializing TypeVect::VECTY and friends unconditionally? >> I am nervous about exchanging one guarding condition for another. >> >> dl >> >> >> On 10/18/17 1:03 AM, Nils Eliasson wrote: >>> >>> HI, >>> >>> I ran into a problem with the interaction between MaxVectorSize and >>> the UseAVX. For some AMD CPUs we limit the vector size to 16 because >>> it gives the best performance. >>> >>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>> >>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the >>> TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even >>> though the platform has the capability. >>> >>> Type.cpp:~660 >>> >>> [...] >>> > if (Matcher::vector_size_supported(T_FLOAT,8)) { >>> > TypeVect::VECTY = TypeVect::make(T_FLOAT,8); >>> > } >>> [...] >>> > mreg2type[Op_VecY] = TypeVect::VECTY; >>> >>> >>> In the ad-files feature flags (UseAVX etc.) are used to control what >>> rules should be matched if it has effects on specific vector >>> registers. Here we have a mismatch. >>> >>> On a platform that supports AVX2 but have MaxVectorSize limited to >>> 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is >>> uninitialized. We will also hit asserts in a few places like: >>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), >>> "sanity"); >>> >>> Shouldn't the type initialization in type.cpp be dependent on >>> feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for >>> the vector registers are initialized if the platform supports them, >>> but they might not be used if MaxVectorSize is limited.) >>> >>> This is a patch that solves the problem, but I have not convinced >>> myself that it is the right way: >>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >>> >>> Feedback appreciated, >>> >>> Regards, >>> Nils Eliasson >>> >>> >>> >>> >>> >>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >>> >> From rwestrel at redhat.com Wed Oct 25 14:29:03 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 25 Oct 2017 16:29:03 +0200 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> References: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> Message-ID: Hi Vladimir, Thanks for looking at this. > Did you consider less intrusive approach by adding branch over > SafePoint with masking on index variable? > > int mask = LoopStripMiningMask * inc; // simplified > for (int i = start; i < stop; i += inc) { > // body > if (i & mask != 0) continue; > safepoint; > } > > Or may be doing it inside .ad file in new SafePoint node > implementation so that ideal graph is not affected. We're looking for the best trade off between latency and thoughput: we want the safepoint poll overhead to be entirely eliminated even when the safepoint doesn't trigger. > I am concern that suggested changes may affect Range Check elimination > (you changed limit to variable value/flag) in addition to complexity > of changes which may affect stability of C2. The CountedLoop that is created with my patch is strictly identical to the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are not changed at that time. They are left as they are today. The difference, with loop strip mining, is that the counted loop has a skeleton outer loop. The bounds of the counted loop are adjusted once loop opts are over. If the counted loop has a predicate, the predicate is moved out of loop just as it is today. The only difference with today, is that the predicate should be moved out of the outer loop. If a pre and post loop needs to be created, then the only difference with today is that the clones need to be moved out of the outer loop and logic that locate the pre from the main loop need to account for the outer loop. It's obviously a complex change so if your primary concern is stability then loop strip mining can be disabled by default. Assuming strip mining off, then that patch is mostly some code refactoring and some logic that never triggers. Roland. From ionutb83 at yahoo.com Wed Oct 25 15:30:25 2017 From: ionutb83 at yahoo.com (Ionut) Date: Wed, 25 Oct 2017 15:30:25 +0000 (UTC) Subject: Vectorized Loop Unrolling on x64? In-Reply-To: References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> <1933684779.3254078.1508840637072@mail.yahoo.com> <354890084.3509873.1508863577533@mail.yahoo.com> Message-ID: <473261957.4194696.1508945425413@mail.yahoo.com> Hello All, ? ?Thanks for you input and useful links. Indeed, it confirms my initial guess. RegardsIonut On Tuesday, October 24, 2017 8:20 PM, Vladimir Sitnikov wrote: Just in case, here's Vladimir Ivanov's vectorization talk:?http://2017.jpoint.ru/en/talks/vector-programming-in-java/Slide 89 describes sum misundervectorization. Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Wed Oct 25 16:13:47 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 25 Oct 2017 09:13:47 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com> Message-ID: <4D25A442-F0BD-41D1-837B-1115A165EA78@oracle.com> > On Oct 25, 2017, at 3:07 AM, Doug Simon wrote: > > 581 > 582 // Fixup the case of C1's inability to optimize profiling of a statically bindable call site > 583 if (entries == 1) { > 584 counts[0] = totalCount; > 585 } > 586 > But what happens if we're looking at a profile from the interpreter? In that case, won't totalCount == 0 && counts[0] have the right value? In which case, the above fixup will lose this information. Maybe it should be: > > counts[0] += totalCount; If it?s pure interpreter you?d have entries == 0, so this fixup won?t fire. Also totalCount at the point of the fixup is a sum of every counter in profile (for all the types + the counter for types that weren?t recorded). So what the fixup does is that it attributes all the counts to the first type (if it?s a monomorphic call site). igor > > -Doug > >> On 25 Oct 2017, at 05:52, Igor Veresov > wrote: >> >> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. >> >> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >> >> Thanks, >> igor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcbeyler at google.com Wed Oct 25 17:03:18 2017 From: jcbeyler at google.com (JC Beyler) Date: Wed, 25 Oct 2017 10:03:18 -0700 Subject: Low-Overhead Heap Profiling In-Reply-To: <68d73f67-1113-0997-8f5a-0baa23151397@oracle.com> References: <1498215147.2741.34.camel@oracle.com> <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com> <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com> <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com> <68d73f67-1113-0997-8f5a-0baa23151397@oracle.com> Message-ID: Clearly a last minute clean-up gone awry... Fixed for next webrev :) On Wed, Oct 25, 2017 at 12:30 AM, Robbin Ehn wrote: > Hi, > > 325 HeapWord *tlab_old_end = thread->tlab().return end(); > > Should be something like: > > 325 HeapWord *tlab_old_end = thread->tlab().end(); > > Thanks, Robbin > > On 2017-10-23 17:27, JC Beyler wrote: > >> Dear all, >> >> Small update this week with this new webrev: >> - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/ >> - Incremental is here: http://cr.openjdk.java.net/~ra >> sbold/8171119/webrev.12_13/ >> >> I patched the code changes showed by Robbin last week and I refactored >> collectedHeap.cpp: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src >> /hotspot/share/gc/shared/collectedHeap.cpp.patch >> >> The original code became a bit too complex in my opinion with the >> handle_heap_sampling handling too many things. So I subdivided the logic >> into two smaller methods and moved out a bit of the logic to make it more >> clear. Hopefully it is :) >> >> Let me know if you have any questions/comments :) >> Jc >> >> On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler > jcbeyler at google.com>> wrote: >> >> Hi Robbin, >> >> That is because version 11 to 12 was only a test change. I was going >> to >> write about it and say here are the webrev links: >> Incremental: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/ >> >> >> Full webrev: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/ >> >> >> This change focused only on refactoring the tests to be more >> manageable, >> readable, maintainable. As all tests are looking at allocations, I >> moved >> common code to a java class: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitor.java.patch >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitor.java.patch> >> >> And then most tests call into that class to turn on/off the sampling, >> allocate, etc. This has removed almost 500 lines of test code so I'm >> happy >> about that. >> >> Thanks for your changes, a bit of relics of previous versions :). I've >> already integrated them into my code and will make a new webrev end >> of this >> week with a bit of refactor of the code handling the tlab slow path. >> I find >> it could use a bit of refactoring to make it easier to follow so I'm >> going >> to take a stab at it this week. >> >> Any other issues/comments? >> >> Thanks! >> Jc >> >> >> On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn > > wrote: >> >> Hi JC, >> >> I saw a webrev.12 in the directory, with only test >> changes(11->12), so I >> took that version. >> I had a look and tested the tests, worked fine! >> >> First glance at the code (looking at full v12) some minor things >> below, >> mostly unused stuff. >> >> Thanks, Robbin >> >> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp >> --- a/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct >> 16 >> 16:54:06 2017 +0200 >> +++ b/src/hotspot/share/runtime/heapMonitoring.cpp Mon Oct >> 16 >> 17:42:42 2017 +0200 >> @@ -211,2 +211,3 @@ >> void initialize(int max_storage) { >> + // validate max_storage to sane value ? What would 0 mean ? >> MutexLocker mu(HeapMonitor_lock); >> @@ -227,8 +228,4 @@ >> bool initialized() { return _initialized; } >> - volatile bool *initialized_address() { return &_initialized; } >> >> private: >> - // Protects the traces currently sampled (below). >> - volatile intptr_t _stack_storage_lock[1]; >> - >> // The traces currently sampled. >> @@ -313,3 +310,2 @@ >> _initialized(false) { >> - _stack_storage_lock[0] = 0; >> } >> @@ -532,13 +528,2 @@ >> >> -// Delegate the initialization question to the underlying >> storage system. >> -bool HeapMonitoring::initialized() { >> - return StackTraceStorage::storage()->initialized(); >> -} >> - >> -// Delegate the initialization question to the underlying >> storage system. >> -bool *HeapMonitoring::initialized_address() { >> - return >> - const_cast(StackTraceS >> torage::storage()->initialized_address()); >> -} >> - >> void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) >> { >> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp >> --- a/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct >> 16 >> 16:54:06 2017 +0200 >> +++ b/src/hotspot/share/runtime/heapMonitoring.hpp Mon Oct >> 16 >> 17:42:42 2017 +0200 >> @@ -35,3 +35,2 @@ >> static uint64_t _rnd; >> - static bool _initialized; >> static jint _monitoring_rate; >> @@ -92,7 +91,2 @@ >> >> - // Is the profiler initialized and where is the address to the >> initialized >> - // boolean. >> - static bool initialized(); >> - static bool *initialized_address(); >> - >> // Called when o is to be sampled from a given thread and a >> given size. >> >> >> >> On 10/10/2017 12:57 AM, JC Beyler wrote: >> >> Dear all, >> >> Thread-safety is back!! Here is the update webrev: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/ >> >> >> Full webrev is here: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/ >> >> >> In order to really test this, I needed to add this so thought >> now >> was a good time. It required a few changes here for the >> creation to >> ensure correctness and safety. Now we keep the static pointer >> but >> clear the data internally so on re-initialize, it will be a >> bit more >> costly than before. I don't think this is a huge use-case so >> I did >> not think it was a problem. I used the internal MutexLocker, >> I think >> I used it well, let me know. >> >> I also added three tests: >> >> 1) Stack depth test: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitorStackDepthTest.java.patch >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorStackDepthTest.java.patch> >> >> This test shows that the maximum stack depth system is >> working. >> >> 2) Thread safety: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitorThreadTest.java.patch >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorThreadTest.java.patch> >> >> The test creates 24 threads and they all allocate at the same >> time. >> The test then checks it does find samples from all the >> threads. >> >> 3) Thread on/off safety >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes >> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H >> eapMonitorThreadOnOffTest.java.patch >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorThreadOnOffTest.java.patch> >> >> The test creates 24 threads that all allocate a bunch of >> memory. >> Then another thread turns the sampling on/off. >> >> Btw, both tests 2 & 3 failed without the locks. >> >> As I worked on this, I saw a lot of places where the tests >> are doing >> very similar things, I'm going to clean up the code a bit and >> make a >> HeapAllocator class that all tests can call directly. This >> will >> greatly simplify the code. >> >> Thanks for any comments/criticisms! >> Jc >> >> >> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler < >> jcbeyler at google.com >> > >> wrote: >> >> Dear all, >> >> Small update to the webrev: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ >> >> > > >> >> Full webrev is here: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ >> >> > > >> >> I updated a bit of the naming, removed a TODO comment, >> and I >> added a test for testing the sampling rate. I also updated the >> maximum stack depth to 1024, there is no >> reason to keep it so small. I did a micro benchmark that >> tests >> the overhead and it seems relatively the same. >> >> I compared allocations from a stack depth of 10 and >> allocations >> from a stack depth of 1024 (allocations are from the same >> helper >> method in >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi >> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/ >> MyPackage/HeapMonitorStatRateTest.java >> > iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor >> /MyPackage/HeapMonitorStatRateTest.java> >> > asbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/se >> rviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java >> > iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor >> /MyPackage/HeapMonitorStatRateTest.java>>): >> - For an array of 1 integer allocated in a >> loop; >> stack depth 1024 vs stack depth 10: 1% slower >> - For an array of 200k integers allocated in >> a loop; >> stack depth 1024 vs stack depth 10: 3% slower >> >> So basically now moving the maximum stack depth to 1024 >> but we >> only copy over the stack depths actually used. >> >> For the next webrev, I will be adding a stack depth test >> to >> show that it works and probably put back the mutex locking so >> that >> we can see how difficult it is to keep >> thread safe. >> >> Let me know what you think! >> Jc >> >> >> >> On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler < >> jcbeyler at google.com >> > >> wrote: >> >> Forgot to say that for my numbers: >> - Not in the test are the actual numbers I got for >> the >> various array sizes, I ran the program 30 times and parsed the >> output; here are the averages and standard >> deviation: >> 1000: 1.28% average; 1.13% standard >> deviation >> 10000: 1.59% average; 1.25% standard >> deviation >> 100000: 1.26% average; 1.26% standard >> deviation >> >> The 1000/10000/100000 are the sizes of the arrays >> being >> allocated. These are allocated 100k times and the sampling >> rate is >> 111 times the size of the array. >> >> Thanks! >> Jc >> >> >> On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler >> >> >> >> wrote: >> >> Hi all, >> >> After a bit of a break, I am back working on >> this :). >> As before, here are two webrevs: >> >> - Full change set: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/ >> >> > > >> - Compared to version 8: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/ >> >> > > >> (This version is compared to version 8 I >> last >> showed but ported to the new folder hierarchy) >> >> In this version I have: >> - Handled Thomas' comments from his email of >> 07/03: >> - Merged the logging to be standard >> - Fixed up the code a bit where asked >> - Added some notes about the code not >> being >> thread-safe yet >> - Removed additional dead code from the >> version >> that modifies interpreter/c1/c2 >> - Fixed compiler issues so that it compiles >> with >> --disable-precompiled-header >> - Tested with ./configure >> --with-boot-jdk= --with-debug-level=slowdebug >> --disable-precompiled-headers >> >> Additionally, I added a test to check the sanity >> of the >> sampler: HeapMonitorStatCorrectnessTest >> (http://cr.openjdk.java.net/~r >> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit >> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorStatCorrectnessTest.java.patch> >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorStatCorrectnessTest.java.patch >> > st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/ >> HeapMonitorStatCorrectnessTest.java.patch>>) >> - This allocates a number of arrays and >> checks that >> we obtain the number of samples we want with an accepted >> error of >> 5%. I tested it 100 times and it >> passed everytime, I can test more if wanted >> - Not in the test are the actual numbers I >> got for >> the various array sizes, I ran the program 30 times and >> parsed the >> output; here are the averages and >> standard deviation: >> 1000: 1.28% average; 1.13% standard >> deviation >> 10000: 1.59% average; 1.25% standard >> deviation >> 100000: 1.26% average; 1.26% standard >> deviation >> >> What this means is that we were always at about >> 1~2% of >> the number of samples the test expected. >> >> Let me know what you think, >> Jc >> >> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler >> >> >> >> wrote: >> >> Hi all, >> >> I apologize, I have not yet handled your >> remarks >> but thought this new webrev would also be useful to see and >> comment >> on perhaps. >> >> Here is the latest webrev, it is generated >> slightly >> different than the others since now I'm using webrev.ksh >> without the >> -N option: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ >> >> > > >> >> And the webrev.07 to webrev.08 diff is here: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ >> >> > > >> >> (Let me know if it works well) >> >> It's a small change between versions but it: >> - provides a fix that makes the average >> sample >> rate correct (more on that below). >> - fixes the code to actually have it play >> nicely >> with the fast tlab refill >> - cleaned up a bit the JVMTI text and now >> use >> jvmtiFrameInfo >> - moved the capability to be onload solo >> >> With this webrev, I've done a small study of >> the >> random number generator we use here for the sampling rate. I >> took a >> small program and it can be simplified to: >> >> for (outer loop) >> for (inner loop) >> int[] tmp = new int[arraySize]; >> >> - I've fixed the outer and inner loops to >> being 800 >> for this experiment, meaning we allocate 640000 times an >> array of a >> given array size. >> >> - Each program provides the average sample >> size >> used for the whole execution >> >> - Then, I ran each variation 30 times and >> then >> calculated the average of the average sample size used for >> various >> array sizes. I selected the array size to >> be one of the following: 1, 10, 100, 1000. >> >> - When compared to 512kb, the average sample >> size >> of 30 runs: >> 1: 4.62% of error >> 10: 3.09% of error >> 100: 0.36% of error >> 1000: 0.1% of error >> 10000: 0.03% of error >> >> What it shows is that, depending on the >> number of >> samples, the average does become better. This is because with >> an >> allocation of 1 element per array, it >> will take longer to hit one of the >> thresholds. This >> is seen by looking at the sample count statistic I put in. >> For the >> same number of iterations (800 * >> 800), the different array sizes provoke: >> 1: 62 samples >> 10: 125 samples >> 100: 788 samples >> 1000: 6166 samples >> 10000: 57721 samples >> >> And of course, the more samples you have, >> the more >> sample rates you pick, which means that your average gets >> closer >> using that math. >> >> Thanks, >> Jc >> >> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler >> >> >> >> wrote: >> >> Thanks Robbin, >> >> This seems to have worked. When I have >> the next >> webrev ready, we will find out but I'm fairly confident it >> will work! >> >> Thanks agian! >> Jc >> >> On Wed, Jun 28, 2017 at 11:46 PM, Robbin >> Ehn >> >> >> >> wrote: >> >> Hi JC, >> >> On 06/29/2017 12:15 AM, JC Beyler >> wrote: >> >> B) Incremental changes >> >> >> I guess the most common work flow >> here is >> using mq : >> hg qnew fix_v1 >> edit files >> hg qrefresh >> hg qnew fix_v2 >> edit files >> hg qrefresh >> >> if you do hg log you will see 2 >> commits >> >> webrev.ksh -r -2 -o my_inc_v1_v2 >> webrev.ksh -o my_full_v2 >> >> >> In your .hgrc you might need: >> [extensions] >> mq = >> >> /Robbin >> >> >> Again another newbiew question >> here... >> >> For showing the incremental >> changes, is >> there a link that explains how to do that? I apologize for my >> newbie >> questions all the time :) >> >> Right now, I do: >> >> ksh ../webrev.ksh -m -N >> >> That generates a webrev.zip and >> send it >> to Chuck Rasbold. He then uploads it to a new webrev. >> >> I tried commiting my change and >> adding >> a small change. Then if I just do ksh ../webrev.ksh without >> any >> options, it seems to produce a similar >> page but now with only the >> changes I >> had (so the 06-07 comparison you were talking about) and a >> changeset >> that has it all. I imagine that is >> what you meant. >> >> Which means that my workflow >> would become: >> >> 1) Make changes >> 2) Make a webrev without any >> options to >> show just the differences with the tip >> 3) Amend my changes to my local >> commit >> so that I have it done with >> 4) Go to 1 >> >> Does that seem correct to you? >> >> Note that when I do this, I only >> see >> the full change of a file in the full change set (Side note >> here: >> now the page says change set and not >> patch, which is maybe why >> Serguei was >> having issues?). >> >> Thanks! >> Jc >> >> >> >> On Wed, Jun 28, 2017 at 1:12 AM, >> Robbin >> Ehn >> > >> >> > >>> wrote: >> >> Hi, >> >> On 06/28/2017 12:04 AM, JC >> Beyler >> wrote: >> >> Dear Thomas et al, >> >> Here is the newest >> webrev: >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ >> >> > > >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ >> >> > >> >> >> >> >> You have some more bits to >> in >> there but generally this looks good and really nice with more >> tests. >> I'll do and deep dive and >> re-test >> this when I get back from my long vacation with whatever patch >> version you have then. >> >> Also I think it's time you >> provide >> incremental (v06->07 changes) as well as complete change-sets. >> >> Thanks, Robbin >> >> >> >> >> Thomas, I "think" I have >> answered all your remarks. The summary is: >> >> - The statistic system >> is up >> and provides insight on what the heap sampler is doing >> - I've noticed >> that, >> though the sampling rate is at the right mean, we are missing >> some >> samples, I have not yet tracked out why >> (details below) >> >> - I've run a tiny >> benchmark >> that is the worse case: it is a very tight loop and allocated >> a >> small array >> - In this case, I >> see no >> overhead when the system is off so that is a good start :) >> - I see right now >> a high >> overhead in this case when sampling is on. This is not a >> really too >> surprising but I'm going to see if >> this is consistent with our >> internal >> implementation. The >> benchmark is really allocation stressful so I'm not too >> surprised >> but I want to do the due diligence. >> >> - The statistic >> system up >> is up and I have a new test >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> > >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> >> >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test >> /serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatT >> est.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> > >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch >> >>> >> - I did a bit of >> a study >> about the random generator here, more details are below but >> basically it seems to work well >> >> - I added a >> capability but >> since this is the first time doing this, I was not sure I did >> it right >> - I did add a test >> though >> for it and the test seems to do what I expect (all methods are >> failing with the >> JVMTI_ERROR_MUST_POSSESS_CAPABILITY >> error). >> - >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch> >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch>> >> >> > asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit >> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch> >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/ >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch >> > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa >> bilityTest.java.patch>>> >> >> - I still need to >> figure >> out what to do about the multi-agent vs single-agent issue >> >> - As far as >> measurements, >> it seems I still need to look at: >> - Why we do the 20 >> random >> calls first, are they necessary? >> - Look at the mean >> of the >> sampling rate that the random generator does and also what is >> actually sampled >> - What is the >> overhead in >> terms of memory/performance when on? >> >> I have inlined my >> answers, I >> think I got them all in the new webrev, let me know your >> thoughts. >> >> Thanks again! >> Jc >> >> >> On Fri, Jun 23, 2017 at >> 3:52 >> AM, Thomas Schatzl > > thomas.schatzl at oracle.com >> > >> > com >> > thomas.schatzl at oracle.com >> >> >> > thomas.schatzl at oracle.com> >> > thomas.schatzl at oracle.com>> >> >> > thomas.schatzl at oracle.com >> > >>>> wrote: >> >> Hi, >> >> On Wed, 2017-06-21 >> at >> 13:45 -0700, JC Beyler wrote: >> > Hi all, >> > >> > First off: >> Thanks again >> to Robbin and Thomas for their reviews :) >> > >> > Next, I've >> uploaded a >> new webrev: >> > >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ >> >> > > >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ >> >> > >> >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ >> >> > > >> < >> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ >> >> > >>> >> >> > >> > Here is an >> update: >> > >> > - @Robbin, I >> forgot to >> say that yes I need to look at implementing >> > this for the >> other >> architectures and testing it before it is all >> > ready to go. Is >> it >> common to have it working on all possible >> > combinations or >> is >> there a subset that I should be doing first and we > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Wed Oct 25 17:23:16 2017 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 25 Oct 2017 19:23:16 +0200 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <4D25A442-F0BD-41D1-837B-1115A165EA78@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com> <4D25A442-F0BD-41D1-837B-1115A165EA78@oracle.com> Message-ID: Thanks for the explanation - maybe you could add it to code as a comment. Sent from my iPhone > On 25 Oct 2017, at 6:13 pm, Igor Veresov wrote: > > > >> On Oct 25, 2017, at 3:07 AM, Doug Simon wrote: >> >> 581 >> 582 // Fixup the case of C1's inability to optimize profiling of a statically bindable call site >> 583 if (entries == 1) { >> 584 counts[0] = totalCount; >> 585 } >> 586 >> But what happens if we're looking at a profile from the interpreter? In that case, won't totalCount == 0 && counts[0] have the right value? In which case, the above fixup will lose this information. Maybe it should be: >> >> counts[0] += totalCount; > > > If it?s pure interpreter you?d have entries == 0, so this fixup won?t fire. Also totalCount at the point of the fixup is a sum of every counter in profile (for all the types + the counter for types that weren?t recorded). So what the fixup does is that it attributes all the counts to the first type (if it?s a monomorphic call site). > > igor > >> >> -Doug >> >>> On 25 Oct 2017, at 05:52, Igor Veresov wrote: >>> >>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. >>> >>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >>> >>> Thanks, >>> igor >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Oct 25 17:29:36 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 25 Oct 2017 10:29:36 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> Message-ID: Igor Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms. Thanks, Vladimir On 10/24/17 8:52 PM, Igor Veresov wrote: > This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. > > Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 > > Thanks, > igor > From vladimir.kozlov at oracle.com Wed Oct 25 19:07:26 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 25 Oct 2017 12:07:26 -0700 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: <3ee0024d-6af5-afe9-8127-0dc2cc3a1711@oracle.com> References: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> <3ee0024d-6af5-afe9-8127-0dc2cc3a1711@oracle.com> Message-ID: <43716769-5d0a-f663-ef8e-c6da60346ac8@oracle.com> Hi Nils, "On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like" MaxVectorSize was designed to limit vector size for testing purpose. I just run compiler/codegen jtreg tests, which includes vector tests, on avx2 Intel machine with -XX:MaxVectorSize=16 and did not hit any problems. I looked and did not find what mismatch you are talking about: "In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch." C2 should not generate vector with size > MaxVectorSize so they should not be any instructions in .ad file which conflict with it. Can you show output of -Xlog:os+cpu on your machine? vector_size_supported() takes into account MaxVectorSize: static const bool vector_size_supported(const BasicType bt, int size) { return (Matcher::max_vector_size(bt) >= size && Matcher::min_vector_size(bt) <= size); } const int Matcher::max_vector_size(const BasicType bt) { return vector_width_in_bytes(bt)/type2aelembytes(bt); } const int Matcher::vector_width_in_bytes(BasicType bt) { ... // Use flag to limit vector size. size = MIN2(size,(int)MaxVectorSize); Thanks, Vladimir On 10/25/17 6:13 AM, Nils Eliasson wrote: > Deans suggestion with making the TypeVect initialization unconditional also removes all platform dependencies on type.cpp: > > http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev.02/ > > Regards, > Nils > > On 2017-10-25 00:02, Vladimir Kozlov wrote: >> We can't use platform specific UseAVX flag in shared code in type.cpp. >> >> I would say we should not support AVX (set UseAVX to 0) on AMD < 17h. >> And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and corresponding vectors 32 and 64 bytes. >> If AMD's Instructions Set before 17h does not support whole 32 bytes vectors we can't call it AVX. >> >> Thanks, >> Vladimir >> >> On 10/18/17 10:01 AM, dean.long at oracle.com wrote: >>> How about initializing TypeVect::VECTY and friends unconditionally? I am nervous about exchanging one guarding condition for another. >>> >>> dl >>> >>> >>> On 10/18/17 1:03 AM, Nils Eliasson wrote: >>>> >>>> HI, >>>> >>>> I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives the best performance. >>>> >>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> ???? } >>>> >>>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the platform has the capability. >>>> >>>> Type.cpp:~660 >>>> >>>> [...] >>>> >?? if (Matcher::vector_size_supported(T_FLOAT,8)) { >>>> >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8); >>>> >?? } >>>> [...] >>>> >?? mreg2type[Op_VecY] = TypeVect::VECTY; >>>> >>>> >>>> In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch. >>>> >>>> On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like: >>>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity"); >>>> >>>> Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, >>>> but they might not be used if MaxVectorSize is limited.) >>>> >>>> This is a patch that solves the problem, but I have not convinced myself that it is the right way: >>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >>>> >>>> Feedback appreciated, >>>> >>>> Regards, >>>> Nils Eliasson >>>> >>>> >>>> >>>> >>>> >>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >>>> >>> > From martin.doerr at sap.com Wed Oct 25 19:08:59 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 25 Oct 2017 19:08:59 +0000 Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions In-Reply-To: References: Message-ID: <18ddb703d81a4a22bc97f134dd276eff@sap.com> Hi Lutz, thanks for working on vector-based enhancements and for providing this webrev. assembler_s390: -The changes in the assembler look good. s390.ad: -It doesn't make sense to load constant len to a register and generate complex compare instructions for it and still to emit code for all cases. I assume that e.g. the 4 characters cases usually have a constant length. If so, much better code could be generated for them by omitting all the stuff around the simple instructions. (ppc64.ad already contains nodes for constant length of needle in indexOf rules.) macroAssembler_s390: -Are you sure the prefetch instructions improve performance? I remember that we had them in other String intrinsics but removed them again as they showed absolutely no performance gain. -Comment: Using hardcoded vector registers is ok for now, but may need to get changed e.g. when using them for C2's SuperWord optimization. -Comment: You could use the vperm instruction instead of vo+vn, but I'm ok with the current implementation because loading a mask is much more convenient than getting the permutation vector loaded (e.g. from constant pool or pc relative). -So the new vector loop looks good to me. -In my opinion, the size of all the generated cases should be in relationship to their performance benefit. As intrinsics are not like stubs and may get inlined often, I can't get rid of the impression that generating so large code wastes valuable code cache space with questionable performance gain in real world scenarios. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Mittwoch, 25. Oktober 2017 12:02 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions Dear all, I would like to request reviews for this s390-only enhancement: Bug: https://bugs.openjdk.java.net/browse/JDK-8189793 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189793.00/index.html Vector instructions, which have been available on System z for a while (since z13), promise noticeable performance improvements. This enhancement improves the String Compress and String Inflate intrinsics by exploiting vector instructions, when available. For long strings, up to 2x performance improvement has been observed in micro-benchmarks. Special care was taken to preserve good performance for short strings. All examined workloads showed a high ratio of short and very short strings. Thank you! Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.long at oracle.com Wed Oct 25 20:32:22 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 25 Oct 2017 13:32:22 -0700 Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for is_deopt_suspend needlessly In-Reply-To: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> Message-ID: <34bf5c2b-0b8e-1d9c-1b3e-e152d7e3cbf4@oracle.com> Looks OK.? It appears that only Sparc uses is_deopt_suspend(), and then only when we exit native. dl On 10/25/17 2:35 AM, jamsheed wrote: > Hi, > > request for review, > > webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/ > > jbs: https://bugs.openjdk.java.net/browse/JDK-6523512 > > desc: removed the is_deopt_suspend() from > has_special_runtime_exit_condition checks > > Best regards, > > Jamsheed > From igor.veresov at oracle.com Wed Oct 25 22:16:47 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 25 Oct 2017 15:16:47 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> Message-ID: <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> Sure. I?ve updated the webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/ Also added a comment in HotSpotMethodData.java per Doug?s request. igor > On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov wrote: > > Igor > > Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms. > > Thanks, > Vladimir > > On 10/24/17 8:52 PM, Igor Veresov wrote: >> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. >> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >> Thanks, >> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Oct 25 23:04:09 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 25 Oct 2017 16:04:09 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> Message-ID: <83757fa4-fed9-3d65-dd91-e547e3bcac05@oracle.com> Looks good. Thanks, Vladimir On 10/25/17 3:16 PM, Igor Veresov wrote: > Sure. I?ve updated the webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/ > Also added a comment in HotSpotMethodData.java per Doug?s request. > > igor > >> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov > wrote: >> >> Igor >> >> Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms. >> >> Thanks, >> Vladimir >> >> On 10/24/17 8:52 PM, Igor Veresov wrote: >>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it >>> speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. >>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >>> Thanks, >>> igor > From vladimir.kozlov at oracle.com Wed Oct 25 23:21:36 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 25 Oct 2017 16:21:36 -0700 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com> References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com> Message-ID: <59e5b4ff-1fb3-db98-264c-ea6b6f98e526@oracle.com> Hi Roland, Tests passed. Please, send changeset with test moved into compiler/exceptions/ directory. Thanks, Vladimir On 10/24/17 2:08 PM, Vladimir Kozlov wrote: > It looks good to me too. The only issue is test's placement - /c1 subdir is nothing to do with C1 compiler. I think test should be put into compiler/exceptions/ directory. > I submitted pre-integration testing. > > Thanks, > Vladimir > > On 10/18/17 8:19 PM, dean.long at oracle.com wrote: >> Yes, but I'm not a Reviewer. >> >> dl >> >> >> On 10/18/17 7:16 AM, Roland Westrelin wrote: >>> Here is an updated webrev with Dean's suggestion: >>> >>> http://cr.openjdk.java.net/~roland/8188151/webrev.01/ >>> >>> Can this be considered reviewed by you, Dean? >>> >>> Roland. >> From igor.veresov at oracle.com Wed Oct 25 23:25:48 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 25 Oct 2017 16:25:48 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <83757fa4-fed9-3d65-dd91-e547e3bcac05@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> <83757fa4-fed9-3d65-dd91-e547e3bcac05@oracle.com> Message-ID: <147B8B6A-1BDE-42C8-BEC5-9BD6538625DC@oracle.com> Thanks! igor > On Oct 25, 2017, at 4:04 PM, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 10/25/17 3:16 PM, Igor Veresov wrote: >> Sure. I?ve updated the webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/ >> Also added a comment in HotSpotMethodData.java per Doug?s request. >> igor >>> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov > wrote: >>> >>> Igor >>> >>> Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms. >>> >>> Thanks, >>> Vladimir >>> >>> On 10/24/17 8:52 PM, Igor Veresov wrote: >>>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI. >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >>>> Thanks, >>>> igor From rohitarulraj at gmail.com Thu Oct 26 04:48:35 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Thu, 26 Oct 2017 10:18:35 +0530 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: References: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> Message-ID: Hello Vladimir, Please find the requested details: AVX/AVX2 support availability on AMD Processors: Family 14h and earlier ? No AVX support Family 15h - (1st-gen), (2nd-gen), (3rd-gen) AVX support available, max vector width is 32 bytes (we limit the vector size to 16 bytes in openJDK). Family 16h ? AVX support available, max vector width is 32 bytes (we limit the vector size to 16 bytes in openJDK). Family 15h - (4th-gen) AVX, AVX2 support available, max vector width is 32 bytes (we limit the vector size to 16 bytes in openJDK). Family 17h ? AVX, AVX2 support available, max vector width is 32 bytes (our proposed changes have vector size set to 32 bytes in openJDK). AVX3 support is not available on AMD processors yet. >From the comments below, Dean's suggestions seems reasonable. Regards, Rohit On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Kozlov wrote: > We can't use platform specific UseAVX flag in shared code in type.cpp. > > I would say we should not support AVX (set UseAVX to 0) on AMD < 17h. > And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and > corresponding vectors 32 and 64 bytes. > If AMD's Instructions Set before 17h does not support whole 32 bytes > vectors we can't call it AVX. > > Thanks, > Vladimir > > On 10/18/17 10:01 AM, dean.long at oracle.com wrote: > >> How about initializing TypeVect::VECTY and friends unconditionally? I am >> nervous about exchanging one guarding condition for another. >> >> dl >> >> >> On 10/18/17 1:03 AM, Nils Eliasson wrote: >> >>> >>> HI, >>> >>> I ran into a problem with the interaction between MaxVectorSize and the >>> UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives >>> the best performance. >>> >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> } >>>> >>> >>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the >>> TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the >>> platform has the capability. >>> >>> Type.cpp:~660 >>> >>> [...] >>> > if (Matcher::vector_size_supported(T_FLOAT,8)) { >>> > TypeVect::VECTY = TypeVect::make(T_FLOAT,8); >>> > } >>> [...] >>> > mreg2type[Op_VecY] = TypeVect::VECTY; >>> >>> >>> In the ad-files feature flags (UseAVX etc.) are used to control what >>> rules should be matched if it has effects on specific vector registers. >>> Here we have a mismatch. >>> >>> On a platform that supports AVX2 but have MaxVectorSize limited to 16, >>> the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is >>> uninitialized. We will also hit asserts in a few places like: >>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), >>> "sanity"); >>> >>> Shouldn't the type initialization in type.cpp be dependent on feature >>> flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector >>> registers are initialized if the platform supports them, but they might not >>> be used if MaxVectorSize is limited.) >>> >>> This is a patch that solves the problem, but I have not convinced myself >>> that it is the right way: >>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >>> >>> Feedback appreciated, >>> >>> Regards, >>> Nils Eliasson >>> >>> >>> >>> >>> >>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamsheed.c.m at oracle.com Thu Oct 26 07:18:10 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Thu, 26 Oct 2017 12:48:10 +0530 Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for is_deopt_suspend needlessly In-Reply-To: <34bf5c2b-0b8e-1d9c-1b3e-e152d7e3cbf4@oracle.com> References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> <34bf5c2b-0b8e-1d9c-1b3e-e152d7e3cbf4@oracle.com> Message-ID: <112bc062-2bb4-25e4-dfc5-546e2b740de4@oracle.com> Thank you for the review, Dean Best regards, Jamsheed On Thursday 26 October 2017 02:02 AM, dean.long at oracle.com wrote: > Looks OK.? It appears that only Sparc uses is_deopt_suspend(), and > then only when we exit native. > > dl > > > On 10/25/17 2:35 AM, jamsheed wrote: >> Hi, >> >> request for review, >> >> webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/ >> >> jbs: https://bugs.openjdk.java.net/browse/JDK-6523512 >> >> desc: removed the is_deopt_suspend() from >> has_special_runtime_exit_condition checks >> >> Best regards, >> >> Jamsheed >> > From tobias.hartmann at oracle.com Thu Oct 26 08:00:35 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 26 Oct 2017 10:00:35 +0200 Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for is_deopt_suspend needlessly In-Reply-To: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> Message-ID: <373f160f-0cd9-9bd8-e89e-7320bd342977@oracle.com> Hi Jamsheed, On 25.10.2017 11:35, jamsheed wrote: > webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/ Looks good to me. Best regards, Tobias From jamsheed.c.m at oracle.com Thu Oct 26 08:57:06 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Thu, 26 Oct 2017 14:27:06 +0530 Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for is_deopt_suspend needlessly In-Reply-To: <373f160f-0cd9-9bd8-e89e-7320bd342977@oracle.com> References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com> <373f160f-0cd9-9bd8-e89e-7320bd342977@oracle.com> Message-ID: <5942a7f2-20be-d35e-3d25-c7cf599228fd@oracle.com> Thank you for the review, Tobias Best regards, Jamsheed On Thursday 26 October 2017 01:30 PM, Tobias Hartmann wrote: > Hi Jamsheed, > > On 25.10.2017 11:35, jamsheed wrote: >> webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/ > > Looks good to me. > > Best regards, > Tobias From jamsheed.c.m at oracle.com Thu Oct 26 13:09:49 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Thu, 26 Oct 2017 18:39:49 +0530 Subject: RFR [10]: 8185989: overview.html files should be deleted? Message-ID: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com> Hi, request for review, jbs: https://bugs.openjdk.java.net/browse/JDK-8185989 webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/ desc: src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html deleted Best regards, Jamsheed From tobias.hartmann at oracle.com Thu Oct 26 13:20:18 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 26 Oct 2017 15:20:18 +0200 Subject: RFR [10]: 8185989: overview.html files should be deleted? In-Reply-To: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com> References: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com> Message-ID: Hi Jamsheed, looks good. Best regards, Tobias On 26.10.2017 15:09, jamsheed wrote: > Hi, > > request for review, > > jbs: https://bugs.openjdk.java.net/browse/JDK-8185989 > > webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/ > > desc: > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html > > deleted > > Best regards, > > Jamsheed > From rwestrel at redhat.com Thu Oct 26 13:59:10 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 26 Oct 2017 15:59:10 +0200 Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure with C1 In-Reply-To: <59e5b4ff-1fb3-db98-264c-ea6b6f98e526@oracle.com> References: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com> <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com> <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com> <59e5b4ff-1fb3-db98-264c-ea6b6f98e526@oracle.com> Message-ID: > Tests passed. Please, send changeset with test moved into compiler/exceptions/ directory. Thanks for the review and testing. Here is the changeset: http://cr.openjdk.java.net/~roland/8188151/8188151.changeset Roland. From tom.rodriguez at oracle.com Thu Oct 26 16:48:03 2017 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 26 Oct 2017 09:48:03 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> Message-ID: <59F211C3.90104@oracle.com> Sorry I'm late to this, but I don't think the HotSpotMethodData changes are correct. If you run with -XX:TypeProfileWidth=1 you'll get incorrect profiles for non-statically bindable call sites. Shouldn't it be entries == 1 && methods[0].canBeStaticallyBound()? I think the ciMethod workaround for this problem has the same issue. Also I think it would make sense to null out the entry so it looks the same as a properly profiled vfinal call site. tom Igor Veresov wrote: > Sure. I?ve updated the webrev: > http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/ > Also added a comment in HotSpotMethodData.java per Doug?s request. > > igor > >> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov >> > wrote: >> >> Igor >> >> Can you factor out checks into boolean function in shared place? May >> be move some surrounding code into it too - I see the same code on all >> platforms. >> >> Thanks, >> Vladimir >> >> On 10/24/17 8:52 PM, Igor Veresov wrote: >>> This a fix from Tom that I ported to all architectures and the new >>> repo structure. While that fix doesn?t not solve the problem of the >>> interpreter-C1 profiling style discrepancy completely it speeds up >>> profiling of the statically bindable call sites and we?d like to push >>> that. I also added a bit of a code to JVMCI to do the profile fix up >>> analogous to what happens in CI. >>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >>> Thanks, >>> igor > From vladimir.kozlov at oracle.com Thu Oct 26 17:28:10 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 26 Oct 2017 10:28:10 -0700 Subject: RFR [10]: 8185989: overview.html files should be deleted? In-Reply-To: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com> References: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com> Message-ID: <2a3bb4f1-07c8-fac2-69d2-2ad7853bfd6a@oracle.com> Good. Thanks, Vladimir On 10/26/17 6:09 AM, jamsheed wrote: > Hi, > > request for review, > > jbs: https://bugs.openjdk.java.net/browse/JDK-8185989 > > webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/ > > desc: > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html > > deleted > > Best regards, > > Jamsheed > From vladimir.kozlov at oracle.com Thu Oct 26 17:36:38 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 26 Oct 2017 10:36:38 -0700 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: References: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> Message-ID: <0bf320d1-aa35-e05c-0959-3ffa09110499@oracle.com> Thank you, Rohit Do you plan to propose changes to increase vector size to 32 for 15h and 16h? Or AMD is fine with current settings? Thanks, Vladimir On 10/25/17 9:48 PM, Rohit Arul Raj wrote: > Hello Vladimir, > > > Please find the requested details: > > > AVX/AVX2 support availability on AMD Processors: > > Family 14h and earlier ? No AVX support > > Family 15h -? (1^st -gen), (2nd-gen), (3rd-gen) AVX support available, max vector width is 32 bytes (we limit the vector > size to 16 bytes in openJDK). > > Family 16h ? AVX support available, max vector width is 32 bytes (we limit the vector size to 16 bytes in openJDK). > > Family 15h -? (4^th -gen) AVX, AVX2 support available, max vector width is 32 bytes (we limit the vector size to 16 > bytes in openJDK). > > Family 17h ? AVX, AVX2 support available, max vector width is 32 bytes (our proposed changes have vector size set to 32 > bytes in openJDK). > > AVX3 support is not available on AMD processors yet. > > > From the comments below, Dean's suggestions seems reasonable. > > Regards, > Rohit > > > On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Kozlov > wrote: > > We can't use platform specific UseAVX flag in shared code in type.cpp. > > I would say we should not support AVX (set UseAVX to 0) on AMD < 17h. > And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and corresponding vectors 32 and 64 bytes. > If AMD's Instructions Set before 17h does not support whole 32 bytes vectors we can't call it AVX. > > Thanks, > Vladimir > > On 10/18/17 10:01 AM, dean.long at oracle.com wrote: > > How about initializing TypeVect::VECTY and friends unconditionally?? I am nervous about exchanging one guarding > condition for another. > > dl > > > On 10/18/17 1:03 AM, Nils Eliasson wrote: > > > HI, > > I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit > the vector size to 16 because it gives the best performance. > > +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { > +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. > ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16); > ???? } > > > Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't > be initalized even though the platform has the capability. > > Type.cpp:~660 > > [...] > >?? if (Matcher::vector_size_supported(T_FLOAT,8)) { > >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8); > >?? } > [...] > >?? mreg2type[Op_VecY] = TypeVect::VECTY; > > > In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has > effects on specific vector registers. Here we have a mismatch. > > On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the > TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like: > assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity"); > > Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of > MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, but they > might not be used if MaxVectorSize is limited.) > > This is a patch that solves the problem, but I have not convinced myself that it is the right way: > http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ > > > Feedback appreciated, > > Regards, > Nils Eliasson > > > > > > http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ > > > > From igor.veresov at oracle.com Thu Oct 26 19:42:52 2017 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 26 Oct 2017 12:42:52 -0700 Subject: RFR(S) 8166750: profiling handles statically bindable call sites differently than the interpreter In-Reply-To: <59F211C3.90104@oracle.com> References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com> <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> <59F211C3.90104@oracle.com> Message-ID: <4B298FB6-326A-4C7F-BBF2-147DA4C4B8F2@oracle.com> Good points, since I already push it, I?ll file a new bug. igor > On Oct 26, 2017, at 9:48 AM, Tom Rodriguez wrote: > > Sorry I'm late to this, but I don't think the HotSpotMethodData changes are correct. If you run with -XX:TypeProfileWidth=1 you'll get incorrect profiles for non-statically bindable call sites. Shouldn't it be entries == 1 && methods[0].canBeStaticallyBound()? I think the ciMethod workaround for this problem has the same issue. Also I think it would make sense to null out the entry so it looks the same as a properly profiled vfinal call site. > > tom > > Igor Veresov wrote: >> Sure. I?ve updated the webrev: >> http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/ >> Also added a comment in HotSpotMethodData.java per Doug?s request. >> >> igor >> >>> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov >>> > wrote: >>> >>> Igor >>> >>> Can you factor out checks into boolean function in shared place? May >>> be move some surrounding code into it too - I see the same code on all >>> platforms. >>> >>> Thanks, >>> Vladimir >>> >>> On 10/24/17 8:52 PM, Igor Veresov wrote: >>>> This a fix from Tom that I ported to all architectures and the new >>>> repo structure. While that fix doesn?t not solve the problem of the >>>> interpreter-C1 profiling style discrepancy completely it speeds up >>>> profiling of the statically bindable call sites and we?d like to push >>>> that. I also added a bit of a code to JVMCI to do the profile fix up >>>> analogous to what happens in CI. >>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 >>>> Thanks, >>>> igor >> From ekaterina.pavlova at oracle.com Fri Oct 27 00:40:24 2017 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 26 Oct 2017 17:40:24 -0700 Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java doesn't have timeout and hang on windows In-Reply-To: References: Message-ID: <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com> Looks good. Thanks for fixing it, -katya On 10/17/17 9:45 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html >> 546 lines changed: 188 ins; 88 del; 270 mod; > > Hi all, > > could you please review this fix for ctw test? > in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution. > > the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows. > > webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html > testing: applications/ctw/modules tests > JBS: https://bugs.openjdk.java.net/browse/JDK-8186618 > > Thanks, > -- Igor > From vladimir.kozlov at oracle.com Fri Oct 27 02:02:15 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 26 Oct 2017 19:02:15 -0700 Subject: [10] RFR(S) 8189064: Crash with compiler/codegen/*Vect.java on Solaris-sparc Message-ID: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com> webrev: http://cr.openjdk.java.net/~kvn/8189064/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8189064 New code from JDK-8187601 triggers an other round of loopopts to try to unroll more loops which were not vectorized. But that also trigger second round of vectorization. To avoid vectorization of already vectorized loops there is cl->is_vectorized_loop() check in SuperWord::transform_loop(). Unfortunately cl->mark_loop_vectorized() is called in SuperWord::output() under several conditions and one of them (compare vector length with unroll count) is not true on SPARC because it has very small vectors (8 bytes) as result cl->mark_loop_vectorized() is not called. The fix is unconditionally call cl->mark_loop_vectorized() when vectors are generated. I also modified JDK-8187601 changes to trigger an other round of loopopts only when main loop is not vectorized. Failed vector tests from bug report passed. I submitted pre-integration testing. Thanks, Vladimir From igor.ignatyev at oracle.com Fri Oct 27 02:44:32 2017 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 26 Oct 2017 19:44:32 -0700 Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java doesn't have timeout and hang on windows In-Reply-To: <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com> References: <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com> Message-ID: <79997CB7-FF94-4354-BC7E-8CE5B73BDC10@oracle.com> Katya, thank you reviewing it. can I have another review for this patch from a Reviewer? Thanks, -- Igor > On Oct 26, 2017, at 5:40 PM, Ekaterina Pavlova wrote: > > Looks good. > > Thanks for fixing it, > > -katya > > On 10/17/17 9:45 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html >>> 546 lines changed: 188 ins; 88 del; 270 mod; >> Hi all, >> could you please review this fix for ctw test? >> in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution. >> the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows. >> webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html >> testing: applications/ctw/modules tests >> JBS: https://bugs.openjdk.java.net/browse/JDK-8186618 >> Thanks, >> -- Igor > From jamsheed.c.m at oracle.com Fri Oct 27 05:24:45 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Fri, 27 Oct 2017 10:54:45 +0530 Subject: RFR [10]: 8185989: overview.html files should be deleted? In-Reply-To: <2a3bb4f1-07c8-fac2-69d2-2ad7853bfd6a@oracle.com> References: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com> <2a3bb4f1-07c8-fac2-69d2-2ad7853bfd6a@oracle.com> Message-ID: Thank you for the review, Tobias, Vladimir Best regards, Jamsheed On Thursday 26 October 2017 10:58 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 10/26/17 6:09 AM, jamsheed wrote: >> Hi, >> >> request for review, >> >> jbs: https://bugs.openjdk.java.net/browse/JDK-8185989 >> >> webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/ >> >> desc: >> >> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html >> >> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html >> >> deleted >> >> Best regards, >> >> Jamsheed >> From tobias.hartmann at oracle.com Fri Oct 27 07:35:58 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 27 Oct 2017 09:35:58 +0200 Subject: [10] RFR(S) 8189064: Crash with compiler/codegen/*Vect.java on Solaris-sparc In-Reply-To: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com> References: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com> Message-ID: Hi Vladimir, On 27.10.2017 04:02, Vladimir Kozlov wrote: > webrev: http://cr.openjdk.java.net/~kvn/8189064/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8189064 Looks good to me! Best regards, Tobias From vladimir.kozlov at oracle.com Fri Oct 27 08:00:45 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 27 Oct 2017 01:00:45 -0700 Subject: [10] RFR(S) 8189064: Crash with compiler/codegen/*Vect.java on Solaris-sparc In-Reply-To: References: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com> Message-ID: Thank you, Tobias Vladimir On 10/27/17 12:35 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 27.10.2017 04:02, Vladimir Kozlov wrote: >> webrev: http://cr.openjdk.java.net/~kvn/8189064/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8189064 > > Looks good to me! > > Best regards, > Tobias From rohitarulraj at gmail.com Fri Oct 27 11:03:54 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Fri, 27 Oct 2017 16:33:54 +0530 Subject: Reduced MaxVectorSize and vector type initialization In-Reply-To: <0bf320d1-aa35-e05c-0959-3ffa09110499@oracle.com> References: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com> <0bf320d1-aa35-e05c-0959-3ffa09110499@oracle.com> Message-ID: Hello Vladimir, We are fine with the current settings. Thanks, Rohit On Thu, Oct 26, 2017 at 11:06 PM, Vladimir Kozlov < vladimir.kozlov at oracle.com> wrote: > Thank you, Rohit > > Do you plan to propose changes to increase vector size to 32 for 15h and > 16h? Or AMD is fine with current settings? > > Thanks, > Vladimir > > On 10/25/17 9:48 PM, Rohit Arul Raj wrote: > >> Hello Vladimir, >> >> >> Please find the requested details: >> >> >> AVX/AVX2 support availability on AMD Processors: >> >> Family 14h and earlier ? No AVX support >> >> Family 15h - (1^st -gen), (2nd-gen), (3rd-gen) AVX support available, >> max vector width is 32 bytes (we limit the vector size to 16 bytes in >> openJDK). >> >> Family 16h ? AVX support available, max vector width is 32 bytes (we >> limit the vector size to 16 bytes in openJDK). >> >> Family 15h - (4^th -gen) AVX, AVX2 support available, max vector width >> is 32 bytes (we limit the vector size to 16 bytes in openJDK). >> >> Family 17h ? AVX, AVX2 support available, max vector width is 32 bytes >> (our proposed changes have vector size set to 32 bytes in openJDK). >> >> AVX3 support is not available on AMD processors yet. >> >> >> From the comments below, Dean's suggestions seems reasonable. >> >> Regards, >> Rohit >> >> >> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com > wrote: >> >> We can't use platform specific UseAVX flag in shared code in type.cpp. >> >> I would say we should not support AVX (set UseAVX to 0) on AMD < 17h. >> And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 >> and corresponding vectors 32 and 64 bytes. >> If AMD's Instructions Set before 17h does not support whole 32 bytes >> vectors we can't call it AVX. >> >> Thanks, >> Vladimir >> >> On 10/18/17 10:01 AM, dean.long at oracle.com > dean.long at oracle.com> wrote: >> >> How about initializing TypeVect::VECTY and friends >> unconditionally? I am nervous about exchanging one guarding >> condition for another. >> >> dl >> >> >> On 10/18/17 1:03 AM, Nils Eliasson wrote: >> >> >> HI, >> >> I ran into a problem with the interaction between >> MaxVectorSize and the UseAVX. For some AMD CPUs we limit >> the vector size to 16 because it gives the best performance. >> >> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> + // Limit vectors size to 16 bytes on AMD cpus < >> 17h. >> FLAG_SET_DEFAULT(MaxVectorSize, 16); >> } >> >> >> Whenf MaxVecorSize is set to 16 it has the sideeffect that >> the TypeVect::VECTY and mreg2type[Op_VecY] won't >> be initalized even though the platform has the capability. >> >> Type.cpp:~660 >> >> [...] >> > if (Matcher::vector_size_supported(T_FLOAT,8)) { >> > TypeVect::VECTY = TypeVect::make(T_FLOAT,8); >> > } >> [...] >> > mreg2type[Op_VecY] = TypeVect::VECTY; >> >> >> In the ad-files feature flags (UseAVX etc.) are used to >> control what rules should be matched if it has >> effects on specific vector registers. Here we have a mismatch. >> >> On a platform that supports AVX2 but have MaxVectorSize >> limited to 16, the VM will fail when the >> TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will >> also hit asserts in a few places like: >> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), >> "sanity"); >> >> Shouldn't the type initialization in type.cpp be dependent on >> feature flag (UseAVX etc.) instead of >> MaxVectorSize? (The type for the vector registers are >> initialized if the platform supports them, but they >> might not be used if MaxVectorSize is limited.) >> >> This is a patch that solves the problem, but I have not >> convinced myself that it is the right way: >> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >> >> >> Feedback appreciated, >> >> Regards, >> Nils Eliasson >> >> >> >> >> >> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz.schmidt at sap.com Fri Oct 27 11:06:50 2017 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 27 Oct 2017 11:06:50 +0000 Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions In-Reply-To: <18ddb703d81a4a22bc97f134dd276eff@sap.com> References: <18ddb703d81a4a22bc97f134dd276eff@sap.com> Message-ID: <6F73CAE2-2FEC-4BC0-9F3A-FEE9748EB694@sap.com> Hi Martin, Thanks for reviewing my change! This is a preliminary response just to let you know I?m working on the change. I?m putting a lot of effort in producing reliable performance measurement data. Turns out this is not easy (to be more honest: almost impossible). s390.ad: You are absolutely right, the sequence load_const/string_compress makes no sense at all. But it does not hurt either ? I could not find one match in all tests I ran. -> Match rule deleted. macroAssembler_s390: prefetch: did not see impact, neither positive nor negative. Artificial micro benchmarks will not benefit (data is in cache anyway). More complex benchmarks show measurement noise which covers the possible prefetch benefit. -> prefetch deleted. Hardcoded vector registers: you are right. There are some design decisions pending, e.g. how many vector scratch registers? Vperm instruction: using that is just another implementation variant that could save the vn vector instruction. On the other hand, loading the index vector is a (compared to vgmh) costly memory access. Given the fact that we mostly deal with short strings, initialization effort is relevant. Code size vs. performance: the old, well known, often discussed tradeoff. Starting from the existing implementation, I invested quite some time in optimizing the (len <= 8) cases. With every refinement step I saw (or believed to see (measurement noise)) some improvement ? or discarded it. Is the overall improvement worth the larger code size? -> tradeoff, discussion. Best Regards, Lutz On 25.10.2017, 21:08, "Doerr, Martin" > wrote: Hi Lutz, thanks for working on vector-based enhancements and for providing this webrev. assembler_s390: -The changes in the assembler look good. s390.ad: -It doesn't make sense to load constant len to a register and generate complex compare instructions for it and still to emit code for all cases. I assume that e.g. the 4 characters cases usually have a constant length. If so, much better code could be generated for them by omitting all the stuff around the simple instructions. (ppc64.ad already contains nodes for constant length of needle in indexOf rules.) macroAssembler_s390: -Are you sure the prefetch instructions improve performance? I remember that we had them in other String intrinsics but removed them again as they showed absolutely no performance gain. -Comment: Using hardcoded vector registers is ok for now, but may need to get changed e.g. when using them for C2's SuperWord optimization. -Comment: You could use the vperm instruction instead of vo+vn, but I'm ok with the current implementation because loading a mask is much more convenient than getting the permutation vector loaded (e.g. from constant pool or pc relative). -So the new vector loop looks good to me. -In my opinion, the size of all the generated cases should be in relationship to their performance benefit. As intrinsics are not like stubs and may get inlined often, I can't get rid of the impression that generating so large code wastes valuable code cache space with questionable performance gain in real world scenarios. Best regards, Martin From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz Sent: Mittwoch, 25. Oktober 2017 12:02 To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions Dear all, I would like to request reviews for this s390-only enhancement: Bug: https://bugs.openjdk.java.net/browse/JDK-8189793 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189793.00/index.html Vector instructions, which have been available on System z for a while (since z13), promise noticeable performance improvements. This enhancement improves the String Compress and String Inflate intrinsics by exploiting vector instructions, when available. For long strings, up to 2x performance improvement has been observed in micro-benchmarks. Special care was taken to preserve good performance for short strings. All examined workloads showed a high ratio of short and very short strings. Thank you! Lutz Dr. Lutz Schmidt | SAP JVM | PI SAP CP Core | T: +49 (6227) 7-42834 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Oct 27 15:50:16 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 27 Oct 2017 08:50:16 -0700 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: References: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> Message-ID: <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com> I ran pre-integration testing with latest webrev.01 and it passed. But, give me more time to look though changes. Thanks, Vladimir On 10/25/17 7:29 AM, Roland Westrelin wrote: > > Hi Vladimir, > > Thanks for looking at this. > >> Did you consider less intrusive approach by adding branch over >> SafePoint with masking on index variable? >> >> int mask = LoopStripMiningMask * inc; // simplified >> for (int i = start; i < stop; i += inc) { >> // body >> if (i & mask != 0) continue; >> safepoint; >> } >> >> Or may be doing it inside .ad file in new SafePoint node >> implementation so that ideal graph is not affected. > > We're looking for the best trade off between latency and thoughput: we > want the safepoint poll overhead to be entirely eliminated even when the > safepoint doesn't trigger. > >> I am concern that suggested changes may affect Range Check elimination >> (you changed limit to variable value/flag) in addition to complexity >> of changes which may affect stability of C2. > > The CountedLoop that is created with my patch is strictly identical to > the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are > not changed at that time. They are left as they are today. The > difference, with loop strip mining, is that the counted loop has a > skeleton outer loop. The bounds of the counted loop are adjusted once > loop opts are over. If the counted loop has a predicate, the predicate > is moved out of loop just as it is today. The only difference with > today, is that the predicate should be moved out of the outer loop. If a > pre and post loop needs to be created, then the only difference with > today is that the clones need to be moved out of the outer loop and > logic that locate the pre from the main loop need to account for the > outer loop. > > It's obviously a complex change so if your primary concern is stability > then loop strip mining can be disabled by default. Assuming strip mining > off, then that patch is mostly some code refactoring and some logic that > never triggers. > > Roland. > From rwestrel at redhat.com Fri Oct 27 16:09:10 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 27 Oct 2017 18:09:10 +0200 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com> References: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com> Message-ID: > I ran pre-integration testing with latest webrev.01 and it passed. > But, give me more time to look though changes. Sure. Thanks for testing it. Roland. From maaartinus at gmail.com Fri Oct 27 19:46:06 2017 From: maaartinus at gmail.com (Martin Grajcar) Date: Fri, 27 Oct 2017 21:46:06 +0200 Subject: Vectorized Loop Unrolling on x64? In-Reply-To: References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com> <1302875736.3225693.1508835938207@mail.yahoo.com> <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com> <1933684779.3254078.1508840637072@mail.yahoo.com> <354890084.3509873.1508863577533@mail.yahoo.com> Message-ID: IIUIC the code on slide 90 is slow due to data dependencies as the only accumulator sum is the bottleneck. Some very long time ago, I played with unrolling it manually using multiple accumulators and gained a factor of maybe 3. But this is well-known, so I wonder what am I missing? IMHO there's no reason why sum += A[i] should be slower than B[i] += A[i] assuming a sufficient iteration count. On Tue, Oct 24, 2017 at 7:20 PM, Vladimir Sitnikov < sitnikov.vladimir at gmail.com> wrote: > Just in case, here's Vladimir Ivanov's vectorization talk: *http://2017.jpoint.ru/en/talks/vector-programming-in-java/ > * > Slide 89 describes sum misundervectorization. > > Vladimir > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Fri Oct 27 21:05:17 2017 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 27 Oct 2017 23:05:17 +0200 Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to jweak values Message-ID: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com> Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set). Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading. https://bugs.openjdk.java.net/browse/JDK-8188102 http://cr.openjdk.java.net/~dnsimon/8188102/ From vladimir.kozlov at oracle.com Fri Oct 27 21:19:45 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 27 Oct 2017 14:19:45 -0700 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com> References: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com> Message-ID: <0972a0db-2115-daa3-9990-7d58915a74a5@oracle.com> First observations. src/hotspot/share/opto/c2_globals.hpp We have uint and int types for flags now. Don't use uintx, which is 64-bit. src/hotspot/share/runtime/arguments.cpp I agree that UseCountedLoopSafepoints should enable strip mining by default. I am concern about enabling UseCountedLoopSafepoints by default. I will look on performance data late. But for regular/nightly testing we need to add special testing with it on and off. src/hotspot/share/opto/loopnode.hpp Should we just make _loop_flags field type uint (32-bit) since we hit 16-bit limit? There is confusion (because you did not have enough bits?) about which loops are marked as strip_mined. I thought it is only inner loop but it looks like out (skeleton) loop also marked as such. I would suggest to mark them differently. I was thinking may be we should create new Loop node subclass for outer loop. Then you don't need special flag for it and it will be obvious what they are in Ideal Graph. The same for outer loop end node. src/hotspot/share/opto/superword.cpp Where next change come from? + if (t2->Opcode() == Op_AddI && t2 == _lp->as_CountedLoop()->incr()) continue; // don't mess with the iv Thanks, Vladimir On 10/27/17 8:50 AM, Vladimir Kozlov wrote: > I ran pre-integration testing with latest webrev.01 and it passed. > But, give me more time to look though changes. > > Thanks, > Vladimir > > On 10/25/17 7:29 AM, Roland Westrelin wrote: >> >> Hi Vladimir, >> >> Thanks for looking at this. >> >>> Did you consider less intrusive approach by adding branch over >>> SafePoint with masking on index variable? >>> >>> ??? int mask = LoopStripMiningMask * inc; // simplified >>> ??? for (int i = start; i < stop; i += inc) { >>> ?????? // body >>> ?????? if (i & mask != 0) continue; >>> ?????? safepoint; >>> ??? } >>> >>> Or may be doing it inside .ad file in new SafePoint node >>> implementation so that ideal graph is not affected. >> >> We're looking for the best trade off between latency and thoughput: we >> want the safepoint poll overhead to be entirely eliminated even when the >> safepoint doesn't trigger. >> >>> I am concern that suggested changes may affect Range Check elimination >>> (you changed limit to variable value/flag) in addition to complexity >>> of changes which may affect stability of C2. >> >> The CountedLoop that is created with my patch is strictly identical to >> the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are >> not changed at that time. They are left as they are today. The >> difference, with loop strip mining, is that the counted loop has a >> skeleton outer loop. The bounds of the counted loop are adjusted once >> loop opts are over. If the counted loop has a predicate, the predicate >> is moved out of loop just as it is today. The only difference with >> today, is that the predicate should be moved out of the outer loop. If a >> pre and post loop needs to be created, then the only difference with >> today is that the clones need to be moved out of the outer loop and >> logic that locate the pre from the main loop need to account for the >> outer loop. >> >> It's obviously a complex change so if your primary concern is stability >> then loop strip mining can be disabled by default. Assuming strip mining >> off, then that patch is mostly some code refactoring and some logic that >> never triggers. >> >> Roland. >> From vladimir.kozlov at oracle.com Fri Oct 27 21:31:07 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 27 Oct 2017 14:31:07 -0700 Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to jweak values In-Reply-To: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com> References: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com> Message-ID: CCing to GC group too. Would be nice to run Hotspot testing with Graal as JIT. Katya, can you help with it? Thanks, Vladimir On 10/27/17 2:05 PM, Doug Simon wrote: > Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set). > > Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading. > > https://bugs.openjdk.java.net/browse/JDK-8188102 > http://cr.openjdk.java.net/~dnsimon/8188102/ > From Derek.White at cavium.com Fri Oct 27 22:31:30 2017 From: Derek.White at cavium.com (White, Derek) Date: Fri, 27 Oct 2017 22:31:30 +0000 Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic In-Reply-To: References: Message-ID: Hi Dmitry, The code looks good. I have one suggestion for MacroAssembler::kernel_crc32(). It's a matter of taste, so it really is just a suggestion: - The use of temp registers in the UseCRC32 case is kind of muddled, using tmp, and table0..table3 as temp registers, and the name "table" is confusing in this case. - Maybe it would be cleaner to refactor the UseCRC32 code into a separate kernel_crc32_using_crc32() subroutine (static or macro?). This would accept the main args and 4 registers for temps. The caller can supply some combination of table or tmp registers. - This would shrink the size of kernel_crc32() by a lot too. - The next person to touch the UseNeon code could factor that out as well ?? This obviously would apply to kernel_crc32c as well. Thanks! - Derek > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev- > bounces at openjdk.java.net] On Behalf Of Dmitry Chuyko > Sent: Wednesday, October 11, 2017 12:31 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic > > Hello, > > Please review an improvement of CRC32 calculation on AArch64. > > MacroAssembler::kernel_crc32 gets table registers that are not used on > -XX:+UseCRC32 path. They can be used to make neighbor loads and CRC > calculations independent. Adding prologue and epilogue for main by-64 loop > makes it applicable starting from len=128 so additional by-32 loop is added > for smaller lengths. > > rfe: https://bugs.openjdk.java.net/browse/JDK-8189176 > webrev: http://cr.openjdk.java.net/~dchuyko/8189176/webrev.00/ > benchmark: > http://cr.openjdk.java.net/~dchuyko/8189176/crc32/CRC32Bench.java > > Results for T88 and A53 are good, but splitting pair loads may slow down > other CPUs so measurements on different HW are highly welcome. > > -Dmitry From aph at redhat.com Sat Oct 28 07:51:57 2017 From: aph at redhat.com (Andrew Haley) Date: Sat, 28 Oct 2017 08:51:57 +0100 Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic In-Reply-To: References: Message-ID: <035eedd5-8385-5f22-0316-0df784140442@redhat.com> On 11/10/17 17:30, Dmitry Chuyko wrote: > Results for T88 and A53 are good, but splitting pair loads may slow down > other CPUs so measurements on different HW are highly welcome. Ah, yes. OK, so I should do some measurements here. Please remind me offlist if I don't respond in a few days. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Sat Oct 28 07:52:33 2017 From: aph at redhat.com (Andrew Haley) Date: Sat, 28 Oct 2017 08:52:33 +0100 Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic In-Reply-To: References: Message-ID: On 27/10/17 23:31, White, Derek wrote: > - The use of temp registers in the UseCRC32 case is kind of muddled, using tmp, and table0..table3 as temp registers, and the name "table" is confusing in this case. > - Maybe it would be cleaner to refactor the UseCRC32 code into a separate kernel_crc32_using_crc32() subroutine (static or macro?). This would accept the main args and 4 registers for temps. The caller can supply some combination of table or tmp registers. > - This would shrink the size of kernel_crc32() by a lot too. That would be nice. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kim.barrett at oracle.com Sun Oct 29 22:29:51 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 29 Oct 2017 18:29:51 -0400 Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to jweak values In-Reply-To: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com> References: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com> Message-ID: [added hotspot-gc-dev to cc list] > On Oct 27, 2017, at 5:05 PM, Doug Simon wrote: > > Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set). > > Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading. > > https://bugs.openjdk.java.net/browse/JDK-8188102 > http://cr.openjdk.java.net/~dnsimon/8188102/ I didn't look at the .java, .py, or project files. ------------------------------------------------------------------------------ src/hotspot/share/jvmci/jvmciCompilerToVM.cpp 1061 nmethod* nm = cb->as_nmethod_or_null(); This appears to be dead code now. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1023 assert(Universe::heap()->is_gc_active(), "should only be called during gc"); ... 1036 if (!Universe::heap()->is_gc_active() && cause != NULL) 1037 cause->klass()->print_on(&ls); I was going to mention that lines 1036-1037 are missing braces around the if-body. However, those lines appear to be dead code, given the assertion on line 1023. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1504 bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) { ... 1506 oop installed_code = JNIHandles::resolve(_jvmci_installed_code); Resolving a weak reference can keep an otherwise dead referent alive. See JDK-8188055 for a discussion of the corresponding problem for j.l.r.Reference. Right now, I think JNIHandles doesn't provide a (public) solution to what I think is being attempted here that works for all collectors. There is in-progress work toward a solution, but it's just that, "in progress". As a (possibly interim) solution, a function like the following might be added to JNIHandles (put the definition near resolve_jweak). bool JNIHandles::is_global_weak_cleared(jweak handle) { assert(is_jweak(handle), "not a weak handle"); return guard_value(jweak_ref(handle)) == NULL; } (That's completely untested, and I haven't thought carefully about the name. And should get input from other GC folks on how to deal with this.) I *think* do_unloading_jvmci then becomes something like the following (again, completely untested) bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) { if (_jvmci_installed_code != NULL) { if (JNIHandles::is_global_weak_cleared(_jvmci_installed_code)) { if (_jvmci_installed_code_triggers_unloading) { make_unloaded(is_alive, NULL); return true; } else { clear_jvmci_installed_code(); } } } return false; } ------------------------------------------------------------------------------ From doug.simon at oracle.com Mon Oct 30 11:14:18 2017 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 30 Oct 2017 12:14:18 +0100 Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to jweak values In-Reply-To: References: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com> Message-ID: Hi Kim, Thanks for the detailed review. > On 29 Oct 2017, at 23:29, Kim Barrett wrote: > > [added hotspot-gc-dev to cc list] > >> On Oct 27, 2017, at 5:05 PM, Doug Simon wrote: >> >> Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set). >> >> Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading. >> >> https://bugs.openjdk.java.net/browse/JDK-8188102 >> http://cr.openjdk.java.net/~dnsimon/8188102/ > > I didn't look at the .java, .py, or project files. > > ------------------------------------------------------------------------------ > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp > 1061 nmethod* nm = cb->as_nmethod_or_null(); > > This appears to be dead code now. Indeed. > ------------------------------------------------------------------------------ > src/hotspot/share/code/nmethod.cpp > 1023 assert(Universe::heap()->is_gc_active(), "should only be called during gc"); > ... > 1036 if (!Universe::heap()->is_gc_active() && cause != NULL) > 1037 cause->klass()->print_on(&ls); > > I was going to mention that lines 1036-1037 are missing braces around > the if-body. However, those lines appear to be dead code, given the > assertion on line 1023. Good catch. That problem pre-dates this webrev but I will clean it up here. > ------------------------------------------------------------------------------ > src/hotspot/share/code/nmethod.cpp > 1504 bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) { > ... > 1506 oop installed_code = JNIHandles::resolve(_jvmci_installed_code); > > Resolving a weak reference can keep an otherwise dead referent alive. > See JDK-8188055 for a discussion of the corresponding problem for > j.l.r.Reference. > > Right now, I think JNIHandles doesn't provide a (public) solution to > what I think is being attempted here that works for all collectors. > There is in-progress work toward a solution, but it's just that, "in > progress". > > As a (possibly interim) solution, a function like the following might > be added to JNIHandles (put the definition near resolve_jweak). > > bool JNIHandles::is_global_weak_cleared(jweak handle) { > assert(is_jweak(handle), "not a weak handle"); > return guard_value(jweak_ref(handle)) == NULL; > } Adding JNIHandles::is_global_weak_cleared makes sense. I've put it the public section near destroy_weak_global instead of the private section where resolve_jweak is declared. > (That's completely untested, and I haven't thought carefully about the > name. And should get input from other GC folks on how to deal with > this.) I *think* do_unloading_jvmci then becomes something like the > following (again, completely untested) > > bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) { > if (_jvmci_installed_code != NULL) { > if (JNIHandles::is_global_weak_cleared(_jvmci_installed_code)) { > if (_jvmci_installed_code_triggers_unloading) { > make_unloaded(is_alive, NULL); > return true; > } else { > clear_jvmci_installed_code(); > } > } > } > return false; > } I think your change works but comes at the cost of potentially preventing nmethod unloading for 1 extra (full?) GC cycle. It assumes that jweak clearing occurs before nmethod scanning. Is that guaranteed? If not, then I think what we want is: bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) { if (_jvmci_installed_code != NULL) { bool cleared = JNIHandles::is_global_weak_cleared(_jvmci_installed_code); if (_jvmci_installed_code_triggers_unloading) { if (cleared) { // jweak reference processing has already cleared the referent make_unloaded(is_alive, NULL); return true; } else { oop installed_code = JNIHandles::resolve(_jvmci_installed_code); if (can_unload(is_alive, (oop*)&installed_code, unloading_occurred)) { return true; } } } else { if (cleared || !is_alive->do_object_b(JNIHandles::resolve(_jvmci_installed_code))) { clear_jvmci_installed_code(); } } } return false; } I've created a new webrev at http://cr.openjdk.java.net/~dnsimon/8188102_2. -Doug From tobias.hartmann at oracle.com Mon Oct 30 11:49:09 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 30 Oct 2017 12:49:09 +0100 Subject: [10] RFR(S): 8190351: InitialAndMaxUsageTest does not free allocated blob Message-ID: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8190351 http://cr.openjdk.java.net/~thartmann/8190351/webrev.00/ If the fillWithSize method bails out because bean.getUsage().getUsed() > CACHE_USAGE_COEF * maxSize, it does not add the just allocated blob to the list. Also, we start with allocating blobs of size 368 Mb which is too large for a default code cache size of 256 Mb. I've refactored the test and changed the allocation loop to start with blobs of size ~36 Mb. Thanks, Tobias From vladimir.kozlov at oracle.com Mon Oct 30 15:02:43 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 30 Oct 2017 08:02:43 -0700 Subject: [10] RFR(S): 8190351: InitialAndMaxUsageTest does not free allocated blob In-Reply-To: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com> References: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 10/30/17 4:49 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8190351 > http://cr.openjdk.java.net/~thartmann/8190351/webrev.00/ > > If the fillWithSize method bails out because bean.getUsage().getUsed() > > CACHE_USAGE_COEF * maxSize, it does not add the just allocated blob to > the list. Also, we start with allocating blobs of size 368 Mb which is > too large for a default code cache size of 256 Mb. > > I've refactored the test and changed the allocation loop to start with > blobs of size ~36 Mb. > > Thanks, > Tobias From tobias.hartmann at oracle.com Mon Oct 30 15:07:49 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 30 Oct 2017 16:07:49 +0100 Subject: [10] RFR(S): 8190351: InitialAndMaxUsageTest does not free allocated blob In-Reply-To: References: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com> Message-ID: <93d15054-61dc-1bfc-3503-23081521e49c@oracle.com> Thanks Vladimir! Best regards, Tobias On 30.10.2017 16:02, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 10/30/17 4:49 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8190351 >> http://cr.openjdk.java.net/~thartmann/8190351/webrev.00/ >> >> If the fillWithSize method bails out because bean.getUsage().getUsed() > CACHE_USAGE_COEF * maxSize, it does not add >> the just allocated blob to the list. Also, we start with allocating blobs of size 368 Mb which is too large for a >> default code cache size of 256 Mb. >> >> I've refactored the test and changed the allocation loop to start with blobs of size ~36 Mb. >> >> Thanks, >> Tobias From dmitrij.pochepko at bell-sw.com Mon Oct 30 15:42:35 2017 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 30 Oct 2017 18:42:35 +0300 Subject: [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays Message-ID: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> Hi, as part of JEP ?Improve performance of String and Array operations on AArch64? I wanted to send out a pre-review for some of the improved intrinsics to get early feedback. This is the first in a row. Please pre-review patch for 8187472 - ?AARCH64: array_equals intrinsic doesn't use prefetch for large arrays? which improves large array handling (small arrays are unaffected). In short, this patch uses large (64 byte) loop with prefetch instruction to handle large arrays, which is done in a stub. I can observe performance boost on systems without h/w prefetcher up to x6. System with hardware prefetching (Cortex A53 and some very modern ones) also benefit from this patch (15% improvement). I've tried a number of different versions (attached to JDK-8187472) with different load instructions (ldr/ldp/), slightly different code shapes, different data dependencies across registers, alignments, e.t.c. Version presented in webrev (version 2.6d from JDK-8187472 attachments) is the simplest from the fast ones (as measured on 3 systems available for testing). I've used this simple benchmark to measure performance: http://cr.openjdk.java.net/~dpochepk/8187472/ArrayEqualsBench.java Chart for ThunderX: http://cr.openjdk.java.net/~dpochepk/8187472/ThunderX.png Chart for Cortex A53(R-Pi): http://cr.openjdk.java.net/~dpochepk/8187472/R-Pi.png Raw numbers for ThunderX: http://cr.openjdk.java.net/~dpochepk/8187472/ThunderX.results.txt Raw numbers for R-Pi: http://cr.openjdk.java.net/~dpochepk/8187472/R-Pi.results.txt webrev: http://cr.openjdk.java.net/~dpochepk/8187472/webrev.01/ Testing: I've run existing jtreg test (java/util/Arrays/ArraysEqCmpTest.java) in both Xmixed and Xcomp and found no regressions. Any additional numbers on other systems are welcome, as well as early feedback on the code. Thanks, Dmitrij From aph at redhat.com Mon Oct 30 16:13:06 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 30 Oct 2017 16:13:06 +0000 Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> Message-ID: On 30/10/17 15:42, Dmitrij Pochepko wrote: > Any additional numbers on other systems are welcome, as well as early > feedback on the code. I take it that the small comparisons are unaffected. The small comparisons are very common, so they shouldn't be ignored. The patch seems unobjectionable, but it's extremely hard to test this stuff. Why is this change: @@ -16154,7 +16154,7 @@ ins_pipe(pipe_class_memory); %} -instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegI_R4 cnt, +instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegP_R4 cnt, iRegI_R0 result, rFlagsReg cr) %{ predicate(((StrEqualsNode*)n)->encoding() == StrIntrinsicNode::LL); It seems very odd to me. Was a vertor-based implementation considered? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Mon Oct 30 16:43:30 2017 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 30 Oct 2017 19:43:30 +0300 Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> Message-ID: On 30.10.2017 19:13, Andrew Haley wrote: > On 30/10/17 15:42, Dmitrij Pochepko wrote: >> Any additional numbers on other systems are welcome, as well as early >> feedback on the code. > I take it that the small comparisons are unaffected. The small > comparisons are very common, so they shouldn't be ignored. > > The patch seems unobjectionable, but it's extremely hard to test > this stuff. Well, I've actually used small brute force test which generates all cases for arrays length from 1 to N(parameter) to test it, because I couldn't find better way. i.e.: case 0: equal arrays case 1: arrays different in 1st symbol ... case N: arrays different in (N-1)th symbol And this test passed. However, I don't think such test should be added to jtreg testbase, because it takes long time to run, so, I assume existing array equals test is enough. > > Why is this change: > > @@ -16154,7 +16154,7 @@ > ins_pipe(pipe_class_memory); > %} > > -instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegI_R4 cnt, > +instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegP_R4 cnt, > iRegI_R0 result, rFlagsReg cr) > %{ > predicate(((StrEqualsNode*)n)->encoding() == StrIntrinsicNode::LL); > > It seems very odd to me. You're right. It's leftover from previous versions. It can be reverted back to iRegI_R4. > > Was a vertor-based implementation considered? > Yes. I've tried simd loads(even aligned ones to be sure that alignment is not an issue). simd versions were attached into JDK-8187472 as ?- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop iteration) ?- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration) ?- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration). I've measured it on ThunderX and found while best non-simd version handles 1000000 bytes arrays in ~295 microseconds, simd versions had numbers about ~355 microseconds. Thanks, Dmitrij From jamsheed.c.m at oracle.com Mon Oct 30 16:45:19 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Mon, 30 Oct 2017 22:15:19 +0530 Subject: [10] RFR: 8167409: Invalid value passed to critical JNI function Message-ID: Hi, request for review, jbs: https://bugs.openjdk.java.net/browse/JDK-8167409 webrev: http://cr.openjdk.java.net/~jcm/8167409/webrev.00/ (contributed by Ioannis Tsakpinis) desc: the tmp? reg used to break the shuffling cycle (handled in ComputeMoveOrder) is set to 64 bit. Best regards, Jamsheed From jamsheed.c.m at oracle.com Mon Oct 30 16:45:53 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Mon, 30 Oct 2017 22:15:53 +0530 Subject: [10] JBS: 8167408: Invalid critical JNI function lookup Message-ID: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com> Hi, request for review, jbs : https://bugs.openjdk.java.net/browse/JDK-8167408 webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/ (contributed by Ioannis Tsakpinis) desc: -- it starts with JavaCritical_ instead of Java_; -- it does not have extra JNIEnv* and jclass arguments; -- Java arrays are passed in two arguments: the first is an array length, and the second is a pointer to raw array data. That is, no need to call GetArrayElements and friends, you can instantly use a direct array pointer. updated arg_size calculation wrt above points. Best regards, Jamsheed From rwestrel at redhat.com Mon Oct 30 17:02:03 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 30 Oct 2017 18:02:03 +0100 Subject: RFR(L): 8186027: C2: loop strip mining In-Reply-To: <0972a0db-2115-daa3-9990-7d58915a74a5@oracle.com> References: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com> <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com> <0972a0db-2115-daa3-9990-7d58915a74a5@oracle.com> Message-ID: Hi Vladimir, > Should we just make _loop_flags field type uint (32-bit) since we hit 16-bit limit? We don't hit the limit with this change. I have some other changes for which I had to change _loop_flags to uint. That's where the int -> uint tweaks are coming from. I can remove them if you like as they are not required. Sorry for the confusion. > There is confusion (because you did not have enough bits?) about which loops are marked as > strip_mined. I thought it is only inner loop but it looks like out (skeleton) loop also marked as > such. I would suggest to mark them differently. The way it works currently is: Opcode() == Op_Loop && is_strip_mined() => outer loop Opcode() == Op_CountedLoop && is_strip_mined() => inner loop The outer loop can't be transformed to a counted loop so that scheme shouldn't break. > I was thinking may be we should create new Loop node subclass for outer loop. Then you don't need > special flag for it and it will be obvious what they are in Ideal Graph. The same for outer loop end > node. Ok. That sounds like it could clean up the code a bit. Do you want me to look into that? > src/hotspot/share/opto/superword.cpp > > Where next change come from? > > + if (t2->Opcode() == Op_AddI && t2 == _lp->as_CountedLoop()->incr()) continue; // don't mess > with the iv I saw a few cases where t2 is the increment of the CountedLoop iv. SuperWord::opnd_positions_match() then swaps the edges of the AddI and later CountedLoopEndNode::phi() fails because the edges of the iv's AddI are not in the expected order anymore. Roland. From glaubitz at physik.fu-berlin.de Sun Oct 15 06:09:19 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Sun, 15 Oct 2017 06:09:19 -0000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> Hi Roman! Please let me look at SPARC next week first before merging this. And thanks for notifying me that Zero is broken again *sigh*. People, please test your changes. Yes, I know you all just care about Hotspot. But please understand that there are many people out there who rely on Zero, i.e. they are using it. Breaking code that people actively use is not nice and should not happen in a project like OpenJDK. Building Zero takes maybe 5 minutes on a fast x86 machine, so I would like to ask everyone to please test their changes against Zero as well. These tests will keep the headaches for people relying on Zero low and also avoids that distributions have to ship many patches on top of OpenJDK upstream. If you cannot test your patch on a given platform X, please let me know. I have access to every platform supported by OpenJDK except AIX/PPC. Thanks, Adrian > On Oct 15, 2017, at 12:41 AM, Roman Kennke wrote: > > The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. > > What I have done: > > grep -i -R shark src > grep -i -R shark make > grep -i -R shark doc > grep -i -R shark doc > > and purged any reference to shark. Almost everything was straightforward. > > The only things I wasn't really sure of: > > - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope that's good? > - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing to do. If not, what *would* be the right thing? > > Then of course I did: > > rm -rf src/hotspot/share/shark > > I also went through the build machinery and removed stuff related to Shark and LLVM libs. > > Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) > > I tested by building a regular x86 JVM and running JTREG tests. All looks fine. > > - I could not build zero because it seems broken because of the recent Atomic::* changes > - I could not test any of the other arches that seemed to reference Shark (arm and sparc) > > Here's the full webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ > > Can I get a review on this? > > Thanks, Roman From vparfinenko at excelsior-usa.com Fri Oct 27 09:26:04 2017 From: vparfinenko at excelsior-usa.com (Vladimir Parfinenko) Date: Fri, 27 Oct 2017 16:26:04 +0700 Subject: Bug in HS interpreter: invokeinterface calls non-public method Message-ID: Hi all, I think I have found a bug in HotSpot interpreter. The problems happens while invokeinterface of public method from java.lang.Object (e.g. hashCode()) in case when the actual method implementation is non-public (e.g. protected). JVMS tells the following about invokeinterface instruction: Otherwise, if step 1 or step 2 of the lookup procedure selects a method that is not public, invokeinterface throws an IllegalAccessError. However in some cases HS interpreter ignores this access check and invokes non-public method. Minimal example using jasm from asmtools is attached below. Compiling and running it gives the following: $ jasm BadImpl.jasm && javac Caller.java $ java -Xint Caller Should pass: Should throw IAE: Exception in thread "main" java.lang.RuntimeException: protected hashCode was called at BadImpl.hashCode(BadImpl.jasm) at Caller.main(Caller.java:11) $ java -Xcomp Caller Should pass: Should throw IAE: Exception in thread "main" java.lang.IllegalAccessError: BadImpl.hashCode()I at Caller.main(Caller.java:11) Note that first invocation ("Should pass") is necessary to reproduce the problem. If you remove it everything works as expected. Regards, Vladimir Parfinenko ----------------------- Caller.java ----------------------- public class Caller { public static void main(String[] args) { Interf x; System.out.println("Should pass:"); x = new GoodImpl(); x.hashCode(); System.out.println("Should throw IAE:"); x = new BadImpl(); x.hashCode(); } } interface Interf { @Override int hashCode(); } class GoodImpl implements Interf { } ----------------------- Caller.java ----------------------- ----------------------- BadImpl.jasm ----------------------- super class BadImpl implements Interf { Method "":"()V" stack 1 locals 1 { aload_0; invokespecial Method java/lang/Object."":"()V"; return; } // override of Object method with protected one, javac doesn't allow this protected Method hashCode:"()I" stack 3 locals 1 { new class java/lang/RuntimeException; dup; ldc String "protected hashCode was called"; invokespecial Method java/lang/RuntimeException."":"(Ljava/lang/String;)V"; athrow; } } ----------------------- BadImpl.jasm ----------------------- From rwestrel at redhat.com Mon Oct 30 17:02:55 2017 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 30 Oct 2017 18:02:55 +0100 Subject: RFR(S): 8186125: "DU iteration must converge quickly" assert in split if with unsafe accesses In-Reply-To: References: Message-ID: Anyone to review this fix? Roland. > http://cr.openjdk.java.net/~roland/8186125/webrev.00/ > > Split if is missing support for graph shapes with the Opaque4Node that > was introduced for unsafe accesses by JDK-8176506. > > In the test case, the 2 Unsafe accesses share a single Opaque4Node > before the if. When split if encounters the Cmp->Bol->Opaque4->If chain, > it only tries to clone Cmp->Bol when it should clone Cmp->Bol->Opaque4 > to make one copy for each If. > > Roland. From aph at redhat.com Mon Oct 30 17:30:36 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 30 Oct 2017 17:30:36 +0000 Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> Message-ID: <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com> On 30/10/17 16:43, Dmitrij Pochepko wrote: > I've tried simd loads(even aligned ones to be sure that alignment is not > an issue). simd versions were attached into JDK-8187472 as > ?- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop > iteration) > ?- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration) > ?- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration). > > I've measured it on ThunderX and found while best non-simd version > handles 1000000 bytes arrays in ~295 microseconds, simd versions had > numbers about ~355 microseconds. I'm rather reluctant to accept non-SIMD intrinsics because I expect SIMD performance to improve, and I expect SIMD to be the future. The same is true of implementations which avoid the use of ldp. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Mon Oct 30 18:03:54 2017 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 30 Oct 2017 21:03:54 +0300 Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com> References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com> Message-ID: <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com> On 30.10.2017 20:30, Andrew Haley wrote: > On 30/10/17 16:43, Dmitrij Pochepko wrote: >> I've tried simd loads(even aligned ones to be sure that alignment is not >> an issue). simd versions were attached into JDK-8187472 as >> ?- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop >> iteration) >> ?- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration) >> ?- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration). >> >> I've measured it on ThunderX and found while best non-simd version >> handles 1000000 bytes arrays in ~295 microseconds, simd versions had >> numbers about ~355 microseconds. > I'm rather reluctant to accept non-SIMD intrinsics because I expect > SIMD performance to improve, and I expect SIMD to be the future. The > same is true of implementations which avoid the use of ldp. > I also expected NEON to be faster on very new designs. Since I have a SIMD version of this intrinsic that I can merge into stub under an if with new option (like UseSIMDForArrayEquals with default value set to false, almost the same as existing UseSIMDForMemoryOps, which is used in array copy intrinsic) if you want, but it is slower for the CPUs we have access to and likely not going to be the default. This way we'll have a fast version and a SIMD version. I am hesitant if it is best to do this, or keep a single, simple, and fastest version for now for this intrinsic, and get back to it when SVE becomes widely available. What do you think? Note that other intrinsics that are in the works will use SIMD. Thanks, Dmitrij From aph at redhat.com Mon Oct 30 18:06:40 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 30 Oct 2017 18:06:40 +0000 Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com> References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com> <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com> Message-ID: <47fb00b1-c51a-03d8-83f8-9c7cbd436f74@redhat.com> On 30/10/17 18:03, Dmitrij Pochepko wrote: > I am hesitant if it is best to do this, or keep a single, simple, and > fastest version for now for this intrinsic, and get back to it when SVE > becomes widely available. > > What do you think? Do it now, or we'll have merge problems later. > Note that other intrinsics that are in the works will use SIMD. OK, thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Mon Oct 30 18:20:06 2017 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 30 Oct 2017 21:20:06 +0300 Subject: [10] RFR: 8189101 - AARCH64: AARCH64: string compare intrinsic doesn't use prefetch Message-ID: Hi, this is a second pre-review as part of JEP ?Improve performance of String and Array operations on AArch64? for another improved intrinsics to get early feedback. Please pre-review patch for 8189101 - ?AARCH64: AARCH64: string compare intrinsic doesn't use prefetch? This patch moves code for long string processing to a stub and reorganize it. For large strings code was re-organized, added large 64-byte unrolled loops and prefetch. Webrev is available at [1]. Surpisingly, it helps a bit for small strings, because code for string comparison node is now shorter, so, less icache lines needed to be populated to execute it. A benchmark was developed to measure performance [2], which contains 4 cases with various sizes: LL (latin1 vs latin1), LU (latin1 vs utf), UL (utf vs latin1) and UU (utf vs utf). I can see up to x5 performance on systems without h/w prefetcher (ThunderX) and up to 40% improvement on system with h/w prefetcher(Cortex A53). Raw performance numbers are at [3]. Charts for performance numbers above are: Cortex A53 [4] and ThunderX [5]. Testing: I've run java/lang/String (contains test for String::compareTo method) jtreg tests with both Xmixed and Xcomp modes and found no regressions. Any additional numbers on other systems are welcome, as well as early feedback on the code. [1] http://cr.openjdk.java.net/~dpochepk/8189101/webrev/ [2] http://cr.openjdk.java.net/~dpochepk/8189101/StringCompareBench.java [3] http://cr.openjdk.java.net/~dpochepk/8189101/strCmp_T88.txt and http://cr.openjdk.java.net/~dpochepk/8189101/strCmp_RPi.txt [4] http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_LL.png http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_LU.png http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_UL.png and http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_UU.png [5] http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_LL.png http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_UL.png http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_LU.png and http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_UU.png Thanks, Dmitrij -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrij.pochepko at bell-sw.com Mon Oct 30 19:18:45 2017 From: dmitrij.pochepko at bell-sw.com (dmitrij.pochepko at bell-sw.com) Date: Mon, 30 Oct 2017 22:18:45 +0300 Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays In-Reply-To: <47fb00b1-c51a-03d8-83f8-9c7cbd436f74@redhat.com> References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com> <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com> <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com> <47fb00b1-c51a-03d8-83f8-9c7cbd436f74@redhat.com> Message-ID: <47181509391125@web22j.yandex.ru> An HTML attachment was scrubbed... URL: From dean.long at oracle.com Mon Oct 30 20:48:46 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 30 Oct 2017 13:48:46 -0700 Subject: [10] JBS: 8167408: Invalid critical JNI function lookup In-Reply-To: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com> References: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com> Message-ID: <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com> I think you need a native test for Windows x86 that defines JavaCritical methods with various signatures (especially arrays) to make sure this is working correctly. dl On 10/30/17 9:45 AM, jamsheed wrote: > Hi, > > request for review, > > jbs : https://bugs.openjdk.java.net/browse/JDK-8167408 > > webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/ > > (contributed by Ioannis Tsakpinis) > > desc: > > -- it starts with JavaCritical_ instead of Java_; > -- it does not have extra JNIEnv* and jclass arguments; > -- Java arrays are passed in two arguments: the first is an array > length, and the second is a pointer to raw array data. That is, no > need to call GetArrayElements and friends, you can instantly use a > direct array pointer. > > updated arg_size calculation wrt above points. > > Best regards, > > Jamsheed > From dean.long at oracle.com Mon Oct 30 21:30:37 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 30 Oct 2017 14:30:37 -0700 Subject: [10] RFR: 8167409: Invalid value passed to critical JNI function In-Reply-To: References: Message-ID: Hi Jamsheed.? Do you have a test for this? dl On 10/30/17 9:45 AM, jamsheed wrote: > Hi, > > request for review, > > jbs: https://bugs.openjdk.java.net/browse/JDK-8167409 > > webrev: http://cr.openjdk.java.net/~jcm/8167409/webrev.00/ > > (contributed by Ioannis Tsakpinis) > > desc: the tmp? reg used to break the shuffling cycle (handled in > ComputeMoveOrder) > > is set to 64 bit. > > Best regards, > > Jamsheed > > From dmitry.chuyko at bell-sw.com Tue Oct 31 16:01:09 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 31 Oct 2017 19:01:09 +0300 Subject: [10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter and C1 Message-ID: Hello, Please review an improvement of CRC32C calculation on AArch64. The implementation is based on JDK-8155162 [1] and the code for CRC32. Intrinsics for array / byte buffer and direct byte buffer are enabled in C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and calls StubRoutines::updateBytesCRC32C(). Template interpreter now also generates TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it calculates parameters and jumps to StubRoutines::updateBytesCRC32C(). rfe: https://bugs.openjdk.java.net/browse/JDK-8189745 webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/ benchmark: http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost in interpreter. For testing I made comparison of CRC32C result sets in C1 and interpreter for both array and direct byte buffer with zero and non-zero offset. -Dmitry [1] https://bugs.openjdk.java.net/browse/JDK-8155162 [2] https://bugs.openjdk.java.net/browse/JDK-8189745?focusedCommentId=14127141&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14127141 From aph at redhat.com Tue Oct 31 17:25:58 2017 From: aph at redhat.com (Andrew Haley) Date: Tue, 31 Oct 2017 17:25:58 +0000 Subject: [10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter and C1 In-Reply-To: References: Message-ID: Hi, On 31/10/17 16:01, Dmitry Chuyko wrote: > Please review an improvement of CRC32C calculation on AArch64. The > implementation is based on JDK-8155162 [1] and the code for CRC32. > > Intrinsics for array / byte buffer and direct byte buffer are enabled in > C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and > calls StubRoutines::updateBytesCRC32C(). > Template interpreter now also generates > TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it > calculates parameters and jumps to StubRoutines::updateBytesCRC32C(). > > rfe: https://bugs.openjdk.java.net/browse/JDK-8189745 > webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/ > benchmark: > http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java > > Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost > in interpreter. > > For testing I made comparison of CRC32C result sets in C1 and > interpreter for both array and direct byte buffer with zero and non-zero > offset. That looks good to me, thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vitalyd at gmail.com Tue Oct 31 18:08:44 2017 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 31 Oct 2017 14:08:44 -0400 Subject: 8u144 hotspot fails to reach safepoint due to compiler thread - VM frozen Message-ID: Hi guys, I have some colleagues who appear to be running into https://bugs.openjdk.java.net/browse/JDK-8059128 on Oracle JDK 8u144 (Linux, x86-64). Naturally, there's no reproducer but they've seen this happen several times in the last couple of months. The symptom is the JVM becomes unresponsive - the application is not servicing any traffic, and jstack doesn't work without the force option. jstack output (with native frames) captured some time apart shows the compiler thread either in Parse::do_all_blocks -> do_one_block -> do_one_bytecode -> ... InstanceKlass::has_finalizable_subclass -> Dependencies::find_finalizable_subclass or ... Dependencies::has_finalizable_subclass() -> Klass::next_sibling() I see that 8059128 was closed as Incomplete, but it does look like there's a real issue here. Has anyone looked into this further or has any new thoughts/ideas? My understanding is the working theory is it's related to some data race between class unloading and the compiler thread observing an inconsistent (corrupt?) type hierarchy. I see https://bugs.openjdk.java.net/browse/JDK-8114823 is also noted as possibly related - the app we're having trouble with is using G1, but class unloading isn't disabled of course. Is there some work around to reduce the likelihood of having the compiler thread and GC cross paths like this? Let me know if you need more info. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamsheed.c.m at oracle.com Tue Oct 31 19:37:51 2017 From: jamsheed.c.m at oracle.com (jamsheed) Date: Wed, 1 Nov 2017 01:07:51 +0530 Subject: [10] JBS: 8167408: Invalid critical JNI function lookup In-Reply-To: <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com> References: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com> <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com> Message-ID: Hi Dean, Thank you for the review, tested with a test case, previously it was not working for windows-x86, now it works. revised webrev with test case:http://cr.openjdk.java.net/~jcm/8167408/webrev.01/ Best regards, Jamsheed On Tuesday 31 October 2017 02:18 AM, dean.long at oracle.com wrote: > I think you need a native test for Windows x86 that defines > JavaCritical methods with various signatures (especially arrays) to > make sure this is working correctly. > > dl > > > On 10/30/17 9:45 AM, jamsheed wrote: >> Hi, >> >> request for review, >> >> jbs : https://bugs.openjdk.java.net/browse/JDK-8167408 >> >> webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/ >> >> (contributed by Ioannis Tsakpinis) >> >> desc: >> >> -- it starts with JavaCritical_ instead of Java_; >> -- it does not have extra JNIEnv* and jclass arguments; >> -- Java arrays are passed in two arguments: the first is an array >> length, and the second is a pointer to raw array data. That is, no >> need to call GetArrayElements and friends, you can instantly use a >> direct array pointer. >> >> updated arg_size calculation wrt above points. >> >> Best regards, >> >> Jamsheed >> > From ionutb83 at yahoo.com Tue Oct 31 21:59:07 2017 From: ionutb83 at yahoo.com (Ionut) Date: Tue, 31 Oct 2017 21:59:07 +0000 (UTC) Subject: Sum of integers optimization References: <345880303.38177.1509487147494.ref@mail.yahoo.com> Message-ID: <345880303.38177.1509487147494@mail.yahoo.com> Hello All, ? I am playing with below example (very trivial, just computing a sum of 1...N integers): @Benchmarkpublic long sum() {? ? long sum = 0;? ? for (int i = 1; i <= N; i++) { ? sum += i; }? ? return sum;} Generated asm on my machine (snapshot from the main scalar loop):? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ......................................................? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 0x00007f4779bff060: movsxd r10,r11d? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f4779bff063: add? ? rax,r10 ? 7.67%? ?24.83% ?? 0x00007f4779bff066: add? ? rax,r10? 6.11%? ? 3.64%? ?? 0x00007f4779bff069: add? ? rax,r10? 4.54%? ? 3.71%? ?? 0x00007f4779bff06c: add? ? rax,r10? 6.12%? ? 5.85%? ?? 0x00007f4779bff06f: add? ? rax,r10? 5.75%? ? 4.21%? ?? 0x00007f4779bff072: add? ? rax,r10? 5.96%? ? 4.38%? ?? 0x00007f4779bff075: add? ? rax,r10? 4.23%? ? 3.63%? ?? 0x00007f4779bff078: add? ? rax,r10? 6.70%? ? 6.32%? ?? 0x00007f4779bff07b: add? ? rax,r10? 7.40%? ? 4.56%? ?? 0x00007f4779bff07e: add? ? rax,r10? 4.61%? ? 3.31%? ?? 0x00007f4779bff081: add? ? rax,r10? 5.45%? ? 5.24%? ?? 0x00007f4779bff084: add? ? rax,r10? 5.99%? ? 5.14%? ?? 0x00007f4779bff087: add? ? rax,r10? 7.70%? ? 5.36%? ?? 0x00007f4779bff08a: add? ? rax,r10? 5.17%? ? 4.16%? ?? 0x00007f4779bff08d: add? ? rax,r10? 3.97%? ? 3.83%? ?? 0x00007f4779bff090: add? ? rax,r10? 4.80%? ? 3.97%? ?? 0x00007f4779bff093: add? ? rax,0x78? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 5.92%? ? 5.97%? ?? 0x00007f4779bff097: add? ? r11d,0x10? ?? ? ? ? ? ?0.01%? ? ? ?? 0x00007f4779bff09b: cmp? ? r11d,0x5f5e0f2 ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 0x00007f4779bff0a2: jl? ? ?0x00007f4779bff060?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ...................................................... Questions:?- Would it be possible for JIT C2 to perform a better optimization in this context, as for example replacing the main loop (which might be costly) by a reduction formula as N*(N-1)/2 (in this specific case)??- Is there any context where JIT C2 can perform such optimization but I am missing??- If not, what prevents it for doing this? ThanksIonut -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Oct 31 22:28:55 2017 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 1 Nov 2017 01:28:55 +0300 Subject: [10] JBS: 8167408: Invalid critical JNI function lookup In-Reply-To: References: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com> <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com> Message-ID: Jamsheed, nice test! 2 suggestions: (1) Enable the test on all platforms: though the bug is platform-specific, it doesn't mean the test should be. I don't see any platform-specific code there and it's beneficial to test other platforms as well (2) Add some test cases with multiple array parameters. Otherwise, looks good. Best regards, Vladimir Ivanov On 10/31/17 10:37 PM, jamsheed wrote: > Hi Dean, > > Thank you for the review, > > tested with a test case, previously it was not working for windows-x86, > now it works. > > revised webrev with test > case:http://cr.openjdk.java.net/~jcm/8167408/webrev.01/ > > Best regards, > > Jamsheed > > > On Tuesday 31 October 2017 02:18 AM, dean.long at oracle.com wrote: >> I think you need a native test for Windows x86 that defines >> JavaCritical methods with various signatures (especially arrays) to >> make sure this is working correctly. >> >> dl >> >> >> On 10/30/17 9:45 AM, jamsheed wrote: >>> Hi, >>> >>> request for review, >>> >>> jbs : https://bugs.openjdk.java.net/browse/JDK-8167408 >>> >>> webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/ >>> >>> (contributed by Ioannis Tsakpinis) >>> >>> desc: >>> >>> -- it starts with JavaCritical_ instead of Java_; >>> -- it does not have extra JNIEnv* and jclass arguments; >>> -- Java arrays are passed in two arguments: the first is an array >>> length, and the second is a pointer to raw array data. That is, no >>> need to call GetArrayElements and friends, you can instantly use a >>> direct array pointer. >>> >>> updated arg_size calculation wrt above points. >>> >>> Best regards, >>> >>> Jamsheed >>> >> > From igor.ignatyev at oracle.com Tue Oct 31 23:30:33 2017 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 31 Oct 2017 16:30:33 -0700 Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java doesn't have timeout and hang on windows In-Reply-To: <79997CB7-FF94-4354-BC7E-8CE5B73BDC10@oracle.com> References: <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com> <79997CB7-FF94-4354-BC7E-8CE5B73BDC10@oracle.com> Message-ID: got an off-list Review from Jepser (cc'ed). Thank you Jesper! -- Igor > On Oct 26, 2017, at 7:44 PM, Igor Ignatyev wrote: > > Katya, thank you reviewing it. > > can I have another review for this patch from a Reviewer? > > Thanks, > -- Igor >> On Oct 26, 2017, at 5:40 PM, Ekaterina Pavlova wrote: >> >> Looks good. >> >> Thanks for fixing it, >> >> -katya >> >> On 10/17/17 9:45 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html >>>> 546 lines changed: 188 ins; 88 del; 270 mod; >>> Hi all, >>> could you please review this fix for ctw test? >>> in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution. >>> the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows. >>> webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html >>> testing: applications/ctw/modules tests >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8186618 >>> Thanks, >>> -- Igor >> > From rednaxelafx at gmail.com Tue Oct 31 23:42:56 2017 From: rednaxelafx at gmail.com (Krystal Mok) Date: Tue, 31 Oct 2017 16:42:56 -0700 Subject: Sum of integers optimization In-Reply-To: <345880303.38177.1509487147494@mail.yahoo.com> References: <345880303.38177.1509487147494.ref@mail.yahoo.com> <345880303.38177.1509487147494@mail.yahoo.com> Message-ID: Hi Ionut, tl;dr: C2's infrastructure for optimizing loops can be made a lot stronger, but from the current directions we can see around the OpenJDK community, it's very unlikely for C2 to receive a major infrastructural upgrade in the future. If you'd like to contribute to Graal to help optimize this kind of code, I'm sure a lot of us in the community would love that. You're right about the code produced by C2. Just ran your example on JDK9/macOS and the main loop produced by C2 is: 0x0000000118ee6640: movslq %r11d,%r10 ;*i2l {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 12 (line 7) 0x0000000118ee6643: add %r10,%rax 0x0000000118ee6646: add %r10,%rax 0x0000000118ee6649: add %r10,%rax 0x0000000118ee664c: add %r10,%rax 0x0000000118ee664f: add %r10,%rax 0x0000000118ee6652: add %r10,%rax 0x0000000118ee6655: add %r10,%rax 0x0000000118ee6658: add %r10,%rax 0x0000000118ee665b: add %r10,%rax 0x0000000118ee665e: add %r10,%rax 0x0000000118ee6661: add %r10,%rax 0x0000000118ee6664: add %r10,%rax 0x0000000118ee6667: add %r10,%rax 0x0000000118ee666a: add %r10,%rax 0x0000000118ee666d: add %r10,%rax 0x0000000118ee6670: add %r10,%rax 0x0000000118ee6673: add $0x78,%rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 13 (line 7) 0x0000000118ee6677: add $0x10,%r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 15 (line 6) Pretty much the same as what you saw. It's certainly possible to tweak C2 or some other JIT compiler to make it more optimized for this test case. I don't have a copy of Zing right now but I believe its Falcon compiler will compile this down to the N*(N-1)/2 form that you expected, since the LLVM it's based on can compile this piece of C code: #include int64_t sum(int n) { int64_t sum = 0; for (int32_t i = 1; i <= n; i++) { sum += i; } return sum; } Down to: sum: test edi, edi jle .LBB0_1 lea eax, [rdi - 1] add edi, -2 imul rdi, rax shr rdi lea rax, [rdi + 2*rax + 1] ret .LBB0_1: xor eax, eax ret For this test case, C2 could at least do a few things to generate better code: 1. A better expression canonicalizer that flattens expression trees. The chain of adds you see in the resulting code is because of the 16x unrolled sum += 1 is turned into: // 120 == 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 sum = ((((((((((((((((sum + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + i) + 120 See how the additions involving i are skewed to the left, effectively degenerating an expression tree into a "linked list of additions". C2's value number, on its own, doesn't recognize that it can reassociate the expression into a flatter tree, e.g. ((((i + i) + (i + i)) + ((i + i) + (i + i))) + (((i + i) + (i + i)) + ((i + i) + (i + i)))) + sum + 120 in which case C2's value number will be able to turn into: t1 = i + i t2 = t1 + t1 t3 = t2 + t2 t4 = t3 + t3 sum = t4 + sum + 120 and then into sum = (i << 4) + sum + 120. This kind of reassociation will at least help make the loop body better, without involving any complicated loop optimizations. The "tree flattening" reassociation can actually be implemented by directly linearizing an expression tree into a C0*X + C1*Y + ... + C2 form. To get to the end goal of optimizing the whole loop into the N*(N-1)/2 form, you'd need more advanced loop analysis, e.g. something akin to LLVM's SCEV, to recognize how "sum" is related to the loop induction variable. BTW, Graal from graalvm-0.22 generates a straightforward loop for this case: XYZ.sum (null) [0x000000010cc091e0, 0x000000010cc09230] 80 bytes [Entry Point] [Verified Entry Point] [Constants] # {method} {0x000000011ebfd768} 'sum' '()J' in 'XYZ' # [sp+0x10] (sp of caller) 0x000000010cc091e0: nopl 0x0(%rax,%rax,1) 0x000000010cc091e5: mov $0x1,%r10d 0x000000010cc091eb: mov $0x0,%rax 0x000000010cc091f2: jmpq 0x000000010cc0920f 0x000000010cc091f7: nopw 0x0(%rax,%rax,1) ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 7 (line 6) 0x000000010cc09200: mov %r10d,%r11d 0x000000010cc09203: inc %r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 15 (line 6) 0x000000010cc09206: movslq %r10d,%r10 ;*i2l {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 12 (line 7) 0x000000010cc09209: add %r10,%rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 13 (line 7) 0x000000010cc0920c: mov %r11d,%r10d ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 18 (line 6) 0x000000010cc0920f: cmp $0x186a1,%r10d 0x000000010cc09216: jl 0x000000010cc09200 ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0} ; - XYZ::sum at 7 (line 6) 0x000000010cc09218: test %eax,-0x1d69218(%rip) # 0x000000010aea0006 ; {poll_return} 0x000000010cc0921e: vzeroupper 0x000000010cc09221: retq - Kris On Tue, Oct 31, 2017 at 2:59 PM, Ionut wrote: > Hello All, > > I am playing with below example (very trivial, just computing a sum of > 1...N integers): > > *@Benchmark* > *public long sum() {* > * long sum = 0;* > * for (int i = 1; i <= N; i++) {* > * sum += i;* > * }* > * return sum;* > *}* > > > Generated asm on my machine (snapshot from the main scalar loop): > .............................. > ........................ > ? 0x00007f4779bff060: movsxd r10,r11d > ? 0x00007f4779bff063: add rax,r10 > 7.67% 24.83% ? 0x00007f4779bff066: add rax,r10 > 6.11% 3.64% ? 0x00007f4779bff069: add rax,r10 > 4.54% 3.71% ? 0x00007f4779bff06c: add rax,r10 > 6.12% 5.85% ? 0x00007f4779bff06f: add rax,r10 > 5.75% 4.21% ? 0x00007f4779bff072: add rax,r10 > 5.96% 4.38% ? 0x00007f4779bff075: add rax,r10 > 4.23% 3.63% ? 0x00007f4779bff078: add rax,r10 > 6.70% 6.32% ? 0x00007f4779bff07b: add rax,r10 > 7.40% 4.56% ? 0x00007f4779bff07e: add rax,r10 > 4.61% 3.31% ? 0x00007f4779bff081: add rax,r10 > 5.45% 5.24% ? 0x00007f4779bff084: add rax,r10 > 5.99% 5.14% ? 0x00007f4779bff087: add rax,r10 > 7.70% 5.36% ? 0x00007f4779bff08a: add rax,r10 > 5.17% 4.16% ? 0x00007f4779bff08d: add rax,r10 > 3.97% 3.83% ? 0x00007f4779bff090: add rax,r10 > 4.80% 3.97% ? 0x00007f4779bff093: add rax,0x78 > > 5.92% 5.97% ? 0x00007f4779bff097: add r11d,0x10 > 0.01% ? 0x00007f4779bff09b: cmp r11d,0x5f5e0f2 > ? 0x00007f4779bff0a2: jl 0x00007f4779bff060 > .............................. > ........................ > > *Questions*: > - Would it be possible for JIT C2 to perform a better optimization in this > context, as for example replacing the main loop (which might be costly) by > a reduction formula as N*(N-1)/2 (in this specific case)? > - Is there any context where JIT C2 can perform such optimization but I am > missing? > - If not, what prevents it for doing this? > > Thanks > Ionut > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: