From rwestrel at redhat.com  Mon Oct  2 07:25:11 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 02 Oct 2017 09:25:11 +0200
Subject: RFR(S): 8187822: C2 conditonal move optimization might create
 broken graph
In-Reply-To: <aa3894c5-2c31-d0d7-b546-2739c1259c45@oracle.com>
References: <dk6h8vqeu8z.fsf@rwestrel.remote.csb>
 <0d0b226d-418f-2344-2ff9-a7682747a0e2@oracle.com>
 <dk61smsdwpp.fsf@rwestrel.remote.csb>
 <66e0634c-84a6-f8ef-6451-bf1e6a9be252@oracle.com>
 <dk6d169zza5.fsf@rwestrel.remote.csb>
 <aa3894c5-2c31-d0d7-b546-2739c1259c45@oracle.com>
Message-ID: <dk64lriypt4.fsf@rwestrel.remote.csb>


> Yes. Thanks to look on it. Changes are good.

Thanks for the review. Anyone to sponsor this fix?

Roland.

From rwestrel at redhat.com  Mon Oct  2 07:48:14 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 02 Oct 2017 09:48:14 +0200
Subject: RFR(S): 8187822: C2 conditonal move optimization might create
 broken graph
In-Reply-To: <dk64lriypt4.fsf@rwestrel.remote.csb>
References: <dk6h8vqeu8z.fsf@rwestrel.remote.csb>
 <0d0b226d-418f-2344-2ff9-a7682747a0e2@oracle.com>
 <dk61smsdwpp.fsf@rwestrel.remote.csb>
 <66e0634c-84a6-f8ef-6451-bf1e6a9be252@oracle.com>
 <dk6d169zza5.fsf@rwestrel.remote.csb>
 <aa3894c5-2c31-d0d7-b546-2739c1259c45@oracle.com>
 <dk64lriypt4.fsf@rwestrel.remote.csb>
Message-ID: <dk6vajyxa69.fsf@rwestrel.remote.csb>


Ready to push changeset:

http://cr.openjdk.java.net/~roland/8187822/changeset

Roland.

From martin.doerr at sap.com  Mon Oct  2 09:03:39 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 2 Oct 2017 09:03:39 +0000
Subject: sponsor needed for 8185979: PPC64: Implement SHA2 intrinsic
In-Reply-To: <4bd56460-59c6-f95a-7a9a-9a6687d84115@oracle.com>
References: <ccf8dd4785864cc28418082a6790d527@sap.com>
 <4bd56460-59c6-f95a-7a9a-9a6687d84115@oracle.com>
Message-ID: <6bf0bbb8001c4b6d88145abd48b15574@sap.com>

Hi Vladimir,

thanks a lot.

Best regards,
Martin


-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Freitag, 29. September 2017 20:41
To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: sponsor needed for 8185979: PPC64: Implement SHA2 intrinsic

I will sponsor it.

Vladimir

On 9/29/17 8:05 AM, Doerr, Martin wrote:
> Hi,
> 
> we need a sponsor for the following PPC64 change:
> 
> 8185979: PPC64: Implement SHA2 intrinsic
> 
> because it touches hotspot tests.
> 
> Latest webrev for jdk10/hs is here:
> 
> http://cr.openjdk.java.net/~mdoerr/8185979_ppc_sha2/webrev.06/
> 
> It already has 2 reviews. Can somebody push it through JPRT, please?
> 
> Best regards,
> 
> Martin
> 

From rwestrel at redhat.com  Mon Oct  2 09:40:34 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 02 Oct 2017 11:40:34 +0200
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i),
 scope_depth)->pco() == handler_pcos->at(i))" failure with C1
Message-ID: <dk6shf1yjjh.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8188151/webrev.00/

When Compilation::generate_exception_handler_table() walks the exception
handler information to populate the exception handler table, it has some
logic that removes duplicate handlers for one particular throwing pc and
it is wrong AFAICT.

That code iterates over already processed (handler_bci, scope_count,
entry_pco) triples stored in GrowableArrays bcis, scope_depths, pcos and
looks for entries for which handler_bci, scope_count are identical to
the current one. It does that by looking for an entry with same
handler_bci in the bcis array and then checks whether scope_count
matches too. The list of triples could be something like:

1: (13, 0, ..)
2: (13, 1, ..)

and the next triple to be process: (13, 1, ..) which is a duplicate of
2. That logic looks for a handler with bci 13, finds entry 1 which
doesn't have scope count 1. And concludes that there no duplicate
entry. It would need to look at the following entry too. Given scope
counts are sorted in increasing order, rather that iterate over the list
of triples from the start, looking for duplicates fromt the end of the
list fixes that problem.

Roland.

From gustavo.scalet at eldorado.org.br  Mon Oct  2 10:53:32 2017
From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet)
Date: Mon, 2 Oct 2017 10:53:32 +0000
Subject: [10] RFR(M): 8185976: PPC64: Implement MulAdd and SquareToLen
 intrinsics
In-Reply-To: <fbd11d013e8b4d2ca1fb6fe7d50c5e5e@sap.com>
References: <dfbae489df1f4226b9d9d115b49f9c5a@serv030.corp.eldorado.org.br>
 <1f159ee480284095b8e5c3f444dceb96@serv031.corp.eldorado.org.br>
 <16e8b68451e94eb79cdd7d9cb5d7984c@sap.com>
 <2425566a8ff74051af485c919a0bf5ee@serv030.corp.eldorado.org.br>
 <4ec93a6bcbe14cf99c2fa02d50a18965@sap.com>
 <dd3df83b1d714e4b9cac1c5b1b38de00@serv031.corp.eldorado.org.br>
 <0ef23b5fcbc54996aea876d4c60e4097@sap.com>
 <10a918efbd344b1fbf95c56b7beedbc0@serv031.corp.eldorado.org.br>
 <2badcffdc4fb44c9ba77b5a1c6cc26fb@sap.com>
 <6362e4c1e3ab4871b12232580f2971aa@serv031.corp.eldorado.org.br>
 <a16fac266cbc45baa6c954257fec5f43@sap.com>
 <1a54e77a4b5e45e2b848da3fccf423dd@serv030.corp.eldorado.org.br>
 <170eb84ecbdb4c9b9fe8d4481e4c319f@sap.com>
 <0aaf319e25934903a468542d02f6a734@serv030.corp.eldorado.org.br>
 <2432cbfebfa342dfb560ecf4d6023581@serv030.corp.eldorado.org.br>
 <fbd11d013e8b4d2ca1fb6fe7d50c5e5e@sap.com>
Message-ID: <257df2509b4c4376967933c9b08ac967@serv030.corp.eldorado.org.br>

Sorry, I didn't notice that.

Thanks, have a great week

> -----Original Message-----
> From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com]
> Sent: sexta-feira, 29 de setembro de 2017 19:48
> To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>; Doerr, Martin
> <martin.doerr at sap.com>
> Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> SquareToLen intrinsics
> 
> Hi,
> 
> I pushed it a few days ago:
> http://hg.openjdk.java.net/jdk10/hs/rev/122833427b36
> 
> Cheers,
>   Goetz.
> 
> > -----Original Message-----
> > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br]
> > Sent: Friday, September 29, 2017 11:26 PM
> > To: Doerr, Martin <martin.doerr at sap.com>; Lindenmaier, Goetz
> > <goetz.lindenmaier at sap.com>
> > Cc: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> > dev at openjdk.java.net>
> > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > SquareToLen intrinsics
> >
> > Hi Martin and Goetz,
> >
> > A new webrev updated to the new repo structure was requested and can
> > be viewed below:
> > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.05/
> >
> > PS: changes applied cleanly from old hotspot to new one.
> >
> > Can it be sponsored now?
> >
> > Thanks.
> >
> > > -----Original Message-----
> > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> > > Sent: quarta-feira, 6 de setembro de 2017 09:45
> > > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr, Martin
> > > <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net'
> > > <hotspot-compiler-dev at openjdk.java.net>
> > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > SquareToLen intrinsics
> > >
> > > Alright, thanks for the instructions. I'll keep that in mind.
> > >
> > > > -----Original Message-----
> > > > From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com]
> > > > Sent: quarta-feira, 6 de setembro de 2017 09:44
> > > > To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>; Doerr,
> > > > Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> > dev at openjdk.java.net'
> > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > > SquareToLen intrinsics
> > > >
> > > > Hi Gustavo,
> > > >
> > > > the repos are all closed. Once they are opened again, you will
> > > > have to merge your change into the new repo structure, post a new
> > > > webrev and only then it can be sponsored.  Me or Martin will
> sponsor it then.
> > > >
> > > > Best regards,
> > > >   Goetz.
> > > >
> > > > > -----Original Message-----
> > > > > From: Gustavo Serra Scalet
> > > > > [mailto:gustavo.scalet at eldorado.org.br]
> > > > > Sent: Mittwoch, 6. September 2017 14:32
> > > > > To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Doerr,
> > > > > Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> dev at openjdk.java.net'
> > > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > > > SquareToLen intrinsics
> > > > >
> > > > > Thanks Goetz.
> > > > >
> > > > > Could somebody sponsor this change?
> > > > >
> > > > > THanks
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Lindenmaier, Goetz [mailto:goetz.lindenmaier at sap.com]
> > > > > > Sent: quarta-feira, 6 de setembro de 2017 03:30
> > > > > > To: Gustavo Serra Scalet <gustavo.scalet at eldorado.org.br>;
> > > > > > Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> > > > dev at openjdk.java.net'
> > > > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd and
> > > > > > SquareToLen intrinsics
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I had a look at this change and tested it.  Reviewed.
> > > > > >
> > > > > > Best regards,
> > > > > >   Goetz.
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> > > > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet
> > > > > > > Sent: Freitag, 1. September 2017 19:12
> > > > > > > To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-
> > > > > > > dev at openjdk.java.net'
> > > > > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd
> > > > > > > and SquareToLen intrinsics
> > > > > > >
> > > > > > > Hi Martin,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Doerr, Martin
> > > > > > > > your first webrev already works on Big Endian. So the only
> > > > > > > > required change is to fix your new code by this trivial
> patch:
> > > > > > > > --- a/src/cpu/ppc/vm/stubGenerator_ppc.cpp      Fri Sep 01
> > > > 17:47:45
> > > > > > 2017
> > > > > > > > +0200
> > > > > > > > +++ b/src/cpu/ppc/vm/stubGenerator_ppc.cpp      Fri Sep 01
> > > > 17:55:08
> > > > > > 2017
> > > > > > > > +0200
> > > > > > > > @@ -3426,7 +3426,9 @@
> > > > > > > >      __ srdi   (product,   product,   1);
> > > > > > > >      // join them to the same register and store it as
> > > > > > > > Little
> > > > Endian
> > > > > > > >      __ orr    (product,   lplw_s,    product);
> > > > > > > > +#ifdef VM_LITTLE_ENDIAN
> > > > > > > >      __ rldicl (product,   product,   32, 0);
> > > > > > > > +#endif
> > > > > > > >      __ stdu   (product,   8,         out_aux);
> > > > > > > >      __ bdnz   (LOOP_SQUARE);
> > > > > > > >
> > > > > > > > So please enable it again for Big Endian in
> vm_version_ppc.
> > > > > > > > Besides that, it looks good to me. We also need a 2nd
> review.
> > > > > > >
> > > > > > > Great! Thanks for checking it and suggesting the diff.
> > > > > > >
> > > > > > > I changed these things. You can find it below:
> > > > > > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.04/
> > > > > > >
> > > > > > > I wonder who could be a 2nd reviewer... Anybody in mind that
> > > > > > > we may
> > > > > > ping?
> > > > > > > Maybe Goetz Lindenmaier?
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Gustavo  Serra Scalet
> > > > > > >
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Martin
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Gustavo Serra Scalet
> > > > > > > > [mailto:gustavo.scalet at eldorado.org.br]
> > > > > > > > Sent: Mittwoch, 30. August 2017 19:03
> > > > > > > > To: Doerr, Martin <martin.doerr at sap.com>;
> > > > > > > > 'hotspot-compiler- dev at openjdk.java.net'
> > > > > > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement MulAdd
> > > > > > > > and SquareToLen intrinsics
> > > > > > > >
> > > > > > > > Hi Martin,
> > > > > > > >
> > > > > > > > (webrev at the end)
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Doerr, Martin
> > > > > > > > >
> > > > > > > > > > The s/rldicl/rldic/ was fixed for "offset", but "len"
> > > > > > > > > > doesn't seem to need further changes as it's being
> > > > > > > > > > cleared with clrldi, which is the same as rldic with
> no shift.
> > > > > > > > > > Therefore it's treated appropriately as requested for
> > > > "offset" parameter. Do you agree?
> > > > > > > > >
> > > > > > > > > No, I didn't find clrldi for len in generate_mulAdd().
> > > > > > > > > Only
> > > > for k.
> > > > > > > >
> > > > > > > > I'm sorry. I was thinking about "offset" and "k", which
> > > > > > > > are both cleaned on generate_mulAdd(). "len" was not
> > > > > > > > cleaned and it was being used on
> > > > > > > > muladd() directly with cmpdi, which could lead to
> problems.
> > > > > > > >
> > > > > > > > That is being changed.
> > > > > > > >
> > > > > > > > > Where are in_len and out_len fixed up in
> > > > generate_squareToLen()?
> > > > > > > >
> > > > > > > > They are not. According to your suggestions, I agree it
> > > > > > > > also needs to be done for the same reason.
> > > > > > > >
> > > > > > > > > > You are right. The way I'm building the 64 bits of the
> > > > > > > > > > register depends on which kind of endianness it is
> run.
> > > > > > > > > > For now it works only on little endian so I'm adding a
> > > > > > > > > > switch (just like I did for SHA) to make it available
> > > > > > > > > > only on
> > > > little endian systems.
> > > > > > > > >
> > > > > > > > > It shouldn't be that hard to get it working on big
> > > > > > > > > endian
> > > > > > > > > ;-) Btw., my point was not to replace the 2 4-byte store
> > > > > > > > > instructions by an 8-byte one (though I'm also ok with
> > > that).
> > > > > > > > > It was that 2 stwu which update the same pointer doesn't
> > > > > > > > > make sense from
> > > > > > performance point of view.
> > > > > > > > > Please keep something which works on big endian, too.
> > > > > > > >
> > > > > > > > I see. The 2x stwu was being used like that because it was
> > > > > > > > the trivial approach when considering the original java
> update:
> > > > > > > > z[i++] = (lastProductLowWord << 31) | (int)(product >>>
> > > > > > > > 33); z[i++] = (int)(product >>> 1);
> > > > > > > >
> > > > > > > > As you pointed out, that might cause some stall on the
> > > > > > > > pipeline so I made it with 1s stdu (and could improve code
> > > > > > > > by reducing 1
> > > > > > > > instruction)
> > > > > > > >
> > > > > > > > Now about having a big endian version: I'm not confident
> > > > > > > > in doing so as I don't have access to such a machine at
> > > > > > > > the
> > > moment.
> > > > > > > > You were kind on offering test support but I don't know if
> > > > > > > > it'd work like that. I may support you in checking out
> > > > > > > > which places are endianness-related but I'm not
> > > > > > > > comfortable in sending you untested
> > > > > > code.
> > > > > > > >
> > > > > > > > Would you be interested in doing such a changes for making
> > > > > > > > it work on Big Endian? For this patch, I provided an
> > > > > > > > interesting test that might help you to verify if it
> worked.
> > > > > > > >
> > > > > > > > > > No, I used the jdk8u152-b01 (State of repository at
> > > > > > > > > > Thu Apr
> > > > > > > > > > 6
> > > > > > > > > > 14:15:31 2017). The reported performance speedup was
> > > > > > > > > > calculated by running the following test
> > > > (TestSquareToLen.java):
> > > > > > > > >
> > > > > > > > > Seems like JDK-8145913 has not been backported, yet.
> > > > > > > > > Sorry for not checking this earlier. So if you want to
> > > > > > > > > make RSA really fast, it should be so much better to
> > > > > > > > > backport that one. But I can still sponsor this change
> > > > > > > > > as it may be used
> > > elsewhere.
> > > > > > > >
> > > > > > > > No problem. It's nice to know that I may not need to
> > > > > > > > request a backport of this patch for performance reasons.
> > > > > > > >
> > > > > > > > And at last, but not least, the new webrev with these
> > > > > > > > clrldi
> > > > > > changes:
> > > > > > > > https://gut.github.io/openjdk/webrev/JDK-
> > > > > > > 8185976/webrev.03/index.html
> > > > > > > >
> > > > > > > > Thank you once again,
> > > > > > > > Gustavo Serra Scalet
> > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Martin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Gustavo Serra Scalet
> > > > > > > > > [mailto:gustavo.scalet at eldorado.org.br]
> > > > > > > > > Sent: Dienstag, 29. August 2017 22:37
> > > > > > > > > To: Doerr, Martin <martin.doerr at sap.com>;
> > > > > > > > > 'hotspot-compiler- dev at openjdk.java.net'
> > > > > > > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement
> > > > > > > > > MulAdd and SquareToLen intrinsics
> > > > > > > > >
> > > > > > > > > Hi Martin,
> > > > > > > > >
> > > > > > > > > New changes:
> > > > > > > > > https://gut.github.io/openjdk/webrev/JDK-8185976/webrev.
> > > > > > > > > 02/
> > > > > > > > >
> > > > > > > > > Check comments below, please.
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Doerr, Martin
> > > > > > > > > >
> > > > > > > > > > 1. Sign extending offset and len Right, sign and zero
> > > > > > > > > > extending is equivalent for offset and len because
> > > > > > > > > > they are guaranteed to be >=0 (by checks in Java). But
> > > > > > > > > > you can only rely on bit 32 (IBM
> > > > > > > > > > notation) to be 0. Bit 0-31 may contain
> > > > > > > > > garbage.
> > > > > > > > > > rldicl was incorrect. My mistake, sorry for that.
> > > > > > > > > > Correct would be rldic which also clears the least
> > > > > > > > > > significant
> > > bits.
> > > > > > > > > > len should also get fixed e.g. by replacing cmpdi by
> > > > > > > > > > extsw_ in
> > > > > > > > muladd.
> > > > > > > > >
> > > > > > > > > The s/rldicl/rldic/ was fixed for "offset", but "len"
> > > > > > > > > doesn't seem to need further changes as it's being
> > > > > > > > > cleared with clrldi, which is the same as rldic with no
> shift.
> > > > > > > > > Therefore it's treated appropriately as requested for
> > > "offset"
> > > > parameter. Do you agree?
> > > > > > > > >
> > > > > > > > > > 2. Using 8 byte instructions for int The code which
> > > > > > > > > > feeds stdu is endianess specific. Doesn't work on all
> > > > > > > > > > PPC64 platforms.
> > > > > > > > >
> > > > > > > > > You are right. The way I'm building the 64 bits of the
> > > > > > > > > register depends on which kind of endianness it is run.
> > > > > > > > > For now it works only on little endian so I'm adding a
> > > > > > > > > switch (just like I did for
> > > > > > > > > SHA) to make it available only on little endian systems.
> > > > > > > > >
> > > > > > > > > > 3.Regarding Andrew's point: Superseded by Montgomery?
> > > > > > > > > > The Montgomery change got backported to jdk8u (JDK-
> > 8150152
> > > > > > > > > > in
> > > > > > > > 8u102).
> > > > > > > > > > I'd expect the performance improvement of these
> > > > > > > > > > intrinsics to be irrelevant for crypto.rsa. Did you
> > > > > > > > > > measure with an older jdk8
> > > > > > > > release?
> > > > > > > > >
> > > > > > > > > No, I used the jdk8u152-b01 (State of repository at Thu
> > > > > > > > > Apr
> > > > > > > > > 6
> > > > > > > > > 14:15:31 2017). The reported performance speedup was
> > > > > > > > > calculated by running the following test
> > > > (TestSquareToLen.java):
> > > > > > > > > import java.math.BigInteger;
> > > > > > > > >
> > > > > > > > > public class TestSquareToLen {
> > > > > > > > >
> > > > > > > > >     public static void main(String args[]) throws
> > > > > > > > > Exception {
> > > > > > > > >
> > > > > > > > >       int n = 10000000;
> > > > > > > > >       if (args.length >=1) {
> > > > > > > > >         n = Integer.parseInt(args[0]);
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > >       BigInteger b1 = new
> > > > > > > > >
> > > > > > >
> > > > >
> > BigInteger("3489398092355735908635051498208250392000229831187732
> > 0859
> > > > > > > 99
> > > > > > > > > 36
> > > > > > > > >
> > > > > > >
> > > > >
> > 73955941838010214688430713917560492078731370166315598379312147
> > 54926
> > > > > > > 092
> > > > > > > > > 22
> > > > > > > > >
> > > > > > >
> > > > >
> > 37802921102076092232721848082893366300577359694237268085206410
> > 30118
> > > > > > > 116
> > > > > > > > > 51
> > > > > > > > >
> > > > > > >
> > > > >
> > 64401804883382348239081994789652420763585798455208997799631311
> > 31540
> > > > > > > 166
> > > > > > > > > 68 718795349783157384006672542605760392289645528307");
> > > > > > > > >       BigInteger b2 = BigInteger.valueOf(0);
> > > > > > > > >       BigInteger check = BigInteger.valueOf(1);
> > > > > > > > >       for (int i = 0; i < n; i++) {
> > > > > > > > >         b2 = b1.multiply(b1);
> > > > > > > > >         if (i == 0)
> > > > > > > > >           // Didn't JIT yet. Comparing against
> > > > > > > > > interpreted
> > > > mode
> > > > > > > > >           check = b2;
> > > > > > > > >       }
> > > > > > > > >       if (b2.compareTo(check) == 0)
> > > > > > > > >         System.out.println("Check ok!");
> > > > > > > > >       else
> > > > > > > > >         System.out.println("Check failed!");
> > > > > > > > >    }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I got these results on JDK8 on my POWER8 machine:
> > > > > > > > > $ ./javac TestSquareToLen.java $ sudo perf stat -r 5
> > > > > > > > > ./java -XX:-UseMulAddIntrinsic -XX:-
> > > > > > > > > UseSquareToLenIntrinsic TestSquareToLen Check ok!
> > > > > > > > > Check ok!
> > > > > > > > > Check ok!
> > > > > > > > > Check ok!
> > > > > > > > > Check ok!
> > > > > > > > >
> > > > > > > > >  Performance counter stats for './java
> > > > > > > > > -XX:-UseMulAddIntrinsic
> > > > > > > > > -XX:- UseSquareToLenIntrinsic TestSquareToLen' (5 runs):
> > > > > > > > >
> > > > > > > > >       15148.009557      task-clock (msec)         #
> 1.053
> > > > CPUs
> > > > > > > > > utilized            ( +-  0.48% )
> > > > > > > > >              2,425      context-switches          #
> 0.160
> > > > K/sec
> > > > > > > > > ( +-  5.84% )
> > > > > > > > >                356      cpu-migrations            #
> 0.023
> > > > K/sec
> > > > > > > > > ( +-  3.01% )
> > > > > > > > >              5,153      page-faults               #
> 0.340
> > > > K/sec
> > > > > > > > > ( +-  5.22% )
> > > > > > > > >     54,536,889,909      cycles                    #
> 3.600
> > > > GHz
> > > > > > > > > ( +-  0.56% )  (66.68%)
> > > > > > > > >        239,554,105      stalled-cycles-frontend   #
> 0.44%
> > > > > > frontend
> > > > > > > > > cycles idle     ( +-  4.87% )  (49.90%)
> > > > > > > > >     27,683,316,001      stalled-cycles-backend    #
> 50.76%
> > > > > > backend
> > > > > > > > > cycles idle      ( +-  0.56% )  (50.17%)
> > > > > > > > >    102,020,229,733      instructions              #
> 1.87
> > > > insn
> > > > > > per
> > > > > > > > > cycle
> > > > > > > > >                                                   #
> 0.27
> > > > > > stalled
> > > > > > > > > cycles per insn  ( +-  0.14% )  (66.94%)
> > > > > > > > >      7,706,072,218      branches                  #
> 508.718
> > > > M/sec
> > > > > > > > > ( +-  0.23% )  (50.20%)
> > > > > > > > >        456,051,162      branch-misses             #
> 5.92%
> > > > of
> > > > > > all
> > > > > > > > > branches          ( +-  0.09% )  (50.07%)
> > > > > > > > >
> > > > > > > > >       14.390840733 seconds time elapsed ( +-  0.09% )
> > > > > > > > >
> > > > > > > > > $ sudo perf stat -r 5 ./java -XX:+UseMulAddIntrinsic -
> > > > > > > > > XX:+UseSquareToLenIntrinsic TestSquareToLen Check ok!
> > > > > > > > > Check ok!
> > > > > > > > > Check ok!
> > > > > > > > > Check ok!
> > > > > > > > > Check ok!
> > > > > > > > >
> > > > > > > > >  Performance counter stats for './java
> > > > > > > > > -XX:+UseMulAddIntrinsic
> > > > > > > > > - XX:+UseSquareToLenIntrinsic TestSquareToLen' (5 runs):
> > > > > > > > >
> > > > > > > > >       11368.141410      task-clock (msec)         #
> 1.045
> > > > CPUs
> > > > > > > > > utilized            ( +-  0.64% )
> > > > > > > > >              1,964      context-switches          #
> 0.173
> > > > K/sec
> > > > > > > > > ( +-  8.93% )
> > > > > > > > >                338      cpu-migrations            #
> 0.030
> > > > K/sec
> > > > > > > > > ( +-  7.65% )
> > > > > > > > >              5,627      page-faults               #
> 0.495
> > > > K/sec
> > > > > > > > > ( +-  6.15% )
> > > > > > > > >     41,100,168,967      cycles                    #
> 3.615
> > > > GHz
> > > > > > > > > ( +-  0.50% )  (66.36%)
> > > > > > > > >        309,052,316      stalled-cycles-frontend   #
> 0.75%
> > > > > > frontend
> > > > > > > > > cycles idle     ( +-  2.84% )  (49.89%)
> > > > > > > > >     14,188,581,685      stalled-cycles-backend    #
> 34.52%
> > > > > > backend
> > > > > > > > > cycles idle      ( +-  0.99% )  (50.34%)
> > > > > > > > >     77,846,029,829      instructions              #
> 1.89
> > > > insn
> > > > > > per
> > > > > > > > > cycle
> > > > > > > > >                                                   #
> 0.18
> > > > > > stalled
> > > > > > > > > cycles per insn  ( +-  0.29% )  (66.96%)
> > > > > > > > >      8,435,216,989      branches                  #
> 742.005
> > > > M/sec
> > > > > > > > > ( +-  0.28% )  (50.17%)
> > > > > > > > >        339,903,936      branch-misses             #
> 4.03%
> > > > of
> > > > > > all
> > > > > > > > > branches          ( +-  0.27% )  (49.90%)
> > > > > > > > >
> > > > > > > > >       10.882357546 seconds time elapsed ( +-  0.24% )
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > (out of curiosity, these numbers are 15.19s (+- 0.32%)
> > > > > > > > > and 13.42s
> > > > > > > > > (+-
> > > > > > > > > 0.53%) on JDK10)
> > > > > > > > >
> > > > > > > > > I may run for SpecJVM2008's crypto.rsa if you are
> > > interested.
> > > > > > > > >
> > > > > > > > > Thank you once again for reviewing this.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Gustavo
> > > > > > > > >
> > > > > > > > > > (I think the change is still acceptable as the
> > > > > > > > > > intrinsics could be used elsewhere and the
> > > > > > > > > > implementation also exists on other
> > > > > > > > > > platforms.)
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Martin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Gustavo Serra Scalet
> > > > > > > > > > [mailto:gustavo.scalet at eldorado.org.br]
> > > > > > > > > > Sent: Mittwoch, 16. August 2017 18:50
> > > > > > > > > > To: Doerr, Martin <martin.doerr at sap.com>;
> > > > > > > > > > 'hotspot-compiler- dev at openjdk.java.net'
> > > > > > > > > > <hotspot-compiler-dev at openjdk.java.net>
> > > > > > > > > > Subject: RE: [10] RFR(M): 8185976: PPC64: Implement
> > > > > > > > > > MulAdd and SquareToLen intrinsics
> > > > > > > > > >
> > > > > > > > > > Hi Martin,
> > > > > > > > > >
> > > > > > > > > > Thanks for dedicated review. It took me a while to be
> > > > > > > > > > able to work on this but I hope to have your points
> solved.
> > > > > > > > > > Please check below the review as well as my comments
> > > > > > > > > > quoting
> > > > your email:
> > > > > > > > > > https://gut.github.io/openjdk/webrev/JDK-
> > 8185976/webrev.01
> > > > > > > > > > /
> > > > > > > > > >
> > > > > > > > > > > -----Original Message----- First of all, C2 does not
> > > > > > > > > > > perform sign extend when calling
> > > > > > stubs.
> > > > > > > > > > > The int parms need to get zero/sign extended. (Could
> > > > > > > > > > > even be done without extra instructions by replacing
> > > > > > > > > > > sldi -> rldicl, cmpdi -> extsw_ in some
> > > > > > > > > > > cases.)
> > > > > > > > > >
> > > > > > > > > > Does it make a difference on my case?
> > > > > > > > > >
> > > > > > > > > > I guess you are talking about mulAdd preparation code.
> > > > > > > > > > The only aspect I found about him is to force the cast
> > > > > > > > > > from 32 bits -> 64 bits by cleaning higher bits.
> > > > > > > > > > Offset is a signed integer but it can't be
> > > > > > > > > negative anyway.
> > > > > > > > > >
> > > > > > > > > > So I changed from:
> > > > > > > > > > sldi   (R5_ARG3, R5_ARG3, 2);
> > > > > > > > > >
> > > > > > > > > > to:
> > > > > > > > > > rldicl (R5_ARG3, R5_ARG3, 2, 32);  // always positive
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > macroAssembler_ppc.cpp:
> > > > > > > > > > > - Indentation should be 2 spaces.
> > > > > > > > > >
> > > > > > > > > > Done
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > stubGenerator_ppc:cpp:
> > > > > > > > > > > - or_, addi_ should get replaced by orr, addi when
> > > > > > > > > > > CR0 result is not needed.
> > > > > > > > > >
> > > > > > > > > > Done
> > > > > > > > > >
> > > > > > > > > > > - Where is lplw initialized?
> > > > > > > > > >
> > > > > > > > > > It should be initialized with 0, I missed that...
> > > > > > > > > >
> > > > > > > > > > > - I believe that the updating load/store
> > > > > > > > > > > instructions
> > > e.g.
> > > > > > > > > > > lwzu don't perform well on some processors. At least
> > > > > > > > > > > using stwu 2 times in the loop doesn't make sense.
> > > > > > > > > >
> > > > > > > > > > You are right. I could manipulate the bits differently
> > > > > > > > > > and ended up with a single stdu in the loop. Neat!
> > > > > > > > > > Although I could not reduce the total number of
> instructions.
> > > > > > > > > >
> > > > > > > > > > > - Note: It should be possible to use 8 byte instead
> > > > > > > > > > > of 4 byte
> > > > > > > > > > > instructions: MacroAssembler::multiply64, addc,
> adde.
> > > > > > > > > > > But I'm not requesting to change that because I
> > > > > > > > > > > guess it would make the code very complicated,
> > > > > > > > > > > especially when supporting both endianess
> > > > > > > > > versions.
> > > > > > > > > >
> > > > > > > > > > Yes, that would require a new analysis on this code.
> > > > > > > > > > May we consider it next? As you said, I prefer having
> > > > > > > > > > an initial version that looks as simple as the
> > > > > > > > > > original java
> > > code.
> > > > > > > > > >
> > > > > > > > > > > - The squareToLen stub implementation is very close
> > > > > > > > > > > the Java implementation. So it'd be interesting to
> > > > > > > > > > > understand what C2 doesn't do as well as the hand
> > > > > > > > > > > written assembly code. Do you know that? (Not
> > > > > > > > > > > absolutely necessary for accepting this change as
> > > > > > > > > > > long as the stub is measurably
> > > > > > > > > > > faster.)
> > > > > > > > > >
> > > > > > > > > > I don't know either. Basically I chose doing it
> > > > > > > > > > because I noticed some performance gain on SpecJVM2008
> > > > > > > > > > when analyzing
> > > > > X64.
> > > > > > > > > > Then, taking a closer look, I didn't notice any AVX or
> > > > > > > > > > some special instructions on
> > > > > > > > > > X64 so I decided to try it on ppc64 by using some
> > > > > > > > > > basic
> > > > > > assembly.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Martin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: hotspot-compiler-dev
> > > > > > > > > > > [mailto:hotspot-compiler-dev-
> > > > > > > > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra
> > > > > > > > > > > Scalet
> > > > > > > > > > > Sent: Donnerstag, 10. August 2017 19:22
> > > > > > > > > > > To: 'hotspot-compiler-dev at openjdk.java.net'
> > > > > > > > > > > <hotspot-compiler- dev at openjdk.java.net>
> > > > > > > > > > > Subject: FW: [10] RFR(M): 8185976: PPC64: Implement
> > > > > > > > > > > MulAdd
> > > > > and
> > > > > > > > > > > SquareToLen intrinsics
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> > > > > > > > > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra
> > > > > > > > > > > Scalet
> > > > > > > > > > > Sent: ter?a-feira, 8 de agosto de 2017 17:19
> > > > > > > > > > > To: ppc-aix-port-dev at openjdk.java.net
> > > > > > > > > > > Subject: [10] RFR(M): 8185976: PPC64: Implement
> > > > > > > > > > > MulAdd and SquareToLen intrinsics
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Could you please review this specific PPC64 change
> > > > > > > > > > > to
> > > > hotspot?
> > > > > > > > > > > By implementing these intrinsics I noticed a small
> > > > > > > > > > > improvement with microbenchmarks analysis. On
> > > > > > > > > > > SpecJVM2008's crypto.rsa benchmark, only when
> > > > > > > > > > > backporting to JDK8 an improvement was
> > > > > > noticed.
> > > > > > > > > > >
> > > > > > > > > > > JBS:
> > > > > > > > > > > https://bugs.openjdk.java.net/browse/JDK-8185976
> > > > > > > > > > > Webrev: https://gut.github.io/openjdk/webrev/JDK-
> > > > > > > 8185976/webrev/
> > > > > > > > > > >
> > > > > > > > > > > Motivation for this implementation:
> > > > > > > > > > > https://twitter.com/ijuma/status/698309312498835457
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Gustavo Serra Scalet

From rwestrel at redhat.com  Mon Oct  2 11:46:37 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 02 Oct 2017 13:46:37 +0200
Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler
 dying subgraph with single if proj
Message-ID: <dk6mv59ydpe.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8188223/webrev.00/

I saw the following crash (that I cannot reproduce anymore having
deleted the replay file by mistake).

With subgraph shape: 

UNC->Region->IfProj->RangeCheck 

The region has the IfProj as single input. The following code in
RegionNode::Ideal():

  if (can_reshape && cnt == 1) {
    // Is it dead loop?
    // If it is LoopNopde it had 2 (+1 itself) inputs and
    // one of them was cut. The loop is dead if it was EntryContol.
    // Loop node may have only one input because entry path
    // is removed in PhaseIdealLoop::Dominators().
    assert(!this->is_Loop() || cnt_orig <= 3, "Loop node should have 3 or less inputs");
    if ((this->is_Loop() && (del_it == LoopNode::EntryControl ||
                             (del_it == 0 && is_unreachable_region(phase)))) ||
        (!this->is_Loop() && has_phis && is_unreachable_region(phase))) {

finds that the subgraph is unreachable which causes the IfProj to be
removed. RangeCheckNode::Ideal() is later called on a dominated range
check which walks the graph, hit the RangeCheck that has a single
projection and causes a crash.

I think it makes sense to make IfNode::range_check_trap_proj() handle
the case of a RangeCheckNode with a single input.

Roland.

From dmitrij.pochepko at bell-sw.com  Mon Oct  2 13:47:44 2017
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Mon, 2 Oct 2017 16:47:44 +0300
Subject: [10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long)
In-Reply-To: <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com>
References: <c2cd5746-bd97-c5f4-5cfc-0bdcff223556@bell-sw.com>
 <5cfb314b-9785-fe47-8797-a899f38643ef@redhat.com>
 <32a35ee7-9593-bbc4-a540-37c590ec3ba6@redhat.com>
 <5fa7e391-bde0-9310-7bcd-98cf96b3f158@bell-sw.com>
 <3d76ff3c-17d7-a37c-2959-8f5e6dacd854@oracle.com>
 <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com>
Message-ID: <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com>

Hi,

please find rebased webrev here: 
http://cr.openjdk.java.net/~dpochepk/8187684/webrev.newws.01/


Thanks,

Dmitij


On 29.09.2017 02:40, Vladimir Kozlov wrote:
> Dmitry,
>
> Please, update changes for new consolidated sources and send new 
> patch/webrev.
>
> Thanks,
> Vladimir
>
> On 9/25/17 9:42 AM, Vladimir Kozlov wrote:
>> Yes, when repo will be opened.
>>
>> Please, send patch and add latest webrev link to the RFE.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/25/17 5:04 AM, Dmitrij Pochepko wrote:
>>>
>>> On 25.09.2017 14:04, Andrew Haley wrote:
>>>> On 20/09/17 14:29, Andrew Haley wrote:
>>>>> On 20/09/17 14:08, Dmitrij Pochepko wrote:
>>>>>> please review small patch for enhancement: 8187684 - Intrinsify
>>>>>> Math.multiplyHigh(long, long)
>>>>> OK, thanks.
>>>> Dmitrij, do you have a sponsor for this?? I'm sure Vladimir would
>>>> be happy to help.? :-)
>>>>
>>> Hi,
>>>
>>> Vladimir, can you sponsor it?
>>>
>>> Thanks,
>>> Dmitrij


From jcbeyler at google.com  Tue Oct  3 03:52:30 2017
From: jcbeyler at google.com (JC Beyler)
Date: Mon, 2 Oct 2017 20:52:30 -0700
Subject: Low-Overhead Heap Profiling
In-Reply-To: <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <CAF9BGBz81YYVUsMLw9sOMR9fH7JQmyb8d_OZ9aJ47EOAqwWhEw@mail.gmail.com>
 <2af975e6-3827-bd57-0c3d-fadd54867a67@oracle.com>
 <365499b6-3f4d-a4df-9e7e-e72a739fb26b@oracle.com>
 <CAF9BGBy0T7A-+1dejYbr6f0d4s0XR-zv-WwHYqXrbe0EtCdsxQ@mail.gmail.com>
 <fb1155c6-ccf8-843c-34fc-fc4a45fc54cc@oracle.com>
 <CAF9BGBxqmXF9OFD4F9wVb-DQOj_UfNNcBhe+ARVxJ5E4cgCC1Q@mail.gmail.com>
 <102c59b8-25b6-8c21-8eef-1de7d0bbf629@oracle.com>
 <CAF9BGBxDaCOxobJi8Uu9kwKPRpnj-KqgWLfJO2z7UX71UnNF4g@mail.gmail.com>
 <CAF9BGByisZchXTsw1=mLETLnOggJShRJBDxFeOUgt+criTTGuQ@mail.gmail.com>
 <1497366226.2829.109.camel@oracle.com>
 <CAF9BGBx8376y_yeMFnMC8PHSqKDaFc5NcbFerQ2up0YVKtSKGg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
Message-ID: <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>

Dear all,

Small update to the webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/

Full webrev is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/

I updated a bit of the naming, removed a TODO comment, and I added a test
for testing the sampling rate. I also updated the maximum stack depth to
1024, there is no reason to keep it so small. I did a micro benchmark that
tests the overhead and it seems relatively the same.

I compared allocations from a stack depth of 10 and allocations from a
stack depth of 1024 (allocations are from the same helper method in
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
):
          - For an array of 1 integer allocated in a loop; stack depth 1024
vs stack depth 10: 1% slower
          - For an array of 200k integers allocated in a loop; stack depth
1024 vs stack depth 10: 3% slower

So basically now moving the maximum stack depth to 1024 but we only copy
over the stack depths actually used.

For the next webrev, I will be adding a stack depth test to show that it
works and probably put back the mutex locking so that we can see how
difficult it is to keep thread safe.

Let me know what you think!
Jc


On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com> wrote:

> Forgot to say that for my numbers:
>  - Not in the test are the actual numbers I got for the various array
> sizes, I ran the program 30 times and parsed the output; here are the
> averages and standard deviation:
>       1000:     1.28% average; 1.13% standard deviation
>       10000:    1.59% average; 1.25% standard deviation
>       100000:   1.26% average; 1.26% standard deviation
>
> The 1000/10000/100000 are the sizes of the arrays being allocated. These
> are allocated 100k times and the sampling rate is 111 times the size of the
> array.
>
> Thanks!
> Jc
>
>
> On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbeyler at google.com> wrote:
>
>> Hi all,
>>
>> After a bit of a break, I am back working on this :). As before, here are
>> two webrevs:
>>
>> - Full change set: http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
>> - Compared to version 8: http://cr.openjdk.java.net/
>> ~rasbold/8171119/webrev.08_09/
>>     (This version is compared to version 8 I last showed but ported to
>> the new folder hierarchy)
>>
>> In this version I have:
>>   - Handled Thomas' comments from his email of 07/03:
>>        - Merged the logging to be standard
>>        - Fixed up the code a bit where asked
>>        - Added some notes about the code not being thread-safe yet
>>    - Removed additional dead code from the version that modifies
>> interpreter/c1/c2
>>    - Fixed compiler issues so that it compiles with
>> --disable-precompiled-header
>>         - Tested with ./configure --with-boot-jdk=<jdk8>
>> --with-debug-level=slowdebug --disable-precompiled-headers
>>
>> Additionally, I added a test to check the sanity of the sampler:
>> HeapMonitorStatCorrectnessTest (http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch)
>>    - This allocates a number of arrays and checks that we obtain the
>> number of samples we want with an accepted error of 5%. I tested it 100
>> times and it passed everytime, I can test more if wanted
>>    - Not in the test are the actual numbers I got for the various array
>> sizes, I ran the program 30 times and parsed the output; here are the
>> averages and standard deviation:
>>       1000:     1.28% average; 1.13% standard deviation
>>       10000:    1.59% average; 1.25% standard deviation
>>       100000:   1.26% average; 1.26% standard deviation
>>
>> What this means is that we were always at about 1~2% of the number of
>> samples the test expected.
>>
>> Let me know what you think,
>> Jc
>>
>>
>>
>> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <jcbeyler at google.com> wrote:
>>
>>> Hi all,
>>>
>>> I apologize, I have not yet handled your remarks but thought this new
>>> webrev would also be useful to see and comment on perhaps.
>>>
>>> Here is the latest webrev, it is generated slightly different than the
>>> others since now I'm using webrev.ksh without the -N option:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
>>>
>>> And the webrev.07 to webrev.08 diff is here:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>>>
>>> (Let me know if it works well)
>>>
>>> It's a small change between versions but it:
>>>   - provides a fix that makes the average sample rate correct (more on
>>> that below).
>>>   - fixes the code to actually have it play nicely with the fast tlab
>>> refill
>>>   - cleaned up a bit the JVMTI text and now use jvmtiFrameInfo
>>> - moved the capability to be onload solo
>>>
>>> With this webrev, I've done a small study of the random number generator
>>> we use here for the sampling rate. I took a small program and it can be
>>> simplified to:
>>>
>>> for (outer loop)
>>> for (inner loop)
>>> int[] tmp = new int[arraySize];
>>>
>>> - I've fixed the outer and inner loops to being 800 for this experiment,
>>> meaning we allocate 640000 times an array of a given array size.
>>>
>>> - Each program provides the average sample size used for the whole
>>> execution
>>>
>>> - Then, I ran each variation 30 times and then calculated the average of
>>> the average sample size used for various array sizes. I selected the array
>>> size to be one of the following: 1, 10, 100, 1000.
>>>
>>> - When compared to 512kb, the average sample size of 30 runs:
>>> 1: 4.62% of error
>>> 10: 3.09% of error
>>> 100: 0.36% of error
>>> 1000: 0.1% of error
>>> 10000: 0.03% of error
>>>
>>> What it shows is that, depending on the number of samples, the average
>>> does become better. This is because with an allocation of 1 element per
>>> array, it will take longer to hit one of the thresholds. This is seen by
>>> looking at the sample count statistic I put in. For the same number of
>>> iterations (800 * 800), the different array sizes provoke:
>>> 1: 62 samples
>>> 10: 125 samples
>>> 100: 788 samples
>>> 1000: 6166 samples
>>> 10000: 57721 samples
>>>
>>> And of course, the more samples you have, the more sample rates you
>>> pick, which means that your average gets closer using that math.
>>>
>>> Thanks,
>>> Jc
>>>
>>> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <jcbeyler at google.com> wrote:
>>>
>>>> Thanks Robbin,
>>>>
>>>> This seems to have worked. When I have the next webrev ready, we will
>>>> find out but I'm fairly confident it will work!
>>>>
>>>> Thanks agian!
>>>> Jc
>>>>
>>>> On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <robbin.ehn at oracle.com>
>>>> wrote:
>>>>
>>>>> Hi JC,
>>>>>
>>>>> On 06/29/2017 12:15 AM, JC Beyler wrote:
>>>>>
>>>>>> B) Incremental changes
>>>>>>
>>>>>
>>>>> I guess the most common work flow here is using mq :
>>>>> hg qnew fix_v1
>>>>> edit files
>>>>> hg qrefresh
>>>>> hg qnew fix_v2
>>>>> edit files
>>>>> hg qrefresh
>>>>>
>>>>> if you do hg log you will see 2 commits
>>>>>
>>>>> webrev.ksh -r -2 -o my_inc_v1_v2
>>>>> webrev.ksh -o my_full_v2
>>>>>
>>>>>
>>>>> In  your .hgrc you might need:
>>>>> [extensions]
>>>>> mq =
>>>>>
>>>>> /Robbin
>>>>>
>>>>>
>>>>>> Again another newbiew question here...
>>>>>>
>>>>>> For showing the incremental changes, is there a link that explains
>>>>>> how to do that? I apologize for my newbie questions all the time :)
>>>>>>
>>>>>> Right now, I do:
>>>>>>
>>>>>>   ksh ../webrev.ksh -m -N
>>>>>>
>>>>>> That generates a webrev.zip and send it to Chuck Rasbold. He then
>>>>>> uploads it to a new webrev.
>>>>>>
>>>>>> I tried commiting my change and adding a small change. Then if I just
>>>>>> do ksh ../webrev.ksh without any options, it seems to produce a similar
>>>>>> page but now with only the changes I had (so the 06-07 comparison you were
>>>>>> talking about) and a changeset that has it all. I imagine that is what you
>>>>>> meant.
>>>>>>
>>>>>> Which means that my workflow would become:
>>>>>>
>>>>>> 1) Make changes
>>>>>> 2) Make a webrev without any options to show just the differences
>>>>>> with the tip
>>>>>> 3) Amend my changes to my local commit so that I have it done with
>>>>>> 4) Go to 1
>>>>>>
>>>>>> Does that seem correct to you?
>>>>>>
>>>>>> Note that when I do this, I only see the full change of a file in the
>>>>>> full change set (Side note here: now the page says change set and not
>>>>>> patch, which is maybe why Serguei was having issues?).
>>>>>>
>>>>>> Thanks!
>>>>>> Jc
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <robbin.ehn at oracle.com
>>>>>> <mailto:robbin.ehn at oracle.com>> wrote:
>>>>>>
>>>>>>     Hi,
>>>>>>
>>>>>>     On 06/28/2017 12:04 AM, JC Beyler wrote:
>>>>>>
>>>>>>         Dear Thomas et al,
>>>>>>
>>>>>>         Here is the newest webrev:
>>>>>>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ <
>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     You have some more bits to in there but generally this looks good
>>>>>> and really nice with more tests.
>>>>>>     I'll do and deep dive and re-test this when I get back from my
>>>>>> long vacation with whatever patch version you have then.
>>>>>>
>>>>>>     Also I think it's time you provide incremental (v06->07 changes)
>>>>>> as well as complete change-sets.
>>>>>>
>>>>>>     Thanks, Robbin
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         Thomas, I "think" I have answered all your remarks. The
>>>>>> summary is:
>>>>>>
>>>>>>         - The statistic system is up and provides insight on what the
>>>>>> heap sampler is doing
>>>>>>              - I've noticed that, though the sampling rate is at the
>>>>>> right mean, we are missing some samples, I have not yet tracked out why
>>>>>> (details below)
>>>>>>
>>>>>>         - I've run a tiny benchmark that is the worse case: it is a
>>>>>> very tight loop and allocated a small array
>>>>>>              - In this case, I see no overhead when the system is off
>>>>>> so that is a good start :)
>>>>>>              - I see right now a high overhead in this case when
>>>>>> sampling is on. This is not a really too surprising but I'm going to see if
>>>>>> this is consistent with our
>>>>>>         internal implementation. The benchmark is really allocation
>>>>>> stressful so I'm not too surprised but I want to do the due diligence.
>>>>>>
>>>>>>            - The statistic system up is up and I have a new test
>>>>>>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/s
>>>>>> erviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTes
>>>>>> t.java.patch
>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>>>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTe
>>>>>> st.java.patch>
>>>>>>               - I did a bit of a study about the random generator
>>>>>> here, more details are below but basically it seems to work well
>>>>>>
>>>>>>            - I added a capability but since this is the first time
>>>>>> doing this, I was not sure I did it right
>>>>>>              - I did add a test though for it and the test seems to
>>>>>> do what I expect (all methods are failing with the
>>>>>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>>>>>                  - http://cr.openjdk.java.net/~ra
>>>>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito
>>>>>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>>>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>>>>>> bilityTest.java.patch>
>>>>>>
>>>>>>            - I still need to figure out what to do about the
>>>>>> multi-agent vs single-agent issue
>>>>>>
>>>>>>            - As far as measurements, it seems I still need to look at:
>>>>>>              - Why we do the 20 random calls first, are they
>>>>>> necessary?
>>>>>>              - Look at the mean of the sampling rate that the random
>>>>>> generator does and also what is actually sampled
>>>>>>              - What is the overhead in terms of memory/performance
>>>>>> when on?
>>>>>>
>>>>>>         I have inlined my answers, I think I got them all in the new
>>>>>> webrev, let me know your thoughts.
>>>>>>
>>>>>>         Thanks again!
>>>>>>         Jc
>>>>>>
>>>>>>
>>>>>>         On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl <
>>>>>> thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com> <mailto:
>>>>>> thomas.schatzl at oracle.com
>>>>>>
>>>>>>         <mailto:thomas.schatzl at oracle.com>>> wrote:
>>>>>>
>>>>>>              Hi,
>>>>>>
>>>>>>              On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:
>>>>>>              > Hi all,
>>>>>>              >
>>>>>>              > First off: Thanks again to Robbin and Thomas for their
>>>>>> reviews :)
>>>>>>              >
>>>>>>              > Next, I've uploaded a new webrev:
>>>>>>              > http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>>>>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>>>>>>
>>>>>>              >
>>>>>>              > Here is an update:
>>>>>>              >
>>>>>>              > - @Robbin, I forgot to say that yes I need to look at
>>>>>> implementing
>>>>>>              > this for the other architectures and testing it before
>>>>>> it is all
>>>>>>              > ready to go. Is it common to have it working on all
>>>>>> possible
>>>>>>              > combinations or is there a subset that I should be
>>>>>> doing first and we
>>>>>>              > can do the others later?
>>>>>>              > - I've tested slowdebug, built and ran the JTreg tests
>>>>>> I wrote with
>>>>>>              > slowdebug and fixed a few more issues
>>>>>>              > - I've refactored a bit of the code following Thomas'
>>>>>> comments
>>>>>>              >    - I think I've handled all the comments from Thomas
>>>>>> (I put
>>>>>>              > comments inline below for the specifics)
>>>>>>
>>>>>>              Thanks for handling all those.
>>>>>>
>>>>>>              > - Following Thomas' comments on statistics, I want to
>>>>>> add some
>>>>>>              > quality assurance tests and find that the easiest way
>>>>>> would be to
>>>>>>              > have a few counters of what is happening in the
>>>>>> sampler and expose
>>>>>>              > that to the user.
>>>>>>              >    - I'll be adding that in the next version if no one
>>>>>> sees any
>>>>>>              > objections to that.
>>>>>>              >    - This will allow me to add a sanity test in JTreg
>>>>>> about number of
>>>>>>              > samples and average of sampling rate
>>>>>>              >
>>>>>>              > @Thomas: I had a few questions that I inlined below
>>>>>> but I will
>>>>>>              > summarize the "bigger ones" here:
>>>>>>              >    - You mentioned constants are not using the right
>>>>>> conventions, I
>>>>>>              > looked around and didn't see any convention except
>>>>>> normal naming then
>>>>>>              > for static constants. Is that right?
>>>>>>
>>>>>>              I looked through https://wiki.openjdk.java.net/
>>>>>> display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>>>>> /display/HotSpot/StyleGui>
>>>>>>         <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>>>>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>>>>>>              de and the rule is to "follow an existing pattern and
>>>>>> must have a
>>>>>>              distinct appearance from other names". Which does not
>>>>>> help a lot I
>>>>>>              guess :/ The GC team started using upper camel case, e.g.
>>>>>>              SomeOtherConstant, but very likely this is probably not
>>>>>> applied
>>>>>>              consistently throughout. So I am fine with not adding
>>>>>> another style
>>>>>>              (like kMaxStackDepth with the "k" in front with some
>>>>>> unknown meaning)
>>>>>>              is fine.
>>>>>>
>>>>>>              (Chances are you will find that style somewhere used
>>>>>> anyway too,
>>>>>>              apologies if so :/)
>>>>>>
>>>>>>
>>>>>>         Thanks for that link, now I know where to look. I used the
>>>>>> upper camel case in my code as well then :) I should have gotten them all.
>>>>>>
>>>>>>
>>>>>>               > PS: I've also inlined my answers to Thomas below:
>>>>>>               >
>>>>>>               > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl
>>>>>> <thomas.schatzl at oracl
>>>>>>               > e.com <http://e.com> <http://e.com>> wrote:
>>>>>>               > > Hi all,
>>>>>>               > >
>>>>>>               > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
>>>>>>               > > > Dear all,
>>>>>>               > > >
>>>>>>               > > > I've continued working on this and have done the
>>>>>> following
>>>>>>               > > webrev:
>>>>>>               > > > http://cr.openjdk.java.net/~ra
>>>>>> sbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>>>>> asbold/8171119/webrev.05/>
>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>>>>>>
>>>>>>               > >
>>>>>>               > > [...]
>>>>>>               > > > Things I still need to do:
>>>>>>               > > >    - Have to fix that TLAB case for the
>>>>>> FastTLABRefill
>>>>>>               > > >    - Have to start looking at the data to see
>>>>>> that it is
>>>>>>               > > consistent and does gather the right samples, right
>>>>>> frequency, etc.
>>>>>>               > > >    - Have to check the GC elements and what that
>>>>>> produces
>>>>>>               > > >    - Run a slowdebug run and ensure I fixed all
>>>>>> those issues you
>>>>>>               > > saw > Robbin
>>>>>>               > > >
>>>>>>               > > > Thanks for looking at the webrev and have a great
>>>>>> week!
>>>>>>               > >
>>>>>>               > >   scratching a bit on the surface of this change,
>>>>>> so apologies for
>>>>>>               > > rather shallow comments:
>>>>>>               > >
>>>>>>               > > - macroAssembler_x86.cpp:5604: while this is
>>>>>> compiler code, and I
>>>>>>               > > am not sure this is final, please avoid littering
>>>>>> the code with
>>>>>>               > > TODO remarks :) They tend to be candidates for
>>>>>> later wtf moments
>>>>>>               > > only.
>>>>>>               > >
>>>>>>               > > Just file a CR for that.
>>>>>>               > >
>>>>>>               > Newcomer question: what is a CR and not sure I have
>>>>>> the rights to do
>>>>>>               > that yet ? :)
>>>>>>
>>>>>>              Apologies. CR is a change request, this suggests to file
>>>>>> a bug in the
>>>>>>              bug tracker. And you are right, you can't just create a
>>>>>> new account in
>>>>>>              the OpenJDK JIRA yourselves. :(
>>>>>>
>>>>>>
>>>>>>         Ok good to know, I'll continue with my own todo list but I'll
>>>>>> work hard on not letting it slip in the webrevs anymore :)
>>>>>>
>>>>>>
>>>>>>              I was mostly referring to the "... but it is a TODO"
>>>>>> part of that
>>>>>>              comment in macroassembler_x86.cpp. Comments about the
>>>>>> why of the code
>>>>>>              are appreciated.
>>>>>>
>>>>>>              [Note that I now understand that this is to some degree
>>>>>> still work in
>>>>>>              progress. As long as the final changeset does no contain
>>>>>> TODO's I am
>>>>>>              fine (and it's not a hard objection, rather their use in
>>>>>> "final" code
>>>>>>              is typically limited in my experience)]
>>>>>>
>>>>>>              5603   // Currently, if this happens, just set back the
>>>>>> actual end to
>>>>>>              where it was.
>>>>>>              5604   // We miss a chance to sample here.
>>>>>>
>>>>>>              Would be okay, if explaining "this" and the "why" of
>>>>>> missing a chance
>>>>>>              to sample here would be best.
>>>>>>
>>>>>>              Like maybe:
>>>>>>
>>>>>>              // If we needed to refill TLABs, just set the actual end
>>>>>> point to
>>>>>>              // the end of the TLAB again. We do not sample here
>>>>>> although we could.
>>>>>>
>>>>>>         Done with your comment, it works well in my mind.
>>>>>>
>>>>>>              I am not sure whether "miss a chance to sample" meant
>>>>>> "we could, but
>>>>>>              consciously don't because it's not that useful" or "it
>>>>>> would be
>>>>>>              necessary but don't because it's too complicated to do.".
>>>>>>
>>>>>>              Looking at the original comment once more, I am also not
>>>>>> sure if that
>>>>>>              comment shouldn't referring to the "end" variable (not
>>>>>> actual_end)
>>>>>>              because that's the variable that is responsible for
>>>>>> taking the sampling
>>>>>>              path? (Going from the member description of
>>>>>> ThreadLocalAllocBuffer).
>>>>>>
>>>>>>
>>>>>>         I've moved this code and it no longer shows up here but the
>>>>>> rationale and answer was:
>>>>>>
>>>>>>         So.. Yes, end is the variable provoking the sampling. Actual
>>>>>> end is the actual end of the TLAB.
>>>>>>
>>>>>>         What was happening here is that the code is resetting _end to
>>>>>> point towards the end of the new TLAB. Because, we now have the end for
>>>>>> sampling and _actual_end for
>>>>>>         the actual end, we need to update the actual_end as well.
>>>>>>
>>>>>>         Normally, were we to do the real work here, we would
>>>>>> calculate the (end - start) offset, then do:
>>>>>>
>>>>>>         - Set the new end to : start + (old_end - old_start)
>>>>>>         - Set the actual end like we do here now where it because it
>>>>>> is the actual end.
>>>>>>
>>>>>>         Why is this not done here now anymore?
>>>>>>             - I was still debating which path to take:
>>>>>>                - Do it in the fast refill code, it has its perks:
>>>>>>                    - In a world where fast refills are happening all
>>>>>> the time or a lot, we can augment there the code to do the sampling
>>>>>>                - Remember what we had as an end before leaving the
>>>>>> slowpath and check on return
>>>>>>                    - This is what I'm doing now, it removes the need
>>>>>> to go fix up all fast refill paths but if you remain in fast refill paths,
>>>>>> you won't get sampling. I
>>>>>>         have to think of the consequences of that, maybe a future
>>>>>> change later on?
>>>>>>                       - I have the statistics now so I'm going to
>>>>>> study that
>>>>>>                          -> By the way, though my statistics are
>>>>>> showing I'm missing some samples, if I turn off FastTlabRefill, it is the
>>>>>> same loss so for now, it seems
>>>>>>         this does not occur in my simple test.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              But maybe I am only confused and it's best to just leave
>>>>>> the comment
>>>>>>              away. :)
>>>>>>
>>>>>>              Thinking about it some more, doesn't this not-sampling
>>>>>> in this case
>>>>>>              mean that sampling does not work in any collector that
>>>>>> does inline TLAB
>>>>>>              allocation at the moment? (Or is inline TLAB alloc
>>>>>> automatically
>>>>>>              disabled with sampling somehow?)
>>>>>>
>>>>>>              That would indeed be a bigger TODO then :)
>>>>>>
>>>>>>
>>>>>>         Agreed, this remark made me think that perhaps as a first
>>>>>> step the new way of doing it is better but I did have to:
>>>>>>            - Remove the const of the ThreadLocalBuffer remaining and
>>>>>> hard_end methods
>>>>>>            - Move hard_end out of the header file to have a bit more
>>>>>> logic there
>>>>>>
>>>>>>         Please let me know what you think of that and if you prefer
>>>>>> it this way or changing the fast refills. (I prefer this way now because it
>>>>>> is more incremental).
>>>>>>
>>>>>>
>>>>>>              > > - calling HeapMonitoring::do_weak_oops() (which
>>>>>> should probably be
>>>>>>              > > called weak_oops_do() like other similar methods)
>>>>>> only if string
>>>>>>              > > deduplication is enabled (in
>>>>>> g1CollectedHeap.cpp:4511) seems wrong.
>>>>>>              >
>>>>>>              > The call should be at least around 6 lines up outside
>>>>>> the if.
>>>>>>              >
>>>>>>              > Preferentially in a method like
>>>>>> process_weak_jni_handles(), including
>>>>>>              > additional logging. (No new (G1) gc phase without
>>>>>> minimal logging
>>>>>>              > :)).
>>>>>>              > Done but really not sure because:
>>>>>>              >
>>>>>>              > I put for logging:
>>>>>>              >   log_develop_trace(gc, freelist)("G1ConcRegionFreeing
>>>>>> [other] : heap
>>>>>>              > monitoring");
>>>>>>
>>>>>>              I would think that "gc, ref" would be more appropriate
>>>>>> log tags for
>>>>>>              this similar to jni handles.
>>>>>>              (I am als not sure what weak reference handling has to
>>>>>> do with
>>>>>>              G1ConcRegionFreeing, so I am a bit puzzled)
>>>>>>
>>>>>>
>>>>>>         I was not sure what to put for the tags or really as the
>>>>>> message. I cleaned it up a bit now to:
>>>>>>              log_develop_trace(gc, ref)("HeapSampling [other] : heap
>>>>>> monitoring processing");
>>>>>>
>>>>>>
>>>>>>
>>>>>>              > Since weak_jni_handles didn't have logging for me to
>>>>>> be inspired
>>>>>>              > from, I did that but unconvinced this is what should
>>>>>> be done.
>>>>>>
>>>>>>              The JNI handle processing does have logging, but only in
>>>>>>              ReferenceProcessor::process_discovered_references(). In
>>>>>>              process_weak_jni_handles() only overall time is measured
>>>>>> (in a G1
>>>>>>              specific way, since only G1 supports disabling reference
>>>>>> procesing) :/
>>>>>>
>>>>>>              The code in ReferenceProcessor prints both time taken
>>>>>>              referenceProcessor.cpp:254, as well as the count, but
>>>>>> strangely only in
>>>>>>              debug VMs.
>>>>>>
>>>>>>              I have no idea why this logging is that unimportant to
>>>>>> only print that
>>>>>>              in a debug VM. However there are reviews out for
>>>>>> changing this area a
>>>>>>              bit, so it might be useful to wait for that
>>>>>> (JDK-8173335).
>>>>>>
>>>>>>
>>>>>>         I cleaned it up a bit anyway and now it returns the count of
>>>>>> objects that are in the system.
>>>>>>
>>>>>>
>>>>>>              > > - the change doubles the size of
>>>>>>              > > CollectedHeap::allocate_from_tlab_slow() above the
>>>>>> "small and nice"
>>>>>>              > > threshold. Maybe it could be refactored a bit.
>>>>>>              > Done I think, it looks better to me :).
>>>>>>
>>>>>>              In ThreadLocalAllocBuffer::handle_sample() I think the
>>>>>>              set_back_actual_end()/pick_next_sample() calls could be
>>>>>> hoisted out of
>>>>>>              the "if" :)
>>>>>>
>>>>>>
>>>>>>         Done!
>>>>>>
>>>>>>
>>>>>>              > > - referenceProcessor.cpp:261: the change should add
>>>>>> logging about
>>>>>>              > > the number of references encountered, maybe after
>>>>>> the corresponding
>>>>>>              > > "JNI weak reference count" log message.
>>>>>>              > Just to double check, are you saying that you'd like
>>>>>> to have the heap
>>>>>>              > sampler to keep in store how many sampled objects were
>>>>>> encountered in
>>>>>>              > the HeapMonitoring::weak_oops_do?
>>>>>>              >    - Would a return of the method with the number of
>>>>>> handled
>>>>>>              > references and logging that work?
>>>>>>
>>>>>>              Yes, it's fine if HeapMonitoring::weak_oops_do() only
>>>>>> returned the
>>>>>>              number of processed weak oops.
>>>>>>
>>>>>>
>>>>>>         Done also (but I admit I have not tested the output yet) :)
>>>>>>
>>>>>>
>>>>>>              >    - Additionally, would you prefer it in a separate
>>>>>> block with its
>>>>>>              > GCTraceTime?
>>>>>>
>>>>>>              Yes. Both kinds of information is interesting: while the
>>>>>> time taken is
>>>>>>              typically more important, the next question would be
>>>>>> why, and the
>>>>>>              number of references typically goes a long way there.
>>>>>>
>>>>>>              See above though, it is probably best to wait a bit.
>>>>>>
>>>>>>
>>>>>>         Agreed that I "could" wait but, if it's ok, I'll just
>>>>>> refactor/remove this when we get closer to something final. Either,
>>>>>> JDK-8173335
>>>>>>         has gone in and I will notice it now or it will soon and I
>>>>>> can change it then.
>>>>>>
>>>>>>
>>>>>>              > > - threadLocalAllocBuffer.cpp:331: one more "TODO"
>>>>>>              > Removed it and added it to my personal todos to look
>>>>>> at.
>>>>>>              >      > >
>>>>>>              > > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer
>>>>>> class
>>>>>>              > > documentation should be updated about the sampling
>>>>>> additions. I
>>>>>>              > > would have no clue what the difference between
>>>>>> "actual_end" and
>>>>>>              > > "end" would be from the given information.
>>>>>>              > If you are talking about the comments in this file, I
>>>>>> made them more
>>>>>>              > clear I hope in the new webrev. If it was somewhere
>>>>>> else, let me know
>>>>>>              > where to change.
>>>>>>
>>>>>>              Thanks, that's much better. Maybe a note in the comment
>>>>>> of the class
>>>>>>              that ThreadLocalBuffer provides some sampling facility
>>>>>> by modifying the
>>>>>>              end() of the TLAB to cause "frequent" calls into the
>>>>>> runtime call where
>>>>>>              actual sampling takes place.
>>>>>>
>>>>>>
>>>>>>         Done, I think it's better now. Added something about the
>>>>>> slow_path_end as well.
>>>>>>
>>>>>>
>>>>>>              > > - in heapMonitoring.hpp: there are some random
>>>>>> comments about some
>>>>>>              > > code that has been grabbed from
>>>>>> "util/math/fastmath.[h|cc]". I
>>>>>>              > > can't tell whether this is code that can be used but
>>>>>> I assume that
>>>>>>              > > Noam Shazeer is okay with that (i.e. that's all
>>>>>> Google code).
>>>>>>              > Jeremy and I double checked and we can release that as
>>>>>> I thought. I
>>>>>>              > removed the comment from that piece of code entirely.
>>>>>>
>>>>>>              Thanks.
>>>>>>
>>>>>>              > > - heapMonitoring.hpp/cpp static constant naming does
>>>>>> not correspond
>>>>>>              > > to Hotspot's. Additionally, in Hotspot static
>>>>>> methods are cased
>>>>>>              > > like other methods.
>>>>>>              > I think I fixed the methods to be cased the same way
>>>>>> as all other
>>>>>>              > methods. For static constants, I was not sure. I fixed
>>>>>> a few other
>>>>>>              > variables but I could not seem to really see a
>>>>>> consistent trend for
>>>>>>              > constants. I made them as variables but I'm not sure
>>>>>> now.
>>>>>>
>>>>>>              Sorry again, style is a kind of mess. The goal of my
>>>>>> suggestions here
>>>>>>              is only to prevent yet another style creeping in.
>>>>>>
>>>>>>              > > - in heapMonitoring.cpp there are a few cryptic
>>>>>> comments at the top
>>>>>>              > > that seem to refer to internal stuff that should
>>>>>> probably be
>>>>>>              > > removed.
>>>>>>              > Sorry about that! My personal todos not cleared out.
>>>>>>
>>>>>>              I am happy about comments, but I simply did not
>>>>>> understand any of that
>>>>>>              and I do not know about other readers as well.
>>>>>>
>>>>>>              If you think you will remember removing/updating them
>>>>>> until the review
>>>>>>              proper (I misunderstood the review situation a little it
>>>>>> seems).
>>>>>>
>>>>>>              > > I did not think through the impact of the TLAB
>>>>>> changes on collector
>>>>>>              > > behavior yet (if there are). Also I did not check
>>>>>> for problems with
>>>>>>              > > concurrent mark and SATB/G1 (if there are).
>>>>>>              > I would love to know your thoughts on this, I think
>>>>>> this is fine. I
>>>>>>
>>>>>>              I think so too now. No objects are made live out of thin
>>>>>> air :)
>>>>>>
>>>>>>              > see issues with multiple threads right now hitting the
>>>>>> stack storage
>>>>>>              > instance. Previous webrevs had a mutex lock here but
>>>>>> we took it out
>>>>>>              > for simplificity (and only for now).
>>>>>>
>>>>>>              :) When looking at this after some thinking I now assume
>>>>>> for this
>>>>>>              review that this code is not MT safe at all. There seems
>>>>>> to be more
>>>>>>              synchronization missing than just the one for the
>>>>>> StackTraceStorage. So
>>>>>>              no comments about this here.
>>>>>>
>>>>>>
>>>>>>         I doubled checked a bit (quickly I admit) but it seems that
>>>>>> synchronization in StackTraceStorage is really all you need (all methods
>>>>>> lead to a StackTraceStorage one
>>>>>>         and can be multithreaded outside of that).
>>>>>>         There is a question about the initialization where the method
>>>>>> HeapMonitoring::initialize_profiling is not thread safe.
>>>>>>         It would work (famous last words) and not crash if there was
>>>>>> a race but we could add a synchronization point there as well (and
>>>>>> therefore on the stop as well).
>>>>>>
>>>>>>         But anyway I will really check and do this once we add back
>>>>>> synchronization.
>>>>>>
>>>>>>
>>>>>>              Also, this would require some kind of specification of
>>>>>> what is allowed
>>>>>>              to be called when and where.
>>>>>>
>>>>>>
>>>>>>         Would we specify this with the methods in the jvmti.xml file?
>>>>>> We could start by specifying in each that they are not thread safe but I
>>>>>> saw no mention of that for
>>>>>>         other methods.
>>>>>>
>>>>>>
>>>>>>              One potentially relevant observation about locking here:
>>>>>> depending on
>>>>>>              sampling frequency, StackTraceStore::add_trace() may be
>>>>>> rather
>>>>>>              frequently called. I assume that you are going to do
>>>>>> measurements :)
>>>>>>
>>>>>>
>>>>>>         Though we don't have the TLAB implementation in our code, the
>>>>>> compiler generated sampler uses 2% of overhead with a 512k sampling rate. I
>>>>>> can do real measurements
>>>>>>         when the code settles and we can see how costly this is as a
>>>>>> TLAB implementation.
>>>>>>         However, my theory is that if the rate is 512k, the
>>>>>> memory/performance overhead should be minimal since it is what we saw with
>>>>>> our code/workloads (though not called
>>>>>>         the same way, we call it essentially at the same rate).
>>>>>>         If you have a benchmark you'd like me to test, let me know!
>>>>>>
>>>>>>         Right now, with my really small test, this does use a bit of
>>>>>> overhead even for a 512k sample size. I don't know yet why, I'm going to
>>>>>> see what is going on.
>>>>>>
>>>>>>         Finally, I think it is not reasonable to suppose the overhead
>>>>>> to be negligible if the sampling rate used is too low. The user should know
>>>>>> that the lower the rate,
>>>>>>         the higher the overhead (documentation TODO?).
>>>>>>
>>>>>>
>>>>>>              I am not sure what the expected usage of the API is, but
>>>>>>              StackTraceStore::add_trace() seems to be able to grow
>>>>>> without bounds.
>>>>>>              Only a GC truncates them to the live ones. That in
>>>>>> itself seems to be
>>>>>>              problematic (GCs can be *wide* apart), and of course
>>>>>> some of the API
>>>>>>              methods add to that because they duplicate that
>>>>>> unbounded array. Do you
>>>>>>              have any concerns/measurements about this?
>>>>>>
>>>>>>
>>>>>>         So, the theory is that yes add_trace can be able to grow
>>>>>> without bounds but it grows at a sample per 512k of allocated space. The
>>>>>> stacks it gathers are currently
>>>>>>         maxed at 64 (I'd like to expand that to an option to the user
>>>>>> though at some point). So I have no concerns because:
>>>>>>
>>>>>>         - If really this is taking a lot of space, that means the job
>>>>>> is keeping a lot of objects in memory as well, therefore the entire heap is
>>>>>> getting huge
>>>>>>         - If this is the case, you will be triggering a GC at some
>>>>>> point anyway.
>>>>>>
>>>>>>         (I'm putting under the rug the issue of "What if we set the
>>>>>> rate to 1 for example" because as you lower the sampling rate, we cannot
>>>>>> guarantee low overhead; the
>>>>>>         idea behind this feature is to have a means of having
>>>>>> meaningful allocated samples at a low overhead)
>>>>>>
>>>>>>         I have no measurements really right now but since I now have
>>>>>> some statistics I can poll, I will look a bit more at this question.
>>>>>>
>>>>>>         I have the same last sentence than above: the user should
>>>>>> expect this to happen if the sampling rate is too small. That probably can
>>>>>> be reflected in the
>>>>>>         StartHeapSampling as a note : careful this might impact your
>>>>>> performance.
>>>>>>
>>>>>>
>>>>>>              Also, these stack traces might hold on to huge arrays.
>>>>>> Any
>>>>>>              consideration of that? Particularly it might be the
>>>>>> cause for OOMEs in
>>>>>>              tight memory situations.
>>>>>>
>>>>>>
>>>>>>         There is a stack size maximum that is set to 64 so it should
>>>>>> not hold huge arrays. I don't think this is an issue but I can double check
>>>>>> with a test or two.
>>>>>>
>>>>>>
>>>>>>              - please consider adding a safepoint check in
>>>>>>              HeapMonitoring::weak_oops_do to prevent accidental
>>>>>> misuse.
>>>>>>
>>>>>>              - in struct StackTraceStorage, the public fields may
>>>>>> also need
>>>>>>              underscores. At least some files in the runtime
>>>>>> directory have structs
>>>>>>              with underscored public members (and some don't). The
>>>>>> runtime team
>>>>>>              should probably comment on that.
>>>>>>
>>>>>>
>>>>>>         Agreed I did not know. I looked around and a lot of structs
>>>>>> did not have them it seemed so I left it as is. I will happily change it if
>>>>>> someone prefers (I was not
>>>>>>         sure if you really preferred or not, your sentence seemed to
>>>>>> be more a note of "this might need to change but I don't know if the
>>>>>> runtime team enforces that", let
>>>>>>         me know if I read that wrongly).
>>>>>>
>>>>>>
>>>>>>              - In StackTraceStorage::weak_oops_do(), when examining
>>>>>> the
>>>>>>              StackTraceData, maybe it is useful to consider having a
>>>>>> non-NULL
>>>>>>              reference outside of the heap's reserved space an error.
>>>>>> There should
>>>>>>              be no oop outside of the heap's reserved space ever.
>>>>>>
>>>>>>              Unless you allow storing random values in
>>>>>> StackTraceData::obj, which I
>>>>>>              would not encourage.
>>>>>>
>>>>>>
>>>>>>         I suppose you are talking about this part:
>>>>>>         if ((value != NULL && Universe::heap()->is_in_reserved(value))
>>>>>> &&
>>>>>>                     (is_alive == NULL ||
>>>>>> is_alive->do_object_b(value))) {
>>>>>>
>>>>>>         What you are saying is that I could have something like:
>>>>>>         if (value != my_non_null_reference &&
>>>>>>                     (is_alive == NULL ||
>>>>>> is_alive->do_object_b(value))) {
>>>>>>
>>>>>>         Is that what you meant? Is there really a reason to do so?
>>>>>> When I look at the code, is_in_reserved seems like a O(1) method call. I'm
>>>>>> not even sure we can have a
>>>>>>         NULL value to be honest. I might have to study that to see if
>>>>>> this was not a paranoid test to begin with.
>>>>>>
>>>>>>         The is_alive code has now morphed due to the comment below.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              - HeapMonitoring::weak_oops_do() does not seem to use the
>>>>>>              passed AbstractRefProcTaskExecutor.
>>>>>>
>>>>>>
>>>>>>         It did use it:
>>>>>>            size_t HeapMonitoring::weak_oops_do(
>>>>>>               AbstractRefProcTaskExecutor *task_executor,
>>>>>>               BoolObjectClosure* is_alive,
>>>>>>               OopClosure *f,
>>>>>>               VoidClosure *complete_gc) {
>>>>>>             assert(SafepointSynchronize::is_at_safepoint(), "must be
>>>>>> at safepoint");
>>>>>>
>>>>>>             if (task_executor != NULL) {
>>>>>>               task_executor->set_single_threaded_mode();
>>>>>>             }
>>>>>>             return StackTraceStorage::storage()->weak_oops_do(is_alive,
>>>>>> f, complete_gc);
>>>>>>         }
>>>>>>
>>>>>>         But due to the comment below, I refactored this, so this is
>>>>>> no longer here. Now I have an always true closure that is passed.
>>>>>>
>>>>>>
>>>>>>              - I do not understand allowing to call this method with
>>>>>> a NULL
>>>>>>              complete_gc closure. This would mean that objects
>>>>>> referenced from the
>>>>>>              object that is referenced by the StackTraceData are not
>>>>>> pulled, meaning
>>>>>>              they would get stale.
>>>>>>
>>>>>>              - same with is_alive parameter value of NULL
>>>>>>
>>>>>>
>>>>>>         So these questions made me look a bit closer at this code.
>>>>>> This code I think was written this way to have a very small impact on the
>>>>>> file but you are right, there
>>>>>>         is no reason for this here. I've simplified the code by
>>>>>> making in referenceProcessor.cpp a process_HeapSampling method that handles
>>>>>> everything there.
>>>>>>
>>>>>>         The code allowed NULLs because it depended on where you were
>>>>>> coming from and how the code was being called.
>>>>>>
>>>>>>         - I added a static always_true variable and pass that now to
>>>>>> be more consistent with the rest of the code.
>>>>>>         - I moved the complete_gc into process_phaseHeapSampling now
>>>>>> (new method) and handle the task_executor and the complete_gc there
>>>>>>              - Newbie question: in our code we did a
>>>>>> set_single_threaded_mode but I see that process_phaseJNI does it right
>>>>>> before its call, do I need to do it for the
>>>>>>         process_phaseHeapSample?
>>>>>>         That API is much cleaner (in my mind) and is consistent with
>>>>>> what is done around it (again in my mind).
>>>>>>
>>>>>>
>>>>>>              - heapMonitoring.cpp:590: I do not completely understand
>>>>>> the purpose of
>>>>>>              this code: in the end this results in a fixed value
>>>>>> directly dependent
>>>>>>              on the Thread address anyway? In the end this results in
>>>>>> a fixed value
>>>>>>              directly dependent on the Thread address anyway?
>>>>>>              IOW, what is special about exactly 20 rounds?
>>>>>>
>>>>>>
>>>>>>         So we really want a fast random number generator that has a
>>>>>> specific mean (512k is the default we use). The code uses the thread
>>>>>> address as the start number of the
>>>>>>         sequence (why not, it is random enough is rationale). Then
>>>>>> instead of just starting there, we prime the sequence and really only start
>>>>>> at the 21st number, it is
>>>>>>         arbitrary and I have not done a study to see if we could do
>>>>>> more or less of that.
>>>>>>
>>>>>>         As I have the statistics of the system up and running, I'll
>>>>>> run some experiments to see if this is needed, is 20 good, or not.
>>>>>>
>>>>>>
>>>>>>              - also I would consider stripping a few bits of the
>>>>>> threads' address as
>>>>>>              initialization value for your rng. The last three bits
>>>>>> (and probably
>>>>>>              more, check whether the Thread object is allocated on
>>>>>> special
>>>>>>              boundaries) are always zero for them.
>>>>>>              Not sure if the given "random" value is random enough
>>>>>> before/after,
>>>>>>              this method, so just skip that comment if you think this
>>>>>> is not
>>>>>>              required.
>>>>>>
>>>>>>
>>>>>>         I don't know is the honest answer. I think what is important
>>>>>> is that we tend towards a mean and it is random "enough" to not fall in
>>>>>> pitfalls of only sampling a
>>>>>>         subset of objects due to their allocation order. I added that
>>>>>> as test to do to see if it changes the mean in any way for the 512k default
>>>>>> value and/or if the first
>>>>>>         1000 elements look better.
>>>>>>
>>>>>>
>>>>>>              Some more random nits I did not find a place to put
>>>>>> anywhere:
>>>>>>
>>>>>>              - ThreadLocalAllocBuffer::_extra_space does not seem to
>>>>>> be used
>>>>>>              anywhere?
>>>>>>
>>>>>>
>>>>>>         Good catch :).
>>>>>>
>>>>>>
>>>>>>              - Maybe indent the declaration of
>>>>>> ThreadLocalAllocBuffer::_bytes_until_sample to align below the other
>>>>>> members of that group.
>>>>>>
>>>>>>
>>>>>>         Done moved it up a bit to have non static members together
>>>>>> and static separate.
>>>>>>
>>>>>>              Thanks,
>>>>>>                 Thomas
>>>>>>
>>>>>>
>>>>>>         Thanks for your review!
>>>>>>         Jc
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171002/6e18c8d9/attachment-0001.html>

From rwestrel at redhat.com  Tue Oct  3 13:19:47 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 03 Oct 2017 15:19:47 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
Message-ID: <dk6k20cxtak.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8186027/webrev.00/

This converts loop:

for (int i = start; i < stop; i += inc) {
  // body
}

to a loop nest:

i = start;
if (i < stop) {
  do {
    int next = MIN(stop, i+LoopStripMiningIter*inc);
    do {
      // body
      i += inc;
    } while (i < next);
    safepoint();
  } while (i < stop);
}

(It's actually:
int next = MIN(stop - i, LoopStripMiningIter*inc) + i;
to protect against overflows)

This should bring the best of running with UseCountedLoopSafepoints on
and running with it off: low time to safepoint with little to no impact
on throughput. That change was first pushed to the shenandoah repo
several months ago and we've been running with it enabled since.

The command line argument LoopStripMiningIter is the number of
iterations between safepoints. In practice, with an arbitrary
LoopStripMiningIter=1000, we observe time to safepoint on par with the
current -XX:+UseCountedLoopSafepoints and most performance regressions
due to -XX:+UseCountedLoopSafepoints gone. The exception is when an
inner counted loop runs for a low number of iterations on average (and
the compiler doesn't have an upper bound on the number of iteration).

This is enabled on the command line with:
-XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000

In PhaseIdealLoop::is_counted_loop(), when loop strip mining is enabled,
for an inner loop, the compiler builds a skeleton outer loop around the
the counted loop. The outer loop is kept as simple as possible so
required adjustments to the existing loop optimizations are not too
intrusive. The reason the outer loop is inserted early in the
optimization process is so that optimizations are not disrupted: an
alternate implementation could have kept the safepoint in the counted
loop until loop opts are over and then only have added the outer loop
and moved the safepoint to the outer loop. That would have prevented
nodes that are referenced in the safepoint to be sunk out of loop for
instance.

The outer loop is a LoopNode with a backedge to a loop exit test and a
safepoint. The loop exit test is a CmpI with a new Opaque5Node. The
skeleton loop is populated with all required Phis after loop opts are
over during macro expansion. At that point only, the loop exit tests are
adjusted so the inner loop runs for at most LoopStripMiningIter. If the
compiler can prove the inner loop runs for no more than
LoopStripMiningIter then during macro expansion, the outer loop is
removed. The safepoint is removed only if the inner loop executes for
less than LoopStripMiningIterShortLoop so that if there are several
counted loops in a raw, we still poll for safepoints regularly.

Until macro expansion, there can be only a few extra nodes in the outer
loop: nodes that would have sunk out of the inner loop and be kept in
the outer loop by the safepoint.

PhaseIdealLoop::clone_loop() which is used by most loop opts has now
several ways of cloning a counted loop. For loop unswitching, both inner
and outer loops need to be cloned. For unrolling, only the inner loop
needs to be cloned. For pre/post loops insertion, only the inner loop
needs to be cloned but the control flow must connect one of the inner
loop copies to the outer loop of the other copy.

Beyond verifying performance results with the usual benchmarks, when I
implemented that change, I wrote test cases for (hopefully) every loop
optimization and verified by inspection of the generated code that the
loop opt triggers correct with loop strip mining.

Roland.

From vladimir.kozlov at oracle.com  Tue Oct  3 21:42:45 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 3 Oct 2017 14:42:45 -0700
Subject: RFR(S): 8187822: C2 conditonal move optimization might create
 broken graph
In-Reply-To: <dk6vajyxa69.fsf@rwestrel.remote.csb>
References: <dk6h8vqeu8z.fsf@rwestrel.remote.csb>
 <0d0b226d-418f-2344-2ff9-a7682747a0e2@oracle.com>
 <dk61smsdwpp.fsf@rwestrel.remote.csb>
 <66e0634c-84a6-f8ef-6451-bf1e6a9be252@oracle.com>
 <dk6d169zza5.fsf@rwestrel.remote.csb>
 <aa3894c5-2c31-d0d7-b546-2739c1259c45@oracle.com>
 <dk64lriypt4.fsf@rwestrel.remote.csb> <dk6vajyxa69.fsf@rwestrel.remote.csb>
Message-ID: <05ec8fdd-a3ea-ac73-2f53-518c57881574@oracle.com>

I submitted pre-integration testing. I will push if it passed.

Vladimir

On 10/2/17 12:48 AM, Roland Westrelin wrote:
> 
> Ready to push changeset:
> 
> http://cr.openjdk.java.net/~roland/8187822/changeset
> 
> Roland.
> 

From dean.long at oracle.com  Wed Oct  4 01:18:05 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Tue, 3 Oct 2017 18:18:05 -0700
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco,
 handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure
 with C1
In-Reply-To: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
Message-ID: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>

If handler->scope_count() > prev_scope, then can we skip the find 
because no duplicate is possible?

dl


On 10/2/17 2:40 AM, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8188151/webrev.00/
>
> When Compilation::generate_exception_handler_table() walks the exception
> handler information to populate the exception handler table, it has some
> logic that removes duplicate handlers for one particular throwing pc and
> it is wrong AFAICT.
>
> That code iterates over already processed (handler_bci, scope_count,
> entry_pco) triples stored in GrowableArrays bcis, scope_depths, pcos and
> looks for entries for which handler_bci, scope_count are identical to
> the current one. It does that by looking for an entry with same
> handler_bci in the bcis array and then checks whether scope_count
> matches too. The list of triples could be something like:
>
> 1: (13, 0, ..)
> 2: (13, 1, ..)
>
> and the next triple to be process: (13, 1, ..) which is a duplicate of
> 2. That logic looks for a handler with bci 13, finds entry 1 which
> doesn't have scope count 1. And concludes that there no duplicate
> entry. It would need to look at the following entry too. Given scope
> counts are sorted in increasing order, rather that iterate over the list
> of triples from the start, looking for duplicates fromt the end of the
> list fixes that problem.
>
> Roland.


From lutz.schmidt at sap.com  Wed Oct  4 07:10:35 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Wed, 4 Oct 2017 07:10:35 +0000
Subject: [10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long)
In-Reply-To: <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com>
References: <c2cd5746-bd97-c5f4-5cfc-0bdcff223556@bell-sw.com>
 <5cfb314b-9785-fe47-8797-a899f38643ef@redhat.com>
 <32a35ee7-9593-bbc4-a540-37c590ec3ba6@redhat.com>
 <5fa7e391-bde0-9310-7bcd-98cf96b3f158@bell-sw.com>
 <3d76ff3c-17d7-a37c-2959-8f5e6dacd854@oracle.com>
 <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com>
 <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com>
Message-ID: <EA235E59-9EA3-42BE-9275-949BC9C9A228@sap.com>

Hi Dmitrij,
Your change looks good.
It works for my multiplyHigh implementation on s390 and ppc (not yet RFR?ed, delayed until your change is in).
Regards, 
Lutz 

 
On 02.10.2017, 15:47, "hotspot-compiler-dev on behalf of Dmitrij Pochepko" <hotspot-compiler-dev-bounces at openjdk.java.net on behalf of dmitrij.pochepko at bell-sw.com> wrote:

    Hi,
    
    please find rebased webrev here: 
    http://cr.openjdk.java.net/~dpochepk/8187684/webrev.newws.01/
    
    
    Thanks,
    
    Dmitij
    
    
    On 29.09.2017 02:40, Vladimir Kozlov wrote:
    > Dmitry,
    >
    > Please, update changes for new consolidated sources and send new 
    > patch/webrev.
    >
    > Thanks,
    > Vladimir
    >
    > On 9/25/17 9:42 AM, Vladimir Kozlov wrote:
    >> Yes, when repo will be opened.
    >>
    >> Please, send patch and add latest webrev link to the RFE.
    >>
    >> Thanks,
    >> Vladimir
    >>
    >> On 9/25/17 5:04 AM, Dmitrij Pochepko wrote:
    >>>
    >>> On 25.09.2017 14:04, Andrew Haley wrote:
    >>>> On 20/09/17 14:29, Andrew Haley wrote:
    >>>>> On 20/09/17 14:08, Dmitrij Pochepko wrote:
    >>>>>> please review small patch for enhancement: 8187684 - Intrinsify
    >>>>>> Math.multiplyHigh(long, long)
    >>>>> OK, thanks.
    >>>> Dmitrij, do you have a sponsor for this?  I'm sure Vladimir would
    >>>> be happy to help.  :-)
    >>>>
    >>> Hi,
    >>>
    >>> Vladimir, can you sponsor it?
    >>>
    >>> Thanks,
    >>> Dmitrij
    
    
From martin.doerr at sap.com  Wed Oct  4 12:28:25 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 4 Oct 2017 12:28:25 +0000
Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support.
 Part II
In-Reply-To: <6073FB33-D580-447F-A201-43FB0EB9867C@sap.com>
References: <6073FB33-D580-447F-A201-43FB0EB9867C@sap.com>
Message-ID: <b6630dd8323a46a4a4c956f91458d668@sap.com>

Hi Lutz,

reviewed and pushed. Thanks for the contribution.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Donnerstag, 28. September 2017 15:31
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support. Part II

Dear all,

I would like to request reviews for this s390-only enhancement:

Bug:     https://bugs.openjdk.java.net/browse/JDK-8187969
Webrev:  http://cr.openjdk.java.net/~lucy/webrevs/8187969.00/index.html

This change is all about providing the instruction definitions and related low-level code emitters for the vector string instructions, introduced with z13. It only facilitates code generation. No code is generated by the change itself.

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171004/b35caa86/attachment.html>

From rwestrel at redhat.com  Wed Oct  4 12:29:01 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 04 Oct 2017 14:29:01 +0200
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i),
 scope_depth)->pco() == handler_pcos->at(i))" failure with C1
In-Reply-To: <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
Message-ID: <dk6zi97w0z6.fsf@rwestrel.remote.csb>


Thanks for looking at this, Dean.

> If handler->scope_count() > prev_scope, then can we skip the find 
> because no duplicate is possible?

Yes, sounds like a reasonable optimization.

Roland.

From lutz.schmidt at sap.com  Wed Oct  4 12:29:36 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Wed, 4 Oct 2017 12:29:36 +0000
Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support.
 Part II
In-Reply-To: <b6630dd8323a46a4a4c956f91458d668@sap.com>
References: <6073FB33-D580-447F-A201-43FB0EB9867C@sap.com>
 <b6630dd8323a46a4a4c956f91458d668@sap.com>
Message-ID: <B5099D9E-1528-4F46-A4DC-19876822FE61@sap.com>

Thank you, Martin!
Regards, Lutz


On 04.10.2017, 14:28, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hi Lutz,

reviewed and pushed. Thanks for the contribution.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Donnerstag, 28. September 2017 15:31
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8187969: [s390] z/Architecture Vector Facility Support. Part II

Dear all,

I would like to request reviews for this s390-only enhancement:

Bug:     https://bugs.openjdk.java.net/browse/JDK-8187969
Webrev:  http://cr.openjdk.java.net/~lucy/webrevs/8187969.00/index.html

This change is all about providing the instruction definitions and related low-level code emitters for the vector string instructions, introduced with z13. It only facilitates code generation. No code is generated by the change itself.

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171004/bfe81ca2/attachment.html>

From vladimir.kozlov at oracle.com  Wed Oct  4 18:52:10 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 4 Oct 2017 11:52:10 -0700
Subject: [10] RFR(S): 8187684 - Intrinsify Math.multiplyHigh(long, long)
In-Reply-To: <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com>
References: <c2cd5746-bd97-c5f4-5cfc-0bdcff223556@bell-sw.com>
 <5cfb314b-9785-fe47-8797-a899f38643ef@redhat.com>
 <32a35ee7-9593-bbc4-a540-37c590ec3ba6@redhat.com>
 <5fa7e391-bde0-9310-7bcd-98cf96b3f158@bell-sw.com>
 <3d76ff3c-17d7-a37c-2959-8f5e6dacd854@oracle.com>
 <1e07d1a7-8145-4a0e-7693-d3816551e65d@oracle.com>
 <768247bc-0ae5-b813-5b66-dbd5e91a8e8e@bell-sw.com>
Message-ID: <549a6c92-1bd1-4262-3831-16dc959854c5@oracle.com>

These changes passed pre-integration testing. I will push them.

Thanks,
Vladimir

On 10/2/17 6:47 AM, Dmitrij Pochepko wrote:
> Hi,
> 
> please find rebased webrev here: http://cr.openjdk.java.net/~dpochepk/8187684/webrev.newws.01/
> 
> 
> Thanks,
> 
> Dmitij
> 
> 
> On 29.09.2017 02:40, Vladimir Kozlov wrote:
>> Dmitry,
>>
>> Please, update changes for new consolidated sources and send new patch/webrev.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/25/17 9:42 AM, Vladimir Kozlov wrote:
>>> Yes, when repo will be opened.
>>>
>>> Please, send patch and add latest webrev link to the RFE.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/25/17 5:04 AM, Dmitrij Pochepko wrote:
>>>>
>>>> On 25.09.2017 14:04, Andrew Haley wrote:
>>>>> On 20/09/17 14:29, Andrew Haley wrote:
>>>>>> On 20/09/17 14:08, Dmitrij Pochepko wrote:
>>>>>>> please review small patch for enhancement: 8187684 - Intrinsify
>>>>>>> Math.multiplyHigh(long, long)
>>>>>> OK, thanks.
>>>>> Dmitrij, do you have a sponsor for this?? I'm sure Vladimir would
>>>>> be happy to help.? :-)
>>>>>
>>>> Hi,
>>>>
>>>> Vladimir, can you sponsor it?
>>>>
>>>> Thanks,
>>>> Dmitrij
> 

From lutz.schmidt at sap.com  Fri Oct  6 09:10:27 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Fri, 6 Oct 2017 09:10:27 +0000
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long)
Message-ID: <F8852E75-B9BA-4C22-8FB2-F35684432254@sap.com>

Dear all,

I would like to request reviews for this s390-only enhancement (ppc support was already implemented for other purpose):

Bug:    https://bugs.openjdk.java.net/browse/JDK-8187964
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187964.00/index.html

This change provides platform-specific implementations for the Math.multiplyHigh method, exploiting 64bit x 64bit => 128bit multiply instructions available on these platforms. Microbenchmark performance shows improvement of 4x to 5x for [s390] and 10x to 15x for [ppc].

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171006/8ff35fc1/attachment.html>

From martin.doerr at sap.com  Fri Oct  6 13:03:46 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 6 Oct 2017 13:03:46 +0000
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, 
 long)
In-Reply-To: <F8852E75-B9BA-4C22-8FB2-F35684432254@sap.com>
References: <F8852E75-B9BA-4C22-8FB2-F35684432254@sap.com>
Message-ID: <c73139111fdb41a3bdae1afcfbb54f18@sap.com>

Hi Lutz,

looks good. If you like, you can get rid of one or both tmp registers if you want to save them.

Did you also check if it improves long division which also uses multiply high nodes?

I can sponsor this change.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Freitag, 6. Oktober 2017 11:10
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long)

Dear all,

I would like to request reviews for this s390-only enhancement (ppc support was already implemented for other purpose):

Bug:    https://bugs.openjdk.java.net/browse/JDK-8187964
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187964.00/index.html

This change provides platform-specific implementations for the Math.multiplyHigh method, exploiting 64bit x 64bit => 128bit multiply instructions available on these platforms. Microbenchmark performance shows improvement of 4x to 5x for [s390] and 10x to 15x for [ppc].

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171006/c8a7364d/attachment.html>

From lutz.schmidt at sap.com  Fri Oct  6 14:14:38 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Fri, 6 Oct 2017 14:14:38 +0000
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, 
 long)
In-Reply-To: <c73139111fdb41a3bdae1afcfbb54f18@sap.com>
References: <F8852E75-B9BA-4C22-8FB2-F35684432254@sap.com>
 <c73139111fdb41a3bdae1afcfbb54f18@sap.com>
Message-ID: <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com>

Hi Martin,

thanks for your review!

I have removed the use of the tmp2 register. That was easy.
I do not like the idea of getting rid of the tmp1 register. This would have to be replaced by a scratch register. I try to avoid scratch registers at places where I can easily get a tmp from reg alloc.

Please find the updated webrev at http://cr.openjdk.java.net/~lucy/webrevs/8187964.01/index.html

The long division benefits quite a bit from multiplyHigh. With a simple MicroBenchmark, I see 4x to 5x improvement. Only the latest processor generation doesn?t benefit as much. I see a 1.5x improvement on z13 only.

There is an easy explanation to the z13 ?anomaly?: the superscalar layout of a z13 core is twice as wide as that of a z196 core. Z13 needs rather complex loop bodies with independent data streams to reach its full potential. My simple benchmark obviously does not provide that.

Best Regards,
Lutz


On 06.10.2017, 15:03, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hi Lutz,

looks good. If you like, you can get rid of one or both tmp registers if you want to save them.

Did you also check if it improves long division which also uses multiply high nodes?

I can sponsor this change.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Freitag, 6. Oktober 2017 11:10
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, long)

Dear all,

I would like to request reviews for this s390-only enhancement (ppc support was already implemented for other purpose):

Bug:    https://bugs.openjdk.java.net/browse/JDK-8187964
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8187964.00/index.html

This change provides platform-specific implementations for the Math.multiplyHigh method, exploiting 64bit x 64bit => 128bit multiply instructions available on these platforms. Microbenchmark performance shows improvement of 4x to 5x for [s390] and 10x to 15x for [ppc].

Thank you!
Lutz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171006/d64e1f26/attachment.html>

From martin.doerr at sap.com  Fri Oct  6 15:08:20 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 6 Oct 2017 15:08:20 +0000
Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian
Message-ID: <a5c288405daa41588917f60c001e702f@sap.com>

Hi,

I have changed the AES intrinsics to support Big Endian (linux and AIX).

Please review:
http://cr.openjdk.java.net/~mdoerr/8188868_PPC64_AES_BE/webrev.00/

Best regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171006/fdfb402f/attachment-0001.html>

From lutz.schmidt at sap.com  Fri Oct  6 15:36:44 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Fri, 6 Oct 2017 15:36:44 +0000
Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete
Message-ID: <E81BDA23-D895-47E0-99F8-C3168002AEAB@sap.com>

Dear all,

I would like to request reviews for this s390-only bug fix:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8188857
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8188857.00/index.html

z/Architecture vector instructions require operating system support. Without os support, any attempt to execute such an instruction results in a SIGFPE signal. The presence of such support cannot be checked for by inspecting the cpu?s facility bits alone.

During startup, a vector instruction is attempted to execute and, in case of a SIGFPE, the vector facility is marked unavailable.

Thank you!
Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171006/6b50ae52/attachment.html>

From aph at redhat.com  Sat Oct  7 09:09:11 2017
From: aph at redhat.com (Andrew Haley)
Date: Sat, 7 Oct 2017 10:09:11 +0100
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, 
 long)
In-Reply-To: <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com>
References: <F8852E75-B9BA-4C22-8FB2-F35684432254@sap.com>
 <c73139111fdb41a3bdae1afcfbb54f18@sap.com>
 <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com>
Message-ID: <5c09abb7-5a9f-5b88-bbc9-3eb2ac68db15@redhat.com>

One thought about this: we might generate better code on these
machines if we had an unsigned multiplyHigh intrinsic and did the
adjustment for signed arithmetic in Java code, where the sign
adjustment could be scheduled separately by C2.

OK, I'll get on with implementing unsignedMultiplyHigh!

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From lutz.schmidt at sap.com  Mon Oct  9 07:54:33 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 9 Oct 2017 07:54:33 +0000
Subject: RFR(S): 8187964: [s390][ppc]: Intrinsify Math.multiplyHigh(long, 
 long)
In-Reply-To: <5c09abb7-5a9f-5b88-bbc9-3eb2ac68db15@redhat.com>
References: <F8852E75-B9BA-4C22-8FB2-F35684432254@sap.com>
 <c73139111fdb41a3bdae1afcfbb54f18@sap.com>
 <07173354-6A6E-486D-830F-525EDBC7AC6A@sap.com>
 <5c09abb7-5a9f-5b88-bbc9-3eb2ac68db15@redhat.com>
Message-ID: <81216795-00FE-46E7-9F6D-3EBFEBB34278@sap.com>

Andrew,
unsigned multiplyHigh would definitely help [s390]. Think of just one instruction instead of seven. You could even get the full length product at the same cost.
Regards, Lutz

 
On 07.10.2017, 11:09, "Andrew Haley" <aph at redhat.com> wrote:

    One thought about this: we might generate better code on these
    machines if we had an unsigned multiplyHigh intrinsic and did the
    adjustment for signed arithmetic in Java code, where the sign
    adjustment could be scheduled separately by C2.
    
    OK, I'll get on with implementing unsignedMultiplyHigh!
    
    -- 
    Andrew Haley
    Java Platform Lead Engineer
    Red Hat UK Ltd. <https://www.redhat.com>
    EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
    

From martin.doerr at sap.com  Mon Oct  9 09:38:36 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 9 Oct 2017 09:38:36 +0000
Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete
In-Reply-To: <E81BDA23-D895-47E0-99F8-C3168002AEAB@sap.com>
References: <E81BDA23-D895-47E0-99F8-C3168002AEAB@sap.com>
Message-ID: <9ed18c99e4de4d9fbadcb6962daf74a6@sap.com>

Hi Lutz,

thanks for providing this fix. Looks good.
I?d only like to remove the assignment of used_len from vm_version_s390.cpp because it is neither used nor set to a defined value. I can remove it before pushing if you?re ok with that.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Freitag, 6. Oktober 2017 17:37
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete

Dear all,

I would like to request reviews for this s390-only bug fix:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8188857
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8188857.00/index.html

z/Architecture vector instructions require operating system support. Without os support, any attempt to execute such an instruction results in a SIGFPE signal. The presence of such support cannot be checked for by inspecting the cpu?s facility bits alone.

During startup, a vector instruction is attempted to execute and, in case of a SIGFPE, the vector facility is marked unavailable.

Thank you!
Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171009/0fcebf0b/attachment.html>

From lutz.schmidt at sap.com  Mon Oct  9 09:49:09 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 9 Oct 2017 09:49:09 +0000
Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete
In-Reply-To: <9ed18c99e4de4d9fbadcb6962daf74a6@sap.com>
References: <E81BDA23-D895-47E0-99F8-C3168002AEAB@sap.com>
 <9ed18c99e4de4d9fbadcb6962daf74a6@sap.com>
Message-ID: <183BDAE6-3A7D-4CD7-8AD2-3F32182D6B55@sap.com>

Thanks, Martin, for reviewing my change.
And yes, please go ahead and remove the assignment before pushing.

Thank you!
Lutz


On 09.10.2017, 11:38, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hi Lutz,

thanks for providing this fix. Looks good.
I?d only like to remove the assignment of used_len from vm_version_s390.cpp because it is neither used nor set to a defined value. I can remove it before pushing if you?re ok with that.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Freitag, 6. Oktober 2017 17:37
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8188857: [s390]: CPU feature detection incomplete

Dear all,

I would like to request reviews for this s390-only bug fix:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8188857
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8188857.00/index.html

z/Architecture vector instructions require operating system support. Without os support, any attempt to execute such an instruction results in a SIGFPE signal. The presence of such support cannot be checked for by inspecting the cpu?s facility bits alone.

During startup, a vector instruction is attempted to execute and, in case of a SIGFPE, the vector facility is marked unavailable.

Thank you!
Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171009/f02d8095/attachment-0001.html>

From goetz.lindenmaier at sap.com  Mon Oct  9 09:57:08 2017
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 9 Oct 2017 09:57:08 +0000
Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian
In-Reply-To: <a5c288405daa41588917f60c001e702f@sap.com>
References: <a5c288405daa41588917f60c001e702f@sap.com>
Message-ID: <290565198b6d490ca16375402cee8eb9@sap.com>

Hi Martin, 

thanks for porting this to be. 
Unfortunately the Unsafe jtreg tests failed tonight in our tests, as 
the space for the stubs does not suffice. 
Could you please fix this in this change?  No new webrev needed.

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Doerr, Martin
> Sent: Freitag, 6. Oktober 2017 17:08
> To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian
> 
> Hi,
> 
> 
> 
> I have changed the AES intrinsics to support Big Endian (linux and AIX).
> 
> 
> 
> Please review:
> 
> http://cr.openjdk.java.net/~mdoerr/8188868_PPC64_AES_BE/webrev.00/
> 
> 
> 
> Best regards,
> Martin
> 
> 


From martin.doerr at sap.com  Mon Oct  9 12:04:28 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 9 Oct 2017 12:04:28 +0000
Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian
In-Reply-To: <290565198b6d490ca16375402cee8eb9@sap.com>
References: <a5c288405daa41588917f60c001e702f@sap.com>
 <290565198b6d490ca16375402cee8eb9@sap.com>
Message-ID: <241225b6b69c41ef90f3cb7aec67b4e6@sap.com>

Hi G?tz,

thanks for the review. Pushed with increased code_size2 = 24000.

Best regards,
Martin


-----Original Message-----
From: Lindenmaier, Goetz 
Sent: Montag, 9. Oktober 2017 11:57
To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: RE: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian

Hi Martin, 

thanks for porting this to be. 
Unfortunately the Unsafe jtreg tests failed tonight in our tests, as 
the space for the stubs does not suffice. 
Could you please fix this in this change?  No new webrev needed.

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Doerr, Martin
> Sent: Freitag, 6. Oktober 2017 17:08
> To: 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-
> dev at openjdk.java.net>
> Subject: RFR(M): 8188868: PPC64: Support AES intrinsics on Big Endian
> 
> Hi,
> 
> 
> 
> I have changed the AES intrinsics to support Big Endian (linux and AIX).
> 
> 
> 
> Please review:
> 
> http://cr.openjdk.java.net/~mdoerr/8188868_PPC64_AES_BE/webrev.00/
> 
> 
> 
> Best regards,
> Martin
> 
> 


From daniel.daugherty at oracle.com  Mon Oct  9 19:41:24 2017
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 9 Oct 2017 13:41:24 -0600
Subject: RFR(XL): 8167108 - SMR and JavaThread Lifecycle
Message-ID: <1e50bb73-840c-fc3a-81ad-31f83037093f@oracle.com>

Greetings,

We have a (eXtra Large) fix for the following bug:

8167108 inconsistent handling of SR_lock can lead to crashes
https://bugs.openjdk.java.net/browse/JDK-8167108

This fix adds a Safe Memory Reclamation (SMR) mechanism based on
Hazard Pointers to manage JavaThread lifecycle.

Here's a PDF for the internal wiki that we've been using to describe
and track the work on this project:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf

Dan has noticed that the indenting is wrong in some of the code quotes
in the PDF that are not present in the internal wiki. We don't have a
solution for that problem yet.

Here's the webrev for current JDK10 version of this fix:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full

This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
testing, additional stress testing on Dan's Solaris X64 server, and
additional testing on Erik and Robbin's machines.

We welcome comments, suggestions and feedback.

Daniel Daugherty
Erik Osterlund
Robbin Ehn

From daniel.daugherty at oracle.com  Mon Oct  9 21:23:23 2017
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 9 Oct 2017 15:23:23 -0600
Subject: RFR(XL): 8167108 - SMR and JavaThread Lifecycle
In-Reply-To: <1e50bb73-840c-fc3a-81ad-31f83037093f@oracle.com>
References: <1e50bb73-840c-fc3a-81ad-31f83037093f@oracle.com>
Message-ID: <546f3f48-47cf-73d1-30b1-b388418ae0bf@oracle.com>

Many thanks to the folks that reviewed this internally and provided
much appreciated feedback:

- Daniel Daugherty
- David Holmes
- Erik Osterlund
- Jerry Thornbrugh
- Karen Kinnear
- Kim Barrett
- Robbin Ehn
- Serguei Spitsyn
- Stefan Karlson

Since there are three contributing authors, we have been reviewing
(and arguing over) each other's code. It has been an adventure!

Dan, Erik, and Robbin


On 10/9/17 1:41 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> We have a (eXtra Large) fix for the following bug:
>
> 8167108 inconsistent handling of SR_lock can lead to crashes
> https://bugs.openjdk.java.net/browse/JDK-8167108
>
> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
> Hazard Pointers to manage JavaThread lifecycle.
>
> Here's a PDF for the internal wiki that we've been using to describe
> and track the work on this project:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>
>
> Dan has noticed that the indenting is wrong in some of the code quotes
> in the PDF that are not present in the internal wiki. We don't have a
> solution for that problem yet.
>
> Here's the webrev for current JDK10 version of this fix:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>
> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
> testing, additional stress testing on Dan's Solaris X64 server, and
> additional testing on Erik and Robbin's machines.
>
> We welcome comments, suggestions and feedback.
>
> Daniel Daugherty
> Erik Osterlund
> Robbin Ehn
>


From jcbeyler at google.com  Mon Oct  9 22:57:45 2017
From: jcbeyler at google.com (JC Beyler)
Date: Mon, 9 Oct 2017 15:57:45 -0700
Subject: Low-Overhead Heap Profiling
In-Reply-To: <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <CAF9BGBz81YYVUsMLw9sOMR9fH7JQmyb8d_OZ9aJ47EOAqwWhEw@mail.gmail.com>
 <2af975e6-3827-bd57-0c3d-fadd54867a67@oracle.com>
 <365499b6-3f4d-a4df-9e7e-e72a739fb26b@oracle.com>
 <CAF9BGBy0T7A-+1dejYbr6f0d4s0XR-zv-WwHYqXrbe0EtCdsxQ@mail.gmail.com>
 <fb1155c6-ccf8-843c-34fc-fc4a45fc54cc@oracle.com>
 <CAF9BGBxqmXF9OFD4F9wVb-DQOj_UfNNcBhe+ARVxJ5E4cgCC1Q@mail.gmail.com>
 <102c59b8-25b6-8c21-8eef-1de7d0bbf629@oracle.com>
 <CAF9BGBxDaCOxobJi8Uu9kwKPRpnj-KqgWLfJO2z7UX71UnNF4g@mail.gmail.com>
 <CAF9BGByisZchXTsw1=mLETLnOggJShRJBDxFeOUgt+criTTGuQ@mail.gmail.com>
 <1497366226.2829.109.camel@oracle.com>
 <CAF9BGBx8376y_yeMFnMC8PHSqKDaFc5NcbFerQ2up0YVKtSKGg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
Message-ID: <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>

Dear all,

Thread-safety is back!! Here is the update webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/

Full webrev is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/

In order to really test this, I needed to add this so thought now was a
good time. It required a few changes here for the creation to ensure
correctness and safety. Now we keep the static pointer but clear the data
internally so on re-initialize, it will be a bit more costly than before. I
don't think this is a huge use-case so I did not think it was a problem. I
used the internal MutexLocker, I think I used it well, let me know.

I also added three tests:

1) Stack depth test:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch

This test shows that the maximum stack depth system is working.

2) Thread safety:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch

The test creates 24 threads and they all allocate at the same time. The
test then checks it does find samples from all the threads.

3) Thread on/off safety
http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch

The test creates 24 threads that all allocate a bunch of memory. Then
another thread turns the sampling on/off.

Btw, both tests 2 & 3 failed without the locks.

As I worked on this, I saw a lot of places where the tests are doing very
similar things, I'm going to clean up the code a bit and make a
HeapAllocator class that all tests can call directly. This will greatly
simplify the code.

Thanks for any comments/criticisms!
Jc


On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbeyler at google.com> wrote:

> Dear all,
>
> Small update to the webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
>
> Full webrev is here:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
>
> I updated a bit of the naming, removed a TODO comment, and I added a test
> for testing the sampling rate. I also updated the maximum stack depth to
> 1024, there is no reason to keep it so small. I did a micro benchmark that
> tests the overhead and it seems relatively the same.
>
> I compared allocations from a stack depth of 10 and allocations from a
> stack depth of 1024 (allocations are from the same helper method in
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
> raw_files/new/test/hotspot/jtreg/serviceability/jvmti/
> HeapMonitor/MyPackage/HeapMonitorStatRateTest.java):
>           - For an array of 1 integer allocated in a loop; stack depth
> 1024 vs stack depth 10: 1% slower
>           - For an array of 200k integers allocated in a loop; stack depth
> 1024 vs stack depth 10: 3% slower
>
> So basically now moving the maximum stack depth to 1024 but we only copy
> over the stack depths actually used.
>
> For the next webrev, I will be adding a stack depth test to show that it
> works and probably put back the mutex locking so that we can see how
> difficult it is to keep thread safe.
>
> Let me know what you think!
> Jc
>
>
>
> On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com> wrote:
>
>> Forgot to say that for my numbers:
>>  - Not in the test are the actual numbers I got for the various array
>> sizes, I ran the program 30 times and parsed the output; here are the
>> averages and standard deviation:
>>       1000:     1.28% average; 1.13% standard deviation
>>       10000:    1.59% average; 1.25% standard deviation
>>       100000:   1.26% average; 1.26% standard deviation
>>
>> The 1000/10000/100000 are the sizes of the arrays being allocated. These
>> are allocated 100k times and the sampling rate is 111 times the size of the
>> array.
>>
>> Thanks!
>> Jc
>>
>>
>> On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbeyler at google.com> wrote:
>>
>>> Hi all,
>>>
>>> After a bit of a break, I am back working on this :). As before, here
>>> are two webrevs:
>>>
>>> - Full change set: http://cr.openjdk.java.ne
>>> t/~rasbold/8171119/webrev.09/
>>> - Compared to version 8: http://cr.openjdk.java.net/
>>> ~rasbold/8171119/webrev.08_09/
>>>     (This version is compared to version 8 I last showed but ported to
>>> the new folder hierarchy)
>>>
>>> In this version I have:
>>>   - Handled Thomas' comments from his email of 07/03:
>>>        - Merged the logging to be standard
>>>        - Fixed up the code a bit where asked
>>>        - Added some notes about the code not being thread-safe yet
>>>    - Removed additional dead code from the version that modifies
>>> interpreter/c1/c2
>>>    - Fixed compiler issues so that it compiles with
>>> --disable-precompiled-header
>>>         - Tested with ./configure --with-boot-jdk=<jdk8>
>>> --with-debug-level=slowdebug --disable-precompiled-headers
>>>
>>> Additionally, I added a test to check the sanity of the sampler:
>>> HeapMonitorStatCorrectnessTest (http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch)
>>>    - This allocates a number of arrays and checks that we obtain the
>>> number of samples we want with an accepted error of 5%. I tested it 100
>>> times and it passed everytime, I can test more if wanted
>>>    - Not in the test are the actual numbers I got for the various array
>>> sizes, I ran the program 30 times and parsed the output; here are the
>>> averages and standard deviation:
>>>       1000:     1.28% average; 1.13% standard deviation
>>>       10000:    1.59% average; 1.25% standard deviation
>>>       100000:   1.26% average; 1.26% standard deviation
>>>
>>> What this means is that we were always at about 1~2% of the number of
>>> samples the test expected.
>>>
>>> Let me know what you think,
>>> Jc
>>>
>>>
>>>
>>> On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <jcbeyler at google.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I apologize, I have not yet handled your remarks but thought this new
>>>> webrev would also be useful to see and comment on perhaps.
>>>>
>>>> Here is the latest webrev, it is generated slightly different than the
>>>> others since now I'm using webrev.ksh without the -N option:
>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
>>>>
>>>> And the webrev.07 to webrev.08 diff is here:
>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>>>>
>>>> (Let me know if it works well)
>>>>
>>>> It's a small change between versions but it:
>>>>   - provides a fix that makes the average sample rate correct (more on
>>>> that below).
>>>>   - fixes the code to actually have it play nicely with the fast tlab
>>>> refill
>>>>   - cleaned up a bit the JVMTI text and now use jvmtiFrameInfo
>>>> - moved the capability to be onload solo
>>>>
>>>> With this webrev, I've done a small study of the random number
>>>> generator we use here for the sampling rate. I took a small program and it
>>>> can be simplified to:
>>>>
>>>> for (outer loop)
>>>> for (inner loop)
>>>> int[] tmp = new int[arraySize];
>>>>
>>>> - I've fixed the outer and inner loops to being 800 for this
>>>> experiment, meaning we allocate 640000 times an array of a given array
>>>> size.
>>>>
>>>> - Each program provides the average sample size used for the whole
>>>> execution
>>>>
>>>> - Then, I ran each variation 30 times and then calculated the average
>>>> of the average sample size used for various array sizes. I selected the
>>>> array size to be one of the following: 1, 10, 100, 1000.
>>>>
>>>> - When compared to 512kb, the average sample size of 30 runs:
>>>> 1: 4.62% of error
>>>> 10: 3.09% of error
>>>> 100: 0.36% of error
>>>> 1000: 0.1% of error
>>>> 10000: 0.03% of error
>>>>
>>>> What it shows is that, depending on the number of samples, the average
>>>> does become better. This is because with an allocation of 1 element per
>>>> array, it will take longer to hit one of the thresholds. This is seen by
>>>> looking at the sample count statistic I put in. For the same number of
>>>> iterations (800 * 800), the different array sizes provoke:
>>>> 1: 62 samples
>>>> 10: 125 samples
>>>> 100: 788 samples
>>>> 1000: 6166 samples
>>>> 10000: 57721 samples
>>>>
>>>> And of course, the more samples you have, the more sample rates you
>>>> pick, which means that your average gets closer using that math.
>>>>
>>>> Thanks,
>>>> Jc
>>>>
>>>> On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <jcbeyler at google.com>
>>>> wrote:
>>>>
>>>>> Thanks Robbin,
>>>>>
>>>>> This seems to have worked. When I have the next webrev ready, we will
>>>>> find out but I'm fairly confident it will work!
>>>>>
>>>>> Thanks agian!
>>>>> Jc
>>>>>
>>>>> On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <robbin.ehn at oracle.com>
>>>>> wrote:
>>>>>
>>>>>> Hi JC,
>>>>>>
>>>>>> On 06/29/2017 12:15 AM, JC Beyler wrote:
>>>>>>
>>>>>>> B) Incremental changes
>>>>>>>
>>>>>>
>>>>>> I guess the most common work flow here is using mq :
>>>>>> hg qnew fix_v1
>>>>>> edit files
>>>>>> hg qrefresh
>>>>>> hg qnew fix_v2
>>>>>> edit files
>>>>>> hg qrefresh
>>>>>>
>>>>>> if you do hg log you will see 2 commits
>>>>>>
>>>>>> webrev.ksh -r -2 -o my_inc_v1_v2
>>>>>> webrev.ksh -o my_full_v2
>>>>>>
>>>>>>
>>>>>> In  your .hgrc you might need:
>>>>>> [extensions]
>>>>>> mq =
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>>
>>>>>>> Again another newbiew question here...
>>>>>>>
>>>>>>> For showing the incremental changes, is there a link that explains
>>>>>>> how to do that? I apologize for my newbie questions all the time :)
>>>>>>>
>>>>>>> Right now, I do:
>>>>>>>
>>>>>>>   ksh ../webrev.ksh -m -N
>>>>>>>
>>>>>>> That generates a webrev.zip and send it to Chuck Rasbold. He then
>>>>>>> uploads it to a new webrev.
>>>>>>>
>>>>>>> I tried commiting my change and adding a small change. Then if I
>>>>>>> just do ksh ../webrev.ksh without any options, it seems to produce a
>>>>>>> similar page but now with only the changes I had (so the 06-07 comparison
>>>>>>> you were talking about) and a changeset that has it all. I imagine that is
>>>>>>> what you meant.
>>>>>>>
>>>>>>> Which means that my workflow would become:
>>>>>>>
>>>>>>> 1) Make changes
>>>>>>> 2) Make a webrev without any options to show just the differences
>>>>>>> with the tip
>>>>>>> 3) Amend my changes to my local commit so that I have it done with
>>>>>>> 4) Go to 1
>>>>>>>
>>>>>>> Does that seem correct to you?
>>>>>>>
>>>>>>> Note that when I do this, I only see the full change of a file in
>>>>>>> the full change set (Side note here: now the page says change set and not
>>>>>>> patch, which is maybe why Serguei was having issues?).
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jc
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <robbin.ehn at oracle.com
>>>>>>> <mailto:robbin.ehn at oracle.com>> wrote:
>>>>>>>
>>>>>>>     Hi,
>>>>>>>
>>>>>>>     On 06/28/2017 12:04 AM, JC Beyler wrote:
>>>>>>>
>>>>>>>         Dear Thomas et al,
>>>>>>>
>>>>>>>         Here is the newest webrev:
>>>>>>>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ <
>>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     You have some more bits to in there but generally this looks
>>>>>>> good and really nice with more tests.
>>>>>>>     I'll do and deep dive and re-test this when I get back from my
>>>>>>> long vacation with whatever patch version you have then.
>>>>>>>
>>>>>>>     Also I think it's time you provide incremental (v06->07 changes)
>>>>>>> as well as complete change-sets.
>>>>>>>
>>>>>>>     Thanks, Robbin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         Thomas, I "think" I have answered all your remarks. The
>>>>>>> summary is:
>>>>>>>
>>>>>>>         - The statistic system is up and provides insight on what
>>>>>>> the heap sampler is doing
>>>>>>>              - I've noticed that, though the sampling rate is at the
>>>>>>> right mean, we are missing some samples, I have not yet tracked out why
>>>>>>> (details below)
>>>>>>>
>>>>>>>         - I've run a tiny benchmark that is the worse case: it is a
>>>>>>> very tight loop and allocated a small array
>>>>>>>              - In this case, I see no overhead when the system is
>>>>>>> off so that is a good start :)
>>>>>>>              - I see right now a high overhead in this case when
>>>>>>> sampling is on. This is not a really too surprising but I'm going to see if
>>>>>>> this is consistent with our
>>>>>>>         internal implementation. The benchmark is really allocation
>>>>>>> stressful so I'm not too surprised but I want to do the due diligence.
>>>>>>>
>>>>>>>            - The statistic system up is up and I have a new test
>>>>>>>         http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/s
>>>>>>> erviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTes
>>>>>>> t.java.patch
>>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>>>>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTe
>>>>>>> st.java.patch>
>>>>>>>               - I did a bit of a study about the random generator
>>>>>>> here, more details are below but basically it seems to work well
>>>>>>>
>>>>>>>            - I added a capability but since this is the first time
>>>>>>> doing this, I was not sure I did it right
>>>>>>>              - I did add a test though for it and the test seems to
>>>>>>> do what I expect (all methods are failing with the
>>>>>>> JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>>>>>>                  - http://cr.openjdk.java.net/~ra
>>>>>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito
>>>>>>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>>>>>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>>>>>>> bilityTest.java.patch>
>>>>>>>
>>>>>>>            - I still need to figure out what to do about the
>>>>>>> multi-agent vs single-agent issue
>>>>>>>
>>>>>>>            - As far as measurements, it seems I still need to look
>>>>>>> at:
>>>>>>>              - Why we do the 20 random calls first, are they
>>>>>>> necessary?
>>>>>>>              - Look at the mean of the sampling rate that the random
>>>>>>> generator does and also what is actually sampled
>>>>>>>              - What is the overhead in terms of memory/performance
>>>>>>> when on?
>>>>>>>
>>>>>>>         I have inlined my answers, I think I got them all in the new
>>>>>>> webrev, let me know your thoughts.
>>>>>>>
>>>>>>>         Thanks again!
>>>>>>>         Jc
>>>>>>>
>>>>>>>
>>>>>>>         On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl <
>>>>>>> thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>
>>>>>>> <mailto:thomas.schatzl at oracle.com
>>>>>>>
>>>>>>>         <mailto:thomas.schatzl at oracle.com>>> wrote:
>>>>>>>
>>>>>>>              Hi,
>>>>>>>
>>>>>>>              On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:
>>>>>>>              > Hi all,
>>>>>>>              >
>>>>>>>              > First off: Thanks again to Robbin and Thomas for
>>>>>>> their reviews :)
>>>>>>>              >
>>>>>>>              > Next, I've uploaded a new webrev:
>>>>>>>              > http://cr.openjdk.java.net/~ra
>>>>>>> sbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>>>>>> asbold/8171119/webrev.06/>
>>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>>>>>>>
>>>>>>>              >
>>>>>>>              > Here is an update:
>>>>>>>              >
>>>>>>>              > - @Robbin, I forgot to say that yes I need to look at
>>>>>>> implementing
>>>>>>>              > this for the other architectures and testing it
>>>>>>> before it is all
>>>>>>>              > ready to go. Is it common to have it working on all
>>>>>>> possible
>>>>>>>              > combinations or is there a subset that I should be
>>>>>>> doing first and we
>>>>>>>              > can do the others later?
>>>>>>>              > - I've tested slowdebug, built and ran the JTreg
>>>>>>> tests I wrote with
>>>>>>>              > slowdebug and fixed a few more issues
>>>>>>>              > - I've refactored a bit of the code following Thomas'
>>>>>>> comments
>>>>>>>              >    - I think I've handled all the comments from
>>>>>>> Thomas (I put
>>>>>>>              > comments inline below for the specifics)
>>>>>>>
>>>>>>>              Thanks for handling all those.
>>>>>>>
>>>>>>>              > - Following Thomas' comments on statistics, I want to
>>>>>>> add some
>>>>>>>              > quality assurance tests and find that the easiest way
>>>>>>> would be to
>>>>>>>              > have a few counters of what is happening in the
>>>>>>> sampler and expose
>>>>>>>              > that to the user.
>>>>>>>              >    - I'll be adding that in the next version if no
>>>>>>> one sees any
>>>>>>>              > objections to that.
>>>>>>>              >    - This will allow me to add a sanity test in JTreg
>>>>>>> about number of
>>>>>>>              > samples and average of sampling rate
>>>>>>>              >
>>>>>>>              > @Thomas: I had a few questions that I inlined below
>>>>>>> but I will
>>>>>>>              > summarize the "bigger ones" here:
>>>>>>>              >    - You mentioned constants are not using the right
>>>>>>> conventions, I
>>>>>>>              > looked around and didn't see any convention except
>>>>>>> normal naming then
>>>>>>>              > for static constants. Is that right?
>>>>>>>
>>>>>>>              I looked through https://wiki.openjdk.java.net/
>>>>>>> display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>>>>>> /display/HotSpot/StyleGui>
>>>>>>>         <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>>>>>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>>>>>>>              de and the rule is to "follow an existing pattern and
>>>>>>> must have a
>>>>>>>              distinct appearance from other names". Which does not
>>>>>>> help a lot I
>>>>>>>              guess :/ The GC team started using upper camel case,
>>>>>>> e.g.
>>>>>>>              SomeOtherConstant, but very likely this is probably not
>>>>>>> applied
>>>>>>>              consistently throughout. So I am fine with not adding
>>>>>>> another style
>>>>>>>              (like kMaxStackDepth with the "k" in front with some
>>>>>>> unknown meaning)
>>>>>>>              is fine.
>>>>>>>
>>>>>>>              (Chances are you will find that style somewhere used
>>>>>>> anyway too,
>>>>>>>              apologies if so :/)
>>>>>>>
>>>>>>>
>>>>>>>         Thanks for that link, now I know where to look. I used the
>>>>>>> upper camel case in my code as well then :) I should have gotten them all.
>>>>>>>
>>>>>>>
>>>>>>>               > PS: I've also inlined my answers to Thomas below:
>>>>>>>               >
>>>>>>>               > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl
>>>>>>> <thomas.schatzl at oracl
>>>>>>>               > e.com <http://e.com> <http://e.com>> wrote:
>>>>>>>               > > Hi all,
>>>>>>>               > >
>>>>>>>               > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
>>>>>>>               > > > Dear all,
>>>>>>>               > > >
>>>>>>>               > > > I've continued working on this and have done the
>>>>>>> following
>>>>>>>               > > webrev:
>>>>>>>               > > > http://cr.openjdk.java.net/~ra
>>>>>>> sbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>>>>>> asbold/8171119/webrev.05/>
>>>>>>>         <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>>>>>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>>>>>>>
>>>>>>>               > >
>>>>>>>               > > [...]
>>>>>>>               > > > Things I still need to do:
>>>>>>>               > > >    - Have to fix that TLAB case for the
>>>>>>> FastTLABRefill
>>>>>>>               > > >    - Have to start looking at the data to see
>>>>>>> that it is
>>>>>>>               > > consistent and does gather the right samples,
>>>>>>> right frequency, etc.
>>>>>>>               > > >    - Have to check the GC elements and what that
>>>>>>> produces
>>>>>>>               > > >    - Run a slowdebug run and ensure I fixed all
>>>>>>> those issues you
>>>>>>>               > > saw > Robbin
>>>>>>>               > > >
>>>>>>>               > > > Thanks for looking at the webrev and have a
>>>>>>> great week!
>>>>>>>               > >
>>>>>>>               > >   scratching a bit on the surface of this change,
>>>>>>> so apologies for
>>>>>>>               > > rather shallow comments:
>>>>>>>               > >
>>>>>>>               > > - macroAssembler_x86.cpp:5604: while this is
>>>>>>> compiler code, and I
>>>>>>>               > > am not sure this is final, please avoid littering
>>>>>>> the code with
>>>>>>>               > > TODO remarks :) They tend to be candidates for
>>>>>>> later wtf moments
>>>>>>>               > > only.
>>>>>>>               > >
>>>>>>>               > > Just file a CR for that.
>>>>>>>               > >
>>>>>>>               > Newcomer question: what is a CR and not sure I have
>>>>>>> the rights to do
>>>>>>>               > that yet ? :)
>>>>>>>
>>>>>>>              Apologies. CR is a change request, this suggests to
>>>>>>> file a bug in the
>>>>>>>              bug tracker. And you are right, you can't just create a
>>>>>>> new account in
>>>>>>>              the OpenJDK JIRA yourselves. :(
>>>>>>>
>>>>>>>
>>>>>>>         Ok good to know, I'll continue with my own todo list but
>>>>>>> I'll work hard on not letting it slip in the webrevs anymore :)
>>>>>>>
>>>>>>>
>>>>>>>              I was mostly referring to the "... but it is a TODO"
>>>>>>> part of that
>>>>>>>              comment in macroassembler_x86.cpp. Comments about the
>>>>>>> why of the code
>>>>>>>              are appreciated.
>>>>>>>
>>>>>>>              [Note that I now understand that this is to some degree
>>>>>>> still work in
>>>>>>>              progress. As long as the final changeset does no
>>>>>>> contain TODO's I am
>>>>>>>              fine (and it's not a hard objection, rather their use
>>>>>>> in "final" code
>>>>>>>              is typically limited in my experience)]
>>>>>>>
>>>>>>>              5603   // Currently, if this happens, just set back the
>>>>>>> actual end to
>>>>>>>              where it was.
>>>>>>>              5604   // We miss a chance to sample here.
>>>>>>>
>>>>>>>              Would be okay, if explaining "this" and the "why" of
>>>>>>> missing a chance
>>>>>>>              to sample here would be best.
>>>>>>>
>>>>>>>              Like maybe:
>>>>>>>
>>>>>>>              // If we needed to refill TLABs, just set the actual
>>>>>>> end point to
>>>>>>>              // the end of the TLAB again. We do not sample here
>>>>>>> although we could.
>>>>>>>
>>>>>>>         Done with your comment, it works well in my mind.
>>>>>>>
>>>>>>>              I am not sure whether "miss a chance to sample" meant
>>>>>>> "we could, but
>>>>>>>              consciously don't because it's not that useful" or "it
>>>>>>> would be
>>>>>>>              necessary but don't because it's too complicated to
>>>>>>> do.".
>>>>>>>
>>>>>>>              Looking at the original comment once more, I am also
>>>>>>> not sure if that
>>>>>>>              comment shouldn't referring to the "end" variable (not
>>>>>>> actual_end)
>>>>>>>              because that's the variable that is responsible for
>>>>>>> taking the sampling
>>>>>>>              path? (Going from the member description of
>>>>>>> ThreadLocalAllocBuffer).
>>>>>>>
>>>>>>>
>>>>>>>         I've moved this code and it no longer shows up here but the
>>>>>>> rationale and answer was:
>>>>>>>
>>>>>>>         So.. Yes, end is the variable provoking the sampling. Actual
>>>>>>> end is the actual end of the TLAB.
>>>>>>>
>>>>>>>         What was happening here is that the code is resetting _end
>>>>>>> to point towards the end of the new TLAB. Because, we now have the end for
>>>>>>> sampling and _actual_end for
>>>>>>>         the actual end, we need to update the actual_end as well.
>>>>>>>
>>>>>>>         Normally, were we to do the real work here, we would
>>>>>>> calculate the (end - start) offset, then do:
>>>>>>>
>>>>>>>         - Set the new end to : start + (old_end - old_start)
>>>>>>>         - Set the actual end like we do here now where it because it
>>>>>>> is the actual end.
>>>>>>>
>>>>>>>         Why is this not done here now anymore?
>>>>>>>             - I was still debating which path to take:
>>>>>>>                - Do it in the fast refill code, it has its perks:
>>>>>>>                    - In a world where fast refills are happening all
>>>>>>> the time or a lot, we can augment there the code to do the sampling
>>>>>>>                - Remember what we had as an end before leaving the
>>>>>>> slowpath and check on return
>>>>>>>                    - This is what I'm doing now, it removes the need
>>>>>>> to go fix up all fast refill paths but if you remain in fast refill paths,
>>>>>>> you won't get sampling. I
>>>>>>>         have to think of the consequences of that, maybe a future
>>>>>>> change later on?
>>>>>>>                       - I have the statistics now so I'm going to
>>>>>>> study that
>>>>>>>                          -> By the way, though my statistics are
>>>>>>> showing I'm missing some samples, if I turn off FastTlabRefill, it is the
>>>>>>> same loss so for now, it seems
>>>>>>>         this does not occur in my simple test.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              But maybe I am only confused and it's best to just
>>>>>>> leave the comment
>>>>>>>              away. :)
>>>>>>>
>>>>>>>              Thinking about it some more, doesn't this not-sampling
>>>>>>> in this case
>>>>>>>              mean that sampling does not work in any collector that
>>>>>>> does inline TLAB
>>>>>>>              allocation at the moment? (Or is inline TLAB alloc
>>>>>>> automatically
>>>>>>>              disabled with sampling somehow?)
>>>>>>>
>>>>>>>              That would indeed be a bigger TODO then :)
>>>>>>>
>>>>>>>
>>>>>>>         Agreed, this remark made me think that perhaps as a first
>>>>>>> step the new way of doing it is better but I did have to:
>>>>>>>            - Remove the const of the ThreadLocalBuffer remaining and
>>>>>>> hard_end methods
>>>>>>>            - Move hard_end out of the header file to have a bit more
>>>>>>> logic there
>>>>>>>
>>>>>>>         Please let me know what you think of that and if you prefer
>>>>>>> it this way or changing the fast refills. (I prefer this way now because it
>>>>>>> is more incremental).
>>>>>>>
>>>>>>>
>>>>>>>              > > - calling HeapMonitoring::do_weak_oops() (which
>>>>>>> should probably be
>>>>>>>              > > called weak_oops_do() like other similar methods)
>>>>>>> only if string
>>>>>>>              > > deduplication is enabled (in
>>>>>>> g1CollectedHeap.cpp:4511) seems wrong.
>>>>>>>              >
>>>>>>>              > The call should be at least around 6 lines up outside
>>>>>>> the if.
>>>>>>>              >
>>>>>>>              > Preferentially in a method like
>>>>>>> process_weak_jni_handles(), including
>>>>>>>              > additional logging. (No new (G1) gc phase without
>>>>>>> minimal logging
>>>>>>>              > :)).
>>>>>>>              > Done but really not sure because:
>>>>>>>              >
>>>>>>>              > I put for logging:
>>>>>>>              >   log_develop_trace(gc,
>>>>>>> freelist)("G1ConcRegionFreeing [other] : heap
>>>>>>>              > monitoring");
>>>>>>>
>>>>>>>              I would think that "gc, ref" would be more appropriate
>>>>>>> log tags for
>>>>>>>              this similar to jni handles.
>>>>>>>              (I am als not sure what weak reference handling has to
>>>>>>> do with
>>>>>>>              G1ConcRegionFreeing, so I am a bit puzzled)
>>>>>>>
>>>>>>>
>>>>>>>         I was not sure what to put for the tags or really as the
>>>>>>> message. I cleaned it up a bit now to:
>>>>>>>              log_develop_trace(gc, ref)("HeapSampling [other] : heap
>>>>>>> monitoring processing");
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              > Since weak_jni_handles didn't have logging for me to
>>>>>>> be inspired
>>>>>>>              > from, I did that but unconvinced this is what should
>>>>>>> be done.
>>>>>>>
>>>>>>>              The JNI handle processing does have logging, but only in
>>>>>>>              ReferenceProcessor::process_discovered_references(). In
>>>>>>>              process_weak_jni_handles() only overall time is
>>>>>>> measured (in a G1
>>>>>>>              specific way, since only G1 supports disabling
>>>>>>> reference procesing) :/
>>>>>>>
>>>>>>>              The code in ReferenceProcessor prints both time taken
>>>>>>>              referenceProcessor.cpp:254, as well as the count, but
>>>>>>> strangely only in
>>>>>>>              debug VMs.
>>>>>>>
>>>>>>>              I have no idea why this logging is that unimportant to
>>>>>>> only print that
>>>>>>>              in a debug VM. However there are reviews out for
>>>>>>> changing this area a
>>>>>>>              bit, so it might be useful to wait for that
>>>>>>> (JDK-8173335).
>>>>>>>
>>>>>>>
>>>>>>>         I cleaned it up a bit anyway and now it returns the count of
>>>>>>> objects that are in the system.
>>>>>>>
>>>>>>>
>>>>>>>              > > - the change doubles the size of
>>>>>>>              > > CollectedHeap::allocate_from_tlab_slow() above the
>>>>>>> "small and nice"
>>>>>>>              > > threshold. Maybe it could be refactored a bit.
>>>>>>>              > Done I think, it looks better to me :).
>>>>>>>
>>>>>>>              In ThreadLocalAllocBuffer::handle_sample() I think the
>>>>>>>              set_back_actual_end()/pick_next_sample() calls could
>>>>>>> be hoisted out of
>>>>>>>              the "if" :)
>>>>>>>
>>>>>>>
>>>>>>>         Done!
>>>>>>>
>>>>>>>
>>>>>>>              > > - referenceProcessor.cpp:261: the change should add
>>>>>>> logging about
>>>>>>>              > > the number of references encountered, maybe after
>>>>>>> the corresponding
>>>>>>>              > > "JNI weak reference count" log message.
>>>>>>>              > Just to double check, are you saying that you'd like
>>>>>>> to have the heap
>>>>>>>              > sampler to keep in store how many sampled objects
>>>>>>> were encountered in
>>>>>>>              > the HeapMonitoring::weak_oops_do?
>>>>>>>              >    - Would a return of the method with the number of
>>>>>>> handled
>>>>>>>              > references and logging that work?
>>>>>>>
>>>>>>>              Yes, it's fine if HeapMonitoring::weak_oops_do() only
>>>>>>> returned the
>>>>>>>              number of processed weak oops.
>>>>>>>
>>>>>>>
>>>>>>>         Done also (but I admit I have not tested the output yet) :)
>>>>>>>
>>>>>>>
>>>>>>>              >    - Additionally, would you prefer it in a separate
>>>>>>> block with its
>>>>>>>              > GCTraceTime?
>>>>>>>
>>>>>>>              Yes. Both kinds of information is interesting: while
>>>>>>> the time taken is
>>>>>>>              typically more important, the next question would be
>>>>>>> why, and the
>>>>>>>              number of references typically goes a long way there.
>>>>>>>
>>>>>>>              See above though, it is probably best to wait a bit.
>>>>>>>
>>>>>>>
>>>>>>>         Agreed that I "could" wait but, if it's ok, I'll just
>>>>>>> refactor/remove this when we get closer to something final. Either,
>>>>>>> JDK-8173335
>>>>>>>         has gone in and I will notice it now or it will soon and I
>>>>>>> can change it then.
>>>>>>>
>>>>>>>
>>>>>>>              > > - threadLocalAllocBuffer.cpp:331: one more "TODO"
>>>>>>>              > Removed it and added it to my personal todos to look
>>>>>>> at.
>>>>>>>              >      > >
>>>>>>>              > > - threadLocalAllocBuffer.hpp:
>>>>>>> ThreadLocalAllocBuffer class
>>>>>>>              > > documentation should be updated about the sampling
>>>>>>> additions. I
>>>>>>>              > > would have no clue what the difference between
>>>>>>> "actual_end" and
>>>>>>>              > > "end" would be from the given information.
>>>>>>>              > If you are talking about the comments in this file, I
>>>>>>> made them more
>>>>>>>              > clear I hope in the new webrev. If it was somewhere
>>>>>>> else, let me know
>>>>>>>              > where to change.
>>>>>>>
>>>>>>>              Thanks, that's much better. Maybe a note in the comment
>>>>>>> of the class
>>>>>>>              that ThreadLocalBuffer provides some sampling facility
>>>>>>> by modifying the
>>>>>>>              end() of the TLAB to cause "frequent" calls into the
>>>>>>> runtime call where
>>>>>>>              actual sampling takes place.
>>>>>>>
>>>>>>>
>>>>>>>         Done, I think it's better now. Added something about the
>>>>>>> slow_path_end as well.
>>>>>>>
>>>>>>>
>>>>>>>              > > - in heapMonitoring.hpp: there are some random
>>>>>>> comments about some
>>>>>>>              > > code that has been grabbed from
>>>>>>> "util/math/fastmath.[h|cc]". I
>>>>>>>              > > can't tell whether this is code that can be used
>>>>>>> but I assume that
>>>>>>>              > > Noam Shazeer is okay with that (i.e. that's all
>>>>>>> Google code).
>>>>>>>              > Jeremy and I double checked and we can release that
>>>>>>> as I thought. I
>>>>>>>              > removed the comment from that piece of code entirely.
>>>>>>>
>>>>>>>              Thanks.
>>>>>>>
>>>>>>>              > > - heapMonitoring.hpp/cpp static constant naming
>>>>>>> does not correspond
>>>>>>>              > > to Hotspot's. Additionally, in Hotspot static
>>>>>>> methods are cased
>>>>>>>              > > like other methods.
>>>>>>>              > I think I fixed the methods to be cased the same way
>>>>>>> as all other
>>>>>>>              > methods. For static constants, I was not sure. I
>>>>>>> fixed a few other
>>>>>>>              > variables but I could not seem to really see a
>>>>>>> consistent trend for
>>>>>>>              > constants. I made them as variables but I'm not sure
>>>>>>> now.
>>>>>>>
>>>>>>>              Sorry again, style is a kind of mess. The goal of my
>>>>>>> suggestions here
>>>>>>>              is only to prevent yet another style creeping in.
>>>>>>>
>>>>>>>              > > - in heapMonitoring.cpp there are a few cryptic
>>>>>>> comments at the top
>>>>>>>              > > that seem to refer to internal stuff that should
>>>>>>> probably be
>>>>>>>              > > removed.
>>>>>>>              > Sorry about that! My personal todos not cleared out.
>>>>>>>
>>>>>>>              I am happy about comments, but I simply did not
>>>>>>> understand any of that
>>>>>>>              and I do not know about other readers as well.
>>>>>>>
>>>>>>>              If you think you will remember removing/updating them
>>>>>>> until the review
>>>>>>>              proper (I misunderstood the review situation a little
>>>>>>> it seems).
>>>>>>>
>>>>>>>              > > I did not think through the impact of the TLAB
>>>>>>> changes on collector
>>>>>>>              > > behavior yet (if there are). Also I did not check
>>>>>>> for problems with
>>>>>>>              > > concurrent mark and SATB/G1 (if there are).
>>>>>>>              > I would love to know your thoughts on this, I think
>>>>>>> this is fine. I
>>>>>>>
>>>>>>>              I think so too now. No objects are made live out of
>>>>>>> thin air :)
>>>>>>>
>>>>>>>              > see issues with multiple threads right now hitting
>>>>>>> the stack storage
>>>>>>>              > instance. Previous webrevs had a mutex lock here but
>>>>>>> we took it out
>>>>>>>              > for simplificity (and only for now).
>>>>>>>
>>>>>>>              :) When looking at this after some thinking I now
>>>>>>> assume for this
>>>>>>>              review that this code is not MT safe at all. There
>>>>>>> seems to be more
>>>>>>>              synchronization missing than just the one for the
>>>>>>> StackTraceStorage. So
>>>>>>>              no comments about this here.
>>>>>>>
>>>>>>>
>>>>>>>         I doubled checked a bit (quickly I admit) but it seems that
>>>>>>> synchronization in StackTraceStorage is really all you need (all methods
>>>>>>> lead to a StackTraceStorage one
>>>>>>>         and can be multithreaded outside of that).
>>>>>>>         There is a question about the initialization where the
>>>>>>> method HeapMonitoring::initialize_profiling is not thread safe.
>>>>>>>         It would work (famous last words) and not crash if there was
>>>>>>> a race but we could add a synchronization point there as well (and
>>>>>>> therefore on the stop as well).
>>>>>>>
>>>>>>>         But anyway I will really check and do this once we add back
>>>>>>> synchronization.
>>>>>>>
>>>>>>>
>>>>>>>              Also, this would require some kind of specification of
>>>>>>> what is allowed
>>>>>>>              to be called when and where.
>>>>>>>
>>>>>>>
>>>>>>>         Would we specify this with the methods in the jvmti.xml
>>>>>>> file? We could start by specifying in each that they are not thread safe
>>>>>>> but I saw no mention of that for
>>>>>>>         other methods.
>>>>>>>
>>>>>>>
>>>>>>>              One potentially relevant observation about locking
>>>>>>> here: depending on
>>>>>>>              sampling frequency, StackTraceStore::add_trace() may be
>>>>>>> rather
>>>>>>>              frequently called. I assume that you are going to do
>>>>>>> measurements :)
>>>>>>>
>>>>>>>
>>>>>>>         Though we don't have the TLAB implementation in our code,
>>>>>>> the compiler generated sampler uses 2% of overhead with a 512k sampling
>>>>>>> rate. I can do real measurements
>>>>>>>         when the code settles and we can see how costly this is as a
>>>>>>> TLAB implementation.
>>>>>>>         However, my theory is that if the rate is 512k, the
>>>>>>> memory/performance overhead should be minimal since it is what we saw with
>>>>>>> our code/workloads (though not called
>>>>>>>         the same way, we call it essentially at the same rate).
>>>>>>>         If you have a benchmark you'd like me to test, let me know!
>>>>>>>
>>>>>>>         Right now, with my really small test, this does use a bit of
>>>>>>> overhead even for a 512k sample size. I don't know yet why, I'm going to
>>>>>>> see what is going on.
>>>>>>>
>>>>>>>         Finally, I think it is not reasonable to suppose the
>>>>>>> overhead to be negligible if the sampling rate used is too low. The user
>>>>>>> should know that the lower the rate,
>>>>>>>         the higher the overhead (documentation TODO?).
>>>>>>>
>>>>>>>
>>>>>>>              I am not sure what the expected usage of the API is, but
>>>>>>>              StackTraceStore::add_trace() seems to be able to grow
>>>>>>> without bounds.
>>>>>>>              Only a GC truncates them to the live ones. That in
>>>>>>> itself seems to be
>>>>>>>              problematic (GCs can be *wide* apart), and of course
>>>>>>> some of the API
>>>>>>>              methods add to that because they duplicate that
>>>>>>> unbounded array. Do you
>>>>>>>              have any concerns/measurements about this?
>>>>>>>
>>>>>>>
>>>>>>>         So, the theory is that yes add_trace can be able to grow
>>>>>>> without bounds but it grows at a sample per 512k of allocated space. The
>>>>>>> stacks it gathers are currently
>>>>>>>         maxed at 64 (I'd like to expand that to an option to the
>>>>>>> user though at some point). So I have no concerns because:
>>>>>>>
>>>>>>>         - If really this is taking a lot of space, that means the
>>>>>>> job is keeping a lot of objects in memory as well, therefore the entire
>>>>>>> heap is getting huge
>>>>>>>         - If this is the case, you will be triggering a GC at some
>>>>>>> point anyway.
>>>>>>>
>>>>>>>         (I'm putting under the rug the issue of "What if we set the
>>>>>>> rate to 1 for example" because as you lower the sampling rate, we cannot
>>>>>>> guarantee low overhead; the
>>>>>>>         idea behind this feature is to have a means of having
>>>>>>> meaningful allocated samples at a low overhead)
>>>>>>>
>>>>>>>         I have no measurements really right now but since I now have
>>>>>>> some statistics I can poll, I will look a bit more at this question.
>>>>>>>
>>>>>>>         I have the same last sentence than above: the user should
>>>>>>> expect this to happen if the sampling rate is too small. That probably can
>>>>>>> be reflected in the
>>>>>>>         StartHeapSampling as a note : careful this might impact your
>>>>>>> performance.
>>>>>>>
>>>>>>>
>>>>>>>              Also, these stack traces might hold on to huge arrays.
>>>>>>> Any
>>>>>>>              consideration of that? Particularly it might be the
>>>>>>> cause for OOMEs in
>>>>>>>              tight memory situations.
>>>>>>>
>>>>>>>
>>>>>>>         There is a stack size maximum that is set to 64 so it should
>>>>>>> not hold huge arrays. I don't think this is an issue but I can double check
>>>>>>> with a test or two.
>>>>>>>
>>>>>>>
>>>>>>>              - please consider adding a safepoint check in
>>>>>>>              HeapMonitoring::weak_oops_do to prevent accidental
>>>>>>> misuse.
>>>>>>>
>>>>>>>              - in struct StackTraceStorage, the public fields may
>>>>>>> also need
>>>>>>>              underscores. At least some files in the runtime
>>>>>>> directory have structs
>>>>>>>              with underscored public members (and some don't). The
>>>>>>> runtime team
>>>>>>>              should probably comment on that.
>>>>>>>
>>>>>>>
>>>>>>>         Agreed I did not know. I looked around and a lot of structs
>>>>>>> did not have them it seemed so I left it as is. I will happily change it if
>>>>>>> someone prefers (I was not
>>>>>>>         sure if you really preferred or not, your sentence seemed to
>>>>>>> be more a note of "this might need to change but I don't know if the
>>>>>>> runtime team enforces that", let
>>>>>>>         me know if I read that wrongly).
>>>>>>>
>>>>>>>
>>>>>>>              - In StackTraceStorage::weak_oops_do(), when examining
>>>>>>> the
>>>>>>>              StackTraceData, maybe it is useful to consider having a
>>>>>>> non-NULL
>>>>>>>              reference outside of the heap's reserved space an
>>>>>>> error. There should
>>>>>>>              be no oop outside of the heap's reserved space ever.
>>>>>>>
>>>>>>>              Unless you allow storing random values in
>>>>>>> StackTraceData::obj, which I
>>>>>>>              would not encourage.
>>>>>>>
>>>>>>>
>>>>>>>         I suppose you are talking about this part:
>>>>>>>         if ((value != NULL && Universe::heap()->is_in_reserved(value))
>>>>>>> &&
>>>>>>>                     (is_alive == NULL ||
>>>>>>> is_alive->do_object_b(value))) {
>>>>>>>
>>>>>>>         What you are saying is that I could have something like:
>>>>>>>         if (value != my_non_null_reference &&
>>>>>>>                     (is_alive == NULL ||
>>>>>>> is_alive->do_object_b(value))) {
>>>>>>>
>>>>>>>         Is that what you meant? Is there really a reason to do so?
>>>>>>> When I look at the code, is_in_reserved seems like a O(1) method call. I'm
>>>>>>> not even sure we can have a
>>>>>>>         NULL value to be honest. I might have to study that to see
>>>>>>> if this was not a paranoid test to begin with.
>>>>>>>
>>>>>>>         The is_alive code has now morphed due to the comment below.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              - HeapMonitoring::weak_oops_do() does not seem to use
>>>>>>> the
>>>>>>>              passed AbstractRefProcTaskExecutor.
>>>>>>>
>>>>>>>
>>>>>>>         It did use it:
>>>>>>>            size_t HeapMonitoring::weak_oops_do(
>>>>>>>               AbstractRefProcTaskExecutor *task_executor,
>>>>>>>               BoolObjectClosure* is_alive,
>>>>>>>               OopClosure *f,
>>>>>>>               VoidClosure *complete_gc) {
>>>>>>>             assert(SafepointSynchronize::is_at_safepoint(), "must
>>>>>>> be at safepoint");
>>>>>>>
>>>>>>>             if (task_executor != NULL) {
>>>>>>>               task_executor->set_single_threaded_mode();
>>>>>>>             }
>>>>>>>             return StackTraceStorage::storage()->weak_oops_do(is_alive,
>>>>>>> f, complete_gc);
>>>>>>>         }
>>>>>>>
>>>>>>>         But due to the comment below, I refactored this, so this is
>>>>>>> no longer here. Now I have an always true closure that is passed.
>>>>>>>
>>>>>>>
>>>>>>>              - I do not understand allowing to call this method with
>>>>>>> a NULL
>>>>>>>              complete_gc closure. This would mean that objects
>>>>>>> referenced from the
>>>>>>>              object that is referenced by the StackTraceData are not
>>>>>>> pulled, meaning
>>>>>>>              they would get stale.
>>>>>>>
>>>>>>>              - same with is_alive parameter value of NULL
>>>>>>>
>>>>>>>
>>>>>>>         So these questions made me look a bit closer at this code.
>>>>>>> This code I think was written this way to have a very small impact on the
>>>>>>> file but you are right, there
>>>>>>>         is no reason for this here. I've simplified the code by
>>>>>>> making in referenceProcessor.cpp a process_HeapSampling method that handles
>>>>>>> everything there.
>>>>>>>
>>>>>>>         The code allowed NULLs because it depended on where you were
>>>>>>> coming from and how the code was being called.
>>>>>>>
>>>>>>>         - I added a static always_true variable and pass that now to
>>>>>>> be more consistent with the rest of the code.
>>>>>>>         - I moved the complete_gc into process_phaseHeapSampling now
>>>>>>> (new method) and handle the task_executor and the complete_gc there
>>>>>>>              - Newbie question: in our code we did a
>>>>>>> set_single_threaded_mode but I see that process_phaseJNI does it right
>>>>>>> before its call, do I need to do it for the
>>>>>>>         process_phaseHeapSample?
>>>>>>>         That API is much cleaner (in my mind) and is consistent with
>>>>>>> what is done around it (again in my mind).
>>>>>>>
>>>>>>>
>>>>>>>              - heapMonitoring.cpp:590: I do not completely
>>>>>>> understand the purpose of
>>>>>>>              this code: in the end this results in a fixed value
>>>>>>> directly dependent
>>>>>>>              on the Thread address anyway? In the end this results
>>>>>>> in a fixed value
>>>>>>>              directly dependent on the Thread address anyway?
>>>>>>>              IOW, what is special about exactly 20 rounds?
>>>>>>>
>>>>>>>
>>>>>>>         So we really want a fast random number generator that has a
>>>>>>> specific mean (512k is the default we use). The code uses the thread
>>>>>>> address as the start number of the
>>>>>>>         sequence (why not, it is random enough is rationale). Then
>>>>>>> instead of just starting there, we prime the sequence and really only start
>>>>>>> at the 21st number, it is
>>>>>>>         arbitrary and I have not done a study to see if we could do
>>>>>>> more or less of that.
>>>>>>>
>>>>>>>         As I have the statistics of the system up and running, I'll
>>>>>>> run some experiments to see if this is needed, is 20 good, or not.
>>>>>>>
>>>>>>>
>>>>>>>              - also I would consider stripping a few bits of the
>>>>>>> threads' address as
>>>>>>>              initialization value for your rng. The last three bits
>>>>>>> (and probably
>>>>>>>              more, check whether the Thread object is allocated on
>>>>>>> special
>>>>>>>              boundaries) are always zero for them.
>>>>>>>              Not sure if the given "random" value is random enough
>>>>>>> before/after,
>>>>>>>              this method, so just skip that comment if you think
>>>>>>> this is not
>>>>>>>              required.
>>>>>>>
>>>>>>>
>>>>>>>         I don't know is the honest answer. I think what is important
>>>>>>> is that we tend towards a mean and it is random "enough" to not fall in
>>>>>>> pitfalls of only sampling a
>>>>>>>         subset of objects due to their allocation order. I added
>>>>>>> that as test to do to see if it changes the mean in any way for the 512k
>>>>>>> default value and/or if the first
>>>>>>>         1000 elements look better.
>>>>>>>
>>>>>>>
>>>>>>>              Some more random nits I did not find a place to put
>>>>>>> anywhere:
>>>>>>>
>>>>>>>              - ThreadLocalAllocBuffer::_extra_space does not seem
>>>>>>> to be used
>>>>>>>              anywhere?
>>>>>>>
>>>>>>>
>>>>>>>         Good catch :).
>>>>>>>
>>>>>>>
>>>>>>>              - Maybe indent the declaration of
>>>>>>> ThreadLocalAllocBuffer::_bytes_until_sample to align below the
>>>>>>> other members of that group.
>>>>>>>
>>>>>>>
>>>>>>>         Done moved it up a bit to have non static members together
>>>>>>> and static separate.
>>>>>>>
>>>>>>>              Thanks,
>>>>>>>                 Thomas
>>>>>>>
>>>>>>>
>>>>>>>         Thanks for your review!
>>>>>>>         Jc
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171009/33dd3574/attachment-0001.html>

From coleen.phillimore at oracle.com  Tue Oct 10 02:36:15 2017
From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com)
Date: Mon, 9 Oct 2017 22:36:15 -0400
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
Message-ID: <0a232ef3-8e2b-3025-14ba-eaf8c6b409fe@oracle.com>


This seems ok to me with Jamsheed's explanation.
Thanks,
Coleen

On 9/14/17 2:54 AM, Dean Long wrote:
> It looks like you accidentally dropped 
> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>
> dl
>
>
> On 9/13/2017 11:21 PM, jamsheed wrote:
>> (adding runtime list for inputs)
>>
>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>> brief desc: special handling of Object.<init> in 
>>> TemplateInterpreter::deopt_reexecute_entry
>>>
>>> required last_sp to be reset explicitly in normal return path
>>>
>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, 
>>> address bcp) {
>>> ? assert(method->contains(bcp), "just checkin'");
>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>> ? if (code == Bytecodes::_return) {
>>> ??? // This is used for deopt during registration of finalizers
>>> ??? // during Object.<init>.? We simply need to resume execution at
>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>> ??? // reexecuting the real bytecode would cause double registration
>>> ??? // of the finalizable object.
>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>
>> last_sp ! = null not an issue for this case, so i skip the assert in 
>> debug build
>>
>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>
>> Please review.
>>
>> Best Regards,
>> Jamsheed
>>
>>
>>
>>
>>
>


From jamsheed.c.m at oracle.com  Tue Oct 10 06:05:37 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Tue, 10 Oct 2017 11:35:37 +0530
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <0a232ef3-8e2b-3025-14ba-eaf8c6b409fe@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <0a232ef3-8e2b-3025-14ba-eaf8c6b409fe@oracle.com>
Message-ID: <0df5882b-b94c-569c-bed8-33f502e7fd8d@oracle.com>

Thanks for the review, Coleen

Best regards,

Jamsheed


On Tuesday 10 October 2017 08:06 AM, coleen.phillimore at oracle.com wrote:
>
> This seems ok to me with Jamsheed's explanation.
> Thanks,
> Coleen
>
> On 9/14/17 2:54 AM, Dean Long wrote:
>> It looks like you accidentally dropped 
>> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>
>> dl
>>
>>
>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>> (adding runtime list for inputs)
>>>
>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>> brief desc: special handling of Object.<init> in 
>>>> TemplateInterpreter::deopt_reexecute_entry
>>>>
>>>> required last_sp to be reset explicitly in normal return path
>>>>
>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, 
>>>> address bcp) {
>>>> ? assert(method->contains(bcp), "just checkin'");
>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>> ? if (code == Bytecodes::_return) {
>>>> ??? // This is used for deopt during registration of finalizers
>>>> ??? // during Object.<init>.? We simply need to resume execution at
>>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>>> ??? // reexecuting the real bytecode would cause double registration
>>>> ??? // of the finalizable object.
>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>
>>> last_sp ! = null not an issue for this case, so i skip the assert in 
>>> debug build
>>>
>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>
>>> Please review.
>>>
>>> Best Regards,
>>> Jamsheed
>>>
>>>
>>>
>>>
>>>
>>
>


From dmitry.chuyko at bell-sw.com  Tue Oct 10 14:54:33 2017
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Tue, 10 Oct 2017 17:54:33 +0300
Subject: RFR (XS): 8188221 - AARCH64: Return type profiling is not performed
 from aarch64 interpreter
Message-ID: <1de74a72-ec83-9fed-ffc8-091af58de457@bell-sw.com>

Hello,

TestArrayCopyNoInitDeopt jtreg test (JDK-8072016) fails in 
-XX:-TieredCompilation mode because return type is not profiled in 
interpreter.
Please review the fix, it adds profiling for aarch64 similar to how it's 
implemented for other cpus.

bug: https://bugs.openjdk.java.net/browse/JDK-8188221
patch: jdk10.patch attached

-Dmitry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jdk10.patch
Type: text/x-patch
Size: 712 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171010/58401554/jdk10.patch>

From vladimir.kozlov at oracle.com  Tue Oct 10 15:11:43 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 10 Oct 2017 08:11:43 -0700
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
Message-ID: <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>

Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 32-bit affected?

Thanks,
Vladimir

On 9/13/17 11:54 PM, Dean Long wrote:
> It looks like you accidentally dropped hotspot-compiler-dev at openjdk.java.net when you added runtime.
> 
> dl
> 
> 
> On 9/13/2017 11:21 PM, jamsheed wrote:
>> (adding runtime list for inputs)
>>
>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>> brief desc: special handling of Object.<init> in TemplateInterpreter::deopt_reexecute_entry
>>>
>>> required last_sp to be reset explicitly in normal return path
>>>
>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, address bcp) {
>>> ? assert(method->contains(bcp), "just checkin'");
>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>> ? if (code == Bytecodes::_return) {
>>> ??? // This is used for deopt during registration of finalizers
>>> ??? // during Object.<init>.? We simply need to resume execution at
>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>> ??? // reexecuting the real bytecode would cause double registration
>>> ??? // of the finalizable object.
>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>
>> last_sp ! = null not an issue for this case, so i skip the assert in debug build
>>
>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>
>> Please review.
>>
>> Best Regards,
>> Jamsheed
>>
>>
>>
>>
>>
> 

From dmitry.chuyko at bell-sw.com  Tue Oct 10 15:13:31 2017
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Tue, 10 Oct 2017 18:13:31 +0300
Subject: RFR (XS): 8188221 - AARCH64: Return type profiling is not
 performed from aarch64 interpreter
In-Reply-To: <1de74a72-ec83-9fed-ffc8-091af58de457@bell-sw.com>
References: <1de74a72-ec83-9fed-ffc8-091af58de457@bell-sw.com>
Message-ID: <122214a9-e4e3-5765-fdca-10f73f79bf2a@bell-sw.com>

--- old/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp 
2017-10-02 09:10:20.917960334 +0000
+++ new/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp 
2017-10-02 09:10:20.293932959 +0000
@@ -414,6 +414,14 @@
 ?? __ restore_constant_pool_cache();
 ?? __ get_method(rmethod);

+? if (state == atos) {
+??? Register obj = r0;
+??? Register mdp = r1;
+??? Register tmp = r2;
+??? __ ldr(mdp, Address(rmethod, Method::method_data_offset()));
+??? __ profile_return_type(mdp, obj, tmp);
+? }
+
 ?? // Pop N words from the stack
 ?? __ get_cache_and_index_at_bcp(r1, r2, 1, index_size);
 ?? __ ldr(r1, Address(r1, ConstantPoolCache::base_offset() + 
ConstantPoolCacheEntry::flags_offset()));


On 10/10/2017 05:54 PM, Dmitry Chuyko wrote:
> Hello,
>
> TestArrayCopyNoInitDeopt jtreg test (JDK-8072016) fails in 
> -XX:-TieredCompilation mode because return type is not profiled in 
> interpreter.
> Please review the fix, it adds profiling for aarch64 similar to how 
> it's implemented for other cpus.
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8188221
> patch: jdk10.patch attached
>
> -Dmitry


From nils.eliasson at oracle.com  Wed Oct 11 09:28:13 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 11 Oct 2017 11:28:13 +0200
Subject: RFR (XXS): 8160303: parse_method_pattern only scans 254 chars
In-Reply-To: <2923877c-af26-398c-658a-2bace3b34fd3@oracle.com>
References: <7555f1ba-383f-3652-a701-765eda0417ac@oracle.com>
 <2923877c-af26-398c-658a-2bace3b34fd3@oracle.com>
Message-ID: <eebcac1a-f54e-3b0e-6f88-5672cca759f1@oracle.com>

Hi,

*redface*

Correct, fixed!

Regards,
Nils Eliasson

On 2017-09-19 20:45, Vladimir Kozlov wrote:
> It should be 1022: one for '(' + one for \0 at the end.
>
> Vladimir
>
> On 9/19/17 3:54 AM, Nils Eliasson wrote:
>> Hi,
>>
>> This patch fixes the wrong (too short) scan length in the signature 
>> parsing in methodMatcher.cpp.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8160303
>>
>> Webrev: http://cr.openjdk.java.net/~neliasso/8160303/webrev.01/
>>
>>
>> Please review,
>>
>> Nils Eliasson
>>
>>


From vladimir.x.ivanov at oracle.com  Wed Oct 11 11:59:22 2017
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 11 Oct 2017 14:59:22 +0300
Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't
 preserve FP registers to jdk8u-dev
In-Reply-To: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
References: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
Message-ID: <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>

Looks good.

Best regards,
Vladimir Ivanov

On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote:
> Hi,
> 		
> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev
> 
> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes.
> 
> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/
> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175
> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873
> 
> Test:  Run jtreg and jprt hotspot testsets.
> 
> Regards,
> Muthusamy C
> 

From jamsheed.c.m at oracle.com  Wed Oct 11 12:48:05 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Wed, 11 Oct 2017 18:18:05 +0530
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
Message-ID: <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>

Hi Vladimir,

Thank you for pointing this.

revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/

Best Regards,

Jamsheed


On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote:
> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 
> 32-bit affected?
>
> Thanks,
> Vladimir
>
> On 9/13/17 11:54 PM, Dean Long wrote:
>> It looks like you accidentally dropped 
>> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>
>> dl
>>
>>
>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>> (adding runtime list for inputs)
>>>
>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>> brief desc: special handling of Object.<init> in 
>>>> TemplateInterpreter::deopt_reexecute_entry
>>>>
>>>> required last_sp to be reset explicitly in normal return path
>>>>
>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, 
>>>> address bcp) {
>>>> ? assert(method->contains(bcp), "just checkin'");
>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>> ? if (code == Bytecodes::_return) {
>>>> ??? // This is used for deopt during registration of finalizers
>>>> ??? // during Object.<init>.? We simply need to resume execution at
>>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>>> ??? // reexecuting the real bytecode would cause double registration
>>>> ??? // of the finalizable object.
>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>
>>> last_sp ! = null not an issue for this case, so i skip the assert in 
>>> debug build
>>>
>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>
>>> Please review.
>>>
>>> Best Regards,
>>> Jamsheed
>>>
>>>
>>>
>>>
>>>
>>


From nils.eliasson at oracle.com  Wed Oct 11 13:15:37 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 11 Oct 2017 15:15:37 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <dk6k20cxtak.fsf@rwestrel.remote.csb>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
Message-ID: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com>

Hi Roland,

I have started reviewing and testing I will sponsor your change when the 
full review is completed.

Best Regards,

Nils


On 2017-10-03 15:19, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8186027/webrev.00/
>
> This converts loop:
>
> for (int i = start; i < stop; i += inc) {
>    // body
> }
>
> to a loop nest:
>
> i = start;
> if (i < stop) {
>    do {
>      int next = MIN(stop, i+LoopStripMiningIter*inc);
>      do {
>        // body
>        i += inc;
>      } while (i < next);
>      safepoint();
>    } while (i < stop);
> }
>
> (It's actually:
> int next = MIN(stop - i, LoopStripMiningIter*inc) + i;
> to protect against overflows)
>
> This should bring the best of running with UseCountedLoopSafepoints on
> and running with it off: low time to safepoint with little to no impact
> on throughput. That change was first pushed to the shenandoah repo
> several months ago and we've been running with it enabled since.
>
> The command line argument LoopStripMiningIter is the number of
> iterations between safepoints. In practice, with an arbitrary
> LoopStripMiningIter=1000, we observe time to safepoint on par with the
> current -XX:+UseCountedLoopSafepoints and most performance regressions
> due to -XX:+UseCountedLoopSafepoints gone. The exception is when an
> inner counted loop runs for a low number of iterations on average (and
> the compiler doesn't have an upper bound on the number of iteration).
>
> This is enabled on the command line with:
> -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000
>
> In PhaseIdealLoop::is_counted_loop(), when loop strip mining is enabled,
> for an inner loop, the compiler builds a skeleton outer loop around the
> the counted loop. The outer loop is kept as simple as possible so
> required adjustments to the existing loop optimizations are not too
> intrusive. The reason the outer loop is inserted early in the
> optimization process is so that optimizations are not disrupted: an
> alternate implementation could have kept the safepoint in the counted
> loop until loop opts are over and then only have added the outer loop
> and moved the safepoint to the outer loop. That would have prevented
> nodes that are referenced in the safepoint to be sunk out of loop for
> instance.
>
> The outer loop is a LoopNode with a backedge to a loop exit test and a
> safepoint. The loop exit test is a CmpI with a new Opaque5Node. The
> skeleton loop is populated with all required Phis after loop opts are
> over during macro expansion. At that point only, the loop exit tests are
> adjusted so the inner loop runs for at most LoopStripMiningIter. If the
> compiler can prove the inner loop runs for no more than
> LoopStripMiningIter then during macro expansion, the outer loop is
> removed. The safepoint is removed only if the inner loop executes for
> less than LoopStripMiningIterShortLoop so that if there are several
> counted loops in a raw, we still poll for safepoints regularly.
>
> Until macro expansion, there can be only a few extra nodes in the outer
> loop: nodes that would have sunk out of the inner loop and be kept in
> the outer loop by the safepoint.
>
> PhaseIdealLoop::clone_loop() which is used by most loop opts has now
> several ways of cloning a counted loop. For loop unswitching, both inner
> and outer loops need to be cloned. For unrolling, only the inner loop
> needs to be cloned. For pre/post loops insertion, only the inner loop
> needs to be cloned but the control flow must connect one of the inner
> loop copies to the outer loop of the other copy.
>
> Beyond verifying performance results with the usual benchmarks, when I
> implemented that change, I wrote test cases for (hopefully) every loop
> optimization and verified by inspection of the generated code that the
> loop opt triggers correct with loop strip mining.
>
> Roland.


From rwestrel at redhat.com  Wed Oct 11 13:53:59 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 11 Oct 2017 15:53:59 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com>
Message-ID: <dk660blvlhk.fsf@rwestrel.remote.csb>


> I have started reviewing and testing I will sponsor your change when the 
> full review is completed.

Thanks!

Roland.

From dmitry.chuyko at bell-sw.com  Wed Oct 11 16:30:54 2017
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Wed, 11 Oct 2017 19:30:54 +0300
Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
Message-ID: <c158f443-29f8-1e8a-42d0-db1f8533faa1@bell-sw.com>

Hello,

Please review an improvement of CRC32 calculation on AArch64.

MacroAssembler::kernel_crc32 gets table registers that are not used on 
-XX:+UseCRC32 path. They can be used to make neighbor loads and CRC 
calculations independent. Adding prologue and epilogue for main by-64 
loop makes it applicable starting from len=128 so additional by-32 loop 
is added for smaller lengths.

rfe: https://bugs.openjdk.java.net/browse/JDK-8189176
webrev: http://cr.openjdk.java.net/~dchuyko/8189176/webrev.00/
benchmark: http://cr.openjdk.java.net/~dchuyko/8189176/crc32/CRC32Bench.java

Results for T88 and A53 are good, but splitting pair loads may slow down 
other CPUs so measurements on different HW are highly welcome.

-Dmitry


From igor.veresov at oracle.com  Wed Oct 11 18:01:23 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 11 Oct 2017 11:01:23 -0700
Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo
 consolidation
Message-ID: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com>

This is to make mx-base project generation work again.

Webrev:  http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/

Thanks,
igor

From dean.long at oracle.com  Wed Oct 11 21:58:00 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 11 Oct 2017 14:58:00 -0700
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
 <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>
Message-ID: <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com>

For AARCH64 in templateTable_arm.cpp, how about using the same code as 
generate_deopt_entry_for?

 ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP
 ? __ restore_stack_top();


dl

On 10/11/17 5:48 AM, jamsheed wrote:
> Hi Vladimir,
>
> Thank you for pointing this.
>
> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>
> Best Regards,
>
> Jamsheed
>
>
> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote:
>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 
>> 32-bit affected?
>>
>> Thanks,
>> Vladimir
>>
>> On 9/13/17 11:54 PM, Dean Long wrote:
>>> It looks like you accidentally dropped 
>>> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>>
>>> dl
>>>
>>>
>>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>>> (adding runtime list for inputs)
>>>>
>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>>> brief desc: special handling of Object.<init> in 
>>>>> TemplateInterpreter::deopt_reexecute_entry
>>>>>
>>>>> required last_sp to be reset explicitly in normal return path
>>>>>
>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, 
>>>>> address bcp) {
>>>>> ? assert(method->contains(bcp), "just checkin'");
>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>>> ? if (code == Bytecodes::_return) {
>>>>> ??? // This is used for deopt during registration of finalizers
>>>>> ??? // during Object.<init>.? We simply need to resume execution at
>>>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>>>> ??? // reexecuting the real bytecode would cause double registration
>>>>> ??? // of the finalizable object.
>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>>
>>>> last_sp ! = null not an issue for this case, so i skip the assert 
>>>> in debug build
>>>>
>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>>
>>>> Please review.
>>>>
>>>> Best Regards,
>>>> Jamsheed
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>


From dean.long at oracle.com  Wed Oct 11 22:21:18 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 11 Oct 2017 15:21:18 -0700
Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo
 consolidation
In-Reply-To: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com>
References: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com>
Message-ID: <6bca0cf3-ecf3-f201-c13b-65e0c6cee11e@oracle.com>

Looks reasonable.

dl


On 10/11/17 11:01 AM, Igor Veresov wrote:
> This is to make mx-base project generation work again.
>
> Webrev:  http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/
>
> Thanks,
> igor


From igor.veresov at oracle.com  Wed Oct 11 23:18:13 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 11 Oct 2017 16:18:13 -0700
Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo
 consolidation
In-Reply-To: <6bca0cf3-ecf3-f201-c13b-65e0c6cee11e@oracle.com>
References: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com>
 <6bca0cf3-ecf3-f201-c13b-65e0c6cee11e@oracle.com>
Message-ID: <EB3BBD8F-DA31-46C3-A677-B985E49799E3@oracle.com>

Thanks, Dean!

igor

> On Oct 11, 2017, at 3:21 PM, dean.long at oracle.com wrote:
> 
> Looks reasonable.
> 
> dl
> 
> 
> On 10/11/17 11:01 AM, Igor Veresov wrote:
>> This is to make mx-base project generation work again.
>> 
>> Webrev:  http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/
>> 
>> Thanks,
>> igor
> 


From vladimir.kozlov at oracle.com  Wed Oct 11 23:20:07 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 11 Oct 2017 16:20:07 -0700
Subject: RFR(S) 8189183: [AOT] Fix eclipse project generation after repo
 consolidation
In-Reply-To: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com>
References: <83CF05B6-E9D8-40E8-AC65-99B35DCFC087@oracle.com>
Message-ID: <879c4551-7ec3-a68e-a62e-9fc7f6499817@oracle.com>

Looks good.

Thanks,
Vladimir

On 10/11/17 11:01 AM, Igor Veresov wrote:
> This is to make mx-base project generation work again.
> 
> Webrev:  http://cr.openjdk.java.net/~iveresov/8189183/webrev.00/
> 
> Thanks,
> igor
> 

From tobias.hartmann at oracle.com  Thu Oct 12 10:04:21 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 12 Oct 2017 12:04:21 +0200
Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with "assert(out
 == prev || prev == __null) failed: no branches off of store slice"
Message-ID: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8189067
http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/

The problem is in the C2 optimization that moves stores out of a loop [1].

We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up 
the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we may 
end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and also 
affects performance of the generated code due to double execution of the same store (see details in the bug comments).

My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and 
reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion that 
this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we create a 
clone of the store and connect it to the load but this store is not connected to the rest of the memory graph, i.e. the 
memory effect of the store is not propagated. Although this may not cause incorrect execution (at least we were not able 
to trigger that), it may cause problems if other optimizations kick in and in some cases we still end up with the same 
store being executed twice.

We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without 
creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the 
loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If 
people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now (this 
also affects JDK 9).

Thanks,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8080289

From jamsheed.c.m at oracle.com  Thu Oct 12 10:33:34 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Thu, 12 Oct 2017 16:03:34 +0530
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
 <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>
 <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com>
Message-ID: <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com>

Dean,

Thank you for the review, yes there is check for extended sp equality 
too. made the change

http://cr.openjdk.java.net/~jcm/8168712/webrev.02/

Best regards,

Jamsheed


On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote:
> For AARCH64 in templateTable_arm.cpp, how about using the same code as 
> generate_deopt_entry_for?
>
> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP
> ? __ restore_stack_top();
>
>
> dl
>
> On 10/11/17 5:48 AM, jamsheed wrote:
>> Hi Vladimir,
>>
>> Thank you for pointing this.
>>
>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>>
>> Best Regards,
>>
>> Jamsheed
>>
>>
>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote:
>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 
>>> 32-bit affected?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 9/13/17 11:54 PM, Dean Long wrote:
>>>> It looks like you accidentally dropped 
>>>> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>>>
>>>> dl
>>>>
>>>>
>>>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>>>> (adding runtime list for inputs)
>>>>>
>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>>>> brief desc: special handling of Object.<init> in 
>>>>>> TemplateInterpreter::deopt_reexecute_entry
>>>>>>
>>>>>> required last_sp to be reset explicitly in normal return path
>>>>>>
>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* 
>>>>>> method, address bcp) {
>>>>>> ? assert(method->contains(bcp), "just checkin'");
>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>>>> ? if (code == Bytecodes::_return) {
>>>>>> ??? // This is used for deopt during registration of finalizers
>>>>>> ??? // during Object.<init>.? We simply need to resume execution at
>>>>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>>>>> ??? // reexecuting the real bytecode would cause double registration
>>>>>> ??? // of the finalizable object.
>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>>>
>>>>> last_sp ! = null not an issue for this case, so i skip the assert 
>>>>> in debug build
>>>>>
>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>>>
>>>>> Please review.
>>>>>
>>>>> Best Regards,
>>>>> Jamsheed
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>


From muthusamy.chinnathambi at oracle.com  Thu Oct 12 10:46:04 2017
From: muthusamy.chinnathambi at oracle.com (Muthusamy Chinnathambi)
Date: Thu, 12 Oct 2017 03:46:04 -0700 (PDT)
Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't
 preserve FP registers to jdk8u-dev
In-Reply-To: <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>
References: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
 <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>
Message-ID: <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>

May I please get a second review for the change.

Regards,
Muthusamy C

-----Original Message-----
From: Vladimir Ivanov 
Sent: Wednesday, October 11, 2017 5:29 PM
To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>
Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev

Looks good.

Best regards,
Vladimir Ivanov

On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote:
> Hi,
> 		
> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev
> 
> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes.
> 
> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/
> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175
> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873
> 
> Test:  Run jtreg and jprt hotspot testsets.
> 
> Regards,
> Muthusamy C
> 

From erik.osterlund at oracle.com  Thu Oct 12 13:25:52 2017
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Thu, 12 Oct 2017 15:25:52 +0200
Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't
 preserve FP registers to jdk8u-dev
In-Reply-To: <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>
References: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
 <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>
 <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>
Message-ID: <59DF6D60.8050501@oracle.com>

Hi Muthusamy,

Looks good. But...

In PreserveFPRegistersTest.java:54
54: long regionSize = 1_000_000; //WB.g1RegionSize();

Was it intentional to hard code the region size to 1 000 000?

Thanks,
/Erik

On 2017-10-12 12:46, Muthusamy Chinnathambi wrote:
> May I please get a second review for the change.
>
> Regards,
> Muthusamy C
>
> -----Original Message-----
> From: Vladimir Ivanov
> Sent: Wednesday, October 11, 2017 5:29 PM
> To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>
> Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev
>
> Looks good.
>
> Best regards,
> Vladimir Ivanov
>
> On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote:
>> Hi,
>> 		
>> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev
>>
>> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes.
>>
>> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/
>> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175
>> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873
>>
>> Test:  Run jtreg and jprt hotspot testsets.
>>
>> Regards,
>> Muthusamy C
>>


From rwestrel at redhat.com  Thu Oct 12 14:12:10 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 12 Oct 2017 16:12:10 +0200
Subject: RFR(S): 8186125: "DU iteration must converge quickly" assert in split
 if with unsafe accesses
Message-ID: <dk6h8v4tpz9.fsf@rwestrel.remote.csb>


http://cr.openjdk.java.net/~roland/8186125/webrev.00/

Split if is missing support for graph shapes with the Opaque4Node that
was introduced for unsafe accesses by JDK-8176506.

In the test case, the 2 Unsafe accesses share a single Opaque4Node
before the if. When split if encounters the Cmp->Bol->Opaque4->If chain,
it only tries to clone Cmp->Bol when it should clone Cmp->Bol->Opaque4
to make one copy for each If.

Roland.

From tobias.hartmann at oracle.com  Thu Oct 12 14:24:42 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 12 Oct 2017 16:24:42 +0200
Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with
 "assert(out == prev || prev == __null) failed: no branches off of store
 slice"
In-Reply-To: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
Message-ID: <65e2a138-cbf9-cff9-ce73-f6caecd86852@oracle.com>

I forgot to mention that my fix also re-enables UseSubwordForMaxVector which was disabled due to JDK-8184995 [1] which 
turned out to be a duplicate of this issue and is not caused by UseSubwordForMaxVector.

[1] https://bugs.openjdk.java.net/browse/JDK-8184995

On 12.10.2017 12:04, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8189067
> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/
> 
> The problem is in the C2 optimization that moves stores out of a loop [1].
> 
> We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up 
> the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we may 
> end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and also 
> affects performance of the generated code due to double execution of the same store (see details in the bug comments).
> 
> My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and 
> reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion that 
> this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we create a 
> clone of the store and connect it to the load but this store is not connected to the rest of the memory graph, i.e. the 
> memory effect of the store is not propagated. Although this may not cause incorrect execution (at least we were not able 
> to trigger that), it may cause problems if other optimizations kick in and in some cases we still end up with the same 
> store being executed twice.
> 
> We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without 
> creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the 
> loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If 
> people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now (this 
> also affects JDK 9).
> 
> Thanks,
> Tobias
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8080289

From vladimir.kozlov at oracle.com  Thu Oct 12 17:33:31 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Oct 2017 10:33:31 -0700
Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with
 "assert(out == prev || prev == __null) failed: no branches off of store
 slice"
In-Reply-To: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
Message-ID: <837b748a-9bcc-00a4-9c42-660bfbf76902@oracle.com>

Good. I think we should leave this conservative fix without optimizing it. We should not spend a lot of time optimizing 
C2 now.

Thanks,
Vladimir

On 10/12/17 3:04 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8189067
> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/
> 
> The problem is in the C2 optimization that moves stores out of a loop [1].
> 
> We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up 
> the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we may 
> end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and also 
> affects performance of the generated code due to double execution of the same store (see details in the bug comments).
> 
> My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and 
> reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion that 
> this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we create a 
> clone of the store and connect it to the load but this store is not connected to the rest of the memory graph, i.e. the 
> memory effect of the store is not propagated. Although this may not cause incorrect execution (at least we were not able 
> to trigger that), it may cause problems if other optimizations kick in and in some cases we still end up with the same 
> store being executed twice.
> 
> We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without 
> creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the 
> loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If 
> people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now (this 
> also affects JDK 9).
> 
> Thanks,
> Tobias
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8080289

From vladimir.kozlov at oracle.com  Thu Oct 12 17:49:46 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Oct 2017 10:49:46 -0700
Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't
 preserve FP registers to jdk8u-dev
In-Reply-To: <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>
References: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
 <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>
 <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>
Message-ID: <c4a8ee01-c19f-15d7-fffe-96722f401105@oracle.com>

Why do you need to add test explicitly to hotspot_compiler group?
It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be 
used in all other testing. Did you check that the test is executed without you modifying TEST.groups?

Thanks,
Vladimir K

On 10/12/17 3:46 AM, Muthusamy Chinnathambi wrote:
> May I please get a second review for the change.
> 
> Regards,
> Muthusamy C
> 
> -----Original Message-----
> From: Vladimir Ivanov
> Sent: Wednesday, October 11, 2017 5:29 PM
> To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>
> Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote:
>> Hi,
>> 		
>> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev
>>
>> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes.
>>
>> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/
>> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175
>> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873
>>
>> Test:  Run jtreg and jprt hotspot testsets.
>>
>> Regards,
>> Muthusamy C
>>

From dean.long at oracle.com  Thu Oct 12 18:21:29 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 12 Oct 2017 11:21:29 -0700
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
 <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>
 <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com>
 <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com>
Message-ID: <e5aa476d-9374-7143-dea6-587e58a9e55d@oracle.com>

Looks good.

dl


On 10/12/17 3:33 AM, jamsheed wrote:
> Dean,
>
> Thank you for the review, yes there is check for extended sp equality 
> too. made the change
>
> http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>
> Best regards,
>
> Jamsheed
>
>
> On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote:
>> For AARCH64 in templateTable_arm.cpp, how about using the same code 
>> as generate_deopt_entry_for?
>>
>> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP
>> ? __ restore_stack_top();
>>
>>
>> dl
>>
>> On 10/11/17 5:48 AM, jamsheed wrote:
>>> Hi Vladimir,
>>>
>>> Thank you for pointing this.
>>>
>>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>>>
>>> Best Regards,
>>>
>>> Jamsheed
>>>
>>>
>>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote:
>>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 
>>>> 32-bit affected?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 9/13/17 11:54 PM, Dean Long wrote:
>>>>> It looks like you accidentally dropped 
>>>>> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>>>>
>>>>> dl
>>>>>
>>>>>
>>>>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>>>>> (adding runtime list for inputs)
>>>>>>
>>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>>>>> brief desc: special handling of Object.<init> in 
>>>>>>> TemplateInterpreter::deopt_reexecute_entry
>>>>>>>
>>>>>>> required last_sp to be reset explicitly in normal return path
>>>>>>>
>>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* 
>>>>>>> method, address bcp) {
>>>>>>> ? assert(method->contains(bcp), "just checkin'");
>>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>>>>> ? if (code == Bytecodes::_return) {
>>>>>>> ??? // This is used for deopt during registration of finalizers
>>>>>>> ??? // during Object.<init>.? We simply need to resume execution at
>>>>>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>>>>>> ??? // reexecuting the real bytecode would cause double 
>>>>>>> registration
>>>>>>> ??? // of the finalizable object.
>>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>>>>
>>>>>> last_sp ! = null not an issue for this case, so i skip the assert 
>>>>>> in debug build
>>>>>>
>>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>>>>
>>>>>> Please review.
>>>>>>
>>>>>> Best Regards,
>>>>>> Jamsheed
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>


From vladimir.kozlov at oracle.com  Thu Oct 12 18:50:45 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Oct 2017 11:50:45 -0700
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <e5aa476d-9374-7143-dea6-587e58a9e55d@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
 <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>
 <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com>
 <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com>
 <e5aa476d-9374-7143-dea6-587e58a9e55d@oracle.com>
Message-ID: <8f40236b-c8ff-4ae5-2e9a-89de588f610e@oracle.com>

+1

Thanks,
Vladimir

On 10/12/17 11:21 AM, dean.long at oracle.com wrote:
> Looks good.
> 
> dl
> 
> 
> On 10/12/17 3:33 AM, jamsheed wrote:
>> Dean,
>>
>> Thank you for the review, yes there is check for extended sp equality too. made the change
>>
>> http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>>
>> Best regards,
>>
>> Jamsheed
>>
>>
>> On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote:
>>> For AARCH64 in templateTable_arm.cpp, how about using the same code as generate_deopt_entry_for?
>>>
>>> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP
>>> ? __ restore_stack_top();
>>>
>>>
>>> dl
>>>
>>> On 10/11/17 5:48 AM, jamsheed wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thank you for pointing this.
>>>>
>>>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>>>>
>>>> Best Regards,
>>>>
>>>> Jamsheed
>>>>
>>>>
>>>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote:
>>>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 32-bit affected?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 9/13/17 11:54 PM, Dean Long wrote:
>>>>>> It looks like you accidentally dropped hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>>>>>
>>>>>> dl
>>>>>>
>>>>>>
>>>>>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>>>>>> (adding runtime list for inputs)
>>>>>>>
>>>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>>>>>> brief desc: special handling of Object.<init> in TemplateInterpreter::deopt_reexecute_entry
>>>>>>>>
>>>>>>>> required last_sp to be reset explicitly in normal return path
>>>>>>>>
>>>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* method, address bcp) {
>>>>>>>> ? assert(method->contains(bcp), "just checkin'");
>>>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>>>>>> ? if (code == Bytecodes::_return) {
>>>>>>>> ??? // This is used for deopt during registration of finalizers
>>>>>>>> ??? // during Object.<init>.? We simply need to resume execution at
>>>>>>>> ??? // the standard return vtos bytecode to pop the frame normally.
>>>>>>>> ??? // reexecuting the real bytecode would cause double registration
>>>>>>>> ??? // of the finalizable object.
>>>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>>>>>
>>>>>>> last_sp ! = null not an issue for this case, so i skip the assert in debug build
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>>>>>
>>>>>>> Please review.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Jamsheed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
> 

From vladimir.kozlov at oracle.com  Thu Oct 12 19:22:08 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Oct 2017 12:22:08 -0700
Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler
 dying subgraph with single if proj
In-Reply-To: <dk6mv59ydpe.fsf@rwestrel.remote.csb>
References: <dk6mv59ydpe.fsf@rwestrel.remote.csb>
Message-ID: <b3e1f4bf-049c-da07-a8d2-ee958e2661c8@oracle.com>

Yes, it is reasonable fix. We have other places where we check If's node outcnt().

May be move the check up to the method's beginning above Opcode() call which is virtual.

Thanks,
Vladimir

On 10/2/17 4:46 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8188223/webrev.00/
> 
> I saw the following crash (that I cannot reproduce anymore having
> deleted the replay file by mistake).
> 
> With subgraph shape:
> 
> UNC->Region->IfProj->RangeCheck
> 
> The region has the IfProj as single input. The following code in
> RegionNode::Ideal():
> 
>    if (can_reshape && cnt == 1) {
>      // Is it dead loop?
>      // If it is LoopNopde it had 2 (+1 itself) inputs and
>      // one of them was cut. The loop is dead if it was EntryContol.
>      // Loop node may have only one input because entry path
>      // is removed in PhaseIdealLoop::Dominators().
>      assert(!this->is_Loop() || cnt_orig <= 3, "Loop node should have 3 or less inputs");
>      if ((this->is_Loop() && (del_it == LoopNode::EntryControl ||
>                               (del_it == 0 && is_unreachable_region(phase)))) ||
>          (!this->is_Loop() && has_phis && is_unreachable_region(phase))) {
> 
> finds that the subgraph is unreachable which causes the IfProj to be
> removed. RangeCheckNode::Ideal() is later called on a dominated range
> check which walks the graph, hit the RangeCheck that has a single
> projection and causes a crash.
> 
> I think it makes sense to make IfNode::range_check_trap_proj() handle
> the case of a RangeCheckNode with a single input.
> 
> Roland.
> 

From dean.long at oracle.com  Thu Oct 12 19:46:27 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 12 Oct 2017 12:46:27 -0700
Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp()
 overhead
Message-ID: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8189244

http://cr.openjdk.java.net/~dlong/8189244/webrev

The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but the 
compiler cannot completely eliminate the code because of the virtual 
call to is_compiled() that could have side-effects. We can fix the 
problem by wrapping the whole thing in #ifdef ASSERT.

This change reduces the size of libjvm.so by almost 2K, and the size of 
frame::sender() by 8%.

dl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171012/8a48eea5/attachment.html>

From vladimir.kozlov at oracle.com  Thu Oct 12 21:13:15 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 12 Oct 2017 14:13:15 -0700
Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp()
 overhead
In-Reply-To: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com>
References: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com>
Message-ID: <35de2110-96f0-93f0-5444-94c695459e41@oracle.com>

Nice find.

Thanks,
Vladimir

On 10/12/17 12:46 PM, dean.long at oracle.com wrote:
> https://bugs.openjdk.java.net/browse/JDK-8189244
> 
> http://cr.openjdk.java.net/~dlong/8189244/webrev
> 
> The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but the compiler cannot completely eliminate the code 
> because of the virtual call to is_compiled() that could have side-effects. We can fix the problem by wrapping the whole 
> thing in #ifdef ASSERT.
> 
> This change reduces the size of libjvm.so by almost 2K, and the size of frame::sender() by 8%.
> 
> dl

From dean.long at oracle.com  Thu Oct 12 21:25:37 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 12 Oct 2017 14:25:37 -0700
Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp()
 overhead
In-Reply-To: <35de2110-96f0-93f0-5444-94c695459e41@oracle.com>
References: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com>
 <35de2110-96f0-93f0-5444-94c695459e41@oracle.com>
Message-ID: <f75274f6-440a-ca14-9bc2-f3d7f4d7d82f@oracle.com>

Thanks Vladimir.

dl


On 10/12/17 2:13 PM, Vladimir Kozlov wrote:
> Nice find.
>
> Thanks,
> Vladimir
>
> On 10/12/17 12:46 PM, dean.long at oracle.com wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8189244
>>
>> http://cr.openjdk.java.net/~dlong/8189244/webrev
>>
>> The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but 
>> the compiler cannot completely eliminate the code because of the 
>> virtual call to is_compiled() that could have side-effects. We can 
>> fix the problem by wrapping the whole thing in #ifdef ASSERT.
>>
>> This change reduces the size of libjvm.so by almost 2K, and the size 
>> of frame::sender() by 8%.
>>
>> dl


From jamsheed.c.m at oracle.com  Fri Oct 13 05:38:46 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Fri, 13 Oct 2017 11:08:46 +0530
Subject: [10] RFR: [AOT] assert(false) failed: DEBUG MESSAGE:
 InterpreterMacroAssembler::call_VM_base: last_sp != NULL
In-Reply-To: <8f40236b-c8ff-4ae5-2e9a-89de588f610e@oracle.com>
References: <bfa3308e-6788-9584-109c-8a998e690523@oracle.com>
 <8eb949bd-fa14-2722-5eaa-21a0a0c95b26@oracle.com>
 <069663cb-4b48-483e-7d1b-8619dafe616d@oracle.com>
 <af342fb4-5908-2256-cdb3-ad5e4bc7cbdc@oracle.com>
 <34517155-bd8d-56ee-86a1-7f9ffe73e2de@oracle.com>
 <01b62784-332e-8b6b-ac15-5fe4c21f9cd5@oracle.com>
 <497a7224-6aab-e5b0-4e72-5475b2ab5579@oracle.com>
 <e5aa476d-9374-7143-dea6-587e58a9e55d@oracle.com>
 <8f40236b-c8ff-4ae5-2e9a-89de588f610e@oracle.com>
Message-ID: <37326ba7-f520-ca96-bdc2-31c8fe99a52d@oracle.com>

Thanks for the review, Dean, Vladimir

Best regards,

Jamsheed


On Friday 13 October 2017 12:20 AM, Vladimir Kozlov wrote:
> +1
>
> Thanks,
> Vladimir
>
> On 10/12/17 11:21 AM, dean.long at oracle.com wrote:
>> Looks good.
>>
>> dl
>>
>>
>> On 10/12/17 3:33 AM, jamsheed wrote:
>>> Dean,
>>>
>>> Thank you for the review, yes there is check for extended sp 
>>> equality too. made the change
>>>
>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>>>
>>> Best regards,
>>>
>>> Jamsheed
>>>
>>>
>>> On Thursday 12 October 2017 03:28 AM, dean.long at oracle.com wrote:
>>>> For AARCH64 in templateTable_arm.cpp, how about using the same code 
>>>> as generate_deopt_entry_for?
>>>>
>>>> ? __ restore_sp_after_call(Rtemp);? // Restore SP to extended SP
>>>> ? __ restore_stack_top();
>>>>
>>>>
>>>> dl
>>>>
>>>> On 10/11/17 5:48 AM, jamsheed wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thank you for pointing this.
>>>>>
>>>>> revised webrev: http://cr.openjdk.java.net/~jcm/8168712/webrev.02/
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Jamsheed
>>>>>
>>>>>
>>>>> On Tuesday 10 October 2017 08:41 PM, Vladimir Kozlov wrote:
>>>>>> Why you added !defined(AARCH64) in templateTable_arm.cpp? Is only 
>>>>>> 32-bit affected?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 9/13/17 11:54 PM, Dean Long wrote:
>>>>>>> It looks like you accidentally dropped 
>>>>>>> hotspot-compiler-dev at openjdk.java.net when you added runtime.
>>>>>>>
>>>>>>> dl
>>>>>>>
>>>>>>>
>>>>>>> On 9/13/2017 11:21 PM, jamsheed wrote:
>>>>>>>> (adding runtime list for inputs)
>>>>>>>>
>>>>>>>> On Monday 11 September 2017 11:43 PM, jamsheed wrote:
>>>>>>>>> brief desc: special handling of Object.<init> in 
>>>>>>>>> TemplateInterpreter::deopt_reexecute_entry
>>>>>>>>>
>>>>>>>>> required last_sp to be reset explicitly in normal return path
>>>>>>>>>
>>>>>>>>> address TemplateInterpreter::deopt_reexecute_entry(Method* 
>>>>>>>>> method, address bcp) {
>>>>>>>>> ? assert(method->contains(bcp), "just checkin'");
>>>>>>>>> ? Bytecodes::Code code?? = Bytecodes::java_code_at(method, bcp);
>>>>>>>>> ? if (code == Bytecodes::_return) {
>>>>>>>>> ??? // This is used for deopt during registration of finalizers
>>>>>>>>> ??? // during Object.<init>.? We simply need to resume 
>>>>>>>>> execution at
>>>>>>>>> ??? // the standard return vtos bytecode to pop the frame 
>>>>>>>>> normally.
>>>>>>>>> ??? // reexecuting the real bytecode would cause double 
>>>>>>>>> registration
>>>>>>>>> ??? // of the finalizable object.
>>>>>>>>> ??? return _normal_table.entry(Bytecodes::_return).entry(vtos); 
>>>>>>>>
>>>>>>>> last_sp ! = null not an issue for this case, so i skip the 
>>>>>>>> assert in debug build
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~jcm/8168712/webrev.01/
>>>>>>>>
>>>>>>>> Please review.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Jamsheed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>


From tobias.hartmann at oracle.com  Fri Oct 13 06:08:56 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 13 Oct 2017 08:08:56 +0200
Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with
 "assert(out == prev || prev == __null) failed: no branches off of store
 slice"
In-Reply-To: <837b748a-9bcc-00a4-9c42-660bfbf76902@oracle.com>
References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
 <837b748a-9bcc-00a4-9c42-660bfbf76902@oracle.com>
Message-ID: <78d1643c-a89d-7453-01d4-1b3cbd33d5e4@oracle.com>

Hi Vladimir,

thanks for the review!

On 12.10.2017 19:33, Vladimir Kozlov wrote:
> Good. I think we should leave this conservative fix without optimizing it. We should not spend a lot of time optimizing 
> C2 now.

Okay, that's fine with me. I anyone wants to optimize that later, he/she can file an RFE.

Best regards,
Tobias
> On 10/12/17 3:04 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8189067
>> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/
>>
>> The problem is in the C2 optimization that moves stores out of a loop [1].
>>
>> We move stores out of a loop by creating clones at all loop exit paths that observe the stored value. When walking up 
>> the dominator chain from observers of a store and placing clones of the store at the top (right after the loop), we 
>> may end up placing multiple cloned stores at the same location. This confuses/crashes the SuperWord optimization and 
>> also affects performance of the generated code due to double execution of the same store (see details in the bug 
>> comments).
>>
>> My initial prototype fix (webrev.00) just checked if there already is a cloned store with the same control input and 
>> reused it instead of creating another one. I've discussed this with Roland in detail and we came to the conclusion 
>> that this is not sufficient. We may still end up with a broken memory graph: If one use of the store is a load, we 
>> create a clone of the store and connect it to the load but this store is not connected to the rest of the memory 
>> graph, i.e. the memory effect of the store is not propagated. Although this may not cause incorrect execution (at 
>> least we were not able to trigger that), it may cause problems if other optimizations kick in and in some cases we 
>> still end up with the same store being executed twice.
>>
>> We agreed to go with the safer version of computing the LCA and move the store there (only the initial store without 
>> creating clones) if it's outside of the loop. This is a bit too strict in cases where there's an uncommon trap in the 
>> loop (see 'TestMoveStoresOutOfLoops::test_after_6') but it works fine in the other cases exercised by the test. If 
>> people agree, I'll file a follow up RFE to improve the optimization but would like to go with the safe fix for now 
>> (this also affects JDK 9).
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8080289

From rwestrel at redhat.com  Fri Oct 13 07:14:08 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 13 Oct 2017 09:14:08 +0200
Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with
 "assert(out == prev || prev == __null) failed: no branches off of store
 slice"
In-Reply-To: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
Message-ID: <dk6bmlbtt8f.fsf@rwestrel.remote.csb>


> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/

That looks good to me.

Roland.

From tobias.hartmann at oracle.com  Fri Oct 13 07:15:06 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 13 Oct 2017 09:15:06 +0200
Subject: [10] RFR(S): 8189067: SuperWord optimization crashes with
 "assert(out == prev || prev == __null) failed: no branches off of store
 slice"
In-Reply-To: <dk6bmlbtt8f.fsf@rwestrel.remote.csb>
References: <93577652-bac2-9231-8710-1a2c01d58e63@oracle.com>
 <dk6bmlbtt8f.fsf@rwestrel.remote.csb>
Message-ID: <7dbb7e2e-6f1a-9607-5284-8deab79a647a@oracle.com>

Thanks Roland!

Best regards,
Tobias

On 13.10.2017 09:14, Roland Westrelin wrote:
> 
>> http://cr.openjdk.java.net/~thartmann/8189067/webrev.01/
> 
> That looks good to me.
> 
> Roland.
> 

From kevin.walls at oracle.com  Fri Oct 13 09:25:01 2017
From: kevin.walls at oracle.com (Kevin Walls)
Date: Fri, 13 Oct 2017 10:25:01 +0100
Subject: [8u] RFF(S): 8164954: split_if creates empty phi and region nodes
Message-ID: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com>

Hi,

I'd like to get a review of a backport to 8u.

bug: https://bugs.openjdk.java.net/browse/JDK-8164954

9 changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/38f38c10a11d

Review thread: 
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025773.html


It doesn't hg import cleanly as the surrounding code is a little 
different: this change adds a condition in split_if() which may make 
that method return earlier, but 8u does not have the block after the 
change, beginning "if (nb_predicate_proj > 1) {", that comes in with 
8078426.

The 8u change has been through jprt testing and also tested with the 
testsuite of a Java-based product which was seen hitting the same assert 
as in this bug.? hg diff of the proposed 8u change is below, I think 
that's enough but can offer a webrev if anybody needs one.

Thanks!
Kevin


bash-4.2$ hg diff src/share/vm/opto/ifnode.cpp
diff -r c89173159237 src/share/vm/opto/ifnode.cpp
--- a/src/share/vm/opto/ifnode.cpp????? Thu Sep 07 10:15:21 2017 -0400
+++ b/src/share/vm/opto/ifnode.cpp????? Fri Oct 13 02:03:00 2017 -0700
@@ -234,6 +234,13 @@
 ?????? predicate_proj = proj;
 ???? }
 ?? }
+
+? // If all the defs of the phi are the same constant, we already have 
the desired end state.
+? // Skip the split that would create empty phi and region nodes.
+? if((r->req() - req_c) == 1) {
+??? return NULL;
+? }
+
 ?? Node* predicate_c = NULL;
 ?? Node* predicate_x = NULL;
 ?? bool counted_loop = r->is_CountedLoop();


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171013/bbdd62f2/attachment.html>

From muthusamy.chinnathambi at oracle.com  Fri Oct 13 10:53:58 2017
From: muthusamy.chinnathambi at oracle.com (Muthusamy Chinnathambi)
Date: Fri, 13 Oct 2017 03:53:58 -0700 (PDT)
Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't
 preserve FP registers to jdk8u-dev
In-Reply-To: <c4a8ee01-c19f-15d7-fffe-96722f401105@oracle.com>
References: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
 <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>
 <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>
 <c4a8ee01-c19f-15d7-fffe-96722f401105@oracle.com>
Message-ID: <91ca5a56-8d20-4714-8b09-c767574af4ae@default>

Hi Vladimir,

> Why do you need to add test explicitly to hotspot_compiler group?
> It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be used in all other 
> testing.
You are right, it should get picked implicitly as part of compact1_minimal group.

> Did you check that the test is executed without you modifying TEST.groups?
Now - yes. Without my TEST.groups modification the test gets executed.

I will drop the change in TEST.groups file.
Please note, this request is only for 8u.

Regards,
Muthusamy C 

-----Original Message-----
From: Vladimir Kozlov 
Sent: Thursday, October 12, 2017 11:20 PM
To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net
Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev

Why do you need to add test explicitly to hotspot_compiler group?
It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be 
used in all other testing. Did you check that the test is executed without you modifying TEST.groups?

Thanks,
Vladimir K

On 10/12/17 3:46 AM, Muthusamy Chinnathambi wrote:
> May I please get a second review for the change.
> 
> Regards,
> Muthusamy C
> 
> -----Original Message-----
> From: Vladimir Ivanov
> Sent: Wednesday, October 11, 2017 5:29 PM
> To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>
> Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev
> 
> Looks good.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote:
>> Hi,
>> 		
>> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev
>>
>> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes.
>>
>> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/
>> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175
>> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873
>>
>> Test:  Run jtreg and jprt hotspot testsets.
>>
>> Regards,
>> Muthusamy C
>>

From claes.redestad at oracle.com  Fri Oct 13 12:08:19 2017
From: claes.redestad at oracle.com (Claes Redestad)
Date: Fri, 13 Oct 2017 14:08:19 +0200
Subject: RFR(XXS): 8189244: x86: eliminate frame::adjust_unextended_sp()
 overhead
In-Reply-To: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com>
References: <9b6909a2-2881-dd20-0af9-4cefe484b326@oracle.com>
Message-ID: <3b0da201-1c3a-fed1-1a10-c59af9476c4e@oracle.com>

Hi Dean,

you asked me to do a quick check if this helps Exception/stack walk
performance:

Benchmark????????????????? Mode? Cnt? Score?? Error?? Units
Throw.throwSyncException? thrpt? 100? 0.803 ? 0.029? ops/us
Throw.throwSyncException? thrpt? 100? 0.867 ? 0.028? ops/us?? # 8%

... thus a significant improvement!

Startup is improved on some measures (total #instructions down 500k,
significant) but not enough to be statistically significant on wall clock
measures.

/Claes

On 2017-10-12 21:46, dean.long at oracle.com wrote:
>
> https://bugs.openjdk.java.net/browse/JDK-8189244
>
> http://cr.openjdk.java.net/~dlong/8189244/webrev
>
> The work that frame::adjust_unextended_sp() does is DEBUG_ONLY, but 
> the compiler cannot completely eliminate the code because of the 
> virtual call to is_compiled() that could have side-effects. We can fix 
> the problem by wrapping the whole thing in #ifdef ASSERT.
>
> This change reduces the size of libjvm.so by almost 2K, and the size 
> of frame::sender() by 8%.
>
> dl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171013/43457ca7/attachment.html>

From vladimir.kozlov at oracle.com  Fri Oct 13 17:53:11 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 13 Oct 2017 10:53:11 -0700
Subject: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't
 preserve FP registers to jdk8u-dev
In-Reply-To: <91ca5a56-8d20-4714-8b09-c767574af4ae@default>
References: <30dbb109-c259-4529-b846-e4afffc94bd0@default>
 <19c869b6-d595-1fde-1481-d2fa583eae3d@oracle.com>
 <c7c306bf-ec8c-41b0-a4a8-61d190232d4f@default>
 <c4a8ee01-c19f-15d7-fffe-96722f401105@oracle.com>
 <91ca5a56-8d20-4714-8b09-c767574af4ae@default>
Message-ID: <f597184c-896f-dc0a-2a7e-50dca18d0e58@oracle.com>

Good.

Thanks,
Vladimir

On 10/13/17 3:53 AM, Muthusamy Chinnathambi wrote:
> Hi Vladimir,
> 
>> Why do you need to add test explicitly to hotspot_compiler group?
>> It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be used in all other
>> testing.
> You are right, it should get picked implicitly as part of compact1_minimal group.
> 
>> Did you check that the test is executed without you modifying TEST.groups?
> Now - yes. Without my TEST.groups modification the test gets executed.
> 
> I will drop the change in TEST.groups file.
> Please note, this request is only for 8u.
> 
> Regards,
> Muthusamy C
> 
> -----Original Message-----
> From: Vladimir Kozlov
> Sent: Thursday, October 12, 2017 11:20 PM
> To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net
> Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev
> 
> Why do you need to add test explicitly to hotspot_compiler group?
> It should be included implicitly into compact1_minimal group as other compiler/ tests. And compact1_minimal should be
> used in all other testing. Did you check that the test is executed without you modifying TEST.groups?
> 
> Thanks,
> Vladimir K
> 
> On 10/12/17 3:46 AM, Muthusamy Chinnathambi wrote:
>> May I please get a second review for the change.
>>
>> Regards,
>> Muthusamy C
>>
>> -----Original Message-----
>> From: Vladimir Ivanov
>> Sent: Wednesday, October 11, 2017 5:29 PM
>> To: Muthusamy Chinnathambi <muthusamy.chinnathambi at oracle.com>
>> Cc: hotspot-gc-dev at openjdk.java.net; hotspot compiler <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: [8u] RFR for backport of JDK-8148175: C1: G1 barriers don't preserve FP registers to jdk8u-dev
>>
>> Looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 10/11/17 12:33 PM, Muthusamy Chinnathambi wrote:
>>> Hi,
>>> 		
>>> Please review the backport of bug: "JDK-8148175: C1: G1 barriers don't preserve FP registers" to jdk8u-dev
>>>
>>> Please note that this is not a clean backport due to new entries in TEST.groups and copyright changes.
>>>
>>> Webrev: http://cr.openjdk.java.net/~mchinnathamb/8148175/webrev.00/
>>> jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8148175
>>> Original patch pushed to jdk9: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a9334e785873
>>>
>>> Test:  Run jtreg and jprt hotspot testsets.
>>>
>>> Regards,
>>> Muthusamy C
>>>

From vladimir.kozlov at oracle.com  Fri Oct 13 17:57:46 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 13 Oct 2017 10:57:46 -0700
Subject: [8u] RFF(S): 8164954: split_if creates empty phi and region nodes
In-Reply-To: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com>
References: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com>
Message-ID: <df8e4f77-5fcb-819b-a6c7-51eb2e5eb629@oracle.com>

8u change looks good.

Thanks,
Vladimir

On 10/13/17 2:25 AM, Kevin Walls wrote:
> Hi,
> 
> I'd like to get a review of a backport to 8u.
> 
> bug: https://bugs.openjdk.java.net/browse/JDK-8164954
> 
> 9 changeset: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/38f38c10a11d
> 
> Review thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025773.html
> 
> 
> It doesn't hg import cleanly as the surrounding code is a little different: this change adds a condition in split_if() 
> which may make that method return earlier, but 8u does not have the block after the change, beginning "if 
> (nb_predicate_proj > 1) {", that comes in with 8078426.
> 
> The 8u change has been through jprt testing and also tested with the testsuite of a Java-based product which was seen 
> hitting the same assert as in this bug.? hg diff of the proposed 8u change is below, I think that's enough but can offer 
> a webrev if anybody needs one.
> 
> Thanks!
> Kevin
> 
> 
> bash-4.2$ hg diff src/share/vm/opto/ifnode.cpp
> diff -r c89173159237 src/share/vm/opto/ifnode.cpp
> --- a/src/share/vm/opto/ifnode.cpp????? Thu Sep 07 10:15:21 2017 -0400
> +++ b/src/share/vm/opto/ifnode.cpp????? Fri Oct 13 02:03:00 2017 -0700
> @@ -234,6 +234,13 @@
>  ?????? predicate_proj = proj;
>  ???? }
>  ?? }
> +
> +? // If all the defs of the phi are the same constant, we already have the desired end state.
> +? // Skip the split that would create empty phi and region nodes.
> +? if((r->req() - req_c) == 1) {
> +??? return NULL;
> +? }
> +
>  ?? Node* predicate_c = NULL;
>  ?? Node* predicate_x = NULL;
>  ?? bool counted_loop = r->is_CountedLoop();
> 
> 

From kevin.walls at oracle.com  Fri Oct 13 22:04:16 2017
From: kevin.walls at oracle.com (Kevin Walls)
Date: Fri, 13 Oct 2017 23:04:16 +0100
Subject: [8u] RFF(S): 8164954: split_if creates empty phi and region nodes
In-Reply-To: <df8e4f77-5fcb-819b-a6c7-51eb2e5eb629@oracle.com>
References: <6620f37e-221b-0533-00d8-3db4f8a63f09@oracle.com>
 <df8e4f77-5fcb-819b-a6c7-51eb2e5eb629@oracle.com>
Message-ID: <cf63aa2a-d72d-5a10-25ff-9c78c8f1e603@oracle.com>

Thanks Vladimir!


On 13/10/2017 18:57, Vladimir Kozlov wrote:
> 8u change looks good.
>
> Thanks,
> Vladimir
>
> On 10/13/17 2:25 AM, Kevin Walls wrote:
>> Hi,
>>
>> I'd like to get a review of a backport to 8u.
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8164954
>>
>> 9 changeset: 
>> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/38f38c10a11d
>>
>> Review thread: 
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025773.html
>>
>>
>> It doesn't hg import cleanly as the surrounding code is a little 
>> different: this change adds a condition in split_if() which may make 
>> that method return earlier, but 8u does not have the block after the 
>> change, beginning "if (nb_predicate_proj > 1) {", that comes in with 
>> 8078426.
>>
>> The 8u change has been through jprt testing and also tested with the 
>> testsuite of a Java-based product which was seen hitting the same 
>> assert as in this bug.? hg diff of the proposed 8u change is below, I 
>> think that's enough but can offer a webrev if anybody needs one.
>>
>> Thanks!
>> Kevin
>>
>>
>> bash-4.2$ hg diff src/share/vm/opto/ifnode.cpp
>> diff -r c89173159237 src/share/vm/opto/ifnode.cpp
>> --- a/src/share/vm/opto/ifnode.cpp????? Thu Sep 07 10:15:21 2017 -0400
>> +++ b/src/share/vm/opto/ifnode.cpp????? Fri Oct 13 02:03:00 2017 -0700
>> @@ -234,6 +234,13 @@
>> ??????? predicate_proj = proj;
>> ????? }
>> ??? }
>> +
>> +? // If all the defs of the phi are the same constant, we already 
>> have the desired end state.
>> +? // Skip the split that would create empty phi and region nodes.
>> +? if((r->req() - req_c) == 1) {
>> +??? return NULL;
>> +? }
>> +
>> ??? Node* predicate_c = NULL;
>> ??? Node* predicate_x = NULL;
>> ??? bool counted_loop = r->is_CountedLoop();
>>
>>


From rkennke at redhat.com  Sat Oct 14 22:41:05 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 15 Oct 2017 00:41:05 +0200
Subject: RFR: 8171853: Remove Shark compiler
Message-ID: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>

The JEP to remove the Shark compiler has received exclusively positive 
feedback (JDK-8189173) on zero-dev. So here comes the big patch to 
remove it.

What I have done:

grep -i -R shark src
grep -i -R shark make
grep -i -R shark doc
grep -i -R shark doc

and purged any reference to shark. Almost everything was straightforward.

The only things I wasn't really sure of:

- in globals.hpp, I re-arranged the KIND_* bits to account for the gap 
that removing KIND_SHARK left. I hope that's good?
- in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
pd_address_in_code(), I am not sure it is the right thing to do. If not, 
what *would* be the right thing?

Then of course I did:

rm -rf src/hotspot/share/shark

I also went through the build machinery and removed stuff related to 
Shark and LLVM libs.

Now the only references in the whole JDK tree to shark is a 'Shark Bay' 
in a timezone file, and 'Wireshark' in some tests ;-)

I tested by building a regular x86 JVM and running JTREG tests. All 
looks fine.

- I could not build zero because it seems broken because of the recent 
Atomic::* changes
- I could not test any of the other arches that seemed to reference 
Shark (arm and sparc)

Here's the full webrev:

http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>

Can I get a review on this?

Thanks, Roman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171015/90b7acd5/attachment.html>

From rkennke at redhat.com  Sun Oct 15 20:20:17 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 15 Oct 2017 22:20:17 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de>
Message-ID: <d5b9c0f3-66a8-7f1b-4d8f-ce9e9e5f373f@redhat.com>

Hi Adrian,
> Please let me look at SPARC next week first before merging this.
Thanks! Will wait for your feedback!
> And thanks for notifying me that Zero is broken again *sigh*.
It seems to be only a little thing. I have a fix that I'm currently 
testing. Will file another bug and an RFR soon.

Thanks, Roman


From david.holmes at oracle.com  Sun Oct 15 20:48:23 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 06:48:23 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
Message-ID: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>

Hi Roman,

The build changes must be reviewed on build-dev - now cc'd.

Thanks,
David

On 15/10/2017 8:41 AM, Roman Kennke wrote:
> The JEP to remove the Shark compiler has received exclusively positive 
> feedback (JDK-8189173) on zero-dev. So here comes the big patch to 
> remove it.
> 
> What I have done:
> 
> grep -i -R shark src
> grep -i -R shark make
> grep -i -R shark doc
> grep -i -R shark doc
> 
> and purged any reference to shark. Almost everything was straightforward.
> 
> The only things I wasn't really sure of:
> 
> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap 
> that removing KIND_SHARK left. I hope that's good?
> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
> pd_address_in_code(), I am not sure it is the right thing to do. If not, 
> what *would* be the right thing?
> 
> Then of course I did:
> 
> rm -rf src/hotspot/share/shark
> 
> I also went through the build machinery and removed stuff related to 
> Shark and LLVM libs.
> 
> Now the only references in the whole JDK tree to shark is a 'Shark Bay' 
> in a timezone file, and 'Wireshark' in some tests ;-)
> 
> I tested by building a regular x86 JVM and running JTREG tests. All 
> looks fine.
> 
> - I could not build zero because it seems broken because of the recent 
> Atomic::* changes
> - I could not test any of the other arches that seemed to reference 
> Shark (arm and sparc)
> 
> Here's the full webrev:
> 
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
> 
> Can I get a review on this?
> 
> Thanks, Roman
> 

From rkennke at redhat.com  Sun Oct 15 21:01:42 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 15 Oct 2017 23:01:42 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
Message-ID: <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>

Hi David,

thanks!

I'm uploading a 2nd revision of the patch that excludes the 
generated-configure.sh part, and adds a smallish Zero-related fix.

http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>

Thanks, Roman


> Hi Roman,
>
> The build changes must be reviewed on build-dev - now cc'd.
>
> Thanks,
> David
>
> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>> The JEP to remove the Shark compiler has received exclusively 
>> positive feedback (JDK-8189173) on zero-dev. So here comes the big 
>> patch to remove it.
>>
>> What I have done:
>>
>> grep -i -R shark src
>> grep -i -R shark make
>> grep -i -R shark doc
>> grep -i -R shark doc
>>
>> and purged any reference to shark. Almost everything was 
>> straightforward.
>>
>> The only things I wasn't really sure of:
>>
>> - in globals.hpp, I re-arranged the KIND_* bits to account for the 
>> gap that removing KIND_SHARK left. I hope that's good?
>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>> pd_address_in_code(), I am not sure it is the right thing to do. If 
>> not, what *would* be the right thing?
>>
>> Then of course I did:
>>
>> rm -rf src/hotspot/share/shark
>>
>> I also went through the build machinery and removed stuff related to 
>> Shark and LLVM libs.
>>
>> Now the only references in the whole JDK tree to shark is a 'Shark 
>> Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>
>> I tested by building a regular x86 JVM and running JTREG tests. All 
>> looks fine.
>>
>> - I could not build zero because it seems broken because of the 
>> recent Atomic::* changes
>> - I could not test any of the other arches that seemed to reference 
>> Shark (arm and sparc)
>>
>> Here's the full webrev:
>>
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>
>> Can I get a review on this?
>>
>> Thanks, Roman
>>


From david.holmes at oracle.com  Sun Oct 15 21:23:52 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 07:23:52 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
Message-ID: <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com>

Hi Roman,

I've looked at all the changes for the build and hotspot and everything 
appears okay to me. Still need someone from compiler team and build team 
to sign off on this though.

One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these 
includes would seem to be impossible:

   38 #ifdef COMPILER1
   39 #include "c1/c1_Runtime1.hpp"
   40 #endif
   41 #ifdef COMPILER2
   42 #include "opto/runtime.hpp"
   43 #endif

no?

In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment 
entirely as it's obviously C2:

if (is_c2_compile(comp_level)) { // C2

Ditto in src/hotspot/share/compiler/compileBroker.cpp

!     // C2
       make_thread(name_buffer, _c2_compile_queue, counters, 
_compilers[1], compiler_thread, CHECK);

Thanks,
David
-----

On 16/10/2017 6:48 AM, David Holmes wrote:
> Hi Roman,
> 
> The build changes must be reviewed on build-dev - now cc'd.
> 
> Thanks,
> David
> 
> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>> The JEP to remove the Shark compiler has received exclusively positive 
>> feedback (JDK-8189173) on zero-dev. So here comes the big patch to 
>> remove it.
>>
>> What I have done:
>>
>> grep -i -R shark src
>> grep -i -R shark make
>> grep -i -R shark doc
>> grep -i -R shark doc
>>
>> and purged any reference to shark. Almost everything was straightforward.
>>
>> The only things I wasn't really sure of:
>>
>> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap 
>> that removing KIND_SHARK left. I hope that's good?
>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>> pd_address_in_code(), I am not sure it is the right thing to do. If 
>> not, what *would* be the right thing?
>>
>> Then of course I did:
>>
>> rm -rf src/hotspot/share/shark
>>
>> I also went through the build machinery and removed stuff related to 
>> Shark and LLVM libs.
>>
>> Now the only references in the whole JDK tree to shark is a 'Shark 
>> Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>
>> I tested by building a regular x86 JVM and running JTREG tests. All 
>> looks fine.
>>
>> - I could not build zero because it seems broken because of the recent 
>> Atomic::* changes
>> - I could not test any of the other arches that seemed to reference 
>> Shark (arm and sparc)
>>
>> Here's the full webrev:
>>
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>
>> Can I get a review on this?
>>
>> Thanks, Roman
>>

From david.holmes at oracle.com  Sun Oct 15 21:25:04 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 07:25:04 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
Message-ID: <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>

On 16/10/2017 7:01 AM, Roman Kennke wrote:
> Hi David,
> 
> thanks!
> 
> I'm uploading a 2nd revision of the patch that excludes the 
> generated-configure.sh part, and adds a smallish Zero-related fix.
> 
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>

Can you point me to the exact change please as I don't want to 
re-examine it all. :)

I'll pull this in and do a test build run internally.

Thanks,
David

> Thanks, Roman
> 
> 
>> Hi Roman,
>>
>> The build changes must be reviewed on build-dev - now cc'd.
>>
>> Thanks,
>> David
>>
>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>> The JEP to remove the Shark compiler has received exclusively 
>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big 
>>> patch to remove it.
>>>
>>> What I have done:
>>>
>>> grep -i -R shark src
>>> grep -i -R shark make
>>> grep -i -R shark doc
>>> grep -i -R shark doc
>>>
>>> and purged any reference to shark. Almost everything was 
>>> straightforward.
>>>
>>> The only things I wasn't really sure of:
>>>
>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the 
>>> gap that removing KIND_SHARK left. I hope that's good?
>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>> pd_address_in_code(), I am not sure it is the right thing to do. If 
>>> not, what *would* be the right thing?
>>>
>>> Then of course I did:
>>>
>>> rm -rf src/hotspot/share/shark
>>>
>>> I also went through the build machinery and removed stuff related to 
>>> Shark and LLVM libs.
>>>
>>> Now the only references in the whole JDK tree to shark is a 'Shark 
>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>
>>> I tested by building a regular x86 JVM and running JTREG tests. All 
>>> looks fine.
>>>
>>> - I could not build zero because it seems broken because of the 
>>> recent Atomic::* changes
>>> - I could not test any of the other arches that seemed to reference 
>>> Shark (arm and sparc)
>>>
>>> Here's the full webrev:
>>>
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>
>>> Can I get a review on this?
>>>
>>> Thanks, Roman
>>>
> 

From david.holmes at oracle.com  Sun Oct 15 21:29:33 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 07:29:33 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
Message-ID: <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>

Just spotted this:

./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java:    /** 
{@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */

David

On 16/10/2017 7:25 AM, David Holmes wrote:
> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>> Hi David,
>>
>> thanks!
>>
>> I'm uploading a 2nd revision of the patch that excludes the 
>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
> 
> Can you point me to the exact change please as I don't want to 
> re-examine it all. :)
> 
> I'll pull this in and do a test build run internally.
> 
> Thanks,
> David
> 
>> Thanks, Roman
>>
>>
>>> Hi Roman,
>>>
>>> The build changes must be reviewed on build-dev - now cc'd.
>>>
>>> Thanks,
>>> David
>>>
>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>> The JEP to remove the Shark compiler has received exclusively 
>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big 
>>>> patch to remove it.
>>>>
>>>> What I have done:
>>>>
>>>> grep -i -R shark src
>>>> grep -i -R shark make
>>>> grep -i -R shark doc
>>>> grep -i -R shark doc
>>>>
>>>> and purged any reference to shark. Almost everything was 
>>>> straightforward.
>>>>
>>>> The only things I wasn't really sure of:
>>>>
>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the 
>>>> gap that removing KIND_SHARK left. I hope that's good?
>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>> pd_address_in_code(), I am not sure it is the right thing to do. If 
>>>> not, what *would* be the right thing?
>>>>
>>>> Then of course I did:
>>>>
>>>> rm -rf src/hotspot/share/shark
>>>>
>>>> I also went through the build machinery and removed stuff related to 
>>>> Shark and LLVM libs.
>>>>
>>>> Now the only references in the whole JDK tree to shark is a 'Shark 
>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>
>>>> I tested by building a regular x86 JVM and running JTREG tests. All 
>>>> looks fine.
>>>>
>>>> - I could not build zero because it seems broken because of the 
>>>> recent Atomic::* changes
>>>> - I could not test any of the other arches that seemed to reference 
>>>> Shark (arm and sparc)
>>>>
>>>> Here's the full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>
>>>> Can I get a review on this?
>>>>
>>>> Thanks, Roman
>>>>
>>

From rkennke at redhat.com  Sun Oct 15 21:31:51 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 15 Oct 2017 23:31:51 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com>
Message-ID: <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com>

Hi David,

thanks for reviewing!

>
> One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these 
> includes would seem to be impossible:
>
> ? 38 #ifdef COMPILER1
> ? 39 #include "c1/c1_Runtime1.hpp"
> ? 40 #endif
> ? 41 #ifdef COMPILER2
> ? 42 #include "opto/runtime.hpp"
> ? 43 #endif
>
> no?

I have no idea. It is at least theoretically possible to have a platform 
with C1 and/or C2 support based on the Zero interpreter? I'm leaving 
that in for now as it was pre-existing and not related to Shark removal, ok?

>
> In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment 
> entirely as it's obviously C2:
>
> if (is_c2_compile(comp_level)) { // C2
>
> Ditto in src/hotspot/share/compiler/compileBroker.cpp
>
> !???? // C2
> ????? make_thread(name_buffer, _c2_compile_queue, counters, 
> _compilers[1], compiler_thread, CHECK);

Ok, right. For consistency, I also remove // C1 in ciEnv.cpp similarily 
obvious is_c1_compile() call :-)

New webrev:

http://cr.openjdk.java.net/~rkennke/8171853/webrev.02/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.02/>

Roman

From david.holmes at oracle.com  Sun Oct 15 21:33:44 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 07:33:44 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com>
 <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com>
Message-ID: <86c02492-ecf5-197b-7ca1-a411f68000c5@oracle.com>

On 16/10/2017 7:31 AM, Roman Kennke wrote:
> Hi David,
> 
> thanks for reviewing!
> 
>>
>> One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these 
>> includes would seem to be impossible:
>>
>> ? 38 #ifdef COMPILER1
>> ? 39 #include "c1/c1_Runtime1.hpp"
>> ? 40 #endif
>> ? 41 #ifdef COMPILER2
>> ? 42 #include "opto/runtime.hpp"
>> ? 43 #endif
>>
>> no?
> 
> I have no idea. It is at least theoretically possible to have a platform 
> with C1 and/or C2 support based on the Zero interpreter? I'm leaving 
> that in for now as it was pre-existing and not related to Shark removal, 
> ok?

Yep that's fine.

Thanks.

David

>>
>> In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment 
>> entirely as it's obviously C2:
>>
>> if (is_c2_compile(comp_level)) { // C2
>>
>> Ditto in src/hotspot/share/compiler/compileBroker.cpp
>>
>> !???? // C2
>> ????? make_thread(name_buffer, _c2_compile_queue, counters, 
>> _compilers[1], compiler_thread, CHECK);
> 
> Ok, right. For consistency, I also remove // C1 in ciEnv.cpp similarily 
> obvious is_c1_compile() call :-)
> 
> New webrev:
> 
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.02/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.02/>
> 
> Roman

From rkennke at redhat.com  Sun Oct 15 21:39:54 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 15 Oct 2017 23:39:54 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
Message-ID: <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com>

Am 15.10.2017 um 23:25 schrieb David Holmes:
> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>> Hi David,
>>
>> thanks!
>>
>> I'm uploading a 2nd revision of the patch that excludes the 
>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>
> Can you point me to the exact change please as I don't want to 
> re-examine it all. :)
Oops, sorry. The diff between 00 and 01 is this (apart from 
generated-configure.sh):

diff --git a/src/hotspot/share/utilities/vmError.cpp 
b/src/hotspot/share/utilities/vmError.cpp
--- a/src/hotspot/share/utilities/vmError.cpp
+++ b/src/hotspot/share/utilities/vmError.cpp
@@ -192,6 +192,7 @@
 ???? st->cr();

 ???? // Print the frames
+??? StackFrameStream sfs(jt);
 ???? for(int i = 0; !sfs.is_done(); sfs.next(), i++) {
 ?????? sfs.current()->zero_print_on_error(i, st, buf, buflen);
 ?????? st->cr();

I.e. I added back the sfs variable that I accidentally removed in webrev.00.


From david.holmes at oracle.com  Sun Oct 15 21:44:04 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 07:44:04 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com>
Message-ID: <aa5a5672-62d3-e938-512f-cc33503c97f7@oracle.com>

On 16/10/2017 7:39 AM, Roman Kennke wrote:
> Am 15.10.2017 um 23:25 schrieb David Holmes:
>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>> Hi David,
>>>
>>> thanks!
>>>
>>> I'm uploading a 2nd revision of the patch that excludes the 
>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>
>> Can you point me to the exact change please as I don't want to 
>> re-examine it all. :)
> Oops, sorry. The diff between 00 and 01 is this (apart from 
> generated-configure.sh):
> 
> diff --git a/src/hotspot/share/utilities/vmError.cpp 
> b/src/hotspot/share/utilities/vmError.cpp
> --- a/src/hotspot/share/utilities/vmError.cpp
> +++ b/src/hotspot/share/utilities/vmError.cpp
> @@ -192,6 +192,7 @@
>  ???? st->cr();
> 
>  ???? // Print the frames
> +??? StackFrameStream sfs(jt);
>  ???? for(int i = 0; !sfs.is_done(); sfs.next(), i++) {
>  ?????? sfs.current()->zero_print_on_error(i, st, buf, buflen);
>  ?????? st->cr();
> 
> I.e. I added back the sfs variable that I accidentally removed in 
> webrev.00.

Looks good!

David

From rkennke at redhat.com  Sun Oct 15 22:00:15 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 16 Oct 2017 00:00:15 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
Message-ID: <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>


Ok, I fixed all the comments you mentioned.

Differential (against webrev.01):
http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
Full webrev:
http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>

Roman

> Just spotted this:
>
> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>
> David
>
> On 16/10/2017 7:25 AM, David Holmes wrote:
>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>> Hi David,
>>>
>>> thanks!
>>>
>>> I'm uploading a 2nd revision of the patch that excludes the 
>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>
>> Can you point me to the exact change please as I don't want to 
>> re-examine it all. :)
>>
>> I'll pull this in and do a test build run internally.
>>
>> Thanks,
>> David
>>
>>> Thanks, Roman
>>>
>>>
>>>> Hi Roman,
>>>>
>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big 
>>>>> patch to remove it.
>>>>>
>>>>> What I have done:
>>>>>
>>>>> grep -i -R shark src
>>>>> grep -i -R shark make
>>>>> grep -i -R shark doc
>>>>> grep -i -R shark doc
>>>>>
>>>>> and purged any reference to shark. Almost everything was 
>>>>> straightforward.
>>>>>
>>>>> The only things I wasn't really sure of:
>>>>>
>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the 
>>>>> gap that removing KIND_SHARK left. I hope that's good?
>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>> pd_address_in_code(), I am not sure it is the right thing to do. 
>>>>> If not, what *would* be the right thing?
>>>>>
>>>>> Then of course I did:
>>>>>
>>>>> rm -rf src/hotspot/share/shark
>>>>>
>>>>> I also went through the build machinery and removed stuff related 
>>>>> to Shark and LLVM libs.
>>>>>
>>>>> Now the only references in the whole JDK tree to shark is a 'Shark 
>>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>
>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>> All looks fine.
>>>>>
>>>>> - I could not build zero because it seems broken because of the 
>>>>> recent Atomic::* changes
>>>>> - I could not test any of the other arches that seemed to 
>>>>> reference Shark (arm and sparc)
>>>>>
>>>>> Here's the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>
>>>>> Can I get a review on this?
>>>>>
>>>>> Thanks, Roman
>>>>>
>>>


From david.holmes at oracle.com  Sun Oct 15 22:08:52 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 08:08:52 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
Message-ID: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>

Looks good.

Thanks,
David

On 16/10/2017 8:00 AM, Roman Kennke wrote:
> 
> Ok, I fixed all the comments you mentioned.
> 
> Differential (against webrev.01):
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
> Full webrev:
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
> 
> Roman
> 
>> Just spotted this:
>>
>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>
>> David
>>
>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>> Hi David,
>>>>
>>>> thanks!
>>>>
>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>
>>> Can you point me to the exact change please as I don't want to 
>>> re-examine it all. :)
>>>
>>> I'll pull this in and do a test build run internally.
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks, Roman
>>>>
>>>>
>>>>> Hi Roman,
>>>>>
>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big 
>>>>>> patch to remove it.
>>>>>>
>>>>>> What I have done:
>>>>>>
>>>>>> grep -i -R shark src
>>>>>> grep -i -R shark make
>>>>>> grep -i -R shark doc
>>>>>> grep -i -R shark doc
>>>>>>
>>>>>> and purged any reference to shark. Almost everything was 
>>>>>> straightforward.
>>>>>>
>>>>>> The only things I wasn't really sure of:
>>>>>>
>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the 
>>>>>> gap that removing KIND_SHARK left. I hope that's good?
>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. 
>>>>>> If not, what *would* be the right thing?
>>>>>>
>>>>>> Then of course I did:
>>>>>>
>>>>>> rm -rf src/hotspot/share/shark
>>>>>>
>>>>>> I also went through the build machinery and removed stuff related 
>>>>>> to Shark and LLVM libs.
>>>>>>
>>>>>> Now the only references in the whole JDK tree to shark is a 'Shark 
>>>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>
>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>> All looks fine.
>>>>>>
>>>>>> - I could not build zero because it seems broken because of the 
>>>>>> recent Atomic::* changes
>>>>>> - I could not test any of the other arches that seemed to 
>>>>>> reference Shark (arm and sparc)
>>>>>>
>>>>>> Here's the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>
>>>>>> Can I get a review on this?
>>>>>>
>>>>>> Thanks, Roman
>>>>>>
>>>>
> 

From vladimir.kozlov at oracle.com  Sun Oct 15 22:14:53 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Sun, 15 Oct 2017 15:14:53 -0700
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>
Message-ID: <85b68a77-f418-c619-0a51-c7389d7c5a86@oracle.com>

+1

Thanks,
Vladimir

On 10/15/17 3:08 PM, David Holmes wrote:
> Looks good.
> 
> Thanks,
> David
> 
> On 16/10/2017 8:00 AM, Roman Kennke wrote:
>>
>> Ok, I fixed all the comments you mentioned.
>>
>> Differential (against webrev.01):
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>> Full webrev:
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>
>> Roman
>>
>>> Just spotted this:
>>>
>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** {@code CompLevel::CompLevel_full_optimization} -- C2 
>>> or Shark */
>>>
>>> David
>>>
>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>> Hi David,
>>>>>
>>>>> thanks!
>>>>>
>>>>> I'm uploading a 2nd revision of the patch that excludes the generated-configure.sh part, and adds a smallish 
>>>>> Zero-related fix.
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>
>>>> Can you point me to the exact change please as I don't want to re-examine it all. :)
>>>>
>>>> I'll pull this in and do a test build run internally.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks, Roman
>>>>>
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So 
>>>>>>> here comes the big patch to remove it.
>>>>>>>
>>>>>>> What I have done:
>>>>>>>
>>>>>>> grep -i -R shark src
>>>>>>> grep -i -R shark make
>>>>>>> grep -i -R shark doc
>>>>>>> grep -i -R shark doc
>>>>>>>
>>>>>>> and purged any reference to shark. Almost everything was straightforward.
>>>>>>>
>>>>>>> The only things I wasn't really sure of:
>>>>>>>
>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope 
>>>>>>> that's good?
>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing 
>>>>>>> to do. If not, what *would* be the right thing?
>>>>>>>
>>>>>>> Then of course I did:
>>>>>>>
>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>
>>>>>>> I also went through the build machinery and removed stuff related to Shark and LLVM libs.
>>>>>>>
>>>>>>> Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in 
>>>>>>> some tests ;-)
>>>>>>>
>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. All looks fine.
>>>>>>>
>>>>>>> - I could not build zero because it seems broken because of the recent Atomic::* changes
>>>>>>> - I could not test any of the other arches that seemed to reference Shark (arm and sparc)
>>>>>>>
>>>>>>> Here's the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>
>>>>>>> Can I get a review on this?
>>>>>>>
>>>>>>> Thanks, Roman
>>>>>>>
>>>>>
>>

From david.holmes at oracle.com  Mon Oct 16 00:31:55 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 10:31:55 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>
Message-ID: <331579a0-29de-f152-2dd4-66987896c463@oracle.com>

My internal JPRT run went fine. So this just needs a build team signoff 
from the perspective of the patch.

However, as this has had a JEP submitted for it, the code changes can 
not be pushed until the JEP has been targeted.

Thanks,
David

On 16/10/2017 8:08 AM, David Holmes wrote:
> Looks good.
> 
> Thanks,
> David
> 
> On 16/10/2017 8:00 AM, Roman Kennke wrote:
>>
>> Ok, I fixed all the comments you mentioned.
>>
>> Differential (against webrev.01):
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>> Full webrev:
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>
>> Roman
>>
>>> Just spotted this:
>>>
>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>>
>>> David
>>>
>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>> Hi David,
>>>>>
>>>>> thanks!
>>>>>
>>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>
>>>> Can you point me to the exact change please as I don't want to 
>>>> re-examine it all. :)
>>>>
>>>> I'll pull this in and do a test build run internally.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks, Roman
>>>>>
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>>> big patch to remove it.
>>>>>>>
>>>>>>> What I have done:
>>>>>>>
>>>>>>> grep -i -R shark src
>>>>>>> grep -i -R shark make
>>>>>>> grep -i -R shark doc
>>>>>>> grep -i -R shark doc
>>>>>>>
>>>>>>> and purged any reference to shark. Almost everything was 
>>>>>>> straightforward.
>>>>>>>
>>>>>>> The only things I wasn't really sure of:
>>>>>>>
>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. 
>>>>>>> If not, what *would* be the right thing?
>>>>>>>
>>>>>>> Then of course I did:
>>>>>>>
>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>
>>>>>>> I also went through the build machinery and removed stuff related 
>>>>>>> to Shark and LLVM libs.
>>>>>>>
>>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>>
>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>>> All looks fine.
>>>>>>>
>>>>>>> - I could not build zero because it seems broken because of the 
>>>>>>> recent Atomic::* changes
>>>>>>> - I could not test any of the other arches that seemed to 
>>>>>>> reference Shark (arm and sparc)
>>>>>>>
>>>>>>> Here's the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>
>>>>>>> Can I get a review on this?
>>>>>>>
>>>>>>> Thanks, Roman
>>>>>>>
>>>>>
>>

From rkennke at redhat.com  Mon Oct 16 05:49:26 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 16 Oct 2017 07:49:26 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <331579a0-29de-f152-2dd4-66987896c463@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>
 <331579a0-29de-f152-2dd4-66987896c463@oracle.com>
Message-ID: <a57fe2df-be5d-be92-a523-0932d2b3c1b3@redhat.com>


Hi David,

thanks for reviewing and testing!

The interaction between JEPs and patches going in is not really clear to 
me, nor is it well documented. For example, we're already pushing 
patches for JEP 304: Garbage Collection Interface, even though it's only 
in 'candidate' state...

In any case, I'll ping Mark Reinhold about moving the Shark JEP forward.

Thanks again,
Roman

> My internal JPRT run went fine. So this just needs a build team 
> signoff from the perspective of the patch.
>
> However, as this has had a JEP submitted for it, the code changes can 
> not be pushed until the JEP has been targeted.
>
> Thanks,
> David
>
> On 16/10/2017 8:08 AM, David Holmes wrote:
>> Looks good.
>>
>> Thanks,
>> David
>>
>> On 16/10/2017 8:00 AM, Roman Kennke wrote:
>>>
>>> Ok, I fixed all the comments you mentioned.
>>>
>>> Differential (against webrev.01):
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>>> Full webrev:
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>>
>>> Roman
>>>
>>>> Just spotted this:
>>>>
>>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>>>
>>>> David
>>>>
>>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>>
>>>>> Can you point me to the exact change please as I don't want to 
>>>>> re-examine it all. :)
>>>>>
>>>>> I'll pull this in and do a test build run internally.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks, Roman
>>>>>>
>>>>>>
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>>>> big patch to remove it.
>>>>>>>>
>>>>>>>> What I have done:
>>>>>>>>
>>>>>>>> grep -i -R shark src
>>>>>>>> grep -i -R shark make
>>>>>>>> grep -i -R shark doc
>>>>>>>> grep -i -R shark doc
>>>>>>>>
>>>>>>>> and purged any reference to shark. Almost everything was 
>>>>>>>> straightforward.
>>>>>>>>
>>>>>>>> The only things I wasn't really sure of:
>>>>>>>>
>>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>>>> pd_address_in_code(), I am not sure it is the right thing to 
>>>>>>>> do. If not, what *would* be the right thing?
>>>>>>>>
>>>>>>>> Then of course I did:
>>>>>>>>
>>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>>
>>>>>>>> I also went through the build machinery and removed stuff 
>>>>>>>> related to Shark and LLVM libs.
>>>>>>>>
>>>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>>>
>>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>>>> All looks fine.
>>>>>>>>
>>>>>>>> - I could not build zero because it seems broken because of the 
>>>>>>>> recent Atomic::* changes
>>>>>>>> - I could not test any of the other arches that seemed to 
>>>>>>>> reference Shark (arm and sparc)
>>>>>>>>
>>>>>>>> Here's the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>>
>>>>>>>> Can I get a review on this?
>>>>>>>>
>>>>>>>> Thanks, Roman
>>>>>>>>
>>>>>>
>>>


From david.holmes at oracle.com  Mon Oct 16 06:10:19 2017
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 16 Oct 2017 16:10:19 +1000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <a57fe2df-be5d-be92-a523-0932d2b3c1b3@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com>
 <331579a0-29de-f152-2dd4-66987896c463@oracle.com>
 <a57fe2df-be5d-be92-a523-0932d2b3c1b3@redhat.com>
Message-ID: <456436e4-955c-75f5-ac92-e2fd4a2fb280@oracle.com>

On 16/10/2017 3:49 PM, Roman Kennke wrote:
> 
> Hi David,
> 
> thanks for reviewing and testing!
> 
> The interaction between JEPs and patches going in is not really clear to 
> me, nor is it well documented. For example, we're already pushing 
> patches for JEP 304: Garbage Collection Interface, even though it's only 
> in 'candidate' state...

If patches can be separated out into generally useful cleanup or 
enabling changes then it can be okay to push them independently of the 
JEP AFAIK. That's obviously a little subjective. In this case though 
we're talking about the whole thing at once, so AFAIK the JEP has to be 
targeted before the changes can be pushed.

> In any case, I'll ping Mark Reinhold about moving the Shark JEP forward.

Thanks. Should be simple enough, I hope. :)

Cheers,
David

> Thanks again,
> Roman
> 
>> My internal JPRT run went fine. So this just needs a build team 
>> signoff from the perspective of the patch.
>>
>> However, as this has had a JEP submitted for it, the code changes can 
>> not be pushed until the JEP has been targeted.
>>
>> Thanks,
>> David
>>
>> On 16/10/2017 8:08 AM, David Holmes wrote:
>>> Looks good.
>>>
>>> Thanks,
>>> David
>>>
>>> On 16/10/2017 8:00 AM, Roman Kennke wrote:
>>>>
>>>> Ok, I fixed all the comments you mentioned.
>>>>
>>>> Differential (against webrev.01):
>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>>>> Full webrev:
>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>>>
>>>> Roman
>>>>
>>>>> Just spotted this:
>>>>>
>>>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>>>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>>>>
>>>>> David
>>>>>
>>>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> thanks!
>>>>>>>
>>>>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>>>
>>>>>> Can you point me to the exact change please as I don't want to 
>>>>>> re-examine it all. :)
>>>>>>
>>>>>> I'll pull this in and do a test build run internally.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks, Roman
>>>>>>>
>>>>>>>
>>>>>>>> Hi Roman,
>>>>>>>>
>>>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>>>>> big patch to remove it.
>>>>>>>>>
>>>>>>>>> What I have done:
>>>>>>>>>
>>>>>>>>> grep -i -R shark src
>>>>>>>>> grep -i -R shark make
>>>>>>>>> grep -i -R shark doc
>>>>>>>>> grep -i -R shark doc
>>>>>>>>>
>>>>>>>>> and purged any reference to shark. Almost everything was 
>>>>>>>>> straightforward.
>>>>>>>>>
>>>>>>>>> The only things I wasn't really sure of:
>>>>>>>>>
>>>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>>>>> pd_address_in_code(), I am not sure it is the right thing to 
>>>>>>>>> do. If not, what *would* be the right thing?
>>>>>>>>>
>>>>>>>>> Then of course I did:
>>>>>>>>>
>>>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>>>
>>>>>>>>> I also went through the build machinery and removed stuff 
>>>>>>>>> related to Shark and LLVM libs.
>>>>>>>>>
>>>>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>>>>
>>>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>>>>> All looks fine.
>>>>>>>>>
>>>>>>>>> - I could not build zero because it seems broken because of the 
>>>>>>>>> recent Atomic::* changes
>>>>>>>>> - I could not test any of the other arches that seemed to 
>>>>>>>>> reference Shark (arm and sparc)
>>>>>>>>>
>>>>>>>>> Here's the full webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>>>
>>>>>>>>> Can I get a review on this?
>>>>>>>>>
>>>>>>>>> Thanks, Roman
>>>>>>>>>
>>>>>>>
>>>>
> 

From erik.joelsson at oracle.com  Mon Oct 16 08:24:56 2017
From: erik.joelsson at oracle.com (Erik Joelsson)
Date: Mon, 16 Oct 2017 10:24:56 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
Message-ID: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com>

Hello Roman,

In hotspot.m4, I believe the check on line 328 (pre changes) is still 
relevant for just the zero case.

Otherwise build changes look good to me.

/Erik


On 2017-10-16 00:00, Roman Kennke wrote:
>
> Ok, I fixed all the comments you mentioned.
>
> Differential (against webrev.01):
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
> Full webrev:
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>
> Roman
>
>> Just spotted this:
>>
>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>
>> David
>>
>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>> Hi David,
>>>>
>>>> thanks!
>>>>
>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>
>>> Can you point me to the exact change please as I don't want to 
>>> re-examine it all. :)
>>>
>>> I'll pull this in and do a test build run internally.
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks, Roman
>>>>
>>>>
>>>>> Hi Roman,
>>>>>
>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>> big patch to remove it.
>>>>>>
>>>>>> What I have done:
>>>>>>
>>>>>> grep -i -R shark src
>>>>>> grep -i -R shark make
>>>>>> grep -i -R shark doc
>>>>>> grep -i -R shark doc
>>>>>>
>>>>>> and purged any reference to shark. Almost everything was 
>>>>>> straightforward.
>>>>>>
>>>>>> The only things I wasn't really sure of:
>>>>>>
>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. 
>>>>>> If not, what *would* be the right thing?
>>>>>>
>>>>>> Then of course I did:
>>>>>>
>>>>>> rm -rf src/hotspot/share/shark
>>>>>>
>>>>>> I also went through the build machinery and removed stuff related 
>>>>>> to Shark and LLVM libs.
>>>>>>
>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>
>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>> All looks fine.
>>>>>>
>>>>>> - I could not build zero because it seems broken because of the 
>>>>>> recent Atomic::* changes
>>>>>> - I could not test any of the other arches that seemed to 
>>>>>> reference Shark (arm and sparc)
>>>>>>
>>>>>> Here's the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>
>>>>>> Can I get a review on this?
>>>>>>
>>>>>> Thanks, Roman
>>>>>>
>>>>
>


From magnus.ihse.bursie at oracle.com  Mon Oct 16 09:25:59 2017
From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie)
Date: Mon, 16 Oct 2017 11:25:59 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com>
Message-ID: <a713f9f6-598b-1cdf-3802-5fea079ebbfc@oracle.com>

On 2017-10-16 10:24, Erik Joelsson wrote:
> Hello Roman,
>
> In hotspot.m4, I believe the check on line 328 (pre changes) is still 
> relevant for just the zero case.
Yes, it is indeed.

>
> Otherwise build changes look good to me.
Agree, looks good.

/Magnus
>
> /Erik
>
>
> On 2017-10-16 00:00, Roman Kennke wrote:
>>
>> Ok, I fixed all the comments you mentioned.
>>
>> Differential (against webrev.01):
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>> Full webrev:
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>
>> Roman
>>
>>> Just spotted this:
>>>
>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>>
>>> David
>>>
>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>> Hi David,
>>>>>
>>>>> thanks!
>>>>>
>>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>
>>>> Can you point me to the exact change please as I don't want to 
>>>> re-examine it all. :)
>>>>
>>>> I'll pull this in and do a test build run internally.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks, Roman
>>>>>
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>>> big patch to remove it.
>>>>>>>
>>>>>>> What I have done:
>>>>>>>
>>>>>>> grep -i -R shark src
>>>>>>> grep -i -R shark make
>>>>>>> grep -i -R shark doc
>>>>>>> grep -i -R shark doc
>>>>>>>
>>>>>>> and purged any reference to shark. Almost everything was 
>>>>>>> straightforward.
>>>>>>>
>>>>>>> The only things I wasn't really sure of:
>>>>>>>
>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. 
>>>>>>> If not, what *would* be the right thing?
>>>>>>>
>>>>>>> Then of course I did:
>>>>>>>
>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>
>>>>>>> I also went through the build machinery and removed stuff 
>>>>>>> related to Shark and LLVM libs.
>>>>>>>
>>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>>
>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>>> All looks fine.
>>>>>>>
>>>>>>> - I could not build zero because it seems broken because of the 
>>>>>>> recent Atomic::* changes
>>>>>>> - I could not test any of the other arches that seemed to 
>>>>>>> reference Shark (arm and sparc)
>>>>>>>
>>>>>>> Here's the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>
>>>>>>> Can I get a review on this?
>>>>>>>
>>>>>>> Thanks, Roman
>>>>>>>
>>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171016/e30bf768/attachment-0001.html>

From rkennke at redhat.com  Mon Oct 16 10:26:43 2017
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 16 Oct 2017 12:26:43 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com>
Message-ID: <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com>


Hi Erik,

You mean like this?

http://cr.openjdk.java.net/~rkennke/8171853/webrev.04.diff/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.04.diff/>

Full webrev here:
http://cr.openjdk.java.net/~rkennke/8171853/webrev.04/ 
<http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.04/>

Thanks,
Roman

> Hello Roman,
>
> In hotspot.m4, I believe the check on line 328 (pre changes) is still 
> relevant for just the zero case.
>
> Otherwise build changes look good to me.
>
> /Erik
>
>
> On 2017-10-16 00:00, Roman Kennke wrote:
>>
>> Ok, I fixed all the comments you mentioned.
>>
>> Differential (against webrev.01):
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>> Full webrev:
>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>
>> Roman
>>
>>> Just spotted this:
>>>
>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>>
>>> David
>>>
>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>> Hi David,
>>>>>
>>>>> thanks!
>>>>>
>>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>
>>>> Can you point me to the exact change please as I don't want to 
>>>> re-examine it all. :)
>>>>
>>>> I'll pull this in and do a test build run internally.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks, Roman
>>>>>
>>>>>
>>>>>> Hi Roman,
>>>>>>
>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>>> big patch to remove it.
>>>>>>>
>>>>>>> What I have done:
>>>>>>>
>>>>>>> grep -i -R shark src
>>>>>>> grep -i -R shark make
>>>>>>> grep -i -R shark doc
>>>>>>> grep -i -R shark doc
>>>>>>>
>>>>>>> and purged any reference to shark. Almost everything was 
>>>>>>> straightforward.
>>>>>>>
>>>>>>> The only things I wasn't really sure of:
>>>>>>>
>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. 
>>>>>>> If not, what *would* be the right thing?
>>>>>>>
>>>>>>> Then of course I did:
>>>>>>>
>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>
>>>>>>> I also went through the build machinery and removed stuff 
>>>>>>> related to Shark and LLVM libs.
>>>>>>>
>>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>>
>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>>> All looks fine.
>>>>>>>
>>>>>>> - I could not build zero because it seems broken because of the 
>>>>>>> recent Atomic::* changes
>>>>>>> - I could not test any of the other arches that seemed to 
>>>>>>> reference Shark (arm and sparc)
>>>>>>>
>>>>>>> Here's the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>
>>>>>>> Can I get a review on this?
>>>>>>>
>>>>>>> Thanks, Roman
>>>>>>>
>>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171016/837f47bd/attachment.html>

From erik.joelsson at oracle.com  Mon Oct 16 10:55:28 2017
From: erik.joelsson at oracle.com (Erik Joelsson)
Date: Mon, 16 Oct 2017 12:55:28 +0200
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
 <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com>
 <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com>
 <e883f832-2e33-c6a0-5966-040d8a90447e@oracle.com>
 <ec5ad498-ce5b-0aae-1825-a26af1717a32@oracle.com>
 <f0deb71e-df99-6e9b-e3fd-29a955a7fe02@redhat.com>
 <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com>
 <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com>
Message-ID: <eab94b87-6f23-24d0-b9f5-29749fd68c8b@oracle.com>

That looks correct. Thanks!

/Erik


On 2017-10-16 12:26, Roman Kennke wrote:
>
> Hi Erik,
>
> You mean like this?
>
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.04.diff/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.04.diff/>
>
> Full webrev here:
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.04/ 
> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.04/>
>
> Thanks,
> Roman
>
>> Hello Roman,
>>
>> In hotspot.m4, I believe the check on line 328 (pre changes) is still 
>> relevant for just the zero case.
>>
>> Otherwise build changes look good to me.
>>
>> /Erik
>>
>>
>> On 2017-10-16 00:00, Roman Kennke wrote:
>>>
>>> Ok, I fixed all the comments you mentioned.
>>>
>>> Differential (against webrev.01):
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03.diff/>
>>> Full webrev:
>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ 
>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.03/>
>>>
>>> Roman
>>>
>>>> Just spotted this:
>>>>
>>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** 
>>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */
>>>>
>>>> David
>>>>
>>>> On 16/10/2017 7:25 AM, David Holmes wrote:
>>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>> I'm uploading a 2nd revision of the patch that excludes the 
>>>>>> generated-configure.sh part, and adds a smallish Zero-related fix.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ 
>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.01/>
>>>>>
>>>>> Can you point me to the exact change please as I don't want to 
>>>>> re-examine it all. :)
>>>>>
>>>>> I'll pull this in and do a test build run internally.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks, Roman
>>>>>>
>>>>>>
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> The build changes must be reviewed on build-dev - now cc'd.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote:
>>>>>>>> The JEP to remove the Shark compiler has received exclusively 
>>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the 
>>>>>>>> big patch to remove it.
>>>>>>>>
>>>>>>>> What I have done:
>>>>>>>>
>>>>>>>> grep -i -R shark src
>>>>>>>> grep -i -R shark make
>>>>>>>> grep -i -R shark doc
>>>>>>>> grep -i -R shark doc
>>>>>>>>
>>>>>>>> and purged any reference to shark. Almost everything was 
>>>>>>>> straightforward.
>>>>>>>>
>>>>>>>> The only things I wasn't really sure of:
>>>>>>>>
>>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for 
>>>>>>>> the gap that removing KIND_SHARK left. I hope that's good?
>>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in 
>>>>>>>> pd_address_in_code(), I am not sure it is the right thing to 
>>>>>>>> do. If not, what *would* be the right thing?
>>>>>>>>
>>>>>>>> Then of course I did:
>>>>>>>>
>>>>>>>> rm -rf src/hotspot/share/shark
>>>>>>>>
>>>>>>>> I also went through the build machinery and removed stuff 
>>>>>>>> related to Shark and LLVM libs.
>>>>>>>>
>>>>>>>> Now the only references in the whole JDK tree to shark is a 
>>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
>>>>>>>>
>>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. 
>>>>>>>> All looks fine.
>>>>>>>>
>>>>>>>> - I could not build zero because it seems broken because of the 
>>>>>>>> recent Atomic::* changes
>>>>>>>> - I could not test any of the other arches that seemed to 
>>>>>>>> reference Shark (arm and sparc)
>>>>>>>>
>>>>>>>> Here's the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ 
>>>>>>>> <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
>>>>>>>>
>>>>>>>> Can I get a review on this?
>>>>>>>>
>>>>>>>> Thanks, Roman
>>>>>>>>
>>>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171016/39e075b5/attachment-0001.html>

From robbin.ehn at oracle.com  Mon Oct 16 15:46:07 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 16 Oct 2017 17:46:07 +0200
Subject: Low-Overhead Heap Profiling
In-Reply-To: <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <CAF9BGByisZchXTsw1=mLETLnOggJShRJBDxFeOUgt+criTTGuQ@mail.gmail.com>
 <1497366226.2829.109.camel@oracle.com>
 <CAF9BGBx8376y_yeMFnMC8PHSqKDaFc5NcbFerQ2up0YVKtSKGg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
 <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
Message-ID: <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>

Hi JC,

I saw a webrev.12 in the directory, with only test changes(11->12), so I took that version.
I had a look and tested the tests, worked fine!

First glance at the code (looking at full v12) some minor things below, mostly unused stuff.

Thanks, Robbin

diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
--- a/src/hotspot/share/runtime/heapMonitoring.cpp	Mon Oct 16 16:54:06 2017 +0200
+++ b/src/hotspot/share/runtime/heapMonitoring.cpp	Mon Oct 16 17:42:42 2017 +0200
@@ -211,2 +211,3 @@
    void initialize(int max_storage) {
+    // validate max_storage to sane value ? What would 0 mean ?
      MutexLocker mu(HeapMonitor_lock);
@@ -227,8 +228,4 @@
    bool initialized() { return _initialized; }
-  volatile bool *initialized_address() { return &_initialized; }

   private:
-  // Protects the traces currently sampled (below).
-  volatile intptr_t _stack_storage_lock[1];
-
    // The traces currently sampled.
@@ -313,3 +310,2 @@
    _initialized(false) {
-    _stack_storage_lock[0] = 0;
  }
@@ -532,13 +528,2 @@

-// Delegate the initialization question to the underlying storage system.
-bool HeapMonitoring::initialized() {
-  return StackTraceStorage::storage()->initialized();
-}
-
-// Delegate the initialization question to the underlying storage system.
-bool *HeapMonitoring::initialized_address() {
-  return
-      const_cast<bool*>(StackTraceStorage::storage()->initialized_address());
-}
-
  void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
--- a/src/hotspot/share/runtime/heapMonitoring.hpp	Mon Oct 16 16:54:06 2017 +0200
+++ b/src/hotspot/share/runtime/heapMonitoring.hpp	Mon Oct 16 17:42:42 2017 +0200
@@ -35,3 +35,2 @@
    static uint64_t _rnd;
-  static bool _initialized;
    static jint _monitoring_rate;
@@ -92,7 +91,2 @@

-  // Is the profiler initialized and where is the address to the initialized
-  // boolean.
-  static bool initialized();
-  static bool *initialized_address();
-
    // Called when o is to be sampled from a given thread and a given size.


On 10/10/2017 12:57 AM, JC Beyler wrote:
> Dear all,
> 
> Thread-safety is back!! Here is the update webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
> 
> Full webrev is here:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
> 
> In order to really test this, I needed to add this so thought now was a good time. It required a few changes here for the creation to ensure correctness and safety. Now we 
> keep the static pointer but clear the data internally so on re-initialize, it will be a bit more costly than before. I don't think this is a huge use-case so I did not 
> think it was a problem. I used the internal MutexLocker, I think I used it well, let me know.
> 
> I also added three tests:
> 
> 1) Stack depth test:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch
> 
> This test shows that the maximum stack depth system is working.
> 
> 2) Thread safety:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch
> 
> The test creates 24 threads and they all allocate at the same time. The test then checks it does find samples from all the threads.
> 
> 3) Thread on/off safety
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch
> 
> The test creates 24 threads that all allocate a bunch of memory. Then another thread turns the sampling on/off.
> 
> Btw, both tests 2 & 3 failed without the locks.
> 
> As I worked on this, I saw a lot of places where the tests are doing very similar things, I'm going to clean up the code a bit and make a HeapAllocator class that all tests 
> can call directly. This will greatly simplify the code.
> 
> Thanks for any comments/criticisms!
> Jc
> 
> 
> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
> 
>     Dear all,
> 
>     Small update to the webrev:
>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
> 
>     Full webrev is here:
>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
> 
>     I updated a bit of the naming, removed a TODO comment, and I added a test for testing the sampling rate. I also updated the maximum stack depth to 1024, there is no
>     reason to keep it so small. I did a micro benchmark that tests the overhead and it seems relatively the same.
> 
>     I compared allocations from a stack depth of 10 and allocations from a stack depth of 1024 (allocations are from the same helper method in
>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java>):
>      ? ? ? ? ? - For an array of 1 integer allocated in a loop; stack depth 1024 vs stack depth 10: 1% slower
>      ??????????- For an array of 200k integers allocated in a loop; stack depth 1024 vs stack depth 10: 3% slower
> 
>     So basically now moving the maximum stack depth to 1024 but we only copy over the stack depths actually used.
> 
>     For the next webrev, I will be adding a stack depth test to show that it works and probably put back the mutex locking so that we can see how difficult it is to keep
>     thread safe.
> 
>     Let me know what you think!
>     Jc
> 
> 
> 
>     On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
> 
>         Forgot to say that for my numbers:
>          ?- Not in the test are the actual numbers I got for the various array sizes, I ran the program 30 times and parsed the output; here are the averages and standard
>         deviation:
>          ? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation
>          ? ? ? 10000:? ? 1.59% average; 1.25% standard deviation
>          ? ? ? 100000:? ?1.26% average; 1.26% standard deviation
> 
>         The 1000/10000/100000 are the sizes of the arrays being allocated. These are allocated 100k times and the sampling rate is 111 times the size of the array.
> 
>         Thanks!
>         Jc
> 
> 
>         On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
> 
>             Hi all,
> 
>             After a bit of a break, I am back working on this :). As before, here are two webrevs:
> 
>             - Full change set: http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>
>             - Compared to version 8: http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>
>              ? ? (This version is compared to version 8 I last showed but ported to the new folder hierarchy)
> 
>             In this version I have:
>              ? - Handled Thomas' comments from his email of 07/03:
>              ? ? ? ?- Merged the logging to be standard
>              ? ? ? ?- Fixed up the code a bit where asked
>              ? ? ? ?- Added some notes about the code not being thread-safe yet
>              ? ?- Removed additional dead code from the version that modifies interpreter/c1/c2
>              ? ?- Fixed compiler issues so that it compiles with --disable-precompiled-header
>              ? ? ? ? - Tested with ./configure --with-boot-jdk=<jdk8> --with-debug-level=slowdebug --disable-precompiled-headers
> 
>             Additionally, I added a test to check the sanity of the sampler: HeapMonitorStatCorrectnessTest
>             (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>)
>              ? ?- This allocates a number of arrays and checks that we obtain the number of samples we want with an accepted error of 5%. I tested it 100 times and it
>             passed everytime, I can test more if wanted
>              ? ?- Not in the test are the actual numbers I got for the various array sizes, I ran the program 30 times and parsed the output; here are the averages and
>             standard deviation:
>              ? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation
>              ? ? ? 10000:? ? 1.59% average; 1.25% standard deviation
>              ? ? ? 100000:? ?1.26% average; 1.26% standard deviation
> 
>             What this means is that we were always at about 1~2% of the number of samples the test expected.
> 
>             Let me know what you think,
>             Jc
> 
>             On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
> 
>                 Hi all,
> 
>                 I apologize, I have not yet handled your remarks but thought this new webrev would also be useful to see and comment on perhaps.
> 
>                 Here is the latest webrev, it is generated slightly different than the others since now I'm using webrev.ksh without the -N option:
>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
> 
>                 And the webrev.07 to webrev.08 diff is here:
>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
> 
>                 (Let me know if it works well)
> 
>                 It's a small change between versions but it:
>                  ? - provides a fix that makes the average sample rate correct (more on that below).
>                  ? - fixes the code to actually have it play nicely with the fast tlab refill
>                  ? - cleaned up a bit the JVMTI text and now use jvmtiFrameInfo
>                 - moved the capability to be onload solo
> 
>                 With this webrev, I've done a small study of the random number generator we use here for the sampling rate. I took a small program and it can be simplified to:
> 
>                 for (outer loop)
>                 for (inner loop)
>                 int[] tmp = new int[arraySize];
> 
>                 - I've fixed the outer and inner loops to being 800 for this experiment, meaning we allocate 640000 times an array of a given array size.
> 
>                 - Each program provides the average sample size used for the whole execution
> 
>                 - Then, I ran each variation 30 times and then calculated the average of the average sample size used for various array sizes. I selected the array size to
>                 be one of the following: 1, 10, 100, 1000.
> 
>                 - When compared to 512kb, the average sample size of 30 runs:
>                 1: 4.62% of error
>                 10: 3.09% of error
>                 100: 0.36% of error
>                 1000: 0.1% of error
>                 10000: 0.03% of error
> 
>                 What it shows is that, depending on the number of samples, the average does become better. This is because with an allocation of 1 element per array, it
>                 will take longer to hit one of the thresholds. This is seen by looking at the sample count statistic I put in. For the same number of iterations (800 *
>                 800), the different array sizes provoke:
>                 1: 62 samples
>                 10: 125 samples
>                 100: 788 samples
>                 1000: 6166 samples
>                 10000: 57721 samples
> 
>                 And of course, the more samples you have, the more sample rates you pick, which means that your average gets closer using that math.
> 
>                 Thanks,
>                 Jc
> 
>                 On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
> 
>                     Thanks Robbin,
> 
>                     This seems to have worked. When I have the next webrev ready, we will find out but I'm fairly confident it will work!
> 
>                     Thanks agian!
>                     Jc
> 
>                     On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>> wrote:
> 
>                         Hi JC,
> 
>                         On 06/29/2017 12:15 AM, JC Beyler wrote:
> 
>                             B) Incremental changes
> 
> 
>                         I guess the most common work flow here is using mq :
>                         hg qnew fix_v1
>                         edit files
>                         hg qrefresh
>                         hg qnew fix_v2
>                         edit files
>                         hg qrefresh
> 
>                         if you do hg log you will see 2 commits
> 
>                         webrev.ksh -r -2 -o my_inc_v1_v2
>                         webrev.ksh -o my_full_v2
> 
> 
>                         In? your .hgrc you might need:
>                         [extensions]
>                         mq =
> 
>                         /Robbin
> 
> 
>                             Again another newbiew question here...
> 
>                             For showing the incremental changes, is there a link that explains how to do that? I apologize for my newbie questions all the time :)
> 
>                             Right now, I do:
> 
>                              ? ksh ../webrev.ksh -m -N
> 
>                             That generates a webrev.zip and send it to Chuck Rasbold. He then uploads it to a new webrev.
> 
>                             I tried commiting my change and adding a small change. Then if I just do ksh ../webrev.ksh without any options, it seems to produce a similar
>                             page but now with only the changes I had (so the 06-07 comparison you were talking about) and a changeset that has it all. I imagine that is
>                             what you meant.
> 
>                             Which means that my workflow would become:
> 
>                             1) Make changes
>                             2) Make a webrev without any options to show just the differences with the tip
>                             3) Amend my changes to my local commit so that I have it done with
>                             4) Go to 1
> 
>                             Does that seem correct to you?
> 
>                             Note that when I do this, I only see the full change of a file in the full change set (Side note here: now the page says change set and not
>                             patch, which is maybe why Serguei was having issues?).
> 
>                             Thanks!
>                             Jc
> 
> 
> 
>                             On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com> <mailto:robbin.ehn at oracle.com
>                             <mailto:robbin.ehn at oracle.com>>> wrote:
> 
>                              ? ? Hi,
> 
>                              ? ? On 06/28/2017 12:04 AM, JC Beyler wrote:
> 
>                              ? ? ? ? Dear Thomas et al,
> 
>                              ? ? ? ? Here is the newest webrev:
>                             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>
> 
> 
> 
>                              ? ? You have some more bits to in there but generally this looks good and really nice with more tests.
>                              ? ? I'll do and deep dive and re-test this when I get back from my long vacation with whatever patch version you have then.
> 
>                              ? ? Also I think it's time you provide incremental (v06->07 changes) as well as complete change-sets.
> 
>                              ? ? Thanks, Robbin
> 
> 
> 
> 
>                              ? ? ? ? Thomas, I "think" I have answered all your remarks. The summary is:
> 
>                              ? ? ? ? - The statistic system is up and provides insight on what the heap sampler is doing
>                              ? ? ? ? ? ? ?- I've noticed that, though the sampling rate is at the right mean, we are missing some samples, I have not yet tracked out why
>                             (details below)
> 
>                              ? ? ? ? - I've run a tiny benchmark that is the worse case: it is a very tight loop and allocated a small array
>                              ? ? ? ? ? ? ?- In this case, I see no overhead when the system is off so that is a good start :)
>                              ? ? ? ? ? ? ?- I see right now a high overhead in this case when sampling is on. This is not a really too surprising but I'm going to see if
>                             this is consistent with our
>                              ? ? ? ? internal implementation. The benchmark is really allocation stressful so I'm not too surprised but I want to do the due diligence.
> 
>                              ? ? ? ? ? ?- The statistic system up is up and I have a new test
>                             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>
>                              ? ? ? ? <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>>
>                              ? ? ? ? ? ? ? - I did a bit of a study about the random generator here, more details are below but basically it seems to work well
> 
>                              ? ? ? ? ? ?- I added a capability but since this is the first time doing this, I was not sure I did it right
>                              ? ? ? ? ? ? ?- I did add a test though for it and the test seems to do what I expect (all methods are failing with the
>                             JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>                              ? ? ? ? ? ? ? ? ?-
>                             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>                                     
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
> 
>                              ? ? ? ? ? ?- I still need to figure out what to do about the multi-agent vs single-agent issue
> 
>                              ? ? ? ? ? ?- As far as measurements, it seems I still need to look at:
>                              ? ? ? ? ? ? ?- Why we do the 20 random calls first, are they necessary?
>                              ? ? ? ? ? ? ?- Look at the mean of the sampling rate that the random generator does and also what is actually sampled
>                              ? ? ? ? ? ? ?- What is the overhead in terms of memory/performance when on?
> 
>                              ? ? ? ? I have inlined my answers, I think I got them all in the new webrev, let me know your thoughts.
> 
>                              ? ? ? ? Thanks again!
>                              ? ? ? ? Jc
> 
> 
>                              ? ? ? ? On Fri, Jun 23, 2017 at 3:52 AM, Thomas Schatzl <thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>
>                             <mailto:thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>> <mailto:thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>
> 
>                              ? ? ? ? <mailto:thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>>>> wrote:
> 
>                              ? ? ? ? ? ? ?Hi,
> 
>                              ? ? ? ? ? ? ?On Wed, 2017-06-21 at 13:45 -0700, JC Beyler wrote:
>                              ? ? ? ? ? ? ?> Hi all,
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> First off: Thanks again to Robbin and Thomas for their reviews :)
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> Next, I've uploaded a new webrev:
>                              ? ? ? ? ? ? ?> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>                              ? ? ? ? <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>
> 
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> Here is an update:
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> - @Robbin, I forgot to say that yes I need to look at implementing
>                              ? ? ? ? ? ? ?> this for the other architectures and testing it before it is all
>                              ? ? ? ? ? ? ?> ready to go. Is it common to have it working on all possible
>                              ? ? ? ? ? ? ?> combinations or is there a subset that I should be doing first and we
>                              ? ? ? ? ? ? ?> can do the others later?
>                              ? ? ? ? ? ? ?> - I've tested slowdebug, built and ran the JTreg tests I wrote with
>                              ? ? ? ? ? ? ?> slowdebug and fixed a few more issues
>                              ? ? ? ? ? ? ?> - I've refactored a bit of the code following Thomas' comments
>                              ? ? ? ? ? ? ?>? ? - I think I've handled all the comments from Thomas (I put
>                              ? ? ? ? ? ? ?> comments inline below for the specifics)
> 
>                              ? ? ? ? ? ? ?Thanks for handling all those.
> 
>                              ? ? ? ? ? ? ?> - Following Thomas' comments on statistics, I want to add some
>                              ? ? ? ? ? ? ?> quality assurance tests and find that the easiest way would be to
>                              ? ? ? ? ? ? ?> have a few counters of what is happening in the sampler and expose
>                              ? ? ? ? ? ? ?> that to the user.
>                              ? ? ? ? ? ? ?>? ? - I'll be adding that in the next version if no one sees any
>                              ? ? ? ? ? ? ?> objections to that.
>                              ? ? ? ? ? ? ?>? ? - This will allow me to add a sanity test in JTreg about number of
>                              ? ? ? ? ? ? ?> samples and average of sampling rate
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> @Thomas: I had a few questions that I inlined below but I will
>                              ? ? ? ? ? ? ?> summarize the "bigger ones" here:
>                              ? ? ? ? ? ? ?>? ? - You mentioned constants are not using the right conventions, I
>                              ? ? ? ? ? ? ?> looked around and didn't see any convention except normal naming then
>                              ? ? ? ? ? ? ?> for static constants. Is that right?
> 
>                              ? ? ? ? ? ? ?I looked through https://wiki.openjdk.java.net/display/HotSpot/StyleGui <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>                             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>                              ? ? ? ? <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>                             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>>
>                              ? ? ? ? ? ? ?de and the rule is to "follow an existing pattern and must have a
>                              ? ? ? ? ? ? ?distinct appearance from other names". Which does not help a lot I
>                              ? ? ? ? ? ? ?guess :/ The GC team started using upper camel case, e.g.
>                              ? ? ? ? ? ? ?SomeOtherConstant, but very likely this is probably not applied
>                              ? ? ? ? ? ? ?consistently throughout. So I am fine with not adding another style
>                              ? ? ? ? ? ? ?(like kMaxStackDepth with the "k" in front with some unknown meaning)
>                              ? ? ? ? ? ? ?is fine.
> 
>                              ? ? ? ? ? ? ?(Chances are you will find that style somewhere used anyway too,
>                              ? ? ? ? ? ? ?apologies if so :/)
> 
> 
>                              ? ? ? ? Thanks for that link, now I know where to look. I used the upper camel case in my code as well then :) I should have gotten them all.
> 
> 
>                              ? ? ? ? ? ? ? > PS: I've also inlined my answers to Thomas below:
>                              ? ? ? ? ? ? ? >
>                              ? ? ? ? ? ? ? > On Tue, Jun 13, 2017 at 8:03 AM, Thomas Schatzl <thomas.schatzl at oracl
>                              ? ? ? ? ? ? ? > e.com <http://e.com> <http://e.com> <http://e.com>> wrote:
>                              ? ? ? ? ? ? ? > > Hi all,
>                              ? ? ? ? ? ? ? > >
>                              ? ? ? ? ? ? ? > > On Mon, 2017-06-12 at 11:11 -0700, JC Beyler wrote:
>                              ? ? ? ? ? ? ? > > > Dear all,
>                              ? ? ? ? ? ? ? > > >
>                              ? ? ? ? ? ? ? > > > I've continued working on this and have done the following
>                              ? ? ? ? ? ? ? > > webrev:
>                              ? ? ? ? ? ? ? > > > http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>                              ? ? ? ? <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>                             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>>
> 
>                              ? ? ? ? ? ? ? > >
>                              ? ? ? ? ? ? ? > > [...]
>                              ? ? ? ? ? ? ? > > > Things I still need to do:
>                              ? ? ? ? ? ? ? > > >? ? - Have to fix that TLAB case for the FastTLABRefill
>                              ? ? ? ? ? ? ? > > >? ? - Have to start looking at the data to see that it is
>                              ? ? ? ? ? ? ? > > consistent and does gather the right samples, right frequency, etc.
>                              ? ? ? ? ? ? ? > > >? ? - Have to check the GC elements and what that produces
>                              ? ? ? ? ? ? ? > > >? ? - Run a slowdebug run and ensure I fixed all those issues you
>                              ? ? ? ? ? ? ? > > saw > Robbin
>                              ? ? ? ? ? ? ? > > >
>                              ? ? ? ? ? ? ? > > > Thanks for looking at the webrev and have a great week!
>                              ? ? ? ? ? ? ? > >
>                              ? ? ? ? ? ? ? > >? ?scratching a bit on the surface of this change, so apologies for
>                              ? ? ? ? ? ? ? > > rather shallow comments:
>                              ? ? ? ? ? ? ? > >
>                              ? ? ? ? ? ? ? > > - macroAssembler_x86.cpp:5604: while this is compiler code, and I
>                              ? ? ? ? ? ? ? > > am not sure this is final, please avoid littering the code with
>                              ? ? ? ? ? ? ? > > TODO remarks :) They tend to be candidates for later wtf moments
>                              ? ? ? ? ? ? ? > > only.
>                              ? ? ? ? ? ? ? > >
>                              ? ? ? ? ? ? ? > > Just file a CR for that.
>                              ? ? ? ? ? ? ? > >
>                              ? ? ? ? ? ? ? > Newcomer question: what is a CR and not sure I have the rights to do
>                              ? ? ? ? ? ? ? > that yet ? :)
> 
>                              ? ? ? ? ? ? ?Apologies. CR is a change request, this suggests to file a bug in the
>                              ? ? ? ? ? ? ?bug tracker. And you are right, you can't just create a new account in
>                              ? ? ? ? ? ? ?the OpenJDK JIRA yourselves. :(
> 
> 
>                              ? ? ? ? Ok good to know, I'll continue with my own todo list but I'll work hard on not letting it slip in the webrevs anymore :)
> 
> 
>                              ? ? ? ? ? ? ?I was mostly referring to the "... but it is a TODO" part of that
>                              ? ? ? ? ? ? ?comment in macroassembler_x86.cpp. Comments about the why of the code
>                              ? ? ? ? ? ? ?are appreciated.
> 
>                              ? ? ? ? ? ? ?[Note that I now understand that this is to some degree still work in
>                              ? ? ? ? ? ? ?progress. As long as the final changeset does no contain TODO's I am
>                              ? ? ? ? ? ? ?fine (and it's not a hard objection, rather their use in "final" code
>                              ? ? ? ? ? ? ?is typically limited in my experience)]
> 
>                              ? ? ? ? ? ? ?5603? ?// Currently, if this happens, just set back the actual end to
>                              ? ? ? ? ? ? ?where it was.
>                              ? ? ? ? ? ? ?5604? ?// We miss a chance to sample here.
> 
>                              ? ? ? ? ? ? ?Would be okay, if explaining "this" and the "why" of missing a chance
>                              ? ? ? ? ? ? ?to sample here would be best.
> 
>                              ? ? ? ? ? ? ?Like maybe:
> 
>                              ? ? ? ? ? ? ?// If we needed to refill TLABs, just set the actual end point to
>                              ? ? ? ? ? ? ?// the end of the TLAB again. We do not sample here although we could.
> 
>                              ? ? ? ? Done with your comment, it works well in my mind.
> 
>                              ? ? ? ? ? ? ?I am not sure whether "miss a chance to sample" meant "we could, but
>                              ? ? ? ? ? ? ?consciously don't because it's not that useful" or "it would be
>                              ? ? ? ? ? ? ?necessary but don't because it's too complicated to do.".
> 
>                              ? ? ? ? ? ? ?Looking at the original comment once more, I am also not sure if that
>                              ? ? ? ? ? ? ?comment shouldn't referring to the "end" variable (not actual_end)
>                              ? ? ? ? ? ? ?because that's the variable that is responsible for taking the sampling
>                              ? ? ? ? ? ? ?path? (Going from the member description of ThreadLocalAllocBuffer).
> 
> 
>                              ? ? ? ? I've moved this code and it no longer shows up here but the rationale and answer was:
> 
>                              ? ? ? ? So.. Yes, end is the variable provoking the sampling. Actual end is the actual end of the TLAB.
> 
>                              ? ? ? ? What was happening here is that the code is resetting _end to point towards the end of the new TLAB. Because, we now have the end for
>                             sampling and _actual_end for
>                              ? ? ? ? the actual end, we need to update the actual_end as well.
> 
>                              ? ? ? ? Normally, were we to do the real work here, we would calculate the (end - start) offset, then do:
> 
>                              ? ? ? ? - Set the new end to : start + (old_end - old_start)
>                              ? ? ? ? - Set the actual end like we do here now where it because it is the actual end.
> 
>                              ? ? ? ? Why is this not done here now anymore?
>                              ? ? ? ? ? ? - I was still debating which path to take:
>                              ? ? ? ? ? ? ? ?- Do it in the fast refill code, it has its perks:
>                              ? ? ? ? ? ? ? ? ? ?- In a world where fast refills are happening all the time or a lot, we can augment there the code to do the sampling
>                              ? ? ? ? ? ? ? ?- Remember what we had as an end before leaving the slowpath and check on return
>                              ? ? ? ? ? ? ? ? ? ?- This is what I'm doing now, it removes the need to go fix up all fast refill paths but if you remain in fast refill paths,
>                             you won't get sampling. I
>                              ? ? ? ? have to think of the consequences of that, maybe a future change later on?
>                              ? ? ? ? ? ? ? ? ? ? ? - I have the statistics now so I'm going to study that
>                              ? ? ? ? ? ? ? ? ? ? ? ? ?-> By the way, though my statistics are showing I'm missing some samples, if I turn off FastTlabRefill, it is the same
>                             loss so for now, it seems
>                              ? ? ? ? this does not occur in my simple test.
> 
> 
> 
>                              ? ? ? ? ? ? ?But maybe I am only confused and it's best to just leave the comment
>                              ? ? ? ? ? ? ?away. :)
> 
>                              ? ? ? ? ? ? ?Thinking about it some more, doesn't this not-sampling in this case
>                              ? ? ? ? ? ? ?mean that sampling does not work in any collector that does inline TLAB
>                              ? ? ? ? ? ? ?allocation at the moment? (Or is inline TLAB alloc automatically
>                              ? ? ? ? ? ? ?disabled with sampling somehow?)
> 
>                              ? ? ? ? ? ? ?That would indeed be a bigger TODO then :)
> 
> 
>                              ? ? ? ? Agreed, this remark made me think that perhaps as a first step the new way of doing it is better but I did have to:
>                              ? ? ? ? ? ?- Remove the const of the ThreadLocalBuffer remaining and hard_end methods
>                              ? ? ? ? ? ?- Move hard_end out of the header file to have a bit more logic there
> 
>                              ? ? ? ? Please let me know what you think of that and if you prefer it this way or changing the fast refills. (I prefer this way now because it
>                             is more incremental).
> 
> 
>                              ? ? ? ? ? ? ?> > - calling HeapMonitoring::do_weak_oops() (which should probably be
>                              ? ? ? ? ? ? ?> > called weak_oops_do() like other similar methods) only if string
>                              ? ? ? ? ? ? ?> > deduplication is enabled (in g1CollectedHeap.cpp:4511) seems wrong.
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> The call should be at least around 6 lines up outside the if.
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> Preferentially in a method like process_weak_jni_handles(), including
>                              ? ? ? ? ? ? ?> additional logging. (No new (G1) gc phase without minimal logging
>                              ? ? ? ? ? ? ?> :)).
>                              ? ? ? ? ? ? ?> Done but really not sure because:
>                              ? ? ? ? ? ? ?>
>                              ? ? ? ? ? ? ?> I put for logging:
>                              ? ? ? ? ? ? ?>? ?log_develop_trace(gc, freelist)("G1ConcRegionFreeing [other] : heap
>                              ? ? ? ? ? ? ?> monitoring");
> 
>                              ? ? ? ? ? ? ?I would think that "gc, ref" would be more appropriate log tags for
>                              ? ? ? ? ? ? ?this similar to jni handles.
>                              ? ? ? ? ? ? ?(I am als not sure what weak reference handling has to do with
>                              ? ? ? ? ? ? ?G1ConcRegionFreeing, so I am a bit puzzled)
> 
> 
>                              ? ? ? ? I was not sure what to put for the tags or really as the message. I cleaned it up a bit now to:
>                              ? ? ? ? ? ? ?log_develop_trace(gc, ref)("HeapSampling [other] : heap monitoring processing");
> 
> 
> 
>                              ? ? ? ? ? ? ?> Since weak_jni_handles didn't have logging for me to be inspired
>                              ? ? ? ? ? ? ?> from, I did that but unconvinced this is what should be done.
> 
>                              ? ? ? ? ? ? ?The JNI handle processing does have logging, but only in
>                              ? ? ? ? ? ? ?ReferenceProcessor::process_discovered_references(). In
>                              ? ? ? ? ? ? ?process_weak_jni_handles() only overall time is measured (in a G1
>                              ? ? ? ? ? ? ?specific way, since only G1 supports disabling reference procesing) :/
> 
>                              ? ? ? ? ? ? ?The code in ReferenceProcessor prints both time taken
>                              ? ? ? ? ? ? ?referenceProcessor.cpp:254, as well as the count, but strangely only in
>                              ? ? ? ? ? ? ?debug VMs.
> 
>                              ? ? ? ? ? ? ?I have no idea why this logging is that unimportant to only print that
>                              ? ? ? ? ? ? ?in a debug VM. However there are reviews out for changing this area a
>                              ? ? ? ? ? ? ?bit, so it might be useful to wait for that (JDK-8173335).
> 
> 
>                              ? ? ? ? I cleaned it up a bit anyway and now it returns the count of objects that are in the system.
> 
> 
>                              ? ? ? ? ? ? ?> > - the change doubles the size of
>                              ? ? ? ? ? ? ?> > CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>                              ? ? ? ? ? ? ?> > threshold. Maybe it could be refactored a bit.
>                              ? ? ? ? ? ? ?> Done I think, it looks better to me :).
> 
>                              ? ? ? ? ? ? ?In ThreadLocalAllocBuffer::handle_sample() I think the
>                              ? ? ? ? ? ? ?set_back_actual_end()/pick_next_sample() calls could be hoisted out of
>                              ? ? ? ? ? ? ?the "if" :)
> 
> 
>                              ? ? ? ? Done!
> 
> 
>                              ? ? ? ? ? ? ?> > - referenceProcessor.cpp:261: the change should add logging about
>                              ? ? ? ? ? ? ?> > the number of references encountered, maybe after the corresponding
>                              ? ? ? ? ? ? ?> > "JNI weak reference count" log message.
>                              ? ? ? ? ? ? ?> Just to double check, are you saying that you'd like to have the heap
>                              ? ? ? ? ? ? ?> sampler to keep in store how many sampled objects were encountered in
>                              ? ? ? ? ? ? ?> the HeapMonitoring::weak_oops_do?
>                              ? ? ? ? ? ? ?>? ? - Would a return of the method with the number of handled
>                              ? ? ? ? ? ? ?> references and logging that work?
> 
>                              ? ? ? ? ? ? ?Yes, it's fine if HeapMonitoring::weak_oops_do() only returned the
>                              ? ? ? ? ? ? ?number of processed weak oops.
> 
> 
>                              ? ? ? ? Done also (but I admit I have not tested the output yet) :)
> 
> 
>                              ? ? ? ? ? ? ?>? ? - Additionally, would you prefer it in a separate block with its
>                              ? ? ? ? ? ? ?> GCTraceTime?
> 
>                              ? ? ? ? ? ? ?Yes. Both kinds of information is interesting: while the time taken is
>                              ? ? ? ? ? ? ?typically more important, the next question would be why, and the
>                              ? ? ? ? ? ? ?number of references typically goes a long way there.
> 
>                              ? ? ? ? ? ? ?See above though, it is probably best to wait a bit.
> 
> 
>                              ? ? ? ? Agreed that I "could" wait but, if it's ok, I'll just refactor/remove this when we get closer to something final. Either, JDK-8173335
>                              ? ? ? ? has gone in and I will notice it now or it will soon and I can change it then.
> 
> 
>                              ? ? ? ? ? ? ?> > - threadLocalAllocBuffer.cpp:331: one more "TODO"
>                              ? ? ? ? ? ? ?> Removed it and added it to my personal todos to look at.
>                              ? ? ? ? ? ? ?>? ? ? > >
>                              ? ? ? ? ? ? ?> > - threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>                              ? ? ? ? ? ? ?> > documentation should be updated about the sampling additions. I
>                              ? ? ? ? ? ? ?> > would have no clue what the difference between "actual_end" and
>                              ? ? ? ? ? ? ?> > "end" would be from the given information.
>                              ? ? ? ? ? ? ?> If you are talking about the comments in this file, I made them more
>                              ? ? ? ? ? ? ?> clear I hope in the new webrev. If it was somewhere else, let me know
>                              ? ? ? ? ? ? ?> where to change.
> 
>                              ? ? ? ? ? ? ?Thanks, that's much better. Maybe a note in the comment of the class
>                              ? ? ? ? ? ? ?that ThreadLocalBuffer provides some sampling facility by modifying the
>                              ? ? ? ? ? ? ?end() of the TLAB to cause "frequent" calls into the runtime call where
>                              ? ? ? ? ? ? ?actual sampling takes place.
> 
> 
>                              ? ? ? ? Done, I think it's better now. Added something about the slow_path_end as well.
> 
> 
>                              ? ? ? ? ? ? ?> > - in heapMonitoring.hpp: there are some random comments about some
>                              ? ? ? ? ? ? ?> > code that has been grabbed from "util/math/fastmath.[h|cc]". I
>                              ? ? ? ? ? ? ?> > can't tell whether this is code that can be used but I assume that
>                              ? ? ? ? ? ? ?> > Noam Shazeer is okay with that (i.e. that's all Google code).
>                              ? ? ? ? ? ? ?> Jeremy and I double checked and we can release that as I thought. I
>                              ? ? ? ? ? ? ?> removed the comment from that piece of code entirely.
> 
>                              ? ? ? ? ? ? ?Thanks.
> 
>                              ? ? ? ? ? ? ?> > - heapMonitoring.hpp/cpp static constant naming does not correspond
>                              ? ? ? ? ? ? ?> > to Hotspot's. Additionally, in Hotspot static methods are cased
>                              ? ? ? ? ? ? ?> > like other methods.
>                              ? ? ? ? ? ? ?> I think I fixed the methods to be cased the same way as all other
>                              ? ? ? ? ? ? ?> methods. For static constants, I was not sure. I fixed a few other
>                              ? ? ? ? ? ? ?> variables but I could not seem to really see a consistent trend for
>                              ? ? ? ? ? ? ?> constants. I made them as variables but I'm not sure now.
> 
>                              ? ? ? ? ? ? ?Sorry again, style is a kind of mess. The goal of my suggestions here
>                              ? ? ? ? ? ? ?is only to prevent yet another style creeping in.
> 
>                              ? ? ? ? ? ? ?> > - in heapMonitoring.cpp there are a few cryptic comments at the top
>                              ? ? ? ? ? ? ?> > that seem to refer to internal stuff that should probably be
>                              ? ? ? ? ? ? ?> > removed.
>                              ? ? ? ? ? ? ?> Sorry about that! My personal todos not cleared out.
> 
>                              ? ? ? ? ? ? ?I am happy about comments, but I simply did not understand any of that
>                              ? ? ? ? ? ? ?and I do not know about other readers as well.
> 
>                              ? ? ? ? ? ? ?If you think you will remember removing/updating them until the review
>                              ? ? ? ? ? ? ?proper (I misunderstood the review situation a little it seems).
> 
>                              ? ? ? ? ? ? ?> > I did not think through the impact of the TLAB changes on collector
>                              ? ? ? ? ? ? ?> > behavior yet (if there are). Also I did not check for problems with
>                              ? ? ? ? ? ? ?> > concurrent mark and SATB/G1 (if there are).
>                              ? ? ? ? ? ? ?> I would love to know your thoughts on this, I think this is fine. I
> 
>                              ? ? ? ? ? ? ?I think so too now. No objects are made live out of thin air :)
> 
>                              ? ? ? ? ? ? ?> see issues with multiple threads right now hitting the stack storage
>                              ? ? ? ? ? ? ?> instance. Previous webrevs had a mutex lock here but we took it out
>                              ? ? ? ? ? ? ?> for simplificity (and only for now).
> 
>                              ? ? ? ? ? ? ?:) When looking at this after some thinking I now assume for this
>                              ? ? ? ? ? ? ?review that this code is not MT safe at all. There seems to be more
>                              ? ? ? ? ? ? ?synchronization missing than just the one for the StackTraceStorage. So
>                              ? ? ? ? ? ? ?no comments about this here.
> 
> 
>                              ? ? ? ? I doubled checked a bit (quickly I admit) but it seems that synchronization in StackTraceStorage is really all you need (all methods
>                             lead to a StackTraceStorage one
>                              ? ? ? ? and can be multithreaded outside of that).
>                              ? ? ? ? There is a question about the initialization where the method HeapMonitoring::initialize_profiling is not thread safe.
>                              ? ? ? ? It would work (famous last words) and not crash if there was a race but we could add a synchronization point there as well (and
>                             therefore on the stop as well).
> 
>                              ? ? ? ? But anyway I will really check and do this once we add back synchronization.
> 
> 
>                              ? ? ? ? ? ? ?Also, this would require some kind of specification of what is allowed
>                              ? ? ? ? ? ? ?to be called when and where.
> 
> 
>                              ? ? ? ? Would we specify this with the methods in the jvmti.xml file? We could start by specifying in each that they are not thread safe but I
>                             saw no mention of that for
>                              ? ? ? ? other methods.
> 
> 
>                              ? ? ? ? ? ? ?One potentially relevant observation about locking here: depending on
>                              ? ? ? ? ? ? ?sampling frequency, StackTraceStore::add_trace() may be rather
>                              ? ? ? ? ? ? ?frequently called. I assume that you are going to do measurements :)
> 
> 
>                              ? ? ? ? Though we don't have the TLAB implementation in our code, the compiler generated sampler uses 2% of overhead with a 512k sampling rate.
>                             I can do real measurements
>                              ? ? ? ? when the code settles and we can see how costly this is as a TLAB implementation.
>                              ? ? ? ? However, my theory is that if the rate is 512k, the memory/performance overhead should be minimal since it is what we saw with our
>                             code/workloads (though not called
>                              ? ? ? ? the same way, we call it essentially at the same rate).
>                              ? ? ? ? If you have a benchmark you'd like me to test, let me know!
> 
>                              ? ? ? ? Right now, with my really small test, this does use a bit of overhead even for a 512k sample size. I don't know yet why, I'm going to
>                             see what is going on.
> 
>                              ? ? ? ? Finally, I think it is not reasonable to suppose the overhead to be negligible if the sampling rate used is too low. The user should
>                             know that the lower the rate,
>                              ? ? ? ? the higher the overhead (documentation TODO?).
> 
> 
>                              ? ? ? ? ? ? ?I am not sure what the expected usage of the API is, but
>                              ? ? ? ? ? ? ?StackTraceStore::add_trace() seems to be able to grow without bounds.
>                              ? ? ? ? ? ? ?Only a GC truncates them to the live ones. That in itself seems to be
>                              ? ? ? ? ? ? ?problematic (GCs can be *wide* apart), and of course some of the API
>                              ? ? ? ? ? ? ?methods add to that because they duplicate that unbounded array. Do you
>                              ? ? ? ? ? ? ?have any concerns/measurements about this?
> 
> 
>                              ? ? ? ? So, the theory is that yes add_trace can be able to grow without bounds but it grows at a sample per 512k of allocated space. The
>                             stacks it gathers are currently
>                              ? ? ? ? maxed at 64 (I'd like to expand that to an option to the user though at some point). So I have no concerns because:
> 
>                              ? ? ? ? - If really this is taking a lot of space, that means the job is keeping a lot of objects in memory as well, therefore the entire heap
>                             is getting huge
>                              ? ? ? ? - If this is the case, you will be triggering a GC at some point anyway.
> 
>                              ? ? ? ? (I'm putting under the rug the issue of "What if we set the rate to 1 for example" because as you lower the sampling rate, we cannot
>                             guarantee low overhead; the
>                              ? ? ? ? idea behind this feature is to have a means of having meaningful allocated samples at a low overhead)
> 
>                              ? ? ? ? I have no measurements really right now but since I now have some statistics I can poll, I will look a bit more at this question.
> 
>                              ? ? ? ? I have the same last sentence than above: the user should expect this to happen if the sampling rate is too small. That probably can be
>                             reflected in the
>                              ? ? ? ? StartHeapSampling as a note : careful this might impact your performance.
> 
> 
>                              ? ? ? ? ? ? ?Also, these stack traces might hold on to huge arrays. Any
>                              ? ? ? ? ? ? ?consideration of that? Particularly it might be the cause for OOMEs in
>                              ? ? ? ? ? ? ?tight memory situations.
> 
> 
>                              ? ? ? ? There is a stack size maximum that is set to 64 so it should not hold huge arrays. I don't think this is an issue but I can double
>                             check with a test or two.
> 
> 
>                              ? ? ? ? ? ? ?- please consider adding a safepoint check in
>                              ? ? ? ? ? ? ?HeapMonitoring::weak_oops_do to prevent accidental misuse.
> 
>                              ? ? ? ? ? ? ?- in struct StackTraceStorage, the public fields may also need
>                              ? ? ? ? ? ? ?underscores. At least some files in the runtime directory have structs
>                              ? ? ? ? ? ? ?with underscored public members (and some don't). The runtime team
>                              ? ? ? ? ? ? ?should probably comment on that.
> 
> 
>                              ? ? ? ? Agreed I did not know. I looked around and a lot of structs did not have them it seemed so I left it as is. I will happily change it if
>                             someone prefers (I was not
>                              ? ? ? ? sure if you really preferred or not, your sentence seemed to be more a note of "this might need to change but I don't know if the
>                             runtime team enforces that", let
>                              ? ? ? ? me know if I read that wrongly).
> 
> 
>                              ? ? ? ? ? ? ?- In StackTraceStorage::weak_oops_do(), when examining the
>                              ? ? ? ? ? ? ?StackTraceData, maybe it is useful to consider having a non-NULL
>                              ? ? ? ? ? ? ?reference outside of the heap's reserved space an error. There should
>                              ? ? ? ? ? ? ?be no oop outside of the heap's reserved space ever.
> 
>                              ? ? ? ? ? ? ?Unless you allow storing random values in StackTraceData::obj, which I
>                              ? ? ? ? ? ? ?would not encourage.
> 
> 
>                              ? ? ? ? I suppose you are talking about this part:
>                              ? ? ? ? if ((value != NULL && Universe::heap()->is_in_reserved(value)) &&
>                              ? ? ? ? ? ? ? ? ? ? (is_alive == NULL || is_alive->do_object_b(value))) {
> 
>                              ? ? ? ? What you are saying is that I could have something like:
>                              ? ? ? ? if (value != my_non_null_reference &&
>                              ? ? ? ? ? ? ? ? ? ? (is_alive == NULL || is_alive->do_object_b(value))) {
> 
>                              ? ? ? ? Is that what you meant? Is there really a reason to do so? When I look at the code, is_in_reserved seems like a O(1) method call. I'm
>                             not even sure we can have a
>                              ? ? ? ? NULL value to be honest. I might have to study that to see if this was not a paranoid test to begin with.
> 
>                              ? ? ? ? The is_alive code has now morphed due to the comment below.
> 
> 
> 
>                              ? ? ? ? ? ? ?- HeapMonitoring::weak_oops_do() does not seem to use the
>                              ? ? ? ? ? ? ?passed AbstractRefProcTaskExecutor.
> 
> 
>                              ? ? ? ? It did use it:
>                              ? ? ? ? ? ?size_t HeapMonitoring::weak_oops_do(
>                              ? ? ? ? ? ? ? AbstractRefProcTaskExecutor *task_executor,
>                              ? ? ? ? ? ? ? BoolObjectClosure* is_alive,
>                              ? ? ? ? ? ? ? OopClosure *f,
>                              ? ? ? ? ? ? ? VoidClosure *complete_gc) {
>                              ? ? ? ? ? ? assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint");
> 
>                              ? ? ? ? ? ? if (task_executor != NULL) {
>                              ? ? ? ? ? ? ? task_executor->set_single_threaded_mode();
>                              ? ? ? ? ? ? }
>                              ? ? ? ? ? ? return StackTraceStorage::storage()->weak_oops_do(is_alive, f, complete_gc);
>                              ? ? ? ? }
> 
>                              ? ? ? ? But due to the comment below, I refactored this, so this is no longer here. Now I have an always true closure that is passed.
> 
> 
>                              ? ? ? ? ? ? ?- I do not understand allowing to call this method with a NULL
>                              ? ? ? ? ? ? ?complete_gc closure. This would mean that objects referenced from the
>                              ? ? ? ? ? ? ?object that is referenced by the StackTraceData are not pulled, meaning
>                              ? ? ? ? ? ? ?they would get stale.
> 
>                              ? ? ? ? ? ? ?- same with is_alive parameter value of NULL
> 
> 
>                              ? ? ? ? So these questions made me look a bit closer at this code. This code I think was written this way to have a very small impact on the
>                             file but you are right, there
>                              ? ? ? ? is no reason for this here. I've simplified the code by making in referenceProcessor.cpp a process_HeapSampling method that handles
>                             everything there.
> 
>                              ? ? ? ? The code allowed NULLs because it depended on where you were coming from and how the code was being called.
> 
>                              ? ? ? ? - I added a static always_true variable and pass that now to be more consistent with the rest of the code.
>                              ? ? ? ? - I moved the complete_gc into process_phaseHeapSampling now (new method) and handle the task_executor and the complete_gc there
>                              ? ? ? ? ? ? ?- Newbie question: in our code we did a set_single_threaded_mode but I see that process_phaseJNI does it right before its call, do
>                             I need to do it for the
>                              ? ? ? ? process_phaseHeapSample?
>                              ? ? ? ? That API is much cleaner (in my mind) and is consistent with what is done around it (again in my mind).
> 
> 
>                              ? ? ? ? ? ? ?- heapMonitoring.cpp:590: I do not completely understand the purpose of
>                              ? ? ? ? ? ? ?this code: in the end this results in a fixed value directly dependent
>                              ? ? ? ? ? ? ?on the Thread address anyway? In the end this results in a fixed value
>                              ? ? ? ? ? ? ?directly dependent on the Thread address anyway?
>                              ? ? ? ? ? ? ?IOW, what is special about exactly 20 rounds?
> 
> 
>                              ? ? ? ? So we really want a fast random number generator that has a specific mean (512k is the default we use). The code uses the thread
>                             address as the start number of the
>                              ? ? ? ? sequence (why not, it is random enough is rationale). Then instead of just starting there, we prime the sequence and really only start
>                             at the 21st number, it is
>                              ? ? ? ? arbitrary and I have not done a study to see if we could do more or less of that.
> 
>                              ? ? ? ? As I have the statistics of the system up and running, I'll run some experiments to see if this is needed, is 20 good, or not.
> 
> 
>                              ? ? ? ? ? ? ?- also I would consider stripping a few bits of the threads' address as
>                              ? ? ? ? ? ? ?initialization value for your rng. The last three bits (and probably
>                              ? ? ? ? ? ? ?more, check whether the Thread object is allocated on special
>                              ? ? ? ? ? ? ?boundaries) are always zero for them.
>                              ? ? ? ? ? ? ?Not sure if the given "random" value is random enough before/after,
>                              ? ? ? ? ? ? ?this method, so just skip that comment if you think this is not
>                              ? ? ? ? ? ? ?required.
> 
> 
>                              ? ? ? ? I don't know is the honest answer. I think what is important is that we tend towards a mean and it is random "enough" to not fall in
>                             pitfalls of only sampling a
>                              ? ? ? ? subset of objects due to their allocation order. I added that as test to do to see if it changes the mean in any way for the 512k
>                             default value and/or if the first
>                              ? ? ? ? 1000 elements look better.
> 
> 
>                              ? ? ? ? ? ? ?Some more random nits I did not find a place to put anywhere:
> 
>                              ? ? ? ? ? ? ?- ThreadLocalAllocBuffer::_extra_space does not seem to be used
>                              ? ? ? ? ? ? ?anywhere?
> 
> 
>                              ? ? ? ? Good catch :).
> 
> 
>                              ? ? ? ? ? ? ?- Maybe indent the declaration of ThreadLocalAllocBuffer::_bytes_until_sample to align below the other members of that group.
> 
> 
>                              ? ? ? ? Done moved it up a bit to have non static members together and static separate.
> 
>                              ? ? ? ? ? ? ?Thanks,
>                              ? ? ? ? ? ? ? ? Thomas
> 
> 
>                              ? ? ? ? Thanks for your review!
>                              ? ? ? ? Jc
> 
> 
> 
> 
> 
> 
> 
> 

From jcbeyler at google.com  Mon Oct 16 16:34:15 2017
From: jcbeyler at google.com (JC Beyler)
Date: Mon, 16 Oct 2017 09:34:15 -0700
Subject: Low-Overhead Heap Profiling
In-Reply-To: <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <CAF9BGByisZchXTsw1=mLETLnOggJShRJBDxFeOUgt+criTTGuQ@mail.gmail.com>
 <1497366226.2829.109.camel@oracle.com>
 <CAF9BGBx8376y_yeMFnMC8PHSqKDaFc5NcbFerQ2up0YVKtSKGg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
 <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
 <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>
Message-ID: <CAF9BGBzgdHoaW2XsDmmvbmfPbDRKP9uwX+VLcMg37rXYxSnC8w@mail.gmail.com>

Hi Robbin,

That is because version 11 to 12 was only a test change. I was going to
write about it and say here are the webrev links:
Incremental:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/

Full webrev:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/

This change focused only on refactoring the tests to be more manageable,
readable, maintainable. As all tests are looking at allocations, I moved
common code to a java class:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch

And then most tests call into that class to turn on/off the sampling,
allocate, etc. This has removed almost 500 lines of test code so I'm happy
about that.

Thanks for your changes, a bit of relics of previous versions :). I've
already integrated them into my code and will make a new webrev end of this
week with a bit of refactor of the code handling the tlab slow path. I find
it could use a bit of refactoring to make it easier to follow so I'm going
to take a stab at it this week.

Any other issues/comments?

Thanks!
Jc


On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:

> Hi JC,
>
> I saw a webrev.12 in the directory, with only test changes(11->12), so I
> took that version.
> I had a look and tested the tests, worked fine!
>
> First glance at the code (looking at full v12) some minor things below,
> mostly unused stuff.
>
> Thanks, Robbin
>
> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
> --- a/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
> 16:54:06 2017 +0200
> +++ b/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
> 17:42:42 2017 +0200
> @@ -211,2 +211,3 @@
>    void initialize(int max_storage) {
> +    // validate max_storage to sane value ? What would 0 mean ?
>      MutexLocker mu(HeapMonitor_lock);
> @@ -227,8 +228,4 @@
>    bool initialized() { return _initialized; }
> -  volatile bool *initialized_address() { return &_initialized; }
>
>   private:
> -  // Protects the traces currently sampled (below).
> -  volatile intptr_t _stack_storage_lock[1];
> -
>    // The traces currently sampled.
> @@ -313,3 +310,2 @@
>    _initialized(false) {
> -    _stack_storage_lock[0] = 0;
>  }
> @@ -532,13 +528,2 @@
>
> -// Delegate the initialization question to the underlying storage system.
> -bool HeapMonitoring::initialized() {
> -  return StackTraceStorage::storage()->initialized();
> -}
> -
> -// Delegate the initialization question to the underlying storage system.
> -bool *HeapMonitoring::initialized_address() {
> -  return
> -      const_cast<bool*>(StackTraceStorage::storage()->initialized_
> address());
> -}
> -
>  void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
> --- a/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
> 16:54:06 2017 +0200
> +++ b/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
> 17:42:42 2017 +0200
> @@ -35,3 +35,2 @@
>    static uint64_t _rnd;
> -  static bool _initialized;
>    static jint _monitoring_rate;
> @@ -92,7 +91,2 @@
>
> -  // Is the profiler initialized and where is the address to the
> initialized
> -  // boolean.
> -  static bool initialized();
> -  static bool *initialized_address();
> -
>    // Called when o is to be sampled from a given thread and a given size.
>
>
>
> On 10/10/2017 12:57 AM, JC Beyler wrote:
>
>> Dear all,
>>
>> Thread-safety is back!! Here is the update webrev:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
>>
>> Full webrev is here:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
>>
>> In order to really test this, I needed to add this so thought now was a
>> good time. It required a few changes here for the creation to ensure
>> correctness and safety. Now we keep the static pointer but clear the data
>> internally so on re-initialize, it will be a bit more costly than before. I
>> don't think this is a huge use-case so I did not think it was a problem. I
>> used the internal MutexLocker, I think I used it well, let me know.
>>
>> I also added three tests:
>>
>> 1) Stack depth test:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorStackDepthTest.java.patch
>>
>> This test shows that the maximum stack depth system is working.
>>
>> 2) Thread safety:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorThreadTest.java.patch
>>
>> The test creates 24 threads and they all allocate at the same time. The
>> test then checks it does find samples from all the threads.
>>
>> 3) Thread on/off safety
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorThreadOnOffTest.java.patch
>>
>> The test creates 24 threads that all allocate a bunch of memory. Then
>> another thread turns the sampling on/off.
>>
>> Btw, both tests 2 & 3 failed without the locks.
>>
>> As I worked on this, I saw a lot of places where the tests are doing very
>> similar things, I'm going to clean up the code a bit and make a
>> HeapAllocator class that all tests can call directly. This will greatly
>> simplify the code.
>>
>> Thanks for any comments/criticisms!
>> Jc
>>
>>
>> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbeyler at google.com <mailto:
>> jcbeyler at google.com>> wrote:
>>
>>     Dear all,
>>
>>     Small update to the webrev:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
>>
>>     Full webrev is here:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
>>
>>     I updated a bit of the naming, removed a TODO comment, and I added a
>> test for testing the sampling rate. I also updated the maximum stack depth
>> to 1024, there is no
>>     reason to keep it so small. I did a micro benchmark that tests the
>> overhead and it seems relatively the same.
>>
>>     I compared allocations from a stack depth of 10 and allocations from
>> a stack depth of 1024 (allocations are from the same helper method in
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi
>> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/
>> MyPackage/HeapMonitorStatRateTest.java
>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_f
>> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor
>> /MyPackage/HeapMonitorStatRateTest.java>):
>>                - For an array of 1 integer allocated in a loop; stack
>> depth 1024 vs stack depth 10: 1% slower
>>                - For an array of 200k integers allocated in a loop; stack
>> depth 1024 vs stack depth 10: 3% slower
>>
>>     So basically now moving the maximum stack depth to 1024 but we only
>> copy over the stack depths actually used.
>>
>>     For the next webrev, I will be adding a stack depth test to show that
>> it works and probably put back the mutex locking so that we can see how
>> difficult it is to keep
>>     thread safe.
>>
>>     Let me know what you think!
>>     Jc
>>
>>
>>
>>     On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com
>> <mailto:jcbeyler at google.com>> wrote:
>>
>>         Forgot to say that for my numbers:
>>           - Not in the test are the actual numbers I got for the various
>> array sizes, I ran the program 30 times and parsed the output; here are the
>> averages and standard
>>         deviation:
>>                1000:     1.28% average; 1.13% standard deviation
>>                10000:    1.59% average; 1.25% standard deviation
>>                100000:   1.26% average; 1.26% standard deviation
>>
>>         The 1000/10000/100000 are the sizes of the arrays being
>> allocated. These are allocated 100k times and the sampling rate is 111
>> times the size of the array.
>>
>>         Thanks!
>>         Jc
>>
>>
>>         On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbeyler at google.com
>> <mailto:jcbeyler at google.com>> wrote:
>>
>>             Hi all,
>>
>>             After a bit of a break, I am back working on this :). As
>> before, here are two webrevs:
>>
>>             - Full change set: http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.09/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.09/>
>>             - Compared to version 8: http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.08_09/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.08_09/>
>>                  (This version is compared to version 8 I last showed but
>> ported to the new folder hierarchy)
>>
>>             In this version I have:
>>                - Handled Thomas' comments from his email of 07/03:
>>                     - Merged the logging to be standard
>>                     - Fixed up the code a bit where asked
>>                     - Added some notes about the code not being
>> thread-safe yet
>>                 - Removed additional dead code from the version that
>> modifies interpreter/c1/c2
>>                 - Fixed compiler issues so that it compiles with
>> --disable-precompiled-header
>>                      - Tested with ./configure --with-boot-jdk=<jdk8>
>> --with-debug-level=slowdebug --disable-precompiled-headers
>>
>>             Additionally, I added a test to check the sanity of the
>> sampler: HeapMonitorStatCorrectnessTest
>>             (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorStatCorrectnessTest.java.patch <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>)
>>                 - This allocates a number of arrays and checks that we
>> obtain the number of samples we want with an accepted error of 5%. I tested
>> it 100 times and it
>>             passed everytime, I can test more if wanted
>>                 - Not in the test are the actual numbers I got for the
>> various array sizes, I ran the program 30 times and parsed the output; here
>> are the averages and
>>             standard deviation:
>>                    1000:     1.28% average; 1.13% standard deviation
>>                    10000:    1.59% average; 1.25% standard deviation
>>                    100000:   1.26% average; 1.26% standard deviation
>>
>>             What this means is that we were always at about 1~2% of the
>> number of samples the test expected.
>>
>>             Let me know what you think,
>>             Jc
>>
>>             On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>
>>                 Hi all,
>>
>>                 I apologize, I have not yet handled your remarks but
>> thought this new webrev would also be useful to see and comment on perhaps.
>>
>>                 Here is the latest webrev, it is generated slightly
>> different than the others since now I'm using webrev.ksh without the -N
>> option:
>>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
>>
>>                 And the webrev.07 to webrev.08 diff is here:
>>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>> <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
>>
>>                 (Let me know if it works well)
>>
>>                 It's a small change between versions but it:
>>                    - provides a fix that makes the average sample rate
>> correct (more on that below).
>>                    - fixes the code to actually have it play nicely with
>> the fast tlab refill
>>                    - cleaned up a bit the JVMTI text and now use
>> jvmtiFrameInfo
>>                 - moved the capability to be onload solo
>>
>>                 With this webrev, I've done a small study of the random
>> number generator we use here for the sampling rate. I took a small program
>> and it can be simplified to:
>>
>>                 for (outer loop)
>>                 for (inner loop)
>>                 int[] tmp = new int[arraySize];
>>
>>                 - I've fixed the outer and inner loops to being 800 for
>> this experiment, meaning we allocate 640000 times an array of a given array
>> size.
>>
>>                 - Each program provides the average sample size used for
>> the whole execution
>>
>>                 - Then, I ran each variation 30 times and then calculated
>> the average of the average sample size used for various array sizes. I
>> selected the array size to
>>                 be one of the following: 1, 10, 100, 1000.
>>
>>                 - When compared to 512kb, the average sample size of 30
>> runs:
>>                 1: 4.62% of error
>>                 10: 3.09% of error
>>                 100: 0.36% of error
>>                 1000: 0.1% of error
>>                 10000: 0.03% of error
>>
>>                 What it shows is that, depending on the number of
>> samples, the average does become better. This is because with an allocation
>> of 1 element per array, it
>>                 will take longer to hit one of the thresholds. This is
>> seen by looking at the sample count statistic I put in. For the same number
>> of iterations (800 *
>>                 800), the different array sizes provoke:
>>                 1: 62 samples
>>                 10: 125 samples
>>                 100: 788 samples
>>                 1000: 6166 samples
>>                 10000: 57721 samples
>>
>>                 And of course, the more samples you have, the more sample
>> rates you pick, which means that your average gets closer using that math.
>>
>>                 Thanks,
>>                 Jc
>>
>>                 On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <
>> jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>
>>                     Thanks Robbin,
>>
>>                     This seems to have worked. When I have the next
>> webrev ready, we will find out but I'm fairly confident it will work!
>>
>>                     Thanks agian!
>>                     Jc
>>
>>                     On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <
>> robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>> wrote:
>>
>>                         Hi JC,
>>
>>                         On 06/29/2017 12:15 AM, JC Beyler wrote:
>>
>>                             B) Incremental changes
>>
>>
>>                         I guess the most common work flow here is using
>> mq :
>>                         hg qnew fix_v1
>>                         edit files
>>                         hg qrefresh
>>                         hg qnew fix_v2
>>                         edit files
>>                         hg qrefresh
>>
>>                         if you do hg log you will see 2 commits
>>
>>                         webrev.ksh -r -2 -o my_inc_v1_v2
>>                         webrev.ksh -o my_full_v2
>>
>>
>>                         In  your .hgrc you might need:
>>                         [extensions]
>>                         mq =
>>
>>                         /Robbin
>>
>>
>>                             Again another newbiew question here...
>>
>>                             For showing the incremental changes, is there
>> a link that explains how to do that? I apologize for my newbie questions
>> all the time :)
>>
>>                             Right now, I do:
>>
>>                                ksh ../webrev.ksh -m -N
>>
>>                             That generates a webrev.zip and send it to
>> Chuck Rasbold. He then uploads it to a new webrev.
>>
>>                             I tried commiting my change and adding a
>> small change. Then if I just do ksh ../webrev.ksh without any options, it
>> seems to produce a similar
>>                             page but now with only the changes I had (so
>> the 06-07 comparison you were talking about) and a changeset that has it
>> all. I imagine that is
>>                             what you meant.
>>
>>                             Which means that my workflow would become:
>>
>>                             1) Make changes
>>                             2) Make a webrev without any options to show
>> just the differences with the tip
>>                             3) Amend my changes to my local commit so
>> that I have it done with
>>                             4) Go to 1
>>
>>                             Does that seem correct to you?
>>
>>                             Note that when I do this, I only see the full
>> change of a file in the full change set (Side note here: now the page says
>> change set and not
>>                             patch, which is maybe why Serguei was having
>> issues?).
>>
>>                             Thanks!
>>                             Jc
>>
>>
>>
>>                             On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <
>> robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com> <mailto:
>> robbin.ehn at oracle.com
>>                             <mailto:robbin.ehn at oracle.com>>> wrote:
>>
>>                                  Hi,
>>
>>                                  On 06/28/2017 12:04 AM, JC Beyler wrote:
>>
>>                                      Dear Thomas et al,
>>
>>                                      Here is the newest webrev:
>>                             http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/>>
>>
>>
>>
>>                                  You have some more bits to in there but
>> generally this looks good and really nice with more tests.
>>                                  I'll do and deep dive and re-test this
>> when I get back from my long vacation with whatever patch version you have
>> then.
>>
>>                                  Also I think it's time you provide
>> incremental (v06->07 changes) as well as complete change-sets.
>>
>>                                  Thanks, Robbin
>>
>>
>>
>>
>>                                      Thomas, I "think" I have answered
>> all your remarks. The summary is:
>>
>>                                      - The statistic system is up and
>> provides insight on what the heap sampler is doing
>>                                           - I've noticed that, though the
>> sampling rate is at the right mean, we are missing some samples, I have not
>> yet tracked out why
>>                             (details below)
>>
>>                                      - I've run a tiny benchmark that is
>> the worse case: it is a very tight loop and allocated a small array
>>                                           - In this case, I see no
>> overhead when the system is off so that is a good start :)
>>                                           - I see right now a high
>> overhead in this case when sampling is on. This is not a really too
>> surprising but I'm going to see if
>>                             this is consistent with our
>>                                      internal implementation. The
>> benchmark is really allocation stressful so I'm not too surprised but I
>> want to do the due diligence.
>>
>>                                         - The statistic system up is up
>> and I have a new test
>>                             http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch>
>>                                      <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorStatTest.java.patch>>
>>                                            - I did a bit of a study about
>> the random generator here, more details are below but basically it seems to
>> work well
>>
>>                                         - I added a capability but since
>> this is the first time doing this, I was not sure I did it right
>>                                           - I did add a test though for
>> it and the test seems to do what I expect (all methods are failing with the
>>                             JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>                                               -
>>                             http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>>                                                                 <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
>>
>>                                         - I still need to figure out what
>> to do about the multi-agent vs single-agent issue
>>
>>                                         - As far as measurements, it
>> seems I still need to look at:
>>                                           - Why we do the 20 random calls
>> first, are they necessary?
>>                                           - Look at the mean of the
>> sampling rate that the random generator does and also what is actually
>> sampled
>>                                           - What is the overhead in terms
>> of memory/performance when on?
>>
>>                                      I have inlined my answers, I think I
>> got them all in the new webrev, let me know your thoughts.
>>
>>                                      Thanks again!
>>                                      Jc
>>
>>
>>                                      On Fri, Jun 23, 2017 at 3:52 AM,
>> Thomas Schatzl <thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.
>> com>
>>                             <mailto:thomas.schatzl at oracle.com <mailto:
>> thomas.schatzl at oracle.com>> <mailto:thomas.schatzl at oracle.com <mailto:
>> thomas.schatzl at oracle.com>
>>
>>                                      <mailto:thomas.schatzl at oracle.com
>> <mailto:thomas.schatzl at oracle.com>>>> wrote:
>>
>>                                           Hi,
>>
>>                                           On Wed, 2017-06-21 at 13:45
>> -0700, JC Beyler wrote:
>>                                           > Hi all,
>>                                           >
>>                                           > First off: Thanks again to
>> Robbin and Thomas for their reviews :)
>>                                           >
>>                                           > Next, I've uploaded a new
>> webrev:
>>                                           >
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/>>
>>                                      <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.06/>>>
>>
>>                                           >
>>                                           > Here is an update:
>>                                           >
>>                                           > - @Robbin, I forgot to say
>> that yes I need to look at implementing
>>                                           > this for the other
>> architectures and testing it before it is all
>>                                           > ready to go. Is it common to
>> have it working on all possible
>>                                           > combinations or is there a
>> subset that I should be doing first and we
>>                                           > can do the others later?
>>                                           > - I've tested slowdebug,
>> built and ran the JTreg tests I wrote with
>>                                           > slowdebug and fixed a few
>> more issues
>>                                           > - I've refactored a bit of
>> the code following Thomas' comments
>>                                           >    - I think I've handled all
>> the comments from Thomas (I put
>>                                           > comments inline below for the
>> specifics)
>>
>>                                           Thanks for handling all those.
>>
>>                                           > - Following Thomas' comments
>> on statistics, I want to add some
>>                                           > quality assurance tests and
>> find that the easiest way would be to
>>                                           > have a few counters of what
>> is happening in the sampler and expose
>>                                           > that to the user.
>>                                           >    - I'll be adding that in
>> the next version if no one sees any
>>                                           > objections to that.
>>                                           >    - This will allow me to
>> add a sanity test in JTreg about number of
>>                                           > samples and average of
>> sampling rate
>>                                           >
>>                                           > @Thomas: I had a few
>> questions that I inlined below but I will
>>                                           > summarize the "bigger ones"
>> here:
>>                                           >    - You mentioned constants
>> are not using the right conventions, I
>>                                           > looked around and didn't see
>> any convention except normal naming then
>>                                           > for static constants. Is that
>> right?
>>
>>                                           I looked through
>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>>                             <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>>
>>                                      <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>
>>                             <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>> /display/HotSpot/StyleGui>>>
>>                                           de and the rule is to "follow
>> an existing pattern and must have a
>>                                           distinct appearance from other
>> names". Which does not help a lot I
>>                                           guess :/ The GC team started
>> using upper camel case, e.g.
>>                                           SomeOtherConstant, but very
>> likely this is probably not applied
>>                                           consistently throughout. So I
>> am fine with not adding another style
>>                                           (like kMaxStackDepth with the
>> "k" in front with some unknown meaning)
>>                                           is fine.
>>
>>                                           (Chances are you will find that
>> style somewhere used anyway too,
>>                                           apologies if so :/)
>>
>>
>>                                      Thanks for that link, now I know
>> where to look. I used the upper camel case in my code as well then :) I
>> should have gotten them all.
>>
>>
>>                                            > PS: I've also inlined my
>> answers to Thomas below:
>>                                            >
>>                                            > On Tue, Jun 13, 2017 at 8:03
>> AM, Thomas Schatzl <thomas.schatzl at oracl
>>                                            > e.com <http://e.com> <
>> http://e.com> <http://e.com>> wrote:
>>                                            > > Hi all,
>>                                            > >
>>                                            > > On Mon, 2017-06-12 at
>> 11:11 -0700, JC Beyler wrote:
>>                                            > > > Dear all,
>>                                            > > >
>>                                            > > > I've continued working
>> on this and have done the following
>>                                            > > webrev:
>>                                            > > >
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>>
>>                                      <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.05/>>>
>>
>>                                            > >
>>                                            > > [...]
>>                                            > > > Things I still need to
>> do:
>>                                            > > >    - Have to fix that
>> TLAB case for the FastTLABRefill
>>                                            > > >    - Have to start
>> looking at the data to see that it is
>>                                            > > consistent and does gather
>> the right samples, right frequency, etc.
>>                                            > > >    - Have to check the
>> GC elements and what that produces
>>                                            > > >    - Run a slowdebug run
>> and ensure I fixed all those issues you
>>                                            > > saw > Robbin
>>                                            > > >
>>                                            > > > Thanks for looking at
>> the webrev and have a great week!
>>                                            > >
>>                                            > >   scratching a bit on the
>> surface of this change, so apologies for
>>                                            > > rather shallow comments:
>>                                            > >
>>                                            > > -
>> macroAssembler_x86.cpp:5604: while this is compiler code, and I
>>                                            > > am not sure this is final,
>> please avoid littering the code with
>>                                            > > TODO remarks :) They tend
>> to be candidates for later wtf moments
>>                                            > > only.
>>                                            > >
>>                                            > > Just file a CR for that.
>>                                            > >
>>                                            > Newcomer question: what is a
>> CR and not sure I have the rights to do
>>                                            > that yet ? :)
>>
>>                                           Apologies. CR is a change
>> request, this suggests to file a bug in the
>>                                           bug tracker. And you are right,
>> you can't just create a new account in
>>                                           the OpenJDK JIRA yourselves. :(
>>
>>
>>                                      Ok good to know, I'll continue with
>> my own todo list but I'll work hard on not letting it slip in the webrevs
>> anymore :)
>>
>>
>>                                           I was mostly referring to the
>> "... but it is a TODO" part of that
>>                                           comment in
>> macroassembler_x86.cpp. Comments about the why of the code
>>                                           are appreciated.
>>
>>                                           [Note that I now understand
>> that this is to some degree still work in
>>                                           progress. As long as the final
>> changeset does no contain TODO's I am
>>                                           fine (and it's not a hard
>> objection, rather their use in "final" code
>>                                           is typically limited in my
>> experience)]
>>
>>                                           5603   // Currently, if this
>> happens, just set back the actual end to
>>                                           where it was.
>>                                           5604   // We miss a chance to
>> sample here.
>>
>>                                           Would be okay, if explaining
>> "this" and the "why" of missing a chance
>>                                           to sample here would be best.
>>
>>                                           Like maybe:
>>
>>                                           // If we needed to refill
>> TLABs, just set the actual end point to
>>                                           // the end of the TLAB again.
>> We do not sample here although we could.
>>
>>                                      Done with your comment, it works
>> well in my mind.
>>
>>                                           I am not sure whether "miss a
>> chance to sample" meant "we could, but
>>                                           consciously don't because it's
>> not that useful" or "it would be
>>                                           necessary but don't because
>> it's too complicated to do.".
>>
>>                                           Looking at the original comment
>> once more, I am also not sure if that
>>                                           comment shouldn't referring to
>> the "end" variable (not actual_end)
>>                                           because that's the variable
>> that is responsible for taking the sampling
>>                                           path? (Going from the member
>> description of ThreadLocalAllocBuffer).
>>
>>
>>                                      I've moved this code and it no
>> longer shows up here but the rationale and answer was:
>>
>>                                      So.. Yes, end is the variable
>> provoking the sampling. Actual end is the actual end of the TLAB.
>>
>>                                      What was happening here is that the
>> code is resetting _end to point towards the end of the new TLAB. Because,
>> we now have the end for
>>                             sampling and _actual_end for
>>                                      the actual end, we need to update
>> the actual_end as well.
>>
>>                                      Normally, were we to do the real
>> work here, we would calculate the (end - start) offset, then do:
>>
>>                                      - Set the new end to : start +
>> (old_end - old_start)
>>                                      - Set the actual end like we do here
>> now where it because it is the actual end.
>>
>>                                      Why is this not done here now
>> anymore?
>>                                          - I was still debating which
>> path to take:
>>                                             - Do it in the fast refill
>> code, it has its perks:
>>                                                 - In a world where fast
>> refills are happening all the time or a lot, we can augment there the code
>> to do the sampling
>>                                             - Remember what we had as an
>> end before leaving the slowpath and check on return
>>                                                 - This is what I'm doing
>> now, it removes the need to go fix up all fast refill paths but if you
>> remain in fast refill paths,
>>                             you won't get sampling. I
>>                                      have to think of the consequences of
>> that, maybe a future change later on?
>>                                                    - I have the
>> statistics now so I'm going to study that
>>                                                       -> By the way,
>> though my statistics are showing I'm missing some samples, if I turn off
>> FastTlabRefill, it is the same
>>                             loss so for now, it seems
>>                                      this does not occur in my simple
>> test.
>>
>>
>>
>>                                           But maybe I am only confused
>> and it's best to just leave the comment
>>                                           away. :)
>>
>>                                           Thinking about it some more,
>> doesn't this not-sampling in this case
>>                                           mean that sampling does not
>> work in any collector that does inline TLAB
>>                                           allocation at the moment? (Or
>> is inline TLAB alloc automatically
>>                                           disabled with sampling somehow?)
>>
>>                                           That would indeed be a bigger
>> TODO then :)
>>
>>
>>                                      Agreed, this remark made me think
>> that perhaps as a first step the new way of doing it is better but I did
>> have to:
>>                                         - Remove the const of the
>> ThreadLocalBuffer remaining and hard_end methods
>>                                         - Move hard_end out of the header
>> file to have a bit more logic there
>>
>>                                      Please let me know what you think of
>> that and if you prefer it this way or changing the fast refills. (I prefer
>> this way now because it
>>                             is more incremental).
>>
>>
>>                                           > > - calling
>> HeapMonitoring::do_weak_oops() (which should probably be
>>                                           > > called weak_oops_do() like
>> other similar methods) only if string
>>                                           > > deduplication is enabled
>> (in g1CollectedHeap.cpp:4511) seems wrong.
>>                                           >
>>                                           > The call should be at least
>> around 6 lines up outside the if.
>>                                           >
>>                                           > Preferentially in a method
>> like process_weak_jni_handles(), including
>>                                           > additional logging. (No new
>> (G1) gc phase without minimal logging
>>                                           > :)).
>>                                           > Done but really not sure
>> because:
>>                                           >
>>                                           > I put for logging:
>>                                           >   log_develop_trace(gc,
>> freelist)("G1ConcRegionFreeing [other] : heap
>>                                           > monitoring");
>>
>>                                           I would think that "gc, ref"
>> would be more appropriate log tags for
>>                                           this similar to jni handles.
>>                                           (I am als not sure what weak
>> reference handling has to do with
>>                                           G1ConcRegionFreeing, so I am a
>> bit puzzled)
>>
>>
>>                                      I was not sure what to put for the
>> tags or really as the message. I cleaned it up a bit now to:
>>                                           log_develop_trace(gc,
>> ref)("HeapSampling [other] : heap monitoring processing");
>>
>>
>>
>>                                           > Since weak_jni_handles didn't
>> have logging for me to be inspired
>>                                           > from, I did that but
>> unconvinced this is what should be done.
>>
>>                                           The JNI handle processing does
>> have logging, but only in
>>                                           ReferenceProcessor::process_discovered_references().
>> In
>>                                           process_weak_jni_handles() only
>> overall time is measured (in a G1
>>                                           specific way, since only G1
>> supports disabling reference procesing) :/
>>
>>                                           The code in ReferenceProcessor
>> prints both time taken
>>                                           referenceProcessor.cpp:254, as
>> well as the count, but strangely only in
>>                                           debug VMs.
>>
>>                                           I have no idea why this logging
>> is that unimportant to only print that
>>                                           in a debug VM. However there
>> are reviews out for changing this area a
>>                                           bit, so it might be useful to
>> wait for that (JDK-8173335).
>>
>>
>>                                      I cleaned it up a bit anyway and now
>> it returns the count of objects that are in the system.
>>
>>
>>                                           > > - the change doubles the
>> size of
>>                                           > >
>> CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>>                                           > > threshold. Maybe it could
>> be refactored a bit.
>>                                           > Done I think, it looks better
>> to me :).
>>
>>                                           In
>> ThreadLocalAllocBuffer::handle_sample() I think the
>>                                           set_back_actual_end()/pick_next_sample()
>> calls could be hoisted out of
>>                                           the "if" :)
>>
>>
>>                                      Done!
>>
>>
>>                                           > > -
>> referenceProcessor.cpp:261: the change should add logging about
>>                                           > > the number of references
>> encountered, maybe after the corresponding
>>                                           > > "JNI weak reference count"
>> log message.
>>                                           > Just to double check, are you
>> saying that you'd like to have the heap
>>                                           > sampler to keep in store how
>> many sampled objects were encountered in
>>                                           > the
>> HeapMonitoring::weak_oops_do?
>>                                           >    - Would a return of the
>> method with the number of handled
>>                                           > references and logging that
>> work?
>>
>>                                           Yes, it's fine if
>> HeapMonitoring::weak_oops_do() only returned the
>>                                           number of processed weak oops.
>>
>>
>>                                      Done also (but I admit I have not
>> tested the output yet) :)
>>
>>
>>                                           >    - Additionally, would you
>> prefer it in a separate block with its
>>                                           > GCTraceTime?
>>
>>                                           Yes. Both kinds of information
>> is interesting: while the time taken is
>>                                           typically more important, the
>> next question would be why, and the
>>                                           number of references typically
>> goes a long way there.
>>
>>                                           See above though, it is
>> probably best to wait a bit.
>>
>>
>>                                      Agreed that I "could" wait but, if
>> it's ok, I'll just refactor/remove this when we get closer to something
>> final. Either, JDK-8173335
>>                                      has gone in and I will notice it now
>> or it will soon and I can change it then.
>>
>>
>>                                           > > -
>> threadLocalAllocBuffer.cpp:331: one more "TODO"
>>                                           > Removed it and added it to my
>> personal todos to look at.
>>                                           >      > >
>>                                           > > -
>> threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>>                                           > > documentation should be
>> updated about the sampling additions. I
>>                                           > > would have no clue what the
>> difference between "actual_end" and
>>                                           > > "end" would be from the
>> given information.
>>                                           > If you are talking about the
>> comments in this file, I made them more
>>                                           > clear I hope in the new
>> webrev. If it was somewhere else, let me know
>>                                           > where to change.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171016/890ef8ec/attachment-0001.html>

From vladimir.kozlov at oracle.com  Mon Oct 16 18:06:59 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 16 Oct 2017 11:06:59 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
Message-ID: <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>

I looked more on changes.

First, please, run RBT testing. May be ask SQE to run testing with AOTed java.base as they did before.

I did not look on Graal assuming Labs reviewed it already.
JVMCI changes looks fine to me.

JAOTC. In addition to previous comment about change in DataPatchProcessor.java.
------

AOTCompiledClass.java - I wish metadataName() was defined in corresponding classes instead of manual checking type. It 
is fine for now but I would add assert for 'else' case that only HotSpotResolvedObjectType ref is expected there.

Should we also unify how we generate method name?
We use JavaMethodInfo.uniqueMethodName() in few places.
Then we have AOTHotSpotResolvedJavaMethod.getNameAndSignature().
And now you added new metadataName().

Hotspot AOT code.
-----------------

aotLoader.hpp - you don't need 2 methods. Move UseAOT check into .cpp code and in .hpp you can do:

static bool reconcile_dynamic_invoke(InstanceKlass* holder, int index, Method* adapter_method, Klass *appendix_klass) 
NOT_AOT({ return true; });

aotCodeHeap.* - I don't like that you have separate reconcile_dynamic_klass() method only for one use. Instead of passin 
[2] array pas it as separate parameter so you can pass NULL when it is not defined.

Hotspot fingerprint.
-------------------

I am concern that you changed logic when and how klass's fingerprint is generated. With your changes it become more 
expensive:

    if (UseAOT && ik->supers_have_passed_fingerprint_checks()) {
+    uint64_t str_fp = _stream->compute_fingerprint();

  Why removing !result->is_anonymous() check is not enough?:

  if (InstanceKlass::should_store_fingerprint()) {
    result->store_fingerprint(stream->compute_fingerprint());

Thanks,
Vladimir

On 10/6/17 12:52 PM, dean.long at oracle.com wrote:
> On 10/6/17 12:37 PM, dean.long at oracle.com wrote:
> 
>> On 10/6/17 10:03 AM, Igor Veresov wrote:
>>
>>>
>>>
>>>> On Oct 6, 2017, at 9:52 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>>
>>>> On 10/5/17 11:16 PM, Igor Veresov wrote:
>>>>>> On Oct 5, 2017, at 10:57 AM, dean.long at oracle.com <mailto:dean.long at oracle.com> wrote:
>>>>>>
>>>>>> On 10/4/17 6:27 PM, Vladimir Kozlov wrote:
>>>>>>
>>>>>>> Yes, I start looking on it.
>>>>>>>
>>>>>>> In DataPatchProcessor.java why you removed addDependentKlassData() call?:
>>>>>>>
>>>>>>> + AOTCompiledClass.addFingerprintKlassData(binaryContainer, type);
>>>>>>> + ???????????????targetSymbol = AOTCompiledClass.metadataName(type);
>>>>>>> ?????????????????gotName = ((action == HotSpotConstantLoadAction.INITIALIZE) ? "got.init." : "got.") + targetSymbol;
>>>>>>> - ???????????????methodInfo.addDependentKlassData(binaryContainer, type);
>>>>>>> ?????????????} else if (metaspaceConstant.asResolvedJavaMethod() != null && action == 
>>>>>>> HotSpotConstantLoadAction.LOAD_COUNTERS) {
>>>>>>>
>>>>>>
>>>>>> It is supposed to be an optimization, to prevent adding dependencies when we don't need them. ?We add dependencies 
>>>>>> elsewhere if we inline a method or reference a field, etc. ?I don't think we need a dependency just because we 
>>>>>> reference a constant.
>>>>>> Igor, do you agree?
>>>>>>
>>>>> I suppose you?re right. Field offset seems to be the only place where a dependency would be required and we should 
>>>>> get it covered. Perhaps this was added before we had field access recording. But I?d test it in case something pops 
>>>>> up (although nothing come to mind right now).
>>>>
>>>> What about allocations and runtime guard checks (class checks)?
>>>
>>>
>>> Yes, good point. Allocation will have the size of the object as a constant, which is definitely something we need a 
>>> dependency for. So either we need to collect all types allocated in the parser, or leave that statement as it were. 
>>> Perhaps we need a followup RFE to clean this up.
>>>
>>
>> OK let me see if I can simply revert that change.? There may be an ordering problem that I was trying to fix at the 
>> same time.
>>
> 
> I forgot to mention, if it's in the metadata, then there should be a dependency, and we have an assert that checks for 
> that.? Do all allocations and class checks generate a metadata entry?? But I agree that a followup RFE is safer.
> 
> dl
> 
> 
>> dl
>>
>>> igor
>>>
>>>>
>>>> Vladimir
>>>>
>>>>> igor
>>>>>>> Is HotSpotConstantPoolObject is real oop (Java object)? oop_got array is scanned for oops only.
>>>>>>>
>>>>>>
>>>>>> Yes, it's the appendix object.
>>>>>>
>>>>>>> Can you explain why the same class can have several Metaspace Names? Are all of them correspond to one class (and 
>>>>>>> its methods)? Should we do more in load_klass_data() in such case.
>>>>>>>
>>>>>>
>>>>>> It is a many to one mapping (aliases) for anonymous classes because we can't rely on the temporary name that the 
>>>>>> JVM creates. Regular classes use load_klass_data but anonymous classes don't. ?I have a TODO in 
>>>>>> AOTCodeHeap::reconcile_dynamic_klass() for loading code for anonymous classes, but it is disabled because I 
>>>>>> haven't implemented AOT of anonymous classes yet:
>>>>>>
>>>>>> ??// TODO: hook up any AOT code
>>>>>> ??// load_klass_data(dyno_data, thread);
>>>>>>
>>>>>>> Please, check consumption of Java heap since you are passing oops for metadata generation instead of strings 
>>>>>>> through JVMCI.
>>>>>>>
>>>>>>
>>>>>> OK.
>>>>>>
>>>>>> dl
>>>>>>
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 10/4/17 2:50 PM, dean.long at oracle.com <mailto:dean.long at oracle.com> wrote:
>>>>>>>> Hi Vladimir, do you have time to review this?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> dl
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/11/17 7:21 PM, Dean Long wrote:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132547
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dlong/8132547/
>>>>>>>>>
>>>>>>>>> This enhancement is a first step in supporting invokedynamic instructions in AOT. ?Previously, when we saw an 
>>>>>>>>> invokedynamic instruction, or any anonymous class, we would generate code to bail out and deoptimize. ?With 
>>>>>>>>> this changeset we go a little further and call into the runtime to resolve the dynamic constant pool entry, 
>>>>>>>>> running the bootstrap method, and returning the adapter method and appendix object. ?Like class initialization 
>>>>>>>>> in AOT, we only do this the first time through. Because AOT double-checks classes using fingerprints and 
>>>>>>>>> symbolic names, special care was required to handle anonymous class names. ?The solution I chose was to name 
>>>>>>>>> anonymous types with aliases based on their constant pool location ("adapter<classid:cpi>" and 
>>>>>>>>> appendix<classid:cpi>").
>>>>>>>>>
>>>>>>>>> Future work is needed to AOT-compile the anonymous classes and/or inline through them, so this change is not 
>>>>>>>>> expected to affect AOT performance. ?In my tests I was not able to measure any difference.
>>>>>>>>>
>>>>>>>>> Upstream Graal changes have already been pushed. ?I broke the JVMCI and hotspot changes into separate webrevs.
>>>>>>>>>
>>>>>>>>> dl
>>>
>>
> 

From igor.veresov at oracle.com  Mon Oct 16 23:51:48 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Mon, 16 Oct 2017 16:51:48 -0700
Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts
Message-ID: <A44986FF-E0D1-4A8B-85EB-576635EB2FC1@oracle.com>

This fixes paths in a couple of more places after repo consolidation.

Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/

Thanks,
igor

From dean.long at oracle.com  Tue Oct 17 07:57:00 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Tue, 17 Oct 2017 00:57:00 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
Message-ID: <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>

New hotspot webrev is here:

http://cr.openjdk.java.net/~dlong/8132547/hs.2/

Comments inlined below...


On 10/16/17 11:06 AM, Vladimir Kozlov wrote:

> I looked more on changes.
>
> First, please, run RBT testing. May be ask SQE to run testing with 
> AOTed java.base as they did before.
>
> I did not look on Graal assuming Labs reviewed it already.
> JVMCI changes looks fine to me.
>
> JAOTC. In addition to previous comment about change in 
> DataPatchProcessor.java.
> ------
>
> AOTCompiledClass.java - I wish metadataName() was defined in 
> corresponding classes instead of manual checking type. 

To do that I would probably need to wrap the JVMCI types in new JAOTC 
types.? I don't want to pollute the JVMCI types.

> It is fine for now but I would add assert for 'else' case that only 
> HotSpotResolvedObjectType ref is expected there.
>

Sure, thanks for catching that.? The cast will fail with an exception 
without the assert, but the assert can give a more informative error 
message.

> Should we also unify how we generate method name?
> We use JavaMethodInfo.uniqueMethodName() in few places.
> Then we have AOTHotSpotResolvedJavaMethod.getNameAndSignature().
> And now you added new metadataName().
>

OK, I filed RFE 8189411.

> Hotspot AOT code.
> -----------------
>
> aotLoader.hpp - you don't need 2 methods. Move UseAOT check into .cpp 
> code and in .hpp you can do:
>
> static bool reconcile_dynamic_invoke(InstanceKlass* holder, int index, 
> Method* adapter_method, Klass *appendix_klass) NOT_AOT({ return true; });
>
> aotCodeHeap.* - I don't like that you have separate 
> reconcile_dynamic_klass() method only for one use. Instead of passin 
> [2] array pas it as separate parameter so you can pass NULL when it is 
> not defined.
>

OK.

> Hotspot fingerprint.
> -------------------
>
> I am concern that you changed logic when and how klass's fingerprint 
> is generated. With your changes it become more expensive:
>
> ?? if (UseAOT && ik->supers_have_passed_fingerprint_checks()) {
> +??? uint64_t str_fp = _stream->compute_fingerprint();
>

You are right, I will revert these changes that were left over from an 
earlier version.

> ?Why removing !result->is_anonymous() check is not enough?:
>
> ?if (InstanceKlass::should_store_fingerprint()) {
> ?? result->store_fingerprint(stream->compute_fingerprint());
>

Because InstanceKlass::should_store_fingerprint() will return false for 
an anonymous class.

dl


> Thanks,
> Vladimir
>
> On 10/6/17 12:52 PM, dean.long at oracle.com wrote:
>> On 10/6/17 12:37 PM, dean.long at oracle.com wrote:
>>
>>> On 10/6/17 10:03 AM, Igor Veresov wrote:
>>>
>>>>
>>>>
>>>>> On Oct 6, 2017, at 9:52 AM, Vladimir Kozlov 
>>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> 
>>>>> wrote:
>>>>>
>>>>> On 10/5/17 11:16 PM, Igor Veresov wrote:
>>>>>>> On Oct 5, 2017, at 10:57 AM, dean.long at oracle.com 
>>>>>>> <mailto:dean.long at oracle.com> wrote:
>>>>>>>
>>>>>>> On 10/4/17 6:27 PM, Vladimir Kozlov wrote:
>>>>>>>
>>>>>>>> Yes, I start looking on it.
>>>>>>>>
>>>>>>>> In DataPatchProcessor.java why you removed 
>>>>>>>> addDependentKlassData() call?:
>>>>>>>>
>>>>>>>> + AOTCompiledClass.addFingerprintKlassData(binaryContainer, type);
>>>>>>>> + ???????????????targetSymbol = 
>>>>>>>> AOTCompiledClass.metadataName(type);
>>>>>>>> ?????????????????gotName = ((action == 
>>>>>>>> HotSpotConstantLoadAction.INITIALIZE) ? "got.init." : "got.") + 
>>>>>>>> targetSymbol;
>>>>>>>> - 
>>>>>>>> ???????????????methodInfo.addDependentKlassData(binaryContainer, 
>>>>>>>> type);
>>>>>>>> ?????????????} else if 
>>>>>>>> (metaspaceConstant.asResolvedJavaMethod() != null && action == 
>>>>>>>> HotSpotConstantLoadAction.LOAD_COUNTERS) {
>>>>>>>>
>>>>>>>
>>>>>>> It is supposed to be an optimization, to prevent adding 
>>>>>>> dependencies when we don't need them. ?We add dependencies 
>>>>>>> elsewhere if we inline a method or reference a field, etc. ?I 
>>>>>>> don't think we need a dependency just because we reference a 
>>>>>>> constant.
>>>>>>> Igor, do you agree?
>>>>>>>
>>>>>> I suppose you?re right. Field offset seems to be the only place 
>>>>>> where a dependency would be required and we should get it 
>>>>>> covered. Perhaps this was added before we had field access 
>>>>>> recording. But I?d test it in case something pops up (although 
>>>>>> nothing come to mind right now).
>>>>>
>>>>> What about allocations and runtime guard checks (class checks)?
>>>>
>>>>
>>>> Yes, good point. Allocation will have the size of the object as a 
>>>> constant, which is definitely something we need a dependency for. 
>>>> So either we need to collect all types allocated in the parser, or 
>>>> leave that statement as it were. Perhaps we need a followup RFE to 
>>>> clean this up.
>>>>
>>>
>>> OK let me see if I can simply revert that change.? There may be an 
>>> ordering problem that I was trying to fix at the same time.
>>>
>>
>> I forgot to mention, if it's in the metadata, then there should be a 
>> dependency, and we have an assert that checks for that.? Do all 
>> allocations and class checks generate a metadata entry?? But I agree 
>> that a followup RFE is safer.
>>
>> dl
>>
>>
>>> dl
>>>
>>>> igor
>>>>
>>>>>
>>>>> Vladimir
>>>>>
>>>>>> igor
>>>>>>>> Is HotSpotConstantPoolObject is real oop (Java object)? oop_got 
>>>>>>>> array is scanned for oops only.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, it's the appendix object.
>>>>>>>
>>>>>>>> Can you explain why the same class can have several Metaspace 
>>>>>>>> Names? Are all of them correspond to one class (and its 
>>>>>>>> methods)? Should we do more in load_klass_data() in such case.
>>>>>>>>
>>>>>>>
>>>>>>> It is a many to one mapping (aliases) for anonymous classes 
>>>>>>> because we can't rely on the temporary name that the JVM 
>>>>>>> creates. Regular classes use load_klass_data but anonymous 
>>>>>>> classes don't. ?I have a TODO in 
>>>>>>> AOTCodeHeap::reconcile_dynamic_klass() for loading code for 
>>>>>>> anonymous classes, but it is disabled because I haven't 
>>>>>>> implemented AOT of anonymous classes yet:
>>>>>>>
>>>>>>> ??// TODO: hook up any AOT code
>>>>>>> ??// load_klass_data(dyno_data, thread);
>>>>>>>
>>>>>>>> Please, check consumption of Java heap since you are passing 
>>>>>>>> oops for metadata generation instead of strings through JVMCI.
>>>>>>>>
>>>>>>>
>>>>>>> OK.
>>>>>>>
>>>>>>> dl
>>>>>>>
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 10/4/17 2:50 PM, dean.long at oracle.com 
>>>>>>>> <mailto:dean.long at oracle.com> wrote:
>>>>>>>>> Hi Vladimir, do you have time to review this?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> dl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9/11/17 7:21 PM, Dean Long wrote:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132547
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dlong/8132547/
>>>>>>>>>>
>>>>>>>>>> This enhancement is a first step in supporting invokedynamic 
>>>>>>>>>> instructions in AOT. ?Previously, when we saw an 
>>>>>>>>>> invokedynamic instruction, or any anonymous class, we would 
>>>>>>>>>> generate code to bail out and deoptimize. ?With this 
>>>>>>>>>> changeset we go a little further and call into the runtime to 
>>>>>>>>>> resolve the dynamic constant pool entry, running the 
>>>>>>>>>> bootstrap method, and returning the adapter method and 
>>>>>>>>>> appendix object. ?Like class initialization in AOT, we only 
>>>>>>>>>> do this the first time through. Because AOT double-checks 
>>>>>>>>>> classes using fingerprints and symbolic names, special care 
>>>>>>>>>> was required to handle anonymous class names. ?The solution I 
>>>>>>>>>> chose was to name anonymous types with aliases based on their 
>>>>>>>>>> constant pool location ("adapter<classid:cpi>" and 
>>>>>>>>>> appendix<classid:cpi>").
>>>>>>>>>>
>>>>>>>>>> Future work is needed to AOT-compile the anonymous classes 
>>>>>>>>>> and/or inline through them, so this change is not expected to 
>>>>>>>>>> affect AOT performance. ?In my tests I was not able to 
>>>>>>>>>> measure any difference.
>>>>>>>>>>
>>>>>>>>>> Upstream Graal changes have already been pushed. ?I broke the 
>>>>>>>>>> JVMCI and hotspot changes into separate webrevs.
>>>>>>>>>>
>>>>>>>>>> dl
>>>>
>>>
>>


From george.triantafillou at oracle.com  Tue Oct 17 14:59:52 2017
From: george.triantafillou at oracle.com (George Triantafillou)
Date: Tue, 17 Oct 2017 10:59:52 -0400
Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts
In-Reply-To: <A44986FF-E0D1-4A8B-85EB-576635EB2FC1@oracle.com>
References: <A44986FF-E0D1-4A8B-85EB-576635EB2FC1@oracle.com>
Message-ID: <0f64832d-0625-fa88-1ae8-55608dc73614@oracle.com>

Hi Igor,

Looks good.

-George

On 10/16/2017 7:51 PM, Igor Veresov wrote:
> This fixes paths in a couple of more places after repo consolidation.
>
> Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/
>
> Thanks,
> igor


From vladimir.kozlov at oracle.com  Tue Oct 17 18:15:23 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Oct 2017 11:15:23 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
Message-ID: <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>

On 10/17/17 12:57 AM, dean.long at oracle.com wrote:
> New hotspot webrev is here:
> 
> http://cr.openjdk.java.net/~dlong/8132547/hs.2/
> 
> Comments inlined below...
> 
> 
> On 10/16/17 11:06 AM, Vladimir Kozlov wrote:
> 
>> I looked more on changes.
>>
>> First, please, run RBT testing. May be ask SQE to run testing with 
>> AOTed java.base as they did before.
>>
>> I did not look on Graal assuming Labs reviewed it already.
>> JVMCI changes looks fine to me.
>>
>> JAOTC. In addition to previous comment about change in 
>> DataPatchProcessor.java.
>> ------
>>
>> AOTCompiledClass.java - I wish metadataName() was defined in 
>> corresponding classes instead of manual checking type. 
> 
> To do that I would probably need to wrap the JVMCI types in new JAOTC 
> types.? I don't want to pollute the JVMCI types.

Agree.

> 
>> It is fine for now but I would add assert for 'else' case that only 
>> HotSpotResolvedObjectType ref is expected there.
>>
> 
> Sure, thanks for catching that.? The cast will fail with an exception 
> without the assert, but the assert can give a more informative error 
> message.
> 
>> Should we also unify how we generate method name?
>> We use JavaMethodInfo.uniqueMethodName() in few places.
>> Then we have AOTHotSpotResolvedJavaMethod.getNameAndSignature().
>> And now you added new metadataName().
>>
> 
> OK, I filed RFE 8189411.

Okay.

> 
>> Hotspot AOT code.
>> -----------------
>>
>> aotLoader.hpp - you don't need 2 methods. Move UseAOT check into .cpp 
>> code and in .hpp you can do:
>>
>> static bool reconcile_dynamic_invoke(InstanceKlass* holder, int index, 
>> Method* adapter_method, Klass *appendix_klass) NOT_AOT({ return true; });
>>
>> aotCodeHeap.* - I don't like that you have separate 
>> reconcile_dynamic_klass() method only for one use. Instead of passin 
>> [2] array pas it as separate parameter so you can pass NULL when it is 
>> not defined.
>>
> 
> OK.
> 
>> Hotspot fingerprint.
>> -------------------
>>
>> I am concern that you changed logic when and how klass's fingerprint 
>> is generated. With your changes it become more expensive:
>>
>> ?? if (UseAOT && ik->supers_have_passed_fingerprint_checks()) {
>> +??? uint64_t str_fp = _stream->compute_fingerprint();
>>
> 
> You are right, I will revert these changes that were left over from an 
> earlier version.
> 
>> ?Why removing !result->is_anonymous() check is not enough?:
>>
>> ?if (InstanceKlass::should_store_fingerprint()) {
>> ?? result->store_fingerprint(stream->compute_fingerprint());
>>
> 
> Because InstanceKlass::should_store_fingerprint() will return false for 
> an anonymous class.

should_store_fingerprint() only checks flags. Do you mean it to return 
'true' during execution too for anonymous classes? But next code will 
recalculate fingerprint for all classes!!! when you need compute only 
for anonymous:

+  if (result->has_stored_fingerprint()) {
+    result->store_fingerprint(stream->compute_fingerprint());
    }

Thanks,
Vladimir

> 
> dl
> 
> 
>> Thanks,
>> Vladimir
>>
>> On 10/6/17 12:52 PM, dean.long at oracle.com wrote:
>>> On 10/6/17 12:37 PM, dean.long at oracle.com wrote:
>>>
>>>> On 10/6/17 10:03 AM, Igor Veresov wrote:
>>>>
>>>>>
>>>>>
>>>>>> On Oct 6, 2017, at 9:52 AM, Vladimir Kozlov 
>>>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> 
>>>>>> wrote:
>>>>>>
>>>>>> On 10/5/17 11:16 PM, Igor Veresov wrote:
>>>>>>>> On Oct 5, 2017, at 10:57 AM, dean.long at oracle.com 
>>>>>>>> <mailto:dean.long at oracle.com> wrote:
>>>>>>>>
>>>>>>>> On 10/4/17 6:27 PM, Vladimir Kozlov wrote:
>>>>>>>>
>>>>>>>>> Yes, I start looking on it.
>>>>>>>>>
>>>>>>>>> In DataPatchProcessor.java why you removed 
>>>>>>>>> addDependentKlassData() call?:
>>>>>>>>>
>>>>>>>>> + AOTCompiledClass.addFingerprintKlassData(binaryContainer, type);
>>>>>>>>> + ???????????????targetSymbol = 
>>>>>>>>> AOTCompiledClass.metadataName(type);
>>>>>>>>> ?????????????????gotName = ((action == 
>>>>>>>>> HotSpotConstantLoadAction.INITIALIZE) ? "got.init." : "got.") + 
>>>>>>>>> targetSymbol;
>>>>>>>>> - 
>>>>>>>>> ???????????????methodInfo.addDependentKlassData(binaryContainer, type); 
>>>>>>>>>
>>>>>>>>> ?????????????} else if 
>>>>>>>>> (metaspaceConstant.asResolvedJavaMethod() != null && action == 
>>>>>>>>> HotSpotConstantLoadAction.LOAD_COUNTERS) {
>>>>>>>>>
>>>>>>>>
>>>>>>>> It is supposed to be an optimization, to prevent adding 
>>>>>>>> dependencies when we don't need them. ?We add dependencies 
>>>>>>>> elsewhere if we inline a method or reference a field, etc. ?I 
>>>>>>>> don't think we need a dependency just because we reference a 
>>>>>>>> constant.
>>>>>>>> Igor, do you agree?
>>>>>>>>
>>>>>>> I suppose you?re right. Field offset seems to be the only place 
>>>>>>> where a dependency would be required and we should get it 
>>>>>>> covered. Perhaps this was added before we had field access 
>>>>>>> recording. But I?d test it in case something pops up (although 
>>>>>>> nothing come to mind right now).
>>>>>>
>>>>>> What about allocations and runtime guard checks (class checks)?
>>>>>
>>>>>
>>>>> Yes, good point. Allocation will have the size of the object as a 
>>>>> constant, which is definitely something we need a dependency for. 
>>>>> So either we need to collect all types allocated in the parser, or 
>>>>> leave that statement as it were. Perhaps we need a followup RFE to 
>>>>> clean this up.
>>>>>
>>>>
>>>> OK let me see if I can simply revert that change.? There may be an 
>>>> ordering problem that I was trying to fix at the same time.
>>>>
>>>
>>> I forgot to mention, if it's in the metadata, then there should be a 
>>> dependency, and we have an assert that checks for that.? Do all 
>>> allocations and class checks generate a metadata entry?? But I agree 
>>> that a followup RFE is safer.
>>>
>>> dl
>>>
>>>
>>>> dl
>>>>
>>>>> igor
>>>>>
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>>> igor
>>>>>>>>> Is HotSpotConstantPoolObject is real oop (Java object)? oop_got 
>>>>>>>>> array is scanned for oops only.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, it's the appendix object.
>>>>>>>>
>>>>>>>>> Can you explain why the same class can have several Metaspace 
>>>>>>>>> Names? Are all of them correspond to one class (and its 
>>>>>>>>> methods)? Should we do more in load_klass_data() in such case.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It is a many to one mapping (aliases) for anonymous classes 
>>>>>>>> because we can't rely on the temporary name that the JVM 
>>>>>>>> creates. Regular classes use load_klass_data but anonymous 
>>>>>>>> classes don't. ?I have a TODO in 
>>>>>>>> AOTCodeHeap::reconcile_dynamic_klass() for loading code for 
>>>>>>>> anonymous classes, but it is disabled because I haven't 
>>>>>>>> implemented AOT of anonymous classes yet:
>>>>>>>>
>>>>>>>> ??// TODO: hook up any AOT code
>>>>>>>> ??// load_klass_data(dyno_data, thread);
>>>>>>>>
>>>>>>>>> Please, check consumption of Java heap since you are passing 
>>>>>>>>> oops for metadata generation instead of strings through JVMCI.
>>>>>>>>>
>>>>>>>>
>>>>>>>> OK.
>>>>>>>>
>>>>>>>> dl
>>>>>>>>
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 10/4/17 2:50 PM, dean.long at oracle.com 
>>>>>>>>> <mailto:dean.long at oracle.com> wrote:
>>>>>>>>>> Hi Vladimir, do you have time to review this?
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> dl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/11/17 7:21 PM, Dean Long wrote:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132547
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~dlong/8132547/
>>>>>>>>>>>
>>>>>>>>>>> This enhancement is a first step in supporting invokedynamic 
>>>>>>>>>>> instructions in AOT. ?Previously, when we saw an 
>>>>>>>>>>> invokedynamic instruction, or any anonymous class, we would 
>>>>>>>>>>> generate code to bail out and deoptimize. ?With this 
>>>>>>>>>>> changeset we go a little further and call into the runtime to 
>>>>>>>>>>> resolve the dynamic constant pool entry, running the 
>>>>>>>>>>> bootstrap method, and returning the adapter method and 
>>>>>>>>>>> appendix object. ?Like class initialization in AOT, we only 
>>>>>>>>>>> do this the first time through. Because AOT double-checks 
>>>>>>>>>>> classes using fingerprints and symbolic names, special care 
>>>>>>>>>>> was required to handle anonymous class names. ?The solution I 
>>>>>>>>>>> chose was to name anonymous types with aliases based on their 
>>>>>>>>>>> constant pool location ("adapter<classid:cpi>" and 
>>>>>>>>>>> appendix<classid:cpi>").
>>>>>>>>>>>
>>>>>>>>>>> Future work is needed to AOT-compile the anonymous classes 
>>>>>>>>>>> and/or inline through them, so this change is not expected to 
>>>>>>>>>>> affect AOT performance. ?In my tests I was not able to 
>>>>>>>>>>> measure any difference.
>>>>>>>>>>>
>>>>>>>>>>> Upstream Graal changes have already been pushed. ?I broke the 
>>>>>>>>>>> JVMCI and hotspot changes into separate webrevs.
>>>>>>>>>>>
>>>>>>>>>>> dl
>>>>>
>>>>
>>>
> 

From vladimir.kozlov at oracle.com  Tue Oct 17 18:21:22 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Oct 2017 11:21:22 -0700
Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts
In-Reply-To: <A44986FF-E0D1-4A8B-85EB-576635EB2FC1@oracle.com>
References: <A44986FF-E0D1-4A8B-85EB-576635EB2FC1@oracle.com>
Message-ID: <9c3ff9aa-cfcb-1f82-3fa4-c6a5666486ef@oracle.com>

Looks good.

Thanks,
Vladimir

On 10/16/17 4:51 PM, Igor Veresov wrote:
> This fixes paths in a couple of more places after repo consolidation.
> 
> Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/
> 
> Thanks,
> igor
> 

From dean.long at oracle.com  Tue Oct 17 20:41:41 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Tue, 17 Oct 2017 13:41:41 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
 <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
Message-ID: <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>

Comment below...


On 10/17/17 11:15 AM, Vladimir Kozlov wrote:
>>> ?Why removing !result->is_anonymous() check is not enough?:
>>>
>>> ?if (InstanceKlass::should_store_fingerprint()) {
>>> result->store_fingerprint(stream->compute_fingerprint());
>>>
>>
>> Because InstanceKlass::should_store_fingerprint() will return false 
>> for an anonymous class.
>
> should_store_fingerprint() only checks flags. Do you mean it to return 
> 'true' during execution too for anonymous classes? But next code will 
> recalculate fingerprint for all classes!!! when you need compute only 
> for anonymous:
>
> +? if (result->has_stored_fingerprint()) {
> + result->store_fingerprint(stream->compute_fingerprint());
> ?? }
>

It should be for anonymous only (in AOT mode), unless I'm missing something:

1982 bool InstanceKlass::has_stored_fingerprint() const {
1983 #if INCLUDE_AOT
1984 return should_store_fingerprint(is_anonymous()) || is_shared();
1985 #else
1986   return false;
1987 #endif
1988 }

1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) { 
[...]1971 if (UseAOT && is_anonymous) {
1972 // (3) We are using AOT code from a shared library and see an 
anonymous class
1973 return true;
1974 } dl

> Thanks,
> Vladimir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171017/6c0d1cbc/attachment.html>

From igor.veresov at oracle.com  Tue Oct 17 20:49:20 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 17 Oct 2017 13:49:20 -0700
Subject: RFR(XS) 8189409: [AOT] Fix paths in aot test scripts
In-Reply-To: <9c3ff9aa-cfcb-1f82-3fa4-c6a5666486ef@oracle.com>
References: <A44986FF-E0D1-4A8B-85EB-576635EB2FC1@oracle.com>
 <9c3ff9aa-cfcb-1f82-3fa4-c6a5666486ef@oracle.com>
Message-ID: <87D1F4CA-C2D3-4A16-B6A1-94E48A989046@oracle.com>

Thanks!

> On Oct 17, 2017, at 11:21 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 10/16/17 4:51 PM, Igor Veresov wrote:
>> This fixes paths in a couple of more places after repo consolidation.
>> Webrev: http://cr.openjdk.java.net/~iveresov/8189409/webrev.00/
>> Thanks,
>> igor


From vladimir.kozlov at oracle.com  Tue Oct 17 22:30:34 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 17 Oct 2017 15:30:34 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
 <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
 <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>
Message-ID: <c2575056-a088-a159-e0f3-854a3b1d429f@oracle.com>


On 10/17/17 1:41 PM, dean.long at oracle.com wrote:
> Comment below...
> 
> 
> On 10/17/17 11:15 AM, Vladimir Kozlov wrote:
>>>> ?Why removing !result->is_anonymous() check is not enough?:
>>>>
>>>> ?if (InstanceKlass::should_store_fingerprint()) {
>>>> result->store_fingerprint(stream->compute_fingerprint());
>>>>
>>>
>>> Because InstanceKlass::should_store_fingerprint() will return false 
>>> for an anonymous class.
>>
>> should_store_fingerprint() only checks flags. Do you mean it to return 
>> 'true' during execution too for anonymous classes? But next code will 
>> recalculate fingerprint for all classes!!! when you need compute only 
>> for anonymous:
>>
>> +? if (result->has_stored_fingerprint()) {
>> + result->store_fingerprint(stream->compute_fingerprint());
>> ?? }
>>
> 
> It should be for anonymous only (in AOT mode), unless I'm missing something:
> 
> 1982 bool InstanceKlass::has_stored_fingerprint() const {
> 1983 #if INCLUDE_AOT
> 1984 return should_store_fingerprint(is_anonymous()) || is_shared();

I mean should_store_fingerprint() will return true for all klasses in 
CDS too. So you recalculating them.

Vladimir

> 1985 #else
> 1986   return false;
> 1987 #endif
> 1988 }
> 
> 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) { 
> [...]1971 if (UseAOT && is_anonymous) {
> 1972 // (3) We are using AOT code from a shared library and see an 
> anonymous class
> 1973 return true;
> 1974 } dl
> 
>> Thanks,
>> Vladimir
> 

From dean.long at oracle.com  Wed Oct 18 01:36:50 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Tue, 17 Oct 2017 18:36:50 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <c2575056-a088-a159-e0f3-854a3b1d429f@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
 <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
 <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>
 <c2575056-a088-a159-e0f3-854a3b1d429f@oracle.com>
Message-ID: <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com>

On 10/17/17 3:30 PM, Vladimir Kozlov wrote:

>
> On 10/17/17 1:41 PM, dean.long at oracle.com wrote:
>> Comment below...
>>
>>
>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote:
>>>>> ?Why removing !result->is_anonymous() check is not enough?:
>>>>>
>>>>> ?if (InstanceKlass::should_store_fingerprint()) {
>>>>> result->store_fingerprint(stream->compute_fingerprint());
>>>>>
>>>>
>>>> Because InstanceKlass::should_store_fingerprint() will return false 
>>>> for an anonymous class.
>>>
>>> should_store_fingerprint() only checks flags. Do you mean it to 
>>> return 'true' during execution too for anonymous classes? But next 
>>> code will recalculate fingerprint for all classes!!! when you need 
>>> compute only for anonymous:
>>>
>>> +? if (result->has_stored_fingerprint()) {
>>> + result->store_fingerprint(stream->compute_fingerprint());
>>> ?? }
>>>
>>
>> It should be for anonymous only (in AOT mode), unless I'm missing 
>> something:
>>
>> 1982 bool InstanceKlass::has_stored_fingerprint() const {
>> 1983 #if INCLUDE_AOT
>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared();
>
> I mean should_store_fingerprint() will return true for all klasses in 
> CDS too. So you recalculating them.
>

I see what you mean now.? New webrev:

http://cr.openjdk.java.net/~dlong/8132547//hs.3/

dl

> Vladimir
>
>> 1985 #else
>> 1986?? return false;
>> 1987 #endif
>> 1988 }
>>
>> 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) 
>> { [...]1971 if (UseAOT && is_anonymous) {
>> 1972 // (3) We are using AOT code from a shared library and see an 
>> anonymous class
>> 1973 return true;
>> 1974 } dl
>>
>>> Thanks,
>>> Vladimir
>>


From igor.ignatyev at oracle.com  Wed Oct 18 04:45:56 2017
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 17 Oct 2017 21:45:56 -0700
Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java
 doesn't have timeout and hang on windows
Message-ID: <B7EDC3AF-CAC6-42CE-AF7C-C7172CCE7D0A@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
> 546 lines changed: 188 ins; 88 del; 270 mod;

Hi all,

could you please review this fix for ctw test?
in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution.
  
the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows.

webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
testing: applications/ctw/modules tests
JBS: https://bugs.openjdk.java.net/browse/JDK-8186618

Thanks,
-- Igor

From nils.eliasson at oracle.com  Wed Oct 18 08:03:19 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 18 Oct 2017 10:03:19 +0200
Subject: Reduced MaxVectorSize and vector type initialization
Message-ID: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>

HI,

I ran into a problem with the interaction between MaxVectorSize and the 
UseAVX. For some AMD CPUs we limit the vector size to 16 because it 
gives the best performance.

> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>        FLAG_SET_DEFAULT(MaxVectorSize, 16);
>      }

Whenf MaxVecorSize is set to 16 it has the sideeffect that the 
TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though 
the platform has the capability.

Type.cpp:~660

[...]
 >   if (Matcher::vector_size_supported(T_FLOAT,8)) {
 >     TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
 >   }
[...]
 >   mreg2type[Op_VecY] = TypeVect::VECTY;


In the ad-files feature flags (UseAVX etc.) are used to control what 
rules should be matched if it has effects on specific vector registers. 
Here we have a mismatch.

On a platform that supports AVX2 but have MaxVectorSize limited to 16, 
the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is 
uninitialized. We will also hit asserts in a few places like: 
assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), 
"sanity");

Shouldn't the type initialization in type.cpp be dependent on feature 
flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector 
registers are initialized if the platform supports them, but they might 
not be used if MaxVectorSize is limited.)

This is a patch that solves the problem, but I have not convinced myself 
that it is the right way:
http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/

Feedback appreciated,

Regards,
Nils Eliasson


http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171018/b33a2d8e/attachment.html>

From christos at zoulas.com  Wed Oct 18 12:25:52 2017
From: christos at zoulas.com (Christos Zoulas)
Date: Wed, 18 Oct 2017 08:25:52 -0400
Subject: what is the SLA for responding to bugs?
Message-ID: <20171018122552.DD4B117FDB6@rebar.astron.com>


I am asking because I filed: https://bugs.openjdk.java.net/browse/JDK-8189172
and I have not heard a word since.

Thanks,

christos

From volker.simonis at gmail.com  Wed Oct 18 13:54:35 2017
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 18 Oct 2017 13:54:35 +0000
Subject: what is the SLA for responding to bugs?
In-Reply-To: <20171018122552.DD4B117FDB6@rebar.astron.com>
References: <20171018122552.DD4B117FDB6@rebar.astron.com>
Message-ID: <CA+3eh11a9nMjjDyBqc0m_a4GqXcWDH6-MfocbeXxckUJVNsdzw@mail.gmail.com>

Christos,

this is an open source project, so you get exactly the SLA you are paying
for :)

If you want more, you could either kindly ask or get a support contract
from Oracle or any other OpenJDK distributor.

Regards,
Volker

Christos Zoulas <christos at zoulas.com> schrieb am Mi. 18. Okt. 2017 um 14:26:

>
> I am asking because I filed:
> https://bugs.openjdk.java.net/browse/JDK-8189172
> and I have not heard a word since.
>
> Thanks,
>
> christos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171018/16e4dfbe/attachment.html>

From rwestrel at redhat.com  Wed Oct 18 14:11:40 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 18 Oct 2017 16:11:40 +0200
Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler
 dying subgraph with single if proj
In-Reply-To: <b3e1f4bf-049c-da07-a8d2-ee958e2661c8@oracle.com>
References: <dk6mv59ydpe.fsf@rwestrel.remote.csb>
 <b3e1f4bf-049c-da07-a8d2-ee958e2661c8@oracle.com>
Message-ID: <dk6vajcr1er.fsf@rwestrel.remote.csb>


Thanks for the review, Vladimir. I followed your suggestion. Here is a
ready to push changeset:

http://cr.openjdk.java.net/~roland/8188223/8188223.patch

Roland.

From rwestrel at redhat.com  Wed Oct 18 14:16:14 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 18 Oct 2017 16:16:14 +0200
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i),
 scope_depth)->pco() == handler_pcos->at(i))" failure with C1
In-Reply-To: <dk6zi97w0z6.fsf@rwestrel.remote.csb>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
 <dk6zi97w0z6.fsf@rwestrel.remote.csb>
Message-ID: <dk6shegr175.fsf@rwestrel.remote.csb>


Here is an updated webrev with Dean's suggestion:

http://cr.openjdk.java.net/~roland/8188151/webrev.01/

Can this be considered reviewed by you, Dean?

Roland.

From christos at zoulas.com  Wed Oct 18 14:47:36 2017
From: christos at zoulas.com (Christos Zoulas)
Date: Wed, 18 Oct 2017 10:47:36 -0400
Subject: what is the SLA for responding to bugs?
In-Reply-To: <CA+3eh11a9nMjjDyBqc0m_a4GqXcWDH6-MfocbeXxckUJVNsdzw@mail.gmail.com>
 from Volker Simonis (Oct 18,  1:54pm)
Message-ID: <20171018144736.330FC17FDB6@rebar.astron.com>

On Oct 18,  1:54pm, volker.simonis at gmail.com (Volker Simonis) wrote:
-- Subject: Re: what is the SLA for responding to bugs?

| Christos,
| 
| this is an open source project, so you get exactly the SLA you are paying
| for :)
| 
| If you want more, you could either kindly ask or get a support contract
| from Oracle or any other OpenJDK distributor.
| 
| Regards,
| Volker

You are right, I should get paid support. I tried and I got 404...
The link from http://bugreport.java.com to "Oracle Java SE Support" goes to:

    https://www.oracle.com/java/java-se-support.html

Best,

christos


From vladimir.x.ivanov at oracle.com  Wed Oct 18 14:48:14 2017
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 18 Oct 2017 17:48:14 +0300
Subject: what is the SLA for responding to bugs?
In-Reply-To: <20171018122552.DD4B117FDB6@rebar.astron.com>
References: <20171018122552.DD4B117FDB6@rebar.astron.com>
Message-ID: <3f2cc7c3-b849-e034-c9d2-511d0a0acf66@oracle.com>

Thanks for the detailed report, Christos.

I'm not aware about any SLA, but development team tries to triage 
incoming bugs in prompt manner. Unfortunately, the bug was filed w/o 
subcategory set, so it went unnoticed.

I was able to reproduce it and added some root cause analysis, but I 
can't promise anything about fixing it in 8u. (And the fact that 9 isn't 
affected makes it less likely.)

Best regards,
Vladimir Ivanov

On 10/18/17 3:25 PM, christos at zoulas.com wrote:
> I am asking because I filed: https://bugs.openjdk.java.net/browse/JDK-8189172
> and I have not heard a word since.
> 
> Thanks,
> 
> christos
> 

From christos at zoulas.com  Wed Oct 18 14:50:50 2017
From: christos at zoulas.com (Christos Zoulas)
Date: Wed, 18 Oct 2017 10:50:50 -0400
Subject: what is the SLA for responding to bugs?
In-Reply-To: <3f2cc7c3-b849-e034-c9d2-511d0a0acf66@oracle.com>
 from Vladimir Ivanov (Oct 18,  5:48pm)
Message-ID: <20171018145050.263FC17FDBA@rebar.astron.com>

On Oct 18,  5:48pm, vladimir.x.ivanov at oracle.com (Vladimir Ivanov) wrote:
-- Subject: Re: what is the SLA for responding to bugs?

| Thanks for the detailed report, Christos.
| 
| I'm not aware about any SLA, but development team tries to triage 
| incoming bugs in prompt manner. Unfortunately, the bug was filed w/o 
| subcategory set, so it went unnoticed.
| 
| I was able to reproduce it and added some root cause analysis, but I 
| can't promise anything about fixing it in 8u. (And the fact that 9 isn't 
| affected makes it less likely.)

Thanks you very much! I am trying to get some paid support on my side to
see if this can be fixed in 8...

Best,

christos

From dean.long at oracle.com  Wed Oct 18 17:01:25 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 18 Oct 2017 10:01:25 -0700
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
Message-ID: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>

How about initializing TypeVect::VECTY and friends unconditionally?? I 
am nervous about exchanging one guarding condition for another.

dl


On 10/18/17 1:03 AM, Nils Eliasson wrote:
>
> HI,
>
> I ran into a problem with the interaction between MaxVectorSize and 
> the UseAVX. For some AMD CPUs we limit the vector size to 16 because 
> it gives the best performance.
>
>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h.
>> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16);
>> ???? }
>
> Whenf MaxVecorSize is set to 16 it has the sideeffect that the 
> TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though 
> the platform has the capability.
>
> Type.cpp:~660
>
> [...]
> >?? if (Matcher::vector_size_supported(T_FLOAT,8)) {
> >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
> >?? }
> [...]
> >?? mreg2type[Op_VecY] = TypeVect::VECTY;
>
>
> In the ad-files feature flags (UseAVX etc.) are used to control what 
> rules should be matched if it has effects on specific vector 
> registers. Here we have a mismatch.
>
> On a platform that supports AVX2 but have MaxVectorSize limited to 16, 
> the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is 
> uninitialized. We will also hit asserts in a few places like: 
> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), 
> "sanity");
>
> Shouldn't the type initialization in type.cpp be dependent on feature 
> flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector 
> registers are initialized if the platform supports them, but they 
> might not be used if MaxVectorSize is limited.)
>
> This is a patch that solves the problem, but I have not convinced 
> myself that it is the right way:
> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>
> Feedback appreciated,
>
> Regards,
> Nils Eliasson
>
>
>
>
>
> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171018/5eb002d7/attachment-0001.html>

From vladimir.kozlov at oracle.com  Wed Oct 18 17:53:23 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 18 Oct 2017 10:53:23 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
 <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
 <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>
 <c2575056-a088-a159-e0f3-854a3b1d429f@oracle.com>
 <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com>
Message-ID: <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com>

New code is good I think.

Thanks,
Vladimir

On 10/17/17 6:36 PM, dean.long at oracle.com wrote:
> On 10/17/17 3:30 PM, Vladimir Kozlov wrote:
> 
>>
>> On 10/17/17 1:41 PM, dean.long at oracle.com wrote:
>>> Comment below...
>>>
>>>
>>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote:
>>>>>> ?Why removing !result->is_anonymous() check is not enough?:
>>>>>>
>>>>>> ?if (InstanceKlass::should_store_fingerprint()) {
>>>>>> result->store_fingerprint(stream->compute_fingerprint());
>>>>>>
>>>>>
>>>>> Because InstanceKlass::should_store_fingerprint() will return false 
>>>>> for an anonymous class.
>>>>
>>>> should_store_fingerprint() only checks flags. Do you mean it to 
>>>> return 'true' during execution too for anonymous classes? But next 
>>>> code will recalculate fingerprint for all classes!!! when you need 
>>>> compute only for anonymous:
>>>>
>>>> +? if (result->has_stored_fingerprint()) {
>>>> + result->store_fingerprint(stream->compute_fingerprint());
>>>> ?? }
>>>>
>>>
>>> It should be for anonymous only (in AOT mode), unless I'm missing 
>>> something:
>>>
>>> 1982 bool InstanceKlass::has_stored_fingerprint() const {
>>> 1983 #if INCLUDE_AOT
>>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared();
>>
>> I mean should_store_fingerprint() will return true for all klasses in 
>> CDS too. So you recalculating them.
>>
> 
> I see what you mean now.? New webrev:
> 
> http://cr.openjdk.java.net/~dlong/8132547//hs.3/
> 
> dl
> 
>> Vladimir
>>
>>> 1985 #else
>>> 1986?? return false;
>>> 1987 #endif
>>> 1988 }
>>>
>>> 1960 bool InstanceKlass::should_store_fingerprint(bool is_anonymous) 
>>> { [...]1971 if (UseAOT && is_anonymous) {
>>> 1972 // (3) We are using AOT code from a shared library and see an 
>>> anonymous class
>>> 1973 return true;
>>> 1974 } dl
>>>
>>>> Thanks,
>>>> Vladimir
>>>
> 

From vladimir.kozlov at oracle.com  Wed Oct 18 18:21:55 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 18 Oct 2017 11:21:55 -0700
Subject: RFR(XS): 8188223: IfNode::range_check_trap_proj() should handler
 dying subgraph with single if proj
In-Reply-To: <dk6vajcr1er.fsf@rwestrel.remote.csb>
References: <dk6mv59ydpe.fsf@rwestrel.remote.csb>
 <b3e1f4bf-049c-da07-a8d2-ee958e2661c8@oracle.com>
 <dk6vajcr1er.fsf@rwestrel.remote.csb>
Message-ID: <bdeaf980-09f9-0c24-6e38-835e6f86fa65@oracle.com>

Good.

Thanks,
Vladimir

On 10/18/17 7:11 AM, Roland Westrelin wrote:
> 
> Thanks for the review, Vladimir. I followed your suggestion. Here is a
> ready to push changeset:
> 
> http://cr.openjdk.java.net/~roland/8188223/8188223.patch
> 
> Roland.
> 

From dean.long at oracle.com  Thu Oct 19 01:55:13 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 18 Oct 2017 18:55:13 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
 <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
 <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>
 <c2575056-a088-a159-e0f3-854a3b1d429f@oracle.com>
 <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com>
 <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com>
Message-ID: <b0f1ab78-0a11-3394-13b2-95813de3074c@oracle.com>

Thanks Vladimir.

dl


On 10/18/17 10:53 AM, Vladimir Kozlov wrote:
> New code is good I think.
>
> Thanks,
> Vladimir
>
> On 10/17/17 6:36 PM, dean.long at oracle.com wrote:
>> On 10/17/17 3:30 PM, Vladimir Kozlov wrote:
>>
>>>
>>> On 10/17/17 1:41 PM, dean.long at oracle.com wrote:
>>>> Comment below...
>>>>
>>>>
>>>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote:
>>>>>>> ?Why removing !result->is_anonymous() check is not enough?:
>>>>>>>
>>>>>>> ?if (InstanceKlass::should_store_fingerprint()) {
>>>>>>> result->store_fingerprint(stream->compute_fingerprint());
>>>>>>>
>>>>>>
>>>>>> Because InstanceKlass::should_store_fingerprint() will return 
>>>>>> false for an anonymous class.
>>>>>
>>>>> should_store_fingerprint() only checks flags. Do you mean it to 
>>>>> return 'true' during execution too for anonymous classes? But next 
>>>>> code will recalculate fingerprint for all classes!!! when you need 
>>>>> compute only for anonymous:
>>>>>
>>>>> +? if (result->has_stored_fingerprint()) {
>>>>> + result->store_fingerprint(stream->compute_fingerprint());
>>>>> ?? }
>>>>>
>>>>
>>>> It should be for anonymous only (in AOT mode), unless I'm missing 
>>>> something:
>>>>
>>>> 1982 bool InstanceKlass::has_stored_fingerprint() const {
>>>> 1983 #if INCLUDE_AOT
>>>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared();
>>>
>>> I mean should_store_fingerprint() will return true for all klasses 
>>> in CDS too. So you recalculating them.
>>>
>>
>> I see what you mean now.? New webrev:
>>
>> http://cr.openjdk.java.net/~dlong/8132547//hs.3/
>>
>> dl
>>
>>> Vladimir
>>>
>>>> 1985 #else
>>>> 1986?? return false;
>>>> 1987 #endif
>>>> 1988 }
>>>>
>>>> 1960 bool InstanceKlass::should_store_fingerprint(bool 
>>>> is_anonymous) { [...]1971 if (UseAOT && is_anonymous) {
>>>> 1972 // (3) We are using AOT code from a shared library and see an 
>>>> anonymous class
>>>> 1973 return true;
>>>> 1974 } dl
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>
>>


From dean.long at oracle.com  Thu Oct 19 03:19:13 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 18 Oct 2017 20:19:13 -0700
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco,
 handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure
 with C1
In-Reply-To: <dk6shegr175.fsf@rwestrel.remote.csb>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
 <dk6zi97w0z6.fsf@rwestrel.remote.csb> <dk6shegr175.fsf@rwestrel.remote.csb>
Message-ID: <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>

Yes, but I'm not a Reviewer.

dl


On 10/18/17 7:16 AM, Roland Westrelin wrote:
> Here is an updated webrev with Dean's suggestion:
>
> http://cr.openjdk.java.net/~roland/8188151/webrev.01/
>
> Can this be considered reviewed by you, Dean?
>
> Roland.


From lutz.schmidt at sap.com  Thu Oct 19 08:10:33 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Thu, 19 Oct 2017 08:10:33 +0000
Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK
 instruction
Message-ID: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com>

Dear all,

I would like to request reviews for this s390-only bug fix:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8189616
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189616.00/index.html

STCK is an ancient instruction to store a CPU timer value. It guarantees strict monotonicity of the stored values across all CPUs in a system. The inherent synchronization has a performance impact which becomes ?considerable? (according to IBM specialists) with the recently announced processor generation (z14).

This change removes the STCK instruction from s390 platform code. The intent is to prevent inadvertent use of the instruction.

Thank you!
Lutz


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171019/3f3615cc/attachment.html>

From rwestrel at redhat.com  Thu Oct 19 09:21:34 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 19 Oct 2017 11:21:34 +0200
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i),
 scope_depth)->pco() == handler_pcos->at(i))" failure with C1
In-Reply-To: <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
 <dk6zi97w0z6.fsf@rwestrel.remote.csb> <dk6shegr175.fsf@rwestrel.remote.csb>
 <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>
Message-ID: <dk68tg7qyqp.fsf@rwestrel.remote.csb>


> Yes, but I'm not a Reviewer.

Thanks for the review! Anyone for another review?

Roland.

From goetz.lindenmaier at sap.com  Thu Oct 19 11:04:58 2017
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 19 Oct 2017 11:04:58 +0000
Subject: FW: 8188131: [PPC] Increase inlining thresholds to the same as other
 platforms
In-Reply-To: <a6c5dddc898345319b4f2faee9cabae5@sap.com>
References: <OF026D4A74.0E535495-ON492581AA.00245DF9-492581AA.0024C9CA@notes.na.collabserv.com>
 <OFBB2A1A8A.33229D26-ON002581BD.003AEE77@LocalDomain>
 <OF8485784D.DED3C7C4-ON492581BE.002406F5-492581BE.0024EEE3@notes.na.collabserv.com>
 <a6c5dddc898345319b4f2faee9cabae5@sap.com>
Message-ID: <68cb012b5aa2478aba35be73b91b5995@sap.com>

Resending this to hotspot-compiler-dev, which is proper list for this. 

Best regards,
  Goetz.

-----Original Message-----
From: Lindenmaier, Goetz 
Sent: Donnerstag, 19. Oktober 2017 13:03
To: 'Kazunori Ogata' <OGATAK at jp.ibm.com>; Doerr, Martin <martin.doerr at sap.com>
Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other platforms

Hi Kazunori, 

To me, this seems to be a very large increase.
Considering that not only the required code cache size but also the 
compiler cpu time will increase in this magnitude, this seems to be 
a rather risky step that should be tested for its benefits on systems
that are highly contended.

In this case, you probably had enough space in the code cache so that
no recompilation etc. happened. 

To further look at this I could think of 
1. finding the minimal code cache size with the old flags where 
   the JIT is not disabled
2. finding the same size for the new flag settings 
   --> How much more is needed for the new settings?

Then you should compare the performance with the bigger 
code cache size for both, and see whether there still is performance
improvement, or whether it's eaten up by more compile time. 
I.e. you should have a setup where compiler threads and application
threads compete for the available CPUs.

What do you think?

Best regards,
  Goetz.

> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> Behalf Of Kazunori Ogata
> Sent: Donnerstag, 19. Oktober 2017 08:43
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other
> platforms
> 
> Hi Martin,
> 
> Thank you for your comment.  I checked the code cache size by running
> SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB).
> 
> The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb
> (+12%).  Is the increase too large?
> 
> 
> The raw output of -XX:+PrintCodeCache are:
> 
> === Original ===
> CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb
> max_used=13884Kb free=638595Kb
>  bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000]
> CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb
> max_used=26593Kb
> free=625886Kb
>  bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000]
> CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb
> free=4254Kb
>  bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000]
>  total_blobs=16606 nmethods=10265 adapters=653
>  compilation: enabled
> 
> 
> === Modified (webrev.00) ===
> CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb
> max_used=18516Kb free=633964Kb
>  bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000]
> CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb
> max_used=26963Kb
> free=625516Kb
>  bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000]
> CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb
> free=4232Kb
>  bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000]
>  total_blobs=16561 nmethods=10295 adapters=653
>  compilation: enabled
> 
> 
> Regards,
> Ogata
> 
> 
> 
> 
> From:   "Doerr, Martin" <martin.doerr at sap.com>
> To:     Kazunori Ogata <OGATAK at jp.ibm.com>, "hotspot-
> dev at openjdk.java.net"
> <hotspot-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net"
> <ppc-aix-port-dev at openjdk.java.net>
> Date:   2017/10/18 19:43
> Subject:        RE: 8188131: [PPC] Increase inlining thresholds to the
> same as other platforms
> 
> 
> 
> Hi Ogata,
> 
> sorry for the delay. I had missed this one.
> 
> The change looks feasible to me.
> 
> It may only impact the utilization of the Code Cache. Can you evaluate
> that (e.g. by running large benchmarks with -XX:+PrintCodeCache)?
> 
> Thanks and best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> Behalf
> Of Kazunori Ogata
> Sent: Freitag, 29. September 2017 08:42
> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as
> other platforms
> 
> Hi all,
> 
> Please review a change for JDK-8188131.
> 
> Bug report:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__bugs.openjdk.java.net_browse_JDK-
> 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD
> 73lAZxkNhGsrlDkk-
> YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e=
> 
> Webrev:
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__cr.openjdk.java.net_-
> 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD
> 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB-
> i9r6lTggpGH3Np8kmONkkMAg&e=
> 
> 
> This change increases the default values of FreqInlineSize and
> InlineSmallCode in ppc64 to 325 and 2500, respectively.  These values are
> the same as aarch64.  The performance of TPC-DS Q96 was improved by
> about
> 6% with this change.
> 
> 
> Regards,
> Ogata
> 
> 
> 


From dean.long at oracle.com  Fri Oct 20 06:01:58 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Thu, 19 Oct 2017 23:01:58 -0700
Subject: [10] RFR(L): 8132547: [AOT] support invokedynamic instructions
In-Reply-To: <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com>
References: <5bd196a3-93f8-b59d-86d8-a87e0cfc8179@oracle.com>
 <84f1072e-8b36-c82a-2939-de2b13405a8e@oracle.com>
 <640fb281-c554-694b-29a3-9c038283db75@oracle.com>
 <f3df7da5-cf9b-ded6-188c-6001013f71cb@oracle.com>
 <EAF34978-1BB1-4178-87D2-40C597E1F7F6@oracle.com>
 <ed9b147c-a20b-efda-3784-a7e85fcbe8a7@oracle.com>
 <55C88A5F-9997-4DDB-9AF7-960BAF05AF96@oracle.com>
 <a96e9e34-8d3b-69f8-a695-3b3dcce4a4e6@oracle.com>
 <32d687f7-2498-f814-3733-9d12c5a6838f@oracle.com>
 <d286897b-da2d-106f-ce31-467afb315e64@oracle.com>
 <50119a3c-9806-7d22-fcb5-236a0e7f995e@oracle.com>
 <6e2fcfd2-207e-f3e8-1fa3-84bbd3170d2b@oracle.com>
 <526576e3-a44d-333b-84ff-8bb14f3baf1d@oracle.com>
 <c2575056-a088-a159-e0f3-854a3b1d429f@oracle.com>
 <5a7dbd2f-dd4c-f39a-1fd9-92a85b83aa4d@oracle.com>
 <760e8235-6e06-9e50-15b3-ae374f1c115d@oracle.com>
Message-ID: <649e1071-8d59-f56e-d69b-8327667be7ed@oracle.com>

Sorry, I need to make one additional change:


diff -r 578d216b57ad src/hotspot/share/jvmci/compilerRuntime.cpp
--- a/src/hotspot/share/jvmci/compilerRuntime.cpp??? Thu Oct 19 19:23:48 
2017 -0700
+++ b/src/hotspot/share/jvmci/compilerRuntime.cpp??? Thu Oct 19 22:59:49 
2017 -0700
@@ -24,7 +24,9 @@
 ?#include "precompiled.hpp"
 ?#include "classfile/stringTable.hpp"
 ?#include "classfile/symbolTable.hpp"
+#include "interpreter/linkResolver.hpp"
 ?#include "jvmci/compilerRuntime.hpp"
+#include "oops/oop.inline.hpp"
 ?#include "runtime/compilationPolicy.hpp"
 ?#include "runtime/deoptimization.hpp"
 ?#include "runtime/interfaceSupport.hpp"


JPRT caught the missing header files in the open solaris build without 
precompiled headers.

dl


On 10/18/17 10:53 AM, Vladimir Kozlov wrote:
> New code is good I think.
>
> Thanks,
> Vladimir
>
> On 10/17/17 6:36 PM, dean.long at oracle.com wrote:
>> On 10/17/17 3:30 PM, Vladimir Kozlov wrote:
>>
>>>
>>> On 10/17/17 1:41 PM, dean.long at oracle.com wrote:
>>>> Comment below...
>>>>
>>>>
>>>> On 10/17/17 11:15 AM, Vladimir Kozlov wrote:
>>>>>>> ?Why removing !result->is_anonymous() check is not enough?:
>>>>>>>
>>>>>>> ?if (InstanceKlass::should_store_fingerprint()) {
>>>>>>> result->store_fingerprint(stream->compute_fingerprint());
>>>>>>>
>>>>>>
>>>>>> Because InstanceKlass::should_store_fingerprint() will return 
>>>>>> false for an anonymous class.
>>>>>
>>>>> should_store_fingerprint() only checks flags. Do you mean it to 
>>>>> return 'true' during execution too for anonymous classes? But next 
>>>>> code will recalculate fingerprint for all classes!!! when you need 
>>>>> compute only for anonymous:
>>>>>
>>>>> +? if (result->has_stored_fingerprint()) {
>>>>> + result->store_fingerprint(stream->compute_fingerprint());
>>>>> ?? }
>>>>>
>>>>
>>>> It should be for anonymous only (in AOT mode), unless I'm missing 
>>>> something:
>>>>
>>>> 1982 bool InstanceKlass::has_stored_fingerprint() const {
>>>> 1983 #if INCLUDE_AOT
>>>> 1984 return should_store_fingerprint(is_anonymous()) || is_shared();
>>>
>>> I mean should_store_fingerprint() will return true for all klasses 
>>> in CDS too. So you recalculating them.
>>>
>>
>> I see what you mean now.? New webrev:
>>
>> http://cr.openjdk.java.net/~dlong/8132547//hs.3/
>>
>> dl
>>
>>> Vladimir
>>>
>>>> 1985 #else
>>>> 1986?? return false;
>>>> 1987 #endif
>>>> 1988 }
>>>>
>>>> 1960 bool InstanceKlass::should_store_fingerprint(bool 
>>>> is_anonymous) { [...]1971 if (UseAOT && is_anonymous) {
>>>> 1972 // (3) We are using AOT code from a shared library and see an 
>>>> anonymous class
>>>> 1973 return true;
>>>> 1974 } dl
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>
>>


From tobias.hartmann at oracle.com  Fri Oct 20 08:04:04 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 20 Oct 2017 10:04:04 +0200
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
Message-ID: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8188785
http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/

Since 8186777 [1], we require two loads to retrieve the java mirror from a klass oop:

LoadP(LoadP(AddP(klass_oop, java_mirror_offset)))

The problem is that now the type of the outermost LoadP does not depend on the inner LoadP (which has a raw pointer 
type) but on the type of the AddP which is one level up. CPP only propagates the types downwards to the direct users and 
as a result, the mirror LoadP ends up with an incorrect (too narrow/optimistic) type.

I've verified the fix with the failing test and also verified that 8188835 [2] is a duplicate.

Gory details:
During CCP, we compute the type of a Phi that merges oops of type A and B where B is a subtype of A. Since the type of 
the A input was not computed yet (it was initialized to TOP at the beginning of CCP), the Phi temporarily ends up with 
type B (i.e. with a type that is too narrow/optimistic). This type is propagated downwards and is being used to optimize 
a java mirror load from the klass oop:

LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi)))))))

The mirror load is then folded to TypeInstPtr::make(B) which is not correct because the oop can be of type A at runtime.

Thanks,
Tobias

[1] https://bugs.openjdk.java.net/browse/JDK-8186777
[2] https://bugs.openjdk.java.net/browse/JDK-8188835

From vladimir.kozlov at oracle.com  Fri Oct 20 16:36:29 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Oct 2017 09:36:29 -0700
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
Message-ID: <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>

Hmm. Is this only LoadP or general problem?

May be add code to next lines when m->is_AddP() :

1734         if (m->bottom_type() != type(m)) { // If not already 
bottomed out
1735           worklist.push(m);     // Propagate change to user

I think we should do similar to PhaseIterGVN::add_users_to_worklist().

Thanks,
Vladimir

On 10/20/17 1:04 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8188785
> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/
> 
> Since 8186777 [1], we require two loads to retrieve the java mirror from 
> a klass oop:
> 
> LoadP(LoadP(AddP(klass_oop, java_mirror_offset)))
> 
> The problem is that now the type of the outermost LoadP does not depend 
> on the inner LoadP (which has a raw pointer type) but on the type of the 
> AddP which is one level up. CPP only propagates the types downwards to 
> the direct users and as a result, the mirror LoadP ends up with an 
> incorrect (too narrow/optimistic) type.
> 
> I've verified the fix with the failing test and also verified that 
> 8188835 [2] is a duplicate.
> 
> Gory details:
> During CCP, we compute the type of a Phi that merges oops of type A and 
> B where B is a subtype of A. Since the type of the A input was not 
> computed yet (it was initialized to TOP at the beginning of CCP), the 
> Phi temporarily ends up with type B (i.e. with a type that is too 
> narrow/optimistic). This type is propagated downwards and is being used 
> to optimize a java mirror load from the klass oop:
> 
> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi)))))))
> 
> The mirror load is then folded to TypeInstPtr::make(B) which is not 
> correct because the oop can be of type A at runtime.
> 
> Thanks,
> Tobias
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8186777
> [2] https://bugs.openjdk.java.net/browse/JDK-8188835

From vladimir.kozlov at oracle.com  Fri Oct 20 16:43:22 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 20 Oct 2017 09:43:22 -0700
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
 <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
Message-ID: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>

On 10/20/17 9:36 AM, Vladimir Kozlov wrote:
> Hmm. Is this only LoadP or general problem?
> 
> May be add code to next lines when m->is_AddP() :
> 
> 1734???????? if (m->bottom_type() != type(m)) { // If not already 
> bottomed out
> 1735?????????? worklist.push(m);???? // Propagate change to user
> 
> I think we should do similar to PhaseIterGVN::add_users_to_worklist().

Hmm, PhaseIterGVN::add_users_to_worklist() is not good example - it only 
puts near loads/stores. Should we fix it too?

Do we have other cases when we calculate type based not on immediate 
inputs but their inputs?

Thanks,
Vladimir

> 
> Thanks,
> Vladimir
> 
> On 10/20/17 1:04 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8188785
>> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/
>>
>> Since 8186777 [1], we require two loads to retrieve the java mirror 
>> from a klass oop:
>>
>> LoadP(LoadP(AddP(klass_oop, java_mirror_offset)))
>>
>> The problem is that now the type of the outermost LoadP does not 
>> depend on the inner LoadP (which has a raw pointer type) but on the 
>> type of the AddP which is one level up. CPP only propagates the types 
>> downwards to the direct users and as a result, the mirror LoadP ends 
>> up with an incorrect (too narrow/optimistic) type.
>>
>> I've verified the fix with the failing test and also verified that 
>> 8188835 [2] is a duplicate.
>>
>> Gory details:
>> During CCP, we compute the type of a Phi that merges oops of type A 
>> and B where B is a subtype of A. Since the type of the A input was not 
>> computed yet (it was initialized to TOP at the beginning of CCP), the 
>> Phi temporarily ends up with type B (i.e. with a type that is too 
>> narrow/optimistic). This type is propagated downwards and is being 
>> used to optimize a java mirror load from the klass oop:
>>
>> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi)))))))
>>
>> The mirror load is then folded to TypeInstPtr::make(B) which is not 
>> correct because the oop can be of type A at runtime.
>>
>> Thanks,
>> Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8186777
>> [2] https://bugs.openjdk.java.net/browse/JDK-8188835

From dmitry.chuyko at bell-sw.com  Fri Oct 20 17:45:47 2017
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Fri, 20 Oct 2017 20:45:47 +0300
Subject: [10] RFR (S): 8189177 - AARCH64: Improve _updateBytesCRC32C intrinsic
Message-ID: <18544a56-5885-784f-b448-7f412861d916@bell-sw.com>

Hello,

Please review an improvement of CRC32C calculation on AArch64. It is 
done pretty similar to a change for JDK-8189176 described in [1].

MacroAssembler::kernel_crc32c gets unused table registers. They can be 
used to make neighbor loads and CRC calculations independent. Adding 
prologue and epilogue for main by-64 loop makes it applicable starting 
from len=128 so additional by-32 loop is added for smaller lengths.

rfe: https://bugs.openjdk.java.net/browse/JDK-8189177
webrev: http://cr.openjdk.java.net/~dchuyko/8189177/webrev.00/
benchmark: 
http://cr.openjdk.java.net/~dchuyko/8189177/crc32c/CRC32CBench.java

Results for T88 and A53 [2] are similar to CRC32 change (good), but 
again splitting pair loads may slow down other CPUs so measurements on 
different HW are welcome.

-Dmitry

[1] 
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-October/027225.html
[2] 
https://bugs.openjdk.java.net/browse/JDK-8189177?focusedCommentId=14124535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14124535


From riasat.abir at gmail.com  Fri Oct 20 21:35:04 2017
From: riasat.abir at gmail.com (Riasat Abir)
Date: Fri, 20 Oct 2017 14:35:04 -0700
Subject: Jdk random crashes
Message-ID: <CACddvw_KY1JXjXs4TPhCY6qaD4T1nkt4ZV2QKLhf4_spSUemdA@mail.gmail.com>

I can't figure out the problem, on this system jdk is crashing randomly.
Attached 3 different logs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171020/570735b0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid3437.log
Type: text/x-log
Size: 58961 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171020/570735b0/hs_err_pid3437-0001.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid6466.log
Type: text/x-log
Size: 57579 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171020/570735b0/hs_err_pid6466-0001.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid6471.log
Type: text/x-log
Size: 58731 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171020/570735b0/hs_err_pid6471-0001.log>

From aph at redhat.com  Sat Oct 21 08:30:14 2017
From: aph at redhat.com (Andrew Haley)
Date: Sat, 21 Oct 2017 09:30:14 +0100
Subject: Jdk random crashes
In-Reply-To: <CACddvw_KY1JXjXs4TPhCY6qaD4T1nkt4ZV2QKLhf4_spSUemdA@mail.gmail.com>
References: <CACddvw_KY1JXjXs4TPhCY6qaD4T1nkt4ZV2QKLhf4_spSUemdA@mail.gmail.com>
Message-ID: <8d5124bb-7c73-eab8-30fd-cb131ac62f26@redhat.com>

On 20/10/17 22:35, Riasat Abir wrote:
> I can't figure out the problem, on this system jdk is crashing randomly.
> Attached 3 different logs.

It's very hard to say.  But we can't really diagnose anything because this
isn't OpenJDK: it's the Oracle proprietary JDK.  It's also out of date.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From kirk.pepperdine at gmail.com  Sat Oct 21 09:01:14 2017
From: kirk.pepperdine at gmail.com (Kirk Pepperdine)
Date: Sat, 21 Oct 2017 11:01:14 +0200
Subject: Jdk random crashes
In-Reply-To: <CACddvw_KY1JXjXs4TPhCY6qaD4T1nkt4ZV2QKLhf4_spSUemdA@mail.gmail.com>
References: <CACddvw_KY1JXjXs4TPhCY6qaD4T1nkt4ZV2QKLhf4_spSUemdA@mail.gmail.com>
Message-ID: <E64C733D-50A2-4F2B-AD0D-C90BE0BA69C0@gmail.com>

You may want to look at the bug database as these all appear to be internal errors. Moving to a newer version of the JDK may fix them but then it may not.

Can you confirm that you?re environment is not corrupted in any way?

Kind regards,
Kirk Pepperdine

> On Oct 20, 2017, at 11:35 PM, Riasat Abir <riasat.abir at gmail.com> wrote:
> 
> I can't figure out the problem, on this system jdk is crashing randomly.
> Attached 3 different logs.
> <hs_err_pid3437.log><hs_err_pid6466.log><hs_err_pid6471.log>


From tobias.hartmann at oracle.com  Mon Oct 23 08:04:06 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 23 Oct 2017 10:04:06 +0200
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
 <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
 <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>
Message-ID: <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com>

Hi Vladimir,

thanks for the review!

On 20.10.2017 18:43, Vladimir Kozlov wrote:
> On 10/20/17 9:36 AM, Vladimir Kozlov wrote:
>> Hmm. Is this only LoadP or general problem?

This is a general problem with nodes that compute their type not based on immediate inputs.

>> May be add code to next lines when m->is_AddP() :
>>
>> 1734???????? if (m->bottom_type() != type(m)) { // If not already bottomed out
>> 1735?????????? worklist.push(m);???? // Propagate change to user

Where should I add that code exactly? My fix already checks for "ut != type(u)".

>> I think we should do similar to PhaseIterGVN::add_users_to_worklist().
> 
> Hmm, PhaseIterGVN::add_users_to_worklist() is not good example - it only puts near loads/stores. Should we fix it too?

Yes, I think it makes sense to update add_users_to_worklist() as well:
http://cr.openjdk.java.net/~thartmann/8188785/webrev.01/

> Do we have other cases when we calculate type based not on immediate inputs but their inputs?

Yes, see code right above my changes:
   // CmpU nodes can get their type information from two nodes up in the
   // graph (instead of from the nodes immediately above). Make sure they
   // are added to the worklist if nodes they depend on are updated, since
   // they could be missed and get wrong types otherwise.

http://hg.openjdk.java.net/jdk10/hs/file/6126617b8508/src/hotspot/share/opto/phaseX.cpp#l1738

The same goes for CallNodes and counted loop exit conditions (see surrounding code).

I'm not aware of any other cases.

Thanks,
Tobias

>> On 10/20/17 1:04 AM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch:
>>> https://bugs.openjdk.java.net/browse/JDK-8188785
>>> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/
>>>
>>> Since 8186777 [1], we require two loads to retrieve the java mirror from a klass oop:
>>>
>>> LoadP(LoadP(AddP(klass_oop, java_mirror_offset)))
>>>
>>> The problem is that now the type of the outermost LoadP does not depend on the inner LoadP (which has a raw pointer 
>>> type) but on the type of the AddP which is one level up. CPP only propagates the types downwards to the direct users 
>>> and as a result, the mirror LoadP ends up with an incorrect (too narrow/optimistic) type.
>>>
>>> I've verified the fix with the failing test and also verified that 8188835 [2] is a duplicate.
>>>
>>> Gory details:
>>> During CCP, we compute the type of a Phi that merges oops of type A and B where B is a subtype of A. Since the type 
>>> of the A input was not computed yet (it was initialized to TOP at the beginning of CCP), the Phi temporarily ends up 
>>> with type B (i.e. with a type that is too narrow/optimistic). This type is propagated downwards and is being used to 
>>> optimize a java mirror load from the klass oop:
>>>
>>> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi)))))))
>>>
>>> The mirror load is then folded to TypeInstPtr::make(B) which is not correct because the oop can be of type A at runtime.
>>>
>>> Thanks,
>>> Tobias
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8186777
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8188835

From martin.doerr at sap.com  Mon Oct 23 08:36:01 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 23 Oct 2017 08:36:01 +0000
Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK
 instruction
In-Reply-To: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com>
References: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com>
Message-ID: <1f7f471e58414a78a375e321dba08f2a@sap.com>

Hi Lutz,

looks good. I think there?s no reason for using stck since we have stckf, so I?m ok with it.
Thanks for removing the z900 code. We only support z10 and newer.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Donnerstag, 19. Oktober 2017 10:11
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction

Dear all,

I would like to request reviews for this s390-only bug fix:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8189616
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189616.00/index.html

STCK is an ancient instruction to store a CPU timer value. It guarantees strict monotonicity of the stored values across all CPUs in a system. The inherent synchronization has a performance impact which becomes ?considerable? (according to IBM specialists) with the recently announced processor generation (z14).

This change removes the STCK instruction from s390 platform code. The intent is to prevent inadvertent use of the instruction.

Thank you!
Lutz


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171023/c3af7bc5/attachment.html>

From lutz.schmidt at sap.com  Mon Oct 23 09:06:09 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Mon, 23 Oct 2017 09:06:09 +0000
Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK
 instruction
In-Reply-To: <1f7f471e58414a78a375e321dba08f2a@sap.com>
References: <224FFCB0-0B4A-4B08-8842-03A6004F9193@sap.com>
 <1f7f471e58414a78a375e321dba08f2a@sap.com>
Message-ID: <021930F1-D5CD-403D-917D-3B5793F9B7C9@sap.com>

Hi Martin,
thank you for your review!
Regards, Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834


On 23.10.2017, 10:36, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hi Lutz,

looks good. I think there?s no reason for using stck since we have stckf, so I?m ok with it.
Thanks for removing the z900 code. We only support z10 and newer.

Best regards,
Martin


From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Donnerstag, 19. Oktober 2017 10:11
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(S): 8189616: [s390] Remove definition and all uses of STCK instruction

Dear all,

I would like to request reviews for this s390-only bug fix:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8189616
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189616.00/index.html

STCK is an ancient instruction to store a CPU timer value. It guarantees strict monotonicity of the stored values across all CPUs in a system. The inherent synchronization has a performance impact which becomes ?considerable? (according to IBM specialists) with the recently announced processor generation (z14).

This change removes the STCK instruction from s390 platform code. The intent is to prevent inadvertent use of the instruction.

Thank you!
Lutz


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171023/986ffc28/attachment-0001.html>

From nils.eliasson at oracle.com  Mon Oct 23 14:16:35 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Mon, 23 Oct 2017 16:16:35 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <dk660blvlhk.fsf@rwestrel.remote.csb>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com>
 <dk660blvlhk.fsf@rwestrel.remote.csb>
Message-ID: <63b109f1-48af-f594-588b-519364ad931f@oracle.com>

Hi Roland,

Sorry for the delay.

First - It's a very impressive work you have done!

Currently your patch doesn't apply cleanly. The fix of JDK-8189067 
changes loopopts.cpp.

I have run your code (based on jdk10 before JDK-8189067) through 
testing. I encountered a minor build problem on solaris_x64 (patch 
below), otherwise it was stable with no encountered test failures.

I have also run performance testing with the conclusion that no 
significant regression can be seen. In some benchmarks like 
scimark.sparse.large that has a known safepointing issue 
(https://bugs.openjdk.java.net/browse/JDK-8177704), very good results 
can be seen.

scimark.sparse.large using G1:
-XX:-UseCountedLoopSafepoints (default) ~86 ops/m
-XX:+UseCountedLoopSafepoints  ~106 ops/m
-XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m

The positive results leads us to the conclusion that we would like 
UseCountedLoopSafepoints to bedefault true, and LoopStripMiningIter 
default to 1000.

c2_globals.hpp:

-  product(bool, UseCountedLoopSafepoints, false,
+  product(bool, UseCountedLoopSafepoints, true,

-  product(uintx, LoopStripMiningIter, 0,
+  product(uintx, LoopStripMiningIter, 1000,

solaris_x64 complained about type conversion:

src/hotspot/share/opto/loopopts.cpp:
@@ -1729,7 +1729,7 @@
      Node* l = cl->outer_loop();
      Node* tail = cl->outer_loop_tail();
      IfNode* le = cl->outer_loop_end();
-    Node* sfpt = cl->outer_safepoint();
+    Node* sfpt = (Node*) cl->outer_safepoint();

src/hotspot/share/opto/opaquenode.cpp
@@ -144,7 +144,7 @@
    assert(iter_estimate > 0, "broken");
    if ((jlong)scaled_iters != scaled_iters_long || iter_estimate <= 
short_scaled_iters) {
      // Remove outer loop and safepoint (too few iterations)
-    Node* outer_sfpt = inner_cl->outer_safepoint();
+    Node* outer_sfpt = (Node*) inner_cl->outer_safepoint();

In the TraceLoopOpts print out I suggest changing space to underscore to 
conform with how the other print outs look:

"PreMainPost Loop: N153/N130 limit_check predicated counted [0,int),+1 
(26 iters) has_sfpt strip mined"

loopnode.cpp:1867
- tty->print(" strip mined");
+ tty->print(" strip_mined");


When your patch is updated, I will do some additional functional 
testing. Also, a second reviewer is required.


Best regards,

Nils Eliasson


On 2017-10-11 15:53, Roland Westrelin wrote:
>> I have started reviewing and testing I will sponsor your change when the
>> full review is completed.
> Thanks!
>
> Roland.


From jcbeyler at google.com  Mon Oct 23 15:27:50 2017
From: jcbeyler at google.com (JC Beyler)
Date: Mon, 23 Oct 2017 08:27:50 -0700
Subject: Low-Overhead Heap Profiling
In-Reply-To: <CAF9BGBzgdHoaW2XsDmmvbmfPbDRKP9uwX+VLcMg37rXYxSnC8w@mail.gmail.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <CAF9BGByisZchXTsw1=mLETLnOggJShRJBDxFeOUgt+criTTGuQ@mail.gmail.com>
 <1497366226.2829.109.camel@oracle.com>
 <CAF9BGBx8376y_yeMFnMC8PHSqKDaFc5NcbFerQ2up0YVKtSKGg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
 <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
 <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>
 <CAF9BGBzgdHoaW2XsDmmvbmfPbDRKP9uwX+VLcMg37rXYxSnC8w@mail.gmail.com>
Message-ID: <CAF9BGBxQgVT5QkdBMwets0E+7BAmchCDWRqY2b04k06RFY=Xdw@mail.gmail.com>

Dear all,

Small update this week with this new webrev:
  - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/
  - Incremental is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/

I patched the code changes showed by Robbin last week and I refactored
collectedHeap.cpp:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/share/gc/shared/collectedHeap.cpp.patch

The original code became a bit too complex in my opinion with the
handle_heap_sampling handling too many things. So I subdivided the logic
into two smaller methods and moved out a bit of the logic to make it more
clear. Hopefully it is :)

Let me know if you have any questions/comments :)
Jc

On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler <jcbeyler at google.com> wrote:

> Hi Robbin,
>
> That is because version 11 to 12 was only a test change. I was going to
> write about it and say here are the webrev links:
> Incremental:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
>
> Full webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/
>
> This change focused only on refactoring the tests to be more manageable,
> readable, maintainable. As all tests are looking at allocations, I moved
> common code to a java class:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
> test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
> HeapMonitor.java.patch
>
> And then most tests call into that class to turn on/off the sampling,
> allocate, etc. This has removed almost 500 lines of test code so I'm happy
> about that.
>
> Thanks for your changes, a bit of relics of previous versions :). I've
> already integrated them into my code and will make a new webrev end of this
> week with a bit of refactor of the code handling the tlab slow path. I find
> it could use a bit of refactoring to make it easier to follow so I'm going
> to take a stab at it this week.
>
> Any other issues/comments?
>
> Thanks!
> Jc
>
>
> On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>
>> Hi JC,
>>
>> I saw a webrev.12 in the directory, with only test changes(11->12), so I
>> took that version.
>> I had a look and tested the tests, worked fine!
>>
>> First glance at the code (looking at full v12) some minor things below,
>> mostly unused stuff.
>>
>> Thanks, Robbin
>>
>> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
>> --- a/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
>> 16:54:06 2017 +0200
>> +++ b/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
>> 17:42:42 2017 +0200
>> @@ -211,2 +211,3 @@
>>    void initialize(int max_storage) {
>> +    // validate max_storage to sane value ? What would 0 mean ?
>>      MutexLocker mu(HeapMonitor_lock);
>> @@ -227,8 +228,4 @@
>>    bool initialized() { return _initialized; }
>> -  volatile bool *initialized_address() { return &_initialized; }
>>
>>   private:
>> -  // Protects the traces currently sampled (below).
>> -  volatile intptr_t _stack_storage_lock[1];
>> -
>>    // The traces currently sampled.
>> @@ -313,3 +310,2 @@
>>    _initialized(false) {
>> -    _stack_storage_lock[0] = 0;
>>  }
>> @@ -532,13 +528,2 @@
>>
>> -// Delegate the initialization question to the underlying storage system.
>> -bool HeapMonitoring::initialized() {
>> -  return StackTraceStorage::storage()->initialized();
>> -}
>> -
>> -// Delegate the initialization question to the underlying storage system.
>> -bool *HeapMonitoring::initialized_address() {
>> -  return
>> -      const_cast<bool*>(StackTraceStorage::storage()->initialized_
>> address());
>> -}
>> -
>>  void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
>> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
>> --- a/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
>> 16:54:06 2017 +0200
>> +++ b/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
>> 17:42:42 2017 +0200
>> @@ -35,3 +35,2 @@
>>    static uint64_t _rnd;
>> -  static bool _initialized;
>>    static jint _monitoring_rate;
>> @@ -92,7 +91,2 @@
>>
>> -  // Is the profiler initialized and where is the address to the
>> initialized
>> -  // boolean.
>> -  static bool initialized();
>> -  static bool *initialized_address();
>> -
>>    // Called when o is to be sampled from a given thread and a given size.
>>
>>
>>
>> On 10/10/2017 12:57 AM, JC Beyler wrote:
>>
>>> Dear all,
>>>
>>> Thread-safety is back!! Here is the update webrev:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
>>>
>>> Full webrev is here:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
>>>
>>> In order to really test this, I needed to add this so thought now was a
>>> good time. It required a few changes here for the creation to ensure
>>> correctness and safety. Now we keep the static pointer but clear the data
>>> internally so on re-initialize, it will be a bit more costly than before. I
>>> don't think this is a huge use-case so I did not think it was a problem. I
>>> used the internal MutexLocker, I think I used it well, let me know.
>>>
>>> I also added three tests:
>>>
>>> 1) Stack depth test:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>>> eapMonitorStackDepthTest.java.patch
>>>
>>> This test shows that the maximum stack depth system is working.
>>>
>>> 2) Thread safety:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>>> eapMonitorThreadTest.java.patch
>>>
>>> The test creates 24 threads and they all allocate at the same time. The
>>> test then checks it does find samples from all the threads.
>>>
>>> 3) Thread on/off safety
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>>> eapMonitorThreadOnOffTest.java.patch
>>>
>>> The test creates 24 threads that all allocate a bunch of memory. Then
>>> another thread turns the sampling on/off.
>>>
>>> Btw, both tests 2 & 3 failed without the locks.
>>>
>>> As I worked on this, I saw a lot of places where the tests are doing
>>> very similar things, I'm going to clean up the code a bit and make a
>>> HeapAllocator class that all tests can call directly. This will greatly
>>> simplify the code.
>>>
>>> Thanks for any comments/criticisms!
>>> Jc
>>>
>>>
>>> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbeyler at google.com <mailto:
>>> jcbeyler at google.com>> wrote:
>>>
>>>     Dear all,
>>>
>>>     Small update to the webrev:
>>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
>>>
>>>     Full webrev is here:
>>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
>>>
>>>     I updated a bit of the naming, removed a TODO comment, and I added a
>>> test for testing the sampling rate. I also updated the maximum stack depth
>>> to 1024, there is no
>>>     reason to keep it so small. I did a micro benchmark that tests the
>>> overhead and it seems relatively the same.
>>>
>>>     I compared allocations from a stack depth of 10 and allocations from
>>> a stack depth of 1024 (allocations are from the same helper method in
>>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi
>>> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/
>>> MyPackage/HeapMonitorStatRateTest.java
>>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_f
>>> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor
>>> /MyPackage/HeapMonitorStatRateTest.java>):
>>>                - For an array of 1 integer allocated in a loop; stack
>>> depth 1024 vs stack depth 10: 1% slower
>>>                - For an array of 200k integers allocated in a loop;
>>> stack depth 1024 vs stack depth 10: 3% slower
>>>
>>>     So basically now moving the maximum stack depth to 1024 but we only
>>> copy over the stack depths actually used.
>>>
>>>     For the next webrev, I will be adding a stack depth test to show
>>> that it works and probably put back the mutex locking so that we can see
>>> how difficult it is to keep
>>>     thread safe.
>>>
>>>     Let me know what you think!
>>>     Jc
>>>
>>>
>>>
>>>     On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com
>>> <mailto:jcbeyler at google.com>> wrote:
>>>
>>>         Forgot to say that for my numbers:
>>>           - Not in the test are the actual numbers I got for the various
>>> array sizes, I ran the program 30 times and parsed the output; here are the
>>> averages and standard
>>>         deviation:
>>>                1000:     1.28% average; 1.13% standard deviation
>>>                10000:    1.59% average; 1.25% standard deviation
>>>                100000:   1.26% average; 1.26% standard deviation
>>>
>>>         The 1000/10000/100000 are the sizes of the arrays being
>>> allocated. These are allocated 100k times and the sampling rate is 111
>>> times the size of the array.
>>>
>>>         Thanks!
>>>         Jc
>>>
>>>
>>>         On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbeyler at google.com
>>> <mailto:jcbeyler at google.com>> wrote:
>>>
>>>             Hi all,
>>>
>>>             After a bit of a break, I am back working on this :). As
>>> before, here are two webrevs:
>>>
>>>             - Full change set: http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.09/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.09/>
>>>             - Compared to version 8: http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.08_09/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.08_09/>
>>>                  (This version is compared to version 8 I last showed
>>> but ported to the new folder hierarchy)
>>>
>>>             In this version I have:
>>>                - Handled Thomas' comments from his email of 07/03:
>>>                     - Merged the logging to be standard
>>>                     - Fixed up the code a bit where asked
>>>                     - Added some notes about the code not being
>>> thread-safe yet
>>>                 - Removed additional dead code from the version that
>>> modifies interpreter/c1/c2
>>>                 - Fixed compiler issues so that it compiles with
>>> --disable-precompiled-header
>>>                      - Tested with ./configure --with-boot-jdk=<jdk8>
>>> --with-debug-level=slowdebug --disable-precompiled-headers
>>>
>>>             Additionally, I added a test to check the sanity of the
>>> sampler: HeapMonitorStatCorrectnessTest
>>>             (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>>> HeapMonitorStatCorrectnessTest.java.patch <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
>>> >)
>>>                 - This allocates a number of arrays and checks that we
>>> obtain the number of samples we want with an accepted error of 5%. I tested
>>> it 100 times and it
>>>             passed everytime, I can test more if wanted
>>>                 - Not in the test are the actual numbers I got for the
>>> various array sizes, I ran the program 30 times and parsed the output; here
>>> are the averages and
>>>             standard deviation:
>>>                    1000:     1.28% average; 1.13% standard deviation
>>>                    10000:    1.59% average; 1.25% standard deviation
>>>                    100000:   1.26% average; 1.26% standard deviation
>>>
>>>             What this means is that we were always at about 1~2% of the
>>> number of samples the test expected.
>>>
>>>             Let me know what you think,
>>>             Jc
>>>
>>>             On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <
>>> jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>>
>>>                 Hi all,
>>>
>>>                 I apologize, I have not yet handled your remarks but
>>> thought this new webrev would also be useful to see and comment on perhaps.
>>>
>>>                 Here is the latest webrev, it is generated slightly
>>> different than the others since now I'm using webrev.ksh without the -N
>>> option:
>>>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
>>>
>>>                 And the webrev.07 to webrev.08 diff is here:
>>>                 http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07_08/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07_08/>
>>>
>>>                 (Let me know if it works well)
>>>
>>>                 It's a small change between versions but it:
>>>                    - provides a fix that makes the average sample rate
>>> correct (more on that below).
>>>                    - fixes the code to actually have it play nicely with
>>> the fast tlab refill
>>>                    - cleaned up a bit the JVMTI text and now use
>>> jvmtiFrameInfo
>>>                 - moved the capability to be onload solo
>>>
>>>                 With this webrev, I've done a small study of the random
>>> number generator we use here for the sampling rate. I took a small program
>>> and it can be simplified to:
>>>
>>>                 for (outer loop)
>>>                 for (inner loop)
>>>                 int[] tmp = new int[arraySize];
>>>
>>>                 - I've fixed the outer and inner loops to being 800 for
>>> this experiment, meaning we allocate 640000 times an array of a given array
>>> size.
>>>
>>>                 - Each program provides the average sample size used for
>>> the whole execution
>>>
>>>                 - Then, I ran each variation 30 times and then
>>> calculated the average of the average sample size used for various array
>>> sizes. I selected the array size to
>>>                 be one of the following: 1, 10, 100, 1000.
>>>
>>>                 - When compared to 512kb, the average sample size of 30
>>> runs:
>>>                 1: 4.62% of error
>>>                 10: 3.09% of error
>>>                 100: 0.36% of error
>>>                 1000: 0.1% of error
>>>                 10000: 0.03% of error
>>>
>>>                 What it shows is that, depending on the number of
>>> samples, the average does become better. This is because with an allocation
>>> of 1 element per array, it
>>>                 will take longer to hit one of the thresholds. This is
>>> seen by looking at the sample count statistic I put in. For the same number
>>> of iterations (800 *
>>>                 800), the different array sizes provoke:
>>>                 1: 62 samples
>>>                 10: 125 samples
>>>                 100: 788 samples
>>>                 1000: 6166 samples
>>>                 10000: 57721 samples
>>>
>>>                 And of course, the more samples you have, the more
>>> sample rates you pick, which means that your average gets closer using that
>>> math.
>>>
>>>                 Thanks,
>>>                 Jc
>>>
>>>                 On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <
>>> jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>>
>>>                     Thanks Robbin,
>>>
>>>                     This seems to have worked. When I have the next
>>> webrev ready, we will find out but I'm fairly confident it will work!
>>>
>>>                     Thanks agian!
>>>                     Jc
>>>
>>>                     On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <
>>> robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>> wrote:
>>>
>>>                         Hi JC,
>>>
>>>                         On 06/29/2017 12:15 AM, JC Beyler wrote:
>>>
>>>                             B) Incremental changes
>>>
>>>
>>>                         I guess the most common work flow here is using
>>> mq :
>>>                         hg qnew fix_v1
>>>                         edit files
>>>                         hg qrefresh
>>>                         hg qnew fix_v2
>>>                         edit files
>>>                         hg qrefresh
>>>
>>>                         if you do hg log you will see 2 commits
>>>
>>>                         webrev.ksh -r -2 -o my_inc_v1_v2
>>>                         webrev.ksh -o my_full_v2
>>>
>>>
>>>                         In  your .hgrc you might need:
>>>                         [extensions]
>>>                         mq =
>>>
>>>                         /Robbin
>>>
>>>
>>>                             Again another newbiew question here...
>>>
>>>                             For showing the incremental changes, is
>>> there a link that explains how to do that? I apologize for my newbie
>>> questions all the time :)
>>>
>>>                             Right now, I do:
>>>
>>>                                ksh ../webrev.ksh -m -N
>>>
>>>                             That generates a webrev.zip and send it to
>>> Chuck Rasbold. He then uploads it to a new webrev.
>>>
>>>                             I tried commiting my change and adding a
>>> small change. Then if I just do ksh ../webrev.ksh without any options, it
>>> seems to produce a similar
>>>                             page but now with only the changes I had (so
>>> the 06-07 comparison you were talking about) and a changeset that has it
>>> all. I imagine that is
>>>                             what you meant.
>>>
>>>                             Which means that my workflow would become:
>>>
>>>                             1) Make changes
>>>                             2) Make a webrev without any options to show
>>> just the differences with the tip
>>>                             3) Amend my changes to my local commit so
>>> that I have it done with
>>>                             4) Go to 1
>>>
>>>                             Does that seem correct to you?
>>>
>>>                             Note that when I do this, I only see the
>>> full change of a file in the full change set (Side note here: now the page
>>> says change set and not
>>>                             patch, which is maybe why Serguei was having
>>> issues?).
>>>
>>>                             Thanks!
>>>                             Jc
>>>
>>>
>>>
>>>                             On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <
>>> robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com> <mailto:
>>> robbin.ehn at oracle.com
>>>                             <mailto:robbin.ehn at oracle.com>>> wrote:
>>>
>>>                                  Hi,
>>>
>>>                                  On 06/28/2017 12:04 AM, JC Beyler wrote:
>>>
>>>                                      Dear Thomas et al,
>>>
>>>                                      Here is the newest webrev:
>>>                             http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/>>
>>>
>>>
>>>
>>>                                  You have some more bits to in there but
>>> generally this looks good and really nice with more tests.
>>>                                  I'll do and deep dive and re-test this
>>> when I get back from my long vacation with whatever patch version you have
>>> then.
>>>
>>>                                  Also I think it's time you provide
>>> incremental (v06->07 changes) as well as complete change-sets.
>>>
>>>                                  Thanks, Robbin
>>>
>>>
>>>
>>>
>>>                                      Thomas, I "think" I have answered
>>> all your remarks. The summary is:
>>>
>>>                                      - The statistic system is up and
>>> provides insight on what the heap sampler is doing
>>>                                           - I've noticed that, though
>>> the sampling rate is at the right mean, we are missing some samples, I have
>>> not yet tracked out why
>>>                             (details below)
>>>
>>>                                      - I've run a tiny benchmark that is
>>> the worse case: it is a very tight loop and allocated a small array
>>>                                           - In this case, I see no
>>> overhead when the system is off so that is a good start :)
>>>                                           - I see right now a high
>>> overhead in this case when sampling is on. This is not a really too
>>> surprising but I'm going to see if
>>>                             this is consistent with our
>>>                                      internal implementation. The
>>> benchmark is really allocation stressful so I'm not too surprised but I
>>> want to do the due diligence.
>>>
>>>                                         - The statistic system up is up
>>> and I have a new test
>>>                             http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito
>>> r/MyPackage/HeapMonitorStatTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorStatTest.java.patch>
>>>                                      <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorStatTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorStatTest.java.patch>>
>>>                                            - I did a bit of a study
>>> about the random generator here, more details are below but basically it
>>> seems to work well
>>>
>>>                                         - I added a capability but since
>>> this is the first time doing this, I was not sure I did it right
>>>                                           - I did add a test though for
>>> it and the test seems to do what I expect (all methods are failing with the
>>>                             JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>>                                               -
>>>                             http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito
>>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>>>                                                                 <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>>> bilityTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
>>>
>>>                                         - I still need to figure out
>>> what to do about the multi-agent vs single-agent issue
>>>
>>>                                         - As far as measurements, it
>>> seems I still need to look at:
>>>                                           - Why we do the 20 random
>>> calls first, are they necessary?
>>>                                           - Look at the mean of the
>>> sampling rate that the random generator does and also what is actually
>>> sampled
>>>                                           - What is the overhead in
>>> terms of memory/performance when on?
>>>
>>>                                      I have inlined my answers, I think
>>> I got them all in the new webrev, let me know your thoughts.
>>>
>>>                                      Thanks again!
>>>                                      Jc
>>>
>>>
>>>                                      On Fri, Jun 23, 2017 at 3:52 AM,
>>> Thomas Schatzl <thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.
>>> com>
>>>                             <mailto:thomas.schatzl at oracle.com <mailto:
>>> thomas.schatzl at oracle.com>> <mailto:thomas.schatzl at oracle.com <mailto:
>>> thomas.schatzl at oracle.com>
>>>
>>>                                      <mailto:thomas.schatzl at oracle.com
>>> <mailto:thomas.schatzl at oracle.com>>>> wrote:
>>>
>>>                                           Hi,
>>>
>>>                                           On Wed, 2017-06-21 at 13:45
>>> -0700, JC Beyler wrote:
>>>                                           > Hi all,
>>>                                           >
>>>                                           > First off: Thanks again to
>>> Robbin and Thomas for their reviews :)
>>>                                           >
>>>                                           > Next, I've uploaded a new
>>> webrev:
>>>                                           >
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/>>
>>>                                      <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/>>>
>>>
>>>                                           >
>>>                                           > Here is an update:
>>>                                           >
>>>                                           > - @Robbin, I forgot to say
>>> that yes I need to look at implementing
>>>                                           > this for the other
>>> architectures and testing it before it is all
>>>                                           > ready to go. Is it common to
>>> have it working on all possible
>>>                                           > combinations or is there a
>>> subset that I should be doing first and we
>>>                                           > can do the others later?
>>>                                           > - I've tested slowdebug,
>>> built and ran the JTreg tests I wrote with
>>>                                           > slowdebug and fixed a few
>>> more issues
>>>                                           > - I've refactored a bit of
>>> the code following Thomas' comments
>>>                                           >    - I think I've handled
>>> all the comments from Thomas (I put
>>>                                           > comments inline below for
>>> the specifics)
>>>
>>>                                           Thanks for handling all those.
>>>
>>>                                           > - Following Thomas' comments
>>> on statistics, I want to add some
>>>                                           > quality assurance tests and
>>> find that the easiest way would be to
>>>                                           > have a few counters of what
>>> is happening in the sampler and expose
>>>                                           > that to the user.
>>>                                           >    - I'll be adding that in
>>> the next version if no one sees any
>>>                                           > objections to that.
>>>                                           >    - This will allow me to
>>> add a sanity test in JTreg about number of
>>>                                           > samples and average of
>>> sampling rate
>>>                                           >
>>>                                           > @Thomas: I had a few
>>> questions that I inlined below but I will
>>>                                           > summarize the "bigger ones"
>>> here:
>>>                                           >    - You mentioned constants
>>> are not using the right conventions, I
>>>                                           > looked around and didn't see
>>> any convention except normal naming then
>>>                                           > for static constants. Is
>>> that right?
>>>
>>>                                           I looked through
>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>>>                             <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui>>
>>>                                      <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui>
>>>                             <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui>>>
>>>                                           de and the rule is to "follow
>>> an existing pattern and must have a
>>>                                           distinct appearance from other
>>> names". Which does not help a lot I
>>>                                           guess :/ The GC team started
>>> using upper camel case, e.g.
>>>                                           SomeOtherConstant, but very
>>> likely this is probably not applied
>>>                                           consistently throughout. So I
>>> am fine with not adding another style
>>>                                           (like kMaxStackDepth with the
>>> "k" in front with some unknown meaning)
>>>                                           is fine.
>>>
>>>                                           (Chances are you will find
>>> that style somewhere used anyway too,
>>>                                           apologies if so :/)
>>>
>>>
>>>                                      Thanks for that link, now I know
>>> where to look. I used the upper camel case in my code as well then :) I
>>> should have gotten them all.
>>>
>>>
>>>                                            > PS: I've also inlined my
>>> answers to Thomas below:
>>>                                            >
>>>                                            > On Tue, Jun 13, 2017 at
>>> 8:03 AM, Thomas Schatzl <thomas.schatzl at oracl
>>>                                            > e.com <http://e.com> <
>>> http://e.com> <http://e.com>> wrote:
>>>                                            > > Hi all,
>>>                                            > >
>>>                                            > > On Mon, 2017-06-12 at
>>> 11:11 -0700, JC Beyler wrote:
>>>                                            > > > Dear all,
>>>                                            > > >
>>>                                            > > > I've continued working
>>> on this and have done the following
>>>                                            > > webrev:
>>>                                            > > >
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/>>
>>>                                      <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/>>>
>>>
>>>                                            > >
>>>                                            > > [...]
>>>                                            > > > Things I still need to
>>> do:
>>>                                            > > >    - Have to fix that
>>> TLAB case for the FastTLABRefill
>>>                                            > > >    - Have to start
>>> looking at the data to see that it is
>>>                                            > > consistent and does
>>> gather the right samples, right frequency, etc.
>>>                                            > > >    - Have to check the
>>> GC elements and what that produces
>>>                                            > > >    - Run a slowdebug
>>> run and ensure I fixed all those issues you
>>>                                            > > saw > Robbin
>>>                                            > > >
>>>                                            > > > Thanks for looking at
>>> the webrev and have a great week!
>>>                                            > >
>>>                                            > >   scratching a bit on the
>>> surface of this change, so apologies for
>>>                                            > > rather shallow comments:
>>>                                            > >
>>>                                            > > -
>>> macroAssembler_x86.cpp:5604: while this is compiler code, and I
>>>                                            > > am not sure this is
>>> final, please avoid littering the code with
>>>                                            > > TODO remarks :) They tend
>>> to be candidates for later wtf moments
>>>                                            > > only.
>>>                                            > >
>>>                                            > > Just file a CR for that.
>>>                                            > >
>>>                                            > Newcomer question: what is
>>> a CR and not sure I have the rights to do
>>>                                            > that yet ? :)
>>>
>>>                                           Apologies. CR is a change
>>> request, this suggests to file a bug in the
>>>                                           bug tracker. And you are
>>> right, you can't just create a new account in
>>>                                           the OpenJDK JIRA yourselves. :(
>>>
>>>
>>>                                      Ok good to know, I'll continue with
>>> my own todo list but I'll work hard on not letting it slip in the webrevs
>>> anymore :)
>>>
>>>
>>>                                           I was mostly referring to the
>>> "... but it is a TODO" part of that
>>>                                           comment in
>>> macroassembler_x86.cpp. Comments about the why of the code
>>>                                           are appreciated.
>>>
>>>                                           [Note that I now understand
>>> that this is to some degree still work in
>>>                                           progress. As long as the final
>>> changeset does no contain TODO's I am
>>>                                           fine (and it's not a hard
>>> objection, rather their use in "final" code
>>>                                           is typically limited in my
>>> experience)]
>>>
>>>                                           5603   // Currently, if this
>>> happens, just set back the actual end to
>>>                                           where it was.
>>>                                           5604   // We miss a chance to
>>> sample here.
>>>
>>>                                           Would be okay, if explaining
>>> "this" and the "why" of missing a chance
>>>                                           to sample here would be best.
>>>
>>>                                           Like maybe:
>>>
>>>                                           // If we needed to refill
>>> TLABs, just set the actual end point to
>>>                                           // the end of the TLAB again.
>>> We do not sample here although we could.
>>>
>>>                                      Done with your comment, it works
>>> well in my mind.
>>>
>>>                                           I am not sure whether "miss a
>>> chance to sample" meant "we could, but
>>>                                           consciously don't because it's
>>> not that useful" or "it would be
>>>                                           necessary but don't because
>>> it's too complicated to do.".
>>>
>>>                                           Looking at the original
>>> comment once more, I am also not sure if that
>>>                                           comment shouldn't referring to
>>> the "end" variable (not actual_end)
>>>                                           because that's the variable
>>> that is responsible for taking the sampling
>>>                                           path? (Going from the member
>>> description of ThreadLocalAllocBuffer).
>>>
>>>
>>>                                      I've moved this code and it no
>>> longer shows up here but the rationale and answer was:
>>>
>>>                                      So.. Yes, end is the variable
>>> provoking the sampling. Actual end is the actual end of the TLAB.
>>>
>>>                                      What was happening here is that the
>>> code is resetting _end to point towards the end of the new TLAB. Because,
>>> we now have the end for
>>>                             sampling and _actual_end for
>>>                                      the actual end, we need to update
>>> the actual_end as well.
>>>
>>>                                      Normally, were we to do the real
>>> work here, we would calculate the (end - start) offset, then do:
>>>
>>>                                      - Set the new end to : start +
>>> (old_end - old_start)
>>>                                      - Set the actual end like we do
>>> here now where it because it is the actual end.
>>>
>>>                                      Why is this not done here now
>>> anymore?
>>>                                          - I was still debating which
>>> path to take:
>>>                                             - Do it in the fast refill
>>> code, it has its perks:
>>>                                                 - In a world where fast
>>> refills are happening all the time or a lot, we can augment there the code
>>> to do the sampling
>>>                                             - Remember what we had as an
>>> end before leaving the slowpath and check on return
>>>                                                 - This is what I'm doing
>>> now, it removes the need to go fix up all fast refill paths but if you
>>> remain in fast refill paths,
>>>                             you won't get sampling. I
>>>                                      have to think of the consequences
>>> of that, maybe a future change later on?
>>>                                                    - I have the
>>> statistics now so I'm going to study that
>>>                                                       -> By the way,
>>> though my statistics are showing I'm missing some samples, if I turn off
>>> FastTlabRefill, it is the same
>>>                             loss so for now, it seems
>>>                                      this does not occur in my simple
>>> test.
>>>
>>>
>>>
>>>                                           But maybe I am only confused
>>> and it's best to just leave the comment
>>>                                           away. :)
>>>
>>>                                           Thinking about it some more,
>>> doesn't this not-sampling in this case
>>>                                           mean that sampling does not
>>> work in any collector that does inline TLAB
>>>                                           allocation at the moment? (Or
>>> is inline TLAB alloc automatically
>>>                                           disabled with sampling
>>> somehow?)
>>>
>>>                                           That would indeed be a bigger
>>> TODO then :)
>>>
>>>
>>>                                      Agreed, this remark made me think
>>> that perhaps as a first step the new way of doing it is better but I did
>>> have to:
>>>                                         - Remove the const of the
>>> ThreadLocalBuffer remaining and hard_end methods
>>>                                         - Move hard_end out of the
>>> header file to have a bit more logic there
>>>
>>>                                      Please let me know what you think
>>> of that and if you prefer it this way or changing the fast refills. (I
>>> prefer this way now because it
>>>                             is more incremental).
>>>
>>>
>>>                                           > > - calling
>>> HeapMonitoring::do_weak_oops() (which should probably be
>>>                                           > > called weak_oops_do() like
>>> other similar methods) only if string
>>>                                           > > deduplication is enabled
>>> (in g1CollectedHeap.cpp:4511) seems wrong.
>>>                                           >
>>>                                           > The call should be at least
>>> around 6 lines up outside the if.
>>>                                           >
>>>                                           > Preferentially in a method
>>> like process_weak_jni_handles(), including
>>>                                           > additional logging. (No new
>>> (G1) gc phase without minimal logging
>>>                                           > :)).
>>>                                           > Done but really not sure
>>> because:
>>>                                           >
>>>                                           > I put for logging:
>>>                                           >   log_develop_trace(gc,
>>> freelist)("G1ConcRegionFreeing [other] : heap
>>>                                           > monitoring");
>>>
>>>                                           I would think that "gc, ref"
>>> would be more appropriate log tags for
>>>                                           this similar to jni handles.
>>>                                           (I am als not sure what weak
>>> reference handling has to do with
>>>                                           G1ConcRegionFreeing, so I am a
>>> bit puzzled)
>>>
>>>
>>>                                      I was not sure what to put for the
>>> tags or really as the message. I cleaned it up a bit now to:
>>>                                           log_develop_trace(gc,
>>> ref)("HeapSampling [other] : heap monitoring processing");
>>>
>>>
>>>
>>>                                           > Since weak_jni_handles
>>> didn't have logging for me to be inspired
>>>                                           > from, I did that but
>>> unconvinced this is what should be done.
>>>
>>>                                           The JNI handle processing does
>>> have logging, but only in
>>>                                           ReferenceProcessor::process_discovered_references().
>>> In
>>>                                           process_weak_jni_handles()
>>> only overall time is measured (in a G1
>>>                                           specific way, since only G1
>>> supports disabling reference procesing) :/
>>>
>>>                                           The code in ReferenceProcessor
>>> prints both time taken
>>>                                           referenceProcessor.cpp:254, as
>>> well as the count, but strangely only in
>>>                                           debug VMs.
>>>
>>>                                           I have no idea why this
>>> logging is that unimportant to only print that
>>>                                           in a debug VM. However there
>>> are reviews out for changing this area a
>>>                                           bit, so it might be useful to
>>> wait for that (JDK-8173335).
>>>
>>>
>>>                                      I cleaned it up a bit anyway and
>>> now it returns the count of objects that are in the system.
>>>
>>>
>>>                                           > > - the change doubles the
>>> size of
>>>                                           > >
>>> CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>>>                                           > > threshold. Maybe it could
>>> be refactored a bit.
>>>                                           > Done I think, it looks
>>> better to me :).
>>>
>>>                                           In
>>> ThreadLocalAllocBuffer::handle_sample() I think the
>>>                                           set_back_actual_end()/pick_next_sample()
>>> calls could be hoisted out of
>>>                                           the "if" :)
>>>
>>>
>>>                                      Done!
>>>
>>>
>>>                                           > > -
>>> referenceProcessor.cpp:261: the change should add logging about
>>>                                           > > the number of references
>>> encountered, maybe after the corresponding
>>>                                           > > "JNI weak reference count"
>>> log message.
>>>                                           > Just to double check, are
>>> you saying that you'd like to have the heap
>>>                                           > sampler to keep in store how
>>> many sampled objects were encountered in
>>>                                           > the
>>> HeapMonitoring::weak_oops_do?
>>>                                           >    - Would a return of the
>>> method with the number of handled
>>>                                           > references and logging that
>>> work?
>>>
>>>                                           Yes, it's fine if
>>> HeapMonitoring::weak_oops_do() only returned the
>>>                                           number of processed weak oops.
>>>
>>>
>>>                                      Done also (but I admit I have not
>>> tested the output yet) :)
>>>
>>>
>>>                                           >    - Additionally, would you
>>> prefer it in a separate block with its
>>>                                           > GCTraceTime?
>>>
>>>                                           Yes. Both kinds of information
>>> is interesting: while the time taken is
>>>                                           typically more important, the
>>> next question would be why, and the
>>>                                           number of references typically
>>> goes a long way there.
>>>
>>>                                           See above though, it is
>>> probably best to wait a bit.
>>>
>>>
>>>                                      Agreed that I "could" wait but, if
>>> it's ok, I'll just refactor/remove this when we get closer to something
>>> final. Either, JDK-8173335
>>>                                      has gone in and I will notice it
>>> now or it will soon and I can change it then.
>>>
>>>
>>>                                           > > -
>>> threadLocalAllocBuffer.cpp:331: one more "TODO"
>>>                                           > Removed it and added it to
>>> my personal todos to look at.
>>>                                           >      > >
>>>                                           > > -
>>> threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>>>                                           > > documentation should be
>>> updated about the sampling additions. I
>>>                                           > > would have no clue what
>>> the difference between "actual_end" and
>>>                                           > > "end" would be from the
>>> given information.
>>>                                           > If you are talking about the
>>> comments in this file, I made them more
>>>                                           > clear I hope in the new
>>> webrev. If it was somewhere else, let me know
>>>                                           > where to change.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171023/75f197eb/attachment-0001.html>

From vladimir.kozlov at oracle.com  Mon Oct 23 16:19:16 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 23 Oct 2017 09:19:16 -0700
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
 <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
 <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>
 <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com>
Message-ID: <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com>

Hi Tobias

On 10/23/17 1:04 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> thanks for the review!
> 
> On 20.10.2017 18:43, Vladimir Kozlov wrote:
>> On 10/20/17 9:36 AM, Vladimir Kozlov wrote:
>>> Hmm. Is this only LoadP or general problem?
> 
> This is a general problem with nodes that compute their type not based 
> on immediate inputs.

I think we need to file a bug or rfe to fix other cases too.

> 
>>> May be add code to next lines when m->is_AddP() :
>>>
>>> 1734???????? if (m->bottom_type() != type(m)) { // If not already 
>>> bottomed out
>>> 1735?????????? worklist.push(m);???? // Propagate change to user
> 
> Where should I add that code exactly? My fix already checks for "ut != 
> type(u)".

My bad - I forgot that raw LoadP may not change its type but you still 
want to push its users when n (AddP) change its type.

> 
>>> I think we should do similar to PhaseIterGVN::add_users_to_worklist().
>>
>> Hmm, PhaseIterGVN::add_users_to_worklist() is not good example - it 
>> only puts near loads/stores. Should we fix it too?
> 
> Yes, I think it makes sense to update add_users_to_worklist() as well:
> http://cr.openjdk.java.net/~thartmann/8188785/webrev.01/

In add_users_to_worklist() you don't need to check type ut !=
 > type(u) - just push node on worklist.

Also can you remove ut->isa_instptr() check in *both* cases. And use 
u->is_Mem() instead of u->Opcode() == Op_LoadP to cover stores too.
The motivation is that original LoadP is raw as result memory operations 
which use it may look for more precise type of the field somewhere so 
they should be on worklist.

> 
>> Do we have other cases when we calculate type based not on immediate 
>> inputs but their inputs?
> 
> Yes, see code right above my changes:
>  ? // CmpU nodes can get their type information from two nodes up in the
>  ? // graph (instead of from the nodes immediately above). Make sure they
>  ? // are added to the worklist if nodes they depend on are updated, since
>  ? // they could be missed and get wrong types otherwise.
> 
> http://hg.openjdk.java.net/jdk10/hs/file/6126617b8508/src/hotspot/share/opto/phaseX.cpp#l1738 
> 

Okay.

Thanks,
Vladimir

> 
> The same goes for CallNodes and counted loop exit conditions (see 
> surrounding code).
> 
> I'm not aware of any other cases.
> 
> Thanks,
> Tobias
> 
>>> On 10/20/17 1:04 AM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch:
>>>> https://bugs.openjdk.java.net/browse/JDK-8188785
>>>> http://cr.openjdk.java.net/~thartmann/8188785/webrev.00/
>>>>
>>>> Since 8186777 [1], we require two loads to retrieve the java mirror 
>>>> from a klass oop:
>>>>
>>>> LoadP(LoadP(AddP(klass_oop, java_mirror_offset)))
>>>>
>>>> The problem is that now the type of the outermost LoadP does not 
>>>> depend on the inner LoadP (which has a raw pointer type) but on the 
>>>> type of the AddP which is one level up. CPP only propagates the 
>>>> types downwards to the direct users and as a result, the mirror 
>>>> LoadP ends up with an incorrect (too narrow/optimistic) type.
>>>>
>>>> I've verified the fix with the failing test and also verified that 
>>>> 8188835 [2] is a duplicate.
>>>>
>>>> Gory details:
>>>> During CCP, we compute the type of a Phi that merges oops of type A 
>>>> and B where B is a subtype of A. Since the type of the A input was 
>>>> not computed yet (it was initialized to TOP at the beginning of 
>>>> CCP), the Phi temporarily ends up with type B (i.e. with a type that 
>>>> is too narrow/optimistic). This type is propagated downwards and is 
>>>> being used to optimize a java mirror load from the klass oop:
>>>>
>>>> LoadP(LoadP(AddP(DecodeNKlass(LoadNKlass(AddP(CastPP(Phi)))))))
>>>>
>>>> The mirror load is then folded to TypeInstPtr::make(B) which is not 
>>>> correct because the oop can be of type A at runtime.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8186777
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8188835

From dean.long at oracle.com  Tue Oct 24 00:27:45 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 23 Oct 2017 17:27:45 -0700
Subject: RFR(XS): 8189649: AOT: assert(caller_frame.cb()->as_nmethod_or_null()
 == cm) failed: expect top frame nmethod
Message-ID: <8692a329-0811-4a78-3937-8a244863737f@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8189649

http://cr.openjdk.java.net/~dlong/8189649/webrev/


We just need to relax the assert to allow any compiled method.


dl


From vladimir.kozlov at oracle.com  Tue Oct 24 02:22:32 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 23 Oct 2017 19:22:32 -0700
Subject: RFR(XS): 8189649: AOT:
 assert(caller_frame.cb()->as_nmethod_or_null() == cm) failed: expect top
 frame nmethod
In-Reply-To: <8692a329-0811-4a78-3937-8a244863737f@oracle.com>
References: <8692a329-0811-4a78-3937-8a244863737f@oracle.com>
Message-ID: <5f8da8a8-489e-f9b7-388b-dfe61c359413@oracle.com>

Good.

Thanks,
Vladimir

On 10/23/17 5:27 PM, dean.long at oracle.com wrote:
> https://bugs.openjdk.java.net/browse/JDK-8189649
> 
> http://cr.openjdk.java.net/~dlong/8189649/webrev/
> 
> 
> We just need to relax the assert to allow any compiled method.
> 
> 
> dl
> 

From dean.long at oracle.com  Tue Oct 24 04:47:17 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 23 Oct 2017 21:47:17 -0700
Subject: RFR(XS): 8189649: AOT:
 assert(caller_frame.cb()->as_nmethod_or_null() == cm) failed: expect top
 frame nmethod
In-Reply-To: <5f8da8a8-489e-f9b7-388b-dfe61c359413@oracle.com>
References: <8692a329-0811-4a78-3937-8a244863737f@oracle.com>
 <5f8da8a8-489e-f9b7-388b-dfe61c359413@oracle.com>
Message-ID: <53eda7fb-bb1e-e432-31fa-747541d60b95@oracle.com>

Thanks Vladimir.

dl


On 10/23/17 7:22 PM, Vladimir Kozlov wrote:
> Good.
>
> Thanks,
> Vladimir
>
> On 10/23/17 5:27 PM, dean.long at oracle.com wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8189649
>>
>> http://cr.openjdk.java.net/~dlong/8189649/webrev/
>>
>>
>> We just need to relax the assert to allow any compiled method.
>>
>>
>> dl
>>


From goetz.lindenmaier at sap.com  Tue Oct 24 07:24:43 2017
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 24 Oct 2017 07:24:43 +0000
Subject: 8188131: [PPC] Increase inlining thresholds to the same as other
 platforms
In-Reply-To: <OFC7497C0B.2D039295-ON492581BF.00231800-492581BF.0023E21B@notes.na.collabserv.com>
References: <OF026D4A74.0E535495-ON492581AA.00245DF9-492581AA.0024C9CA@notes.na.collabserv.com>
 <OFBB2A1A8A.33229D26-ON002581BD.003AEE77@LocalDomain>
 <OF8485784D.DED3C7C4-ON492581BE.002406F5-492581BE.0024EEE3@notes.na.collabserv.com>
 <OF908158DB.734FD41B-ON002581BE.003CBC21@LocalDomain>
 <OFC7497C0B.2D039295-ON492581BF.00231800-492581BF.0023E21B@notes.na.collabserv.com>
Message-ID: <374596dc72ed4b54bd9e3cc43e221d72@sap.com>

Hi Ogata, 

> It is helpful if you could explain what is the difference of the JIT
> behavior when the code cache is large enough and when it is the minimum
If the code cache is not large enough, code can get evicted and recompiled. 
Then the compiler threads keep concurring for cpu with the application threads, 
assuming the application utilizes all cpus for application threads.
Generating bigger code obviously will bring the application faster into this
situation.

Please, as this is a compiler issue, it should be discussed on hotspot-compiler-dev.

Best regards,
  Goetz.

> -----Original Message-----
> From: Kazunori Ogata [mailto:OGATAK at jp.ibm.com]
> Sent: Freitag, 20. Oktober 2017 08:32
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Cc: hotspot-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>;
> ppc-aix-port-dev at openjdk.java.net
> Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other
> platforms
> 
> Hi Goetz,
> 
> Thank you for your comment.  OK, I'll evaluate the patch more by comparing
> the minimum code cache sizes and the performance on the cache size.
> 
> It is helpful if you could explain what is the difference of the JIT
> behavior when the code cache is large enough and when it is the minimum
> size.  It seems almost the same to me because all the methods that needed
> to be compiled should be compiled in both cases, but I may miss something.
> 
> 
> By the way, the benchmark I confirmed performance improvement was TPC-
> DS
> q96, but I measured the code cache size of SPECjbb2015 by my mistake. I'll
> compare the minimum code cache sizes and the performance of both
> benchmarks, as this patch will affect all applications.
> 
> 
> Regards,
> Ogata
> 
> 
> 
> From:   "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>
> To:     Kazunori Ogata <OGATAK at jp.ibm.com>, "Doerr, Martin"
> <martin.doerr at sap.com>
> Cc:     "ppc-aix-port-dev at openjdk.java.net"
> <ppc-aix-port-dev at openjdk.java.net>, "hotspot-dev at openjdk.java.net"
> <hotspot-dev at openjdk.java.net>
> Date:   2017/10/19 20:03
> Subject:        RE: 8188131: [PPC] Increase inlining thresholds to the
> same as other platforms
> 
> 
> 
> Hi Kazunori,
> 
> To me, this seems to be a very large increase.
> Considering that not only the required code cache size but also the
> compiler cpu time will increase in this magnitude, this seems to be
> a rather risky step that should be tested for its benefits on systems
> that are highly contended.
> 
> In this case, you probably had enough space in the code cache so that
> no recompilation etc. happened.
> 
> To further look at this I could think of
> 1. finding the minimal code cache size with the old flags where
>    the JIT is not disabled
> 2. finding the same size for the new flag settings
>    --> How much more is needed for the new settings?
> 
> Then you should compare the performance with the bigger
> code cache size for both, and see whether there still is performance
> improvement, or whether it's eaten up by more compile time.
> I.e. you should have a setup where compiler threads and application
> threads compete for the available CPUs.
> 
> What do you think?
> 
> Best regards,
>   Goetz.
> 
> > -----Original Message-----
> > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> > Behalf Of Kazunori Ogata
> > Sent: Donnerstag, 19. Oktober 2017 08:43
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as
> other
> > platforms
> >
> > Hi Martin,
> >
> > Thank you for your comment.  I checked the code cache size by running
> > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB).
> >
> > The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb
> > (+12%).  Is the increase too large?
> >
> >
> > The raw output of -XX:+PrintCodeCache are:
> >
> > === Original ===
> > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb
> > max_used=13884Kb free=638595Kb
> >  bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000]
> > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb
> > max_used=26593Kb
> > free=625886Kb
> >  bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000]
> > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb
> > free=4254Kb
> >  bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000]
> >  total_blobs=16606 nmethods=10265 adapters=653
> >  compilation: enabled
> >
> >
> > === Modified (webrev.00) ===
> > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb
> > max_used=18516Kb free=633964Kb
> >  bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000]
> > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb
> > max_used=26963Kb
> > free=625516Kb
> >  bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000]
> > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb
> > free=4232Kb
> >  bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000]
> >  total_blobs=16561 nmethods=10295 adapters=653
> >  compilation: enabled
> >
> >
> > Regards,
> > Ogata
> >
> >
> >
> >
> > From:   "Doerr, Martin" <martin.doerr at sap.com>
> > To:     Kazunori Ogata <OGATAK at jp.ibm.com>, "hotspot-
> > dev at openjdk.java.net"
> > <hotspot-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net"
> > <ppc-aix-port-dev at openjdk.java.net>
> > Date:   2017/10/18 19:43
> > Subject:        RE: 8188131: [PPC] Increase inlining thresholds to the
> > same as other platforms
> >
> >
> >
> > Hi Ogata,
> >
> > sorry for the delay. I had missed this one.
> >
> > The change looks feasible to me.
> >
> > It may only impact the utilization of the Code Cache. Can you evaluate
> > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)?
> >
> > Thanks and best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> > Behalf
> > Of Kazunori Ogata
> > Sent: Freitag, 29. September 2017 08:42
> > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as
> > other platforms
> >
> > Hi all,
> >
> > Please review a change for JDK-8188131.
> >
> > Bug report:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__bugs.openjdk.java.net_browse_JDK-
> > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> >
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD
> > 73lAZxkNhGsrlDkk-
> > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e=
> >
> > Webrev:
> > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__cr.openjdk.java.net_-
> > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
> >
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD
> > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB-
> > i9r6lTggpGH3Np8kmONkkMAg&e=
> >
> >
> > This change increases the default values of FreqInlineSize and
> > InlineSmallCode in ppc64 to 325 and 2500, respectively.  These values
> are
> > the same as aarch64.  The performance of TPC-DS Q96 was improved by
> > about
> > 6% with this change.
> >
> >
> > Regards,
> > Ogata
> >
> >
> >
> 
> 
> 


From tobias.hartmann at oracle.com  Tue Oct 24 07:33:52 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 24 Oct 2017 09:33:52 +0200
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
 <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
 <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>
 <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com>
 <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com>
Message-ID: <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com>

Hi Vladimir,

On 23.10.2017 18:19, Vladimir Kozlov wrote:
> I think we need to file a bug or rfe to fix other cases too.

Okay, it's difficult to file a bug for a not yet known issue so I've filed an RFE to look into this:
https://bugs.openjdk.java.net/browse/JDK-8189856

> In add_users_to_worklist() you don't need to check type ut !=
>  > type(u) - just push node on worklist.

Right, fixed:
http://cr.openjdk.java.net/~thartmann/8188785/webrev.02/

> Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP to 
> cover stores too.
> The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type of 
> the field somewhere so they should be on worklist.

Why is that necessary? If the raw LoadP changes its type, all direct users will be added to the worklist anyway.

The problem in the failing case is that the type of the AddP changed but the type of the raw LoadP didn't (it stays 
raw). However, the InstPtr load depends on the type of the AddP:

   InstPtrLoadP(RawLoadP(AddP(..)))

Do you expect other memory users of the raw LoadP to depend on the type of the AddP? I think we should only add handling 
for known special cases but here's the corresponding webrev:
http://cr.openjdk.java.net/~thartmann/8188785/webrev.03/

Thanks,
Tobias

From rwestrel at redhat.com  Tue Oct 24 08:37:01 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Tue, 24 Oct 2017 10:37:01 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <63b109f1-48af-f594-588b-519364ad931f@oracle.com>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <7d9550ce-7987-8487-d345-4838f5e72cbc@oracle.com>
 <dk660blvlhk.fsf@rwestrel.remote.csb>
 <63b109f1-48af-f594-588b-519364ad931f@oracle.com>
Message-ID: <dk6wp3lndqq.fsf@rwestrel.remote.csb>


Hi Nils,

Thanks for going over the patch and testing it.
Here is an updated webrev:

http://cr.openjdk.java.net/~roland/8186027/webrev.01/

I also made the changes you suggested, except for:

> src/hotspot/share/opto/loopopts.cpp:
> @@ -1729,7 +1729,7 @@
>       Node* l = cl->outer_loop();
>       Node* tail = cl->outer_loop_tail();
>       IfNode* le = cl->outer_loop_end();
> -    Node* sfpt = cl->outer_safepoint();
> +    Node* sfpt = (Node*) cl->outer_safepoint();
>
> src/hotspot/share/opto/opaquenode.cpp
> @@ -144,7 +144,7 @@
>     assert(iter_estimate > 0, "broken");
>     if ((jlong)scaled_iters != scaled_iters_long || iter_estimate <= 
> short_scaled_iters) {
>       // Remove outer loop and safepoint (too few iterations)
> -    Node* outer_sfpt = inner_cl->outer_safepoint();
> +    Node* outer_sfpt = (Node*) inner_cl->outer_safepoint();

for which I used the patch below instead (I ran the build with
precompiled headers disabled to verify that change).

Roland.

diff --git a/src/hotspot/share/opto/loopopts.cpp b/src/hotspot/share/opto/loopopts.cpp
--- a/src/hotspot/share/opto/loopopts.cpp
+++ b/src/hotspot/share/opto/loopopts.cpp
@@ -26,6 +26,7 @@
 #include "memory/allocation.inline.hpp"
 #include "memory/resourceArea.hpp"
 #include "opto/addnode.hpp"
+#include "opto/callnode.hpp"
 #include "opto/castnode.hpp"
 #include "opto/connode.hpp"
 #include "opto/castnode.hpp"
@@ -845,7 +846,6 @@
               assert(n_loop->_parent == outer_loop, "broken loop tree");
             }
 #endif
-            int count = phi->replace_edge(n, n->in(MemNode::Memory));
             assert(count > 0, "inconsistent phi");
 
             // Compute latest point this store can go
diff --git a/src/hotspot/share/opto/opaquenode.cpp b/src/hotspot/share/opto/opaquenode.cpp
--- a/src/hotspot/share/opto/opaquenode.cpp
+++ b/src/hotspot/share/opto/opaquenode.cpp
@@ -24,6 +24,7 @@
 
 #include "precompiled.hpp"
 #include "opto/addnode.hpp"
+#include "opto/callnode.hpp"
 #include "opto/cfgnode.hpp"
 #include "opto/connode.hpp"
 #include "opto/divnode.hpp"


From ionutb83 at yahoo.com  Tue Oct 24 09:05:38 2017
From: ionutb83 at yahoo.com (Ionut)
Date: Tue, 24 Oct 2017 09:05:38 +0000 (UTC)
Subject: Vectorized Loop Unrolling on x64?
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
Message-ID: <1302875736.3225693.1508835938207@mail.yahoo.com>

Hello All,
? ? I want to ask you about?https://bugs.openjdk.java.net/browse/JDK-8129920?- Vectorized loop unrolling?which says it is applicable?only?for x86 targets. Do you plan to port this for x64 as well? Or I miss something here?
RegardsIonut
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/4451623d/attachment.html>

From nils.eliasson at oracle.com  Tue Oct 24 09:06:50 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 24 Oct 2017 11:06:50 +0200
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <1302875736.3225693.1508835938207@mail.yahoo.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
Message-ID: <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>

Hi Ionut,

In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64.

Regards,

Nils Eliasson


On 2017-10-24 11:05, Ionut wrote:
> Hello All,
>
>     I want to ask you about 
> https://bugs.openjdk.java.net/browse/JDK-8129920* - Vectorized loop 
> unrolling *which says it is applicable _only__for x86 targets_. Do you 
> plan to port this for x64 as well? Or I miss something here?
>
> Regards
> Ionut

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/7b343563/attachment.html>

From ionutb83 at yahoo.com  Tue Oct 24 10:23:57 2017
From: ionutb83 at yahoo.com (Ionut)
Date: Tue, 24 Oct 2017 10:23:57 +0000 (UTC)
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
Message-ID: <1933684779.3254078.1508840637072@mail.yahoo.com>

Hi?Nils,
? Thanks, it is clear. However, I have tried a simple example (e.g.??just iterating through an array and do the sum?using JMH) on my x64 Linux and it seems to not be vectorized ...? Below initial source code and assembly.?Could you please provide me any hint, am I doing something wrong?
JDK is 9.0.1
Source code:
@BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })@State(Scope.Benchmark)public class Sum1ToNArray {? ? private int[] array;
? ? public static void main(String[] args) {? ? ? ? Options opt =
? ? ? ? ? ? new OptionsBuilder()? ? ? ? ? ? ? ? .include(Sum1ToNArray.class.getSimpleName())? ? ? ? ? ? ? ? .build();? ? ? ? new Runner(opt).run();? ? }
? ? @Setup(Level.Trial)? ? public void setUp() {? ? ? ? this.array = new int[100_000_000];? ? ? ? for (int i = 0; i < array.length; i++)? ? ? ? ? ? array[i] = i + 1;? ? }
? ? @Benchmark? ? public long hotMethod() {
? ? ? ? long sum = 0;? ? ? ? for (int i = 0; i < array.length; i++) {? ? ? ? ? ? sum += array[i];? ? ? ? }? ? ? ? return sum;? ? }}
Assembly:....[Hottest Region 1]..............................................................................c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10]?11.08%? ? 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14]? 0.30%? ? 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c]? 8.86%? ? 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? [r14+r11*4+0x28]?10.49%? ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24]? 0.38%? ? 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20]? 0.03%? ? 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c]? 0.23%? ? 0.22%? ? ???? 0x00007f7bf1bff13c: add? ? rsi,rdx?10.58%? ?18.59%? ???? 0x00007f7bf1bff13f: add? ? rbp,rsi? 0.32%? ? 0.17%? ? ???? 0x00007f7bf1bff142: add? ? r13,rbp? 0.05%? ? 0.04%? ? ???? 0x00007f7bf1bff145: add? ? rdi,r13?26.10%? ?28.47%? ???? 0x00007f7bf1bff148: add? ? rbx,rdi? 5.55%? ? 5.48%? ? ???? 0x00007f7bf1bff14b: add? ? rcx,rbx? 5.66%? ? 1.32%? ? ???? 0x00007f7bf1bff14e: add? ? r9,rcx? 7.85%? ? 3.11%? ? ???? 0x00007f7bf1bff151: add? ? rax,r9? ? ? ? ? ? ?;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)?10.19%? ? 5.67%? ? ??? 0x00007f7bf1bff154: add? ? r11d,0x8? ? ? ? ?;*iinc {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)? 0.38%? ? 0.12%? ? ???? 0x00007f7bf1bff158: cmp? ? r11d,r8d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 10 (line 52)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? ? ? ? ? ? ; *lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? ? ? ? ? ? ? ;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
Regards 

    On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
 

  Hi Ionut, In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. Regards, Nils Eliasson
  
 On 2017-10-24 11:05, Ionut wrote:
  
 Hello All, 
  ? ? I want to ask you about?https://bugs.openjdk.java.net/browse/JDK-8129920?- Vectorized loop unrolling?which says it is applicable?only?for x86 targets. Do you plan to port this for x64 as well? Or I miss something here? 
  Regards Ionut 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/53cd180a/attachment-0001.html>

From OGATAK at jp.ibm.com  Tue Oct 24 11:11:53 2017
From: OGATAK at jp.ibm.com (Kazunori Ogata)
Date: Tue, 24 Oct 2017 20:11:53 +0900
Subject: 8188131: [PPC] Increase inlining thresholds to the same as other
 platforms
In-Reply-To: <OF05C8DC1D.A87190B2-ON002581C3.00293B70@LocalDomain>
References: <OF026D4A74.0E535495-ON492581AA.00245DF9-492581AA.0024C9CA@notes.na.collabserv.com>
 <OFBB2A1A8A.33229D26-ON002581BD.003AEE77@LocalDomain>
 <OF8485784D.DED3C7C4-ON492581BE.002406F5-492581BE.0024EEE3@notes.na.collabserv.com>
 <OF908158DB.734FD41B-ON002581BE.003CBC21@LocalDomain>
 <OFC7497C0B.2D039295-ON492581BF.00231800-492581BF.0023E21B@notes.na.collabserv.com>
 <OF05C8DC1D.A87190B2-ON002581C3.00293B70@LocalDomain>
Message-ID: <OF23CBBF89.ECDBF476-ON492581C3.003BE630-492581C3.003D84F0@notes.na.collabserv.com>

Hi Goetz,

Thank you for clarification and re-directing discussion to 
hotspot-compiler-dev ML.

I understood the intention of the measurement around the lower bound of 
the code cache size.  I'll post the results when I finish measurements.


Regards,
Ogata


From:   "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>
To:     Kazunori Ogata <OGATAK at jp.ibm.com>, 
"'hotspot-compiler-dev at openjdk.java.net'" 
<hotspot-compiler-dev at openjdk.java.net>
Cc:     "Doerr, Martin" <martin.doerr at sap.com>, 
"ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
Date:   2017/10/24 16:30
Subject:        RE: 8188131: [PPC] Increase inlining thresholds to the 
same as other platforms


Hi Ogata, 

> It is helpful if you could explain what is the difference of the JIT
> behavior when the code cache is large enough and when it is the minimum
If the code cache is not large enough, code can get evicted and 
recompiled. 
Then the compiler threads keep concurring for cpu with the application 
threads, 
assuming the application utilizes all cpus for application threads.
Generating bigger code obviously will bring the application faster into 
this
situation.

Please, as this is a compiler issue, it should be discussed on 
hotspot-compiler-dev.

Best regards,
  Goetz.

> -----Original Message-----
> From: Kazunori Ogata [mailto:OGATAK at jp.ibm.com]
> Sent: Freitag, 20. Oktober 2017 08:32
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Cc: hotspot-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>;
> ppc-aix-port-dev at openjdk.java.net
> Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as 
other
> platforms
> 
> Hi Goetz,
> 
> Thank you for your comment.  OK, I'll evaluate the patch more by 
comparing
> the minimum code cache sizes and the performance on the cache size.
> 
> It is helpful if you could explain what is the difference of the JIT
> behavior when the code cache is large enough and when it is the minimum
> size.  It seems almost the same to me because all the methods that 
needed
> to be compiled should be compiled in both cases, but I may miss 
something.
> 
> 
> By the way, the benchmark I confirmed performance improvement was TPC-
> DS
> q96, but I measured the code cache size of SPECjbb2015 by my mistake. 
I'll
> compare the minimum code cache sizes and the performance of both
> benchmarks, as this patch will affect all applications.
> 
> 
> Regards,
> Ogata
> 
> 
> 
> From:   "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>
> To:     Kazunori Ogata <OGATAK at jp.ibm.com>, "Doerr, Martin"
> <martin.doerr at sap.com>
> Cc:     "ppc-aix-port-dev at openjdk.java.net"
> <ppc-aix-port-dev at openjdk.java.net>, "hotspot-dev at openjdk.java.net"
> <hotspot-dev at openjdk.java.net>
> Date:   2017/10/19 20:03
> Subject:        RE: 8188131: [PPC] Increase inlining thresholds to the
> same as other platforms
> 
> 
> 
> Hi Kazunori,
> 
> To me, this seems to be a very large increase.
> Considering that not only the required code cache size but also the
> compiler cpu time will increase in this magnitude, this seems to be
> a rather risky step that should be tested for its benefits on systems
> that are highly contended.
> 
> In this case, you probably had enough space in the code cache so that
> no recompilation etc. happened.
> 
> To further look at this I could think of
> 1. finding the minimal code cache size with the old flags where
>    the JIT is not disabled
> 2. finding the same size for the new flag settings
>    --> How much more is needed for the new settings?
> 
> Then you should compare the performance with the bigger
> code cache size for both, and see whether there still is performance
> improvement, or whether it's eaten up by more compile time.
> I.e. you should have a setup where compiler threads and application
> threads compete for the available CPUs.
> 
> What do you think?
> 
> Best regards,
>   Goetz.
> 
> > -----Original Message-----
> > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> > Behalf Of Kazunori Ogata
> > Sent: Donnerstag, 19. Oktober 2017 08:43
> > To: Doerr, Martin <martin.doerr at sap.com>
> > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same 
as
> other
> > platforms
> >
> > Hi Martin,
> >
> > Thank you for your comment.  I checked the code cache size by running
> > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 
31GB).
> >
> > The used code cache size was increased by 4.5MB from 41982Kb to 
47006Kb
> > (+12%).  Is the increase too large?
> >
> >
> > The raw output of -XX:+PrintCodeCache are:
> >
> > === Original ===
> > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb
> > max_used=13884Kb free=638595Kb
> >  bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000]
> > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb
> > max_used=26593Kb
> > free=625886Kb
> >  bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000]
> > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb
> > free=4254Kb
> >  bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000]
> >  total_blobs=16606 nmethods=10265 adapters=653
> >  compilation: enabled
> >
> >
> > === Modified (webrev.00) ===
> > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb
> > max_used=18516Kb free=633964Kb
> >  bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000]
> > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb
> > max_used=26963Kb
> > free=625516Kb
> >  bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000]
> > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb
> > free=4232Kb
> >  bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000]
> >  total_blobs=16561 nmethods=10295 adapters=653
> >  compilation: enabled
> >
> >
> > Regards,
> > Ogata
> >
> >
> >
> >
> > From:   "Doerr, Martin" <martin.doerr at sap.com>
> > To:     Kazunori Ogata <OGATAK at jp.ibm.com>, "hotspot-
> > dev at openjdk.java.net"
> > <hotspot-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net"
> > <ppc-aix-port-dev at openjdk.java.net>
> > Date:   2017/10/18 19:43
> > Subject:        RE: 8188131: [PPC] Increase inlining thresholds to the
> > same as other platforms
> >
> >
> >
> > Hi Ogata,
> >
> > sorry for the delay. I had missed this one.
> >
> > The change looks feasible to me.
> >
> > It may only impact the utilization of the Code Cache. Can you evaluate
> > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)?
> >
> > Thanks and best regards,
> > Martin
> >
> >
> > -----Original Message-----
> > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> > Behalf
> > Of Kazunori Ogata
> > Sent: Freitag, 29. September 2017 08:42
> > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same 
as
> > other platforms
> >
> > Hi all,
> >
> > Please review a change for JDK-8188131.
> >
> > Bug report:
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__bugs.openjdk.java.net_browse_JDK-
> > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p-
> >
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD
> > 73lAZxkNhGsrlDkk-
> > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e=
> >
> > Webrev:
> > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__cr.openjdk.java.net_-
> > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-
> siA1ZOg&r=p-
> >
> FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD
> > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB-
> > i9r6lTggpGH3Np8kmONkkMAg&e=
> >
> >
> > This change increases the default values of FreqInlineSize and
> > InlineSmallCode in ppc64 to 325 and 2500, respectively.  These values
> are
> > the same as aarch64.  The performance of TPC-DS Q96 was improved by
> > about
> > 6% with this change.
> >
> >
> > Regards,
> > Ogata
> >
> >
> >
> 
> 
> 


From vladimir.kozlov at oracle.com  Tue Oct 24 16:43:13 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 24 Oct 2017 09:43:13 -0700
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
 <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
 <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>
 <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com>
 <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com>
 <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com>
Message-ID: <66832428-3a6a-232d-9c57-7efd01cc97a2@oracle.com>

On 10/24/17 12:33 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> On 23.10.2017 18:19, Vladimir Kozlov wrote:
>> I think we need to file a bug or rfe to fix other cases too.
> 
> Okay, it's difficult to file a bug for a not yet known issue so I've filed an RFE to look into this:
> https://bugs.openjdk.java.net/browse/JDK-8189856

Good. RFE is fine.

> 
>> In add_users_to_worklist() you don't need to check type ut !=
>> ?> type(u) - just push node on worklist.
> 
> Right, fixed:
> http://cr.openjdk.java.net/~thartmann/8188785/webrev.02/

Okay, you are right, lets use this version for the fix. We can do additional changes for 8189856.

Thanks,
Vladimir

> 
>> Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP to 
>> cover stores too.
>> The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type 
>> of the field somewhere so they should be on worklist.
> 
> Why is that necessary? If the raw LoadP changes its type, all direct users will be added to the worklist anyway.
> 
> The problem in the failing case is that the type of the AddP changed but the type of the raw LoadP didn't (it stays 
> raw). However, the InstPtr load depends on the type of the AddP:
> 
>  ? InstPtrLoadP(RawLoadP(AddP(..)))
> 
> Do you expect other memory users of the raw LoadP to depend on the type of the AddP? I think we should only add handling 
> for known special cases but here's the corresponding webrev:
> http://cr.openjdk.java.net/~thartmann/8188785/webrev.03/
> 
> Thanks,
> Tobias

From ionutb83 at yahoo.com  Tue Oct 24 16:46:17 2017
From: ionutb83 at yahoo.com (Ionut)
Date: Tue, 24 Oct 2017 16:46:17 +0000 (UTC)
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <1933684779.3254078.1508840637072@mail.yahoo.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
 <1933684779.3254078.1508840637072@mail.yahoo.com>
Message-ID: <354890084.3509873.1508863577533@mail.yahoo.com>

Hello All,
? ?Meanwhile I tested two more other scenarios, as follows:
- a[i] = b[i] + c[i]? ? ? ? ? ? ? ? ? ? // where?a, b, c are arrays of ints- a[i] = a[i] + <int_value>? ? ? // where <int_value>might be a constant, etc
In both cases they were vectorized, but my initial example (e.g. iterating through the array of ints and computing the sum of elements) is not ... which makes me think this case is currently not supported by JIT.
Could you please confirm this?
RegardsIonut 

    On Tuesday, October 24, 2017 12:24 PM, Ionut <ionutb83 at yahoo.com> wrote:
 

 Hi?Nils,
? Thanks, it is clear. However, I have tried a simple example (e.g.??just iterating through an array and do the sum?using JMH) on my x64 Linux and it seems to not be vectorized ...? Below initial source code and assembly.?Could you please provide me any hint, am I doing something wrong?
JDK is 9.0.1
Source code:
@BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)@Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })@State(Scope.Benchmark)public class Sum1ToNArray {? ? private int[] array;
? ? public static void main(String[] args) {? ? ? ? Options opt =
? ? ? ? ? ? new OptionsBuilder()? ? ? ? ? ? ? ? .include(Sum1ToNArray.class.getSimpleName())? ? ? ? ? ? ? ? .build();? ? ? ? new Runner(opt).run();? ? }
? ? @Setup(Level.Trial)? ? public void setUp() {? ? ? ? this.array = new int[100_000_000];? ? ? ? for (int i = 0; i < array.length; i++)? ? ? ? ? ? array[i] = i + 1;? ? }
? ? @Benchmark? ? public long hotMethod() {
? ? ? ? long sum = 0;? ? ? ? for (int i = 0; i < array.length; i++) {? ? ? ? ? ? sum += array[i];? ? ? ? }? ? ? ? return sum;? ? }}
Assembly:....[Hottest Region 1]..............................................................................c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10]?11.08%? ? 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14]? 0.30%? ? 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c]? 8.86%? ? 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? [r14+r11*4+0x28]?10.49%? ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24]? 0.38%? ? 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20]? 0.03%? ? 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c]? 0.23%? ? 0.22%? ? ???? 0x00007f7bf1bff13c: add? ? rsi,rdx?10.58%? ?18.59%? ???? 0x00007f7bf1bff13f: add? ? rbp,rsi? 0.32%? ? 0.17%? ? ???? 0x00007f7bf1bff142: add? ? r13,rbp? 0.05%? ? 0.04%? ? ???? 0x00007f7bf1bff145: add? ? rdi,r13?26.10%? ?28.47%? ???? 0x00007f7bf1bff148: add? ? rbx,rdi? 5.55%? ? 5.48%? ? ???? 0x00007f7bf1bff14b: add? ? rcx,rbx? 5.66%? ? 1.32%? ? ???? 0x00007f7bf1bff14e: add? ? r9,rcx? 7.85%? ? 3.11%? ? ???? 0x00007f7bf1bff151: add? ? rax,r9? ? ? ? ? ? ?;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)?10.19%? ? 5.67%? ? ??? 0x00007f7bf1bff154: add? ? r11d,0x8? ? ? ? ?;*iinc {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)? 0.38%? ? 0.12%? ? ???? 0x00007f7bf1bff158: cmp? ? r11d,r8d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 10 (line 52)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? ? ? ? ? ? ; *lload_1 {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10]? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? ? ? ? ? ? ? ;*ladd {reexecute=0 rethrow=0 return_oop=0}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
Regards 

    On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
 

  Hi Ionut, In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64. Regards, Nils Eliasson
  
 On 2017-10-24 11:05, Ionut wrote:
  
 Hello All, 
  ? ? I want to ask you about?https://bugs.openjdk.java.net/browse/JDK-8129920?- Vectorized loop unrolling?which says it is applicable?only?for x86 targets. Do you plan to port this for x64 as well? Or I miss something here? 
  Regards Ionut 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/cdcfaca1/attachment-0001.html>

From tobias.hartmann at oracle.com  Tue Oct 24 16:51:56 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 24 Oct 2017 18:51:56 +0200
Subject: [10] RFR(S): 8188785: CCP sets invalid type for java mirror load
In-Reply-To: <66832428-3a6a-232d-9c57-7efd01cc97a2@oracle.com>
References: <c38111c3-350c-c4b7-d08e-455701d9d52c@oracle.com>
 <bbbc88d9-0126-c0e4-adbf-eaad92eaeecd@oracle.com>
 <5d17a4a6-76b7-3cbc-a91f-a860807f8e11@oracle.com>
 <157b3059-3c93-fe56-e407-beda1e13fb6e@oracle.com>
 <1aacf12c-5f27-694f-151c-df2d699a5e34@oracle.com>
 <364705bf-55af-84de-5cd8-cabe20bb2e57@oracle.com>
 <66832428-3a6a-232d-9c57-7efd01cc97a2@oracle.com>
Message-ID: <b7dfc351-615a-b6e6-b3c8-da168464112e@oracle.com>

Hi Vladimir,

On 24.10.2017 18:43, Vladimir Kozlov wrote:
> Okay, you are right, lets use this version for the fix. We can do additional changes for 8189856.

Okay, thanks for reviewing! I'll push webrev.02.

Best regards,
Tobias
>>> Also can you remove ut->isa_instptr() check in *both* cases. And use u->is_Mem() instead of u->Opcode() == Op_LoadP 
>>> to cover stores too.
>>> The motivation is that original LoadP is raw as result memory operations which use it may look for more precise type 
>>> of the field somewhere so they should be on worklist.
>>
>> Why is that necessary? If the raw LoadP changes its type, all direct users will be added to the worklist anyway.
>>
>> The problem in the failing case is that the type of the AddP changed but the type of the raw LoadP didn't (it stays 
>> raw). However, the InstPtr load depends on the type of the AddP:
>>
>> ?? InstPtrLoadP(RawLoadP(AddP(..)))
>>
>> Do you expect other memory users of the raw LoadP to depend on the type of the AddP? I think we should only add 
>> handling for known special cases but here's the corresponding webrev:
>> http://cr.openjdk.java.net/~thartmann/8188785/webrev.03/
>>
>> Thanks,
>> Tobias

From nils.eliasson at oracle.com  Tue Oct 24 16:59:44 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Tue, 24 Oct 2017 18:59:44 +0200
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <354890084.3509873.1508863577533@mail.yahoo.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
 <1933684779.3254078.1508840637072@mail.yahoo.com>
 <354890084.3509873.1508863577533@mail.yahoo.com>
Message-ID: <5da48501-b7ca-f814-3d19-1d482d9bb337@oracle.com>

Hi,

Array reduction operations is implemented but are disabled in some 
settings.

See excellent blog post by Richard Startin: 
http://richardstartin.uk/tricking-java-into-adding-up-arrays-faster/

https://bugs.openjdk.java.net/browse/JDK-8188313

https://bugs.openjdk.java.net/browse/JDK-8078563

Regards,

Nils Eliasosn


On 2017-10-24 18:46, Ionut wrote:
> Hello All,
>
> ? ?Meanwhile I tested two more other scenarios, as follows:
>
> - a[i] = b[i] + c[i]? ? ? ? ? ? ? ? ? ? // where?a, b, c are arrays of 
> ints
> - a[i] = a[i] + <int_value>? ? ? // where <int_value>might be a 
> constant, etc
>
> In both cases they were vectorized, but my initial example (e.g. 
> iterating through the array of ints and computing the sum of elements) 
> is not ... which makes me think this case is currently not supported 
> by JIT.
>
> Could you please confirm this?
>
> Regards
> Ionut
>
>
> On Tuesday, October 24, 2017 12:24 PM, Ionut <ionutb83 at yahoo.com> wrote:
>
>
> Hi Nils,
>
> Thanks, it is clear. However, I have tried a simple example (e.g. just 
> iterating through an array and do the sum using JMH) on my x64 Linux 
> and it seems to not be vectorized ...? Below initial source code and 
> assembly.
> Could you please provide me any hint, am I doing something wrong?
>
> *JDK is 9.0.1*
>
> *_Source code:_*
>
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)
> @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)
> @Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", 
> "-Xbatch", "-XX:+UseSuperWord" })
> @State(Scope.Benchmark)
> public class _Sum1ToNArray _{
> private int[] array;
>
> ? ? public static void main(String[] args) {
> ? ? ? ? Options opt =
> ? new OptionsBuilder()
> .include(Sum1ToNArray.class.getSimpleName())
> ? ? ? .build();
> new Runner(opt).run();
> ? ? }
>
> @Setup(Level.Trial)
> ? ? public void setUp() {
> this.array = new int[100_000_000];
> for (int i = 0; i < array.length; i++)
> ? array[i] = i + 1;
> ? ? }
>
> @Benchmark
> public long hotMethod() {
> long sum = 0;
> for (int i = 0; i < array.length; i++) {
> ? sum += array[i];
> ? ? ? ? }
> return sum;
> ? ? }
> }
>
> *_Assembly:_*
> ....[Hottest Region 
> 1]..............................................................................
> c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1
> ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114
> ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx
> ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d
> ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? 
> ;*lload_1 {reexecute=0 rethrow=0 return_oop=0}
> ? ? ? ? ? ? ? ? ? ???? ?; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)
> ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR 
> [r14+r11*4+0x10]
> ?11.08% 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR 
> [r14+r11*4+0x14]
> ? 0.30% 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR 
> [r14+r11*4+0x18]
> ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR 
> [r14+r11*4+0x2c]
> ? 8.86% 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? 
> [r14+r11*4+0x28]
> ?10.49% ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR 
> [r14+r11*4+0x24]
> ? 0.38% 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR 
> [r14+r11*4+0x20]
> ? 0.03% 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR 
> [r14+r11*4+0x1c]
> ? 0.23% 0.22%? ? ???? 0x00007f7bf1bff13c: add rsi,rdx
> ?10.58% ?18.59%? ???? 0x00007f7bf1bff13f: add rbp,rsi
> ? 0.32% 0.17%? ? ???? 0x00007f7bf1bff142: add r13,rbp
> ? 0.05% 0.04%? ? ???? 0x00007f7bf1bff145: add rdi,r13
> ?26.10% ?28.47%? ???? 0x00007f7bf1bff148: add rbx,rdi
> ? 5.55% 5.48%? ? ???? 0x00007f7bf1bff14b: add rcx,rbx
> ? 5.66% 1.32%? ? ???? 0x00007f7bf1bff14e: add r9,rcx
> ? 7.85% 3.11%? ? ???? 0x00007f7bf1bff151: add rax,r9? ? ? ? ? ? 
> ?;*ladd {reexecute=0 rethrow=0 return_oop=0}
> ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; 
> - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
> ?10.19% 5.67%? ? ??? 0x00007f7bf1bff154: add r11d,0x8? ? ? ? ?;*iinc 
> {reexecute=0 rethrow=0 return_oop=0}
> ? ? ? ? ? ? ? ? ? ? ?? ; - com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)
> ? 0.38% 0.12%? ? ???? 0x00007f7bf1bff158: cmp r11d,r8d
> ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 
> 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 10 
> (line 52)
> ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d
> ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174
> ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? 
> ? ? ? ? ? ; *lload_1 {reexecute=0 rethrow=0 return_oop=0}
> ? ? ? ? ; - com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)
> ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR 
> [r14+r11*4+0x10]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? 
> ? ? ? ? ? ? ;*ladd {reexecute=0 rethrow=0 return_oop=0}
> ? ? ? ? ?; - com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
>
> Regards
>
>
> On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson 
> <nils.eliasson at oracle.com> wrote:
>
>
> Hi Ionut,
> In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64.
> Regards,
> Nils Eliasson
>
> On 2017-10-24 11:05, Ionut wrote:
>> Hello All,
>>
>> ? I want to ask you about 
>> https://bugs.openjdk.java.net/browse/JDK-8129920*?- Vectorized loop 
>> unrolling *which says it is applicable _only__for x86 targets_. Do 
>> you plan to port this for x64 as well? Or I miss something here?
>>
>> Regards
>> Ionut
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/87ca0c7b/attachment-0001.html>

From vladimir.kozlov at oracle.com  Tue Oct 24 17:03:51 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 24 Oct 2017 10:03:51 -0700
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <354890084.3509873.1508863577533@mail.yahoo.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
 <1933684779.3254078.1508840637072@mail.yahoo.com>
 <354890084.3509873.1508863577533@mail.yahoo.com>
Message-ID: <e043d652-9f49-0804-1911-fff449485566@oracle.com>

You are right - your initial examples are not supported by current HotSpot JIT vectorization.
Second example (sum/reduction) could be optimized https://bugs.openjdk.java.net/browse/JDK-8074981 but because generated 
code is very expensive we limited it to cases where benefit overweights expense: 
https://bugs.openjdk.java.net/browse/JDK-8078563.

Regards,
Vladimir

On 10/24/17 9:46 AM, Ionut wrote:
> Hello All,
> 
>  ? ?Meanwhile I tested two more other scenarios, as follows:
> 
> - a[i] = b[i] + c[i]? ? ? ? ? ? ? ? ? ? // where?a, b, c are arrays of ints
> - a[i] = a[i] + <int_value>? ? ? // where <int_value>might be a constant, etc
> 
> In both cases they were vectorized, but my initial example (e.g. iterating through the array of ints and computing the 
> sum of elements) is not ... which makes me think this case is currently not supported by JIT.
> 
> Could you please confirm this?
> 
> Regards
> Ionut
> 
> 
> On Tuesday, October 24, 2017 12:24 PM, Ionut <ionutb83 at yahoo.com> wrote:
> 
> 
> Hi Nils,
> 
>  ? Thanks, it is clear. However, I have tried a simple example (e.g. just iterating through an array and do the sum 
> using JMH) on my x64 Linux and it seems to not be vectorized ...? Below initial source code and assembly.
> Could you please provide me any hint, am I doing something wrong?
> 
> *JDK is 9.0.1*
> 
> *_Source code:_*
> 
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> @Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)
> @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.NANOSECONDS)
> @Fork(value = 3, jvmArgsAppend = { "-XX:-TieredCompilation", "-Xbatch", "-XX:+UseSuperWord" })
> @State(Scope.Benchmark)
> public class _Sum1ToNArray _{
>  ? ? private int[] array;
> 
>  ? ? public static void main(String[] args) {
>  ? ? ? ? Options opt =
>  ? ? ? ? ? ? new OptionsBuilder()
>  ? ? ? ? ? ? ? ? .include(Sum1ToNArray.class.getSimpleName())
>  ? ? ? ? ? ? ? ? .build();
>  ? ? ? ? new Runner(opt).run();
>  ? ? }
> 
>  ? ? @Setup(Level.Trial)
>  ? ? public void setUp() {
>  ? ? ? ? this.array = new int[100_000_000];
>  ? ? ? ? for (int i = 0; i < array.length; i++)
>  ? ? ? ? ? ? array[i] = i + 1;
>  ? ? }
> 
>  ? ? @Benchmark
>  ? ? public long hotMethod() {
>  ? ? ? ? long sum = 0;
>  ? ? ? ? for (int i = 0; i < array.length; i++) {
>  ? ? ? ? ? ? sum += array[i];
>  ? ? ? ? }
>  ? ? ? ? return sum;
>  ? ? }
> }
> 
> *_Assembly:_*
> ....[Hottest Region 1]..............................................................................
> c2, com.jpt.Sum1ToNArray::hotMethod, version 139 (63 bytes)
> 
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0f9: mov? ? r8d,r10d
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff0fc: add? ? r8d,0xfffffff9
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff100: mov? ? r11d,0x1
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff106: cmp? ? r8d,0x1
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? 0x00007f7bf1bff10a: jg? ? ?0x00007f7bf1bff114
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? 0x00007f7bf1bff10c: mov? ? rax,rdx
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? ?0x00007f7bf1bff10f: jmp? ? 0x00007f7bf1bff15d
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? 0x00007f7bf1bff111: mov? ? rdx,rax? ? ? ? ? ? ;*lload_1 {reexecute=0 rethrow=0 
> return_oop=0}
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - 
> com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???? 0x00007f7bf1bff114: movsxd rsi,DWORD PTR [r14+r11*4+0x10]
>  ?11.08%? ? 8.55%? ? ??? 0x00007f7bf1bff119: movsxd rbp,DWORD PTR [r14+r11*4+0x14]
>  ? 0.30%? ? 0.17%? ? ???? 0x00007f7bf1bff11e: movsxd r13,DWORD PTR [r14+r11*4+0x18]
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff123: movsxd rax,DWORD PTR [r14+r11*4+0x2c]
>  ? 8.86%? ? 2.85%? ? ???? 0x00007f7bf1bff128: movsxd r9,DWORD PTR? [r14+r11*4+0x28]
>  ?10.49%? ?23.29%? ???? 0x00007f7bf1bff12d: movsxd rcx,DWORD PTR [r14+r11*4+0x24]
>  ? 0.38%? ? 0.45%? ? ???? 0x00007f7bf1bff132: movsxd rbx,DWORD PTR [r14+r11*4+0x20]
>  ? 0.03%? ? 0.06%? ? ???? 0x00007f7bf1bff137: movsxd rdi,DWORD PTR [r14+r11*4+0x1c]
>  ? 0.23%? ? 0.22%? ? ???? 0x00007f7bf1bff13c: add? ? rsi,rdx
>  ?10.58%? ?18.59%? ???? 0x00007f7bf1bff13f: add? ? rbp,rsi
>  ? 0.32%? ? 0.17%? ? ???? 0x00007f7bf1bff142: add? ? r13,rbp
>  ? 0.05%? ? 0.04%? ? ???? 0x00007f7bf1bff145: add? ? rdi,r13
>  ?26.10%? ?28.47%? ???? 0x00007f7bf1bff148: add? ? rbx,rdi
>  ? 5.55%? ? 5.48%? ? ???? 0x00007f7bf1bff14b: add? ? rcx,rbx
>  ? 5.66%? ? 1.32%? ? ???? 0x00007f7bf1bff14e: add? ? r9,rcx
>  ? 7.85%? ? 3.11%? ? ???? 0x00007f7bf1bff151: add? ? rax,r9? ? ? ? ? ? ?;*ladd {reexecute=0 rethrow=0 return_oop=0}
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - 
> com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
>  ?10.19%? ? 5.67%? ? ??? 0x00007f7bf1bff154: add? ? r11d,0x8? ? ? ? ?;*iinc {reexecute=0 rethrow=0 return_oop=0}
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - 
> com.jpt.Sum1ToNArray::hotMethod at 23 (line 52)
>  ? 0.38%? ? 0.12%? ? ???? 0x00007f7bf1bff158: cmp? ? r11d,r8d
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f7bf1bff15b: jl? ? ? ? 0x00007f7bf1bff111? ;*if_icmpge {reexecute=0 rethrow=0 
> return_oop=0}
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - 
> com.jpt.Sum1ToNArray::hotMethod at 10 (line 52)
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?0x00007f7bf1bff15d: cmp? ? r11d,r10d
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff160: jge? ? ? ?0x00007f7bf1bff174
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff162: xchg? ? ax,ax? ? ? ? ? ? ? ? ? ? ? ; *lload_1 {reexecute=0 
> rethrow=0 return_oop=0}
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - 
> com.jpt.Sum1ToNArray::hotMethod at 13 (line 53)
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0x00007f7bf1bff164: movsxd r8,DWORD PTR [r14+r11*4+0x10]
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?0x00007f7bf1bff169: add? ? ? ?rax,r8? ? ? ? ? ? ? ? ? ? ;*ladd {reexecute=0 
> rethrow=0 return_oop=0}
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?; - 
> com.jpt.Sum1ToNArray::hotMethod at 21 (line 53)
> 
> Regards
> 
> 
> On Tuesday, October 24, 2017 11:22 AM, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> 
> Hi Ionut,
> In this case x86 refers to both x86_32/ia32 and x86_64/amd64/x64.
> Regards,
> Nils Eliasson
> 
> On 2017-10-24 11:05, Ionut wrote:
>> Hello All,
>>
>> ? ? I want to ask you about https://bugs.openjdk.java.net/browse/JDK-8129920*?- Vectorized loop unrolling *which says 
>> it is applicable _only__for x86 targets_. Do you plan to port this for x64 as well? Or I miss something here?
>>
>> Regards
>> Ionut
> 
> 
> 
> 
> 

From sitnikov.vladimir at gmail.com  Tue Oct 24 17:20:41 2017
From: sitnikov.vladimir at gmail.com (Vladimir Sitnikov)
Date: Tue, 24 Oct 2017 17:20:41 +0000
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <e043d652-9f49-0804-1911-fff449485566@oracle.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
 <1933684779.3254078.1508840637072@mail.yahoo.com>
 <354890084.3509873.1508863577533@mail.yahoo.com>
 <e043d652-9f49-0804-1911-fff449485566@oracle.com>
Message-ID: <CAB=Je-Gv10KNn-hHsdFtdG=tTuAKwhH+Y_mB-HzKbeY+npyy_g@mail.gmail.com>

Just in case, here's Vladimir Ivanov's vectorization talk:
*http://2017.jpoint.ru/en/talks/vector-programming-in-java/
<http://2017.jpoint.ru/en/talks/vector-programming-in-java/>*
Slide 89 describes sum misundervectorization.

Vladimir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171024/e812d04c/attachment.html>

From vladimir.kozlov at oracle.com  Tue Oct 24 21:08:11 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 24 Oct 2017 14:08:11 -0700
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco,
 handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure
 with C1
In-Reply-To: <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
 <dk6zi97w0z6.fsf@rwestrel.remote.csb> <dk6shegr175.fsf@rwestrel.remote.csb>
 <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>
Message-ID: <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com>

It looks good to me too. The only issue is test's placement - /c1 subdir is nothing to do with C1 compiler. I think test should be put into compiler/exceptions/ directory.
I submitted pre-integration testing.

Thanks,
Vladimir

On 10/18/17 8:19 PM, dean.long at oracle.com wrote:
> Yes, but I'm not a Reviewer.
> 
> dl
> 
> 
> On 10/18/17 7:16 AM, Roland Westrelin wrote:
>> Here is an updated webrev with Dean's suggestion:
>>
>> http://cr.openjdk.java.net/~roland/8188151/webrev.01/
>>
>> Can this be considered reviewed by you, Dean?
>>
>> Roland.
> 

From vladimir.kozlov at oracle.com  Tue Oct 24 22:02:37 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 24 Oct 2017 15:02:37 -0700
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
 <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
Message-ID: <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>

We can't use platform specific UseAVX flag in shared code in type.cpp.

I would say we should not support AVX (set UseAVX to 0) on AMD < 17h.
And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and corresponding vectors 32 and 64 bytes.
If AMD's Instructions Set before 17h does not support whole 32 bytes vectors we can't call it AVX.

Thanks,
Vladimir

On 10/18/17 10:01 AM, dean.long at oracle.com wrote:
> How about initializing TypeVect::VECTY and friends unconditionally?? I am nervous about exchanging one guarding condition for another.
> 
> dl
> 
> 
> On 10/18/17 1:03 AM, Nils Eliasson wrote:
>>
>> HI,
>>
>> I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives the best performance.
>>
>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>> ???? }
>>
>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the platform has the capability.
>>
>> Type.cpp:~660
>>
>> [...]
>> >?? if (Matcher::vector_size_supported(T_FLOAT,8)) {
>> >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
>> >?? }
>> [...]
>> >?? mreg2type[Op_VecY] = TypeVect::VECTY;
>>
>>
>> In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch.
>>
>> On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like: 
>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity");
>>
>> Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, 
>> but they might not be used if MaxVectorSize is limited.)
>>
>> This is a patch that solves the problem, but I have not convinced myself that it is the right way:
>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>
>> Feedback appreciated,
>>
>> Regards,
>> Nils Eliasson
>>
>>
>>
>>
>>
>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>
> 

From vladimir.kozlov at oracle.com  Tue Oct 24 23:02:46 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Tue, 24 Oct 2017 16:02:46 -0700
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <dk6k20cxtak.fsf@rwestrel.remote.csb>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
Message-ID: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>

Roland,

Did you consider less intrusive approach by adding branch over SafePoint with masking on index variable?

   int mask = LoopStripMiningMask * inc; // simplified
   for (int i = start; i < stop; i += inc) {
      // body
      if (i & mask != 0) continue;
      safepoint;
   }

Or may be doing it inside .ad file in new SafePoint node implementation so that ideal graph is not affected.
I am concern that suggested changes may affect Range Check elimination (you changed limit to variable value/flag) in addition to complexity of changes which may affect stability of C2.

Thanks,
Vladimir

On 10/3/17 6:19 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8186027/webrev.00/
> 
> This converts loop:
> 
> for (int i = start; i < stop; i += inc) {
>    // body
> }
> 
> to a loop nest:
> 
> i = start;
> if (i < stop) {
>    do {
>      int next = MIN(stop, i+LoopStripMiningIter*inc);
>      do {
>        // body
>        i += inc;
>      } while (i < next);
>      safepoint();
>    } while (i < stop);
> }
> 
> (It's actually:
> int next = MIN(stop - i, LoopStripMiningIter*inc) + i;
> to protect against overflows)
> 
> This should bring the best of running with UseCountedLoopSafepoints on
> and running with it off: low time to safepoint with little to no impact
> on throughput. That change was first pushed to the shenandoah repo
> several months ago and we've been running with it enabled since.
> 
> The command line argument LoopStripMiningIter is the number of
> iterations between safepoints. In practice, with an arbitrary
> LoopStripMiningIter=1000, we observe time to safepoint on par with the
> current -XX:+UseCountedLoopSafepoints and most performance regressions
> due to -XX:+UseCountedLoopSafepoints gone. The exception is when an
> inner counted loop runs for a low number of iterations on average (and
> the compiler doesn't have an upper bound on the number of iteration).
> 
> This is enabled on the command line with:
> -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000
> 
> In PhaseIdealLoop::is_counted_loop(), when loop strip mining is enabled,
> for an inner loop, the compiler builds a skeleton outer loop around the
> the counted loop. The outer loop is kept as simple as possible so
> required adjustments to the existing loop optimizations are not too
> intrusive. The reason the outer loop is inserted early in the
> optimization process is so that optimizations are not disrupted: an
> alternate implementation could have kept the safepoint in the counted
> loop until loop opts are over and then only have added the outer loop
> and moved the safepoint to the outer loop. That would have prevented
> nodes that are referenced in the safepoint to be sunk out of loop for
> instance.
> 
> The outer loop is a LoopNode with a backedge to a loop exit test and a
> safepoint. The loop exit test is a CmpI with a new Opaque5Node. The
> skeleton loop is populated with all required Phis after loop opts are
> over during macro expansion. At that point only, the loop exit tests are
> adjusted so the inner loop runs for at most LoopStripMiningIter. If the
> compiler can prove the inner loop runs for no more than
> LoopStripMiningIter then during macro expansion, the outer loop is
> removed. The safepoint is removed only if the inner loop executes for
> less than LoopStripMiningIterShortLoop so that if there are several
> counted loops in a raw, we still poll for safepoints regularly.
> 
> Until macro expansion, there can be only a few extra nodes in the outer
> loop: nodes that would have sunk out of the inner loop and be kept in
> the outer loop by the safepoint.
> 
> PhaseIdealLoop::clone_loop() which is used by most loop opts has now
> several ways of cloning a counted loop. For loop unswitching, both inner
> and outer loops need to be cloned. For unrolling, only the inner loop
> needs to be cloned. For pre/post loops insertion, only the inner loop
> needs to be cloned but the control flow must connect one of the inner
> loop copies to the outer loop of the other copy.
> 
> Beyond verifying performance results with the usual benchmarks, when I
> implemented that change, I wrote test cases for (hopefully) every loop
> optimization and verified by inspection of the generated code that the
> loop opt triggers correct with loop strip mining.
> 
> Roland.
> 

From igor.veresov at oracle.com  Wed Oct 25 03:52:42 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Tue, 24 Oct 2017 20:52:42 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
Message-ID: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>

This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.

Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
JBS: https://bugs.openjdk.java.net/browse/JDK-8166750

Thanks,
igor

From robbin.ehn at oracle.com  Wed Oct 25 07:30:38 2017
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 25 Oct 2017 09:30:38 +0200
Subject: Low-Overhead Heap Profiling
In-Reply-To: <CAF9BGBxQgVT5QkdBMwets0E+7BAmchCDWRqY2b04k06RFY=Xdw@mail.gmail.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
 <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
 <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>
 <CAF9BGBzgdHoaW2XsDmmvbmfPbDRKP9uwX+VLcMg37rXYxSnC8w@mail.gmail.com>
 <CAF9BGBxQgVT5QkdBMwets0E+7BAmchCDWRqY2b04k06RFY=Xdw@mail.gmail.com>
Message-ID: <68d73f67-1113-0997-8f5a-0baa23151397@oracle.com>

Hi,

325     HeapWord *tlab_old_end = thread->tlab().return end();

Should be something like:

325     HeapWord *tlab_old_end = thread->tlab().end();

Thanks, Robbin

On 2017-10-23 17:27, JC Beyler wrote:
> Dear all,
> 
> Small update this week with this new webrev:
>  ? - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/
>  ? - Incremental is here: http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/
> 
> I patched the code changes showed by Robbin last week and I refactored 
> collectedHeap.cpp:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/share/gc/shared/collectedHeap.cpp.patch
> 
> The original code became a bit too complex in my opinion with the 
> handle_heap_sampling handling too many things. So I subdivided the logic into 
> two smaller methods and moved out a bit of the logic to make it more clear. 
> Hopefully it is :)
> 
> Let me know if you have any questions/comments :)
> Jc
> 
> On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler <jcbeyler at google.com 
> <mailto:jcbeyler at google.com>> wrote:
> 
>     Hi Robbin,
> 
>     That is because version 11 to 12 was only a test change. I was going to
>     write about it and say here are the webrev links:
>     Incremental:
>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/>
> 
>     Full webrev:
>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/
>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/>
> 
>     This change focused only on refactoring the tests to be more manageable,
>     readable, maintainable. As all tests are looking at allocations, I moved
>     common code to a java class:
>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch
>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java.patch>
> 
>     And then most tests call into that class to turn on/off the sampling,
>     allocate, etc. This has removed almost 500 lines of test code so I'm happy
>     about that.
> 
>     Thanks for your changes, a bit of relics of previous versions :). I've
>     already integrated them into my code and will make a new webrev end of this
>     week with a bit of refactor of the code handling the tlab slow path. I find
>     it could use a bit of refactoring to make it easier to follow so I'm going
>     to take a stab at it this week.
> 
>     Any other issues/comments?
> 
>     Thanks!
>     Jc
> 
> 
>     On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <robbin.ehn at oracle.com
>     <mailto:robbin.ehn at oracle.com>> wrote:
> 
>         Hi JC,
> 
>         I saw a webrev.12 in the directory, with only test changes(11->12), so I
>         took that version.
>         I had a look and tested the tests, worked fine!
> 
>         First glance at the code (looking at full v12) some minor things below,
>         mostly unused stuff.
> 
>         Thanks, Robbin
> 
>         diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
>         --- a/src/hotspot/share/runtime/heapMonitoring.cpp? ? ? Mon Oct 16
>         16:54:06 2017 +0200
>         +++ b/src/hotspot/share/runtime/heapMonitoring.cpp? ? ? Mon Oct 16
>         17:42:42 2017 +0200
>         @@ -211,2 +211,3 @@
>          ? ?void initialize(int max_storage) {
>         +? ? // validate max_storage to sane value ? What would 0 mean ?
>          ? ? ?MutexLocker mu(HeapMonitor_lock);
>         @@ -227,8 +228,4 @@
>          ? ?bool initialized() { return _initialized; }
>         -? volatile bool *initialized_address() { return &_initialized; }
> 
>          ? private:
>         -? // Protects the traces currently sampled (below).
>         -? volatile intptr_t _stack_storage_lock[1];
>         -
>          ? ?// The traces currently sampled.
>         @@ -313,3 +310,2 @@
>          ? ?_initialized(false) {
>         -? ? _stack_storage_lock[0] = 0;
>          ?}
>         @@ -532,13 +528,2 @@
> 
>         -// Delegate the initialization question to the underlying storage system.
>         -bool HeapMonitoring::initialized() {
>         -? return StackTraceStorage::storage()->initialized();
>         -}
>         -
>         -// Delegate the initialization question to the underlying storage system.
>         -bool *HeapMonitoring::initialized_address() {
>         -? return
>         -     
>         const_cast<bool*>(StackTraceStorage::storage()->initialized_address());
>         -}
>         -
>          ?void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
>         diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
>         --- a/src/hotspot/share/runtime/heapMonitoring.hpp? ? ? Mon Oct 16
>         16:54:06 2017 +0200
>         +++ b/src/hotspot/share/runtime/heapMonitoring.hpp? ? ? Mon Oct 16
>         17:42:42 2017 +0200
>         @@ -35,3 +35,2 @@
>          ? ?static uint64_t _rnd;
>         -? static bool _initialized;
>          ? ?static jint _monitoring_rate;
>         @@ -92,7 +91,2 @@
> 
>         -? // Is the profiler initialized and where is the address to the
>         initialized
>         -? // boolean.
>         -? static bool initialized();
>         -? static bool *initialized_address();
>         -
>          ? ?// Called when o is to be sampled from a given thread and a given size.
> 
> 
> 
>         On 10/10/2017 12:57 AM, JC Beyler wrote:
> 
>             Dear all,
> 
>             Thread-safety is back!! Here is the update webrev:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/>
> 
>             Full webrev is here:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/>
> 
>             In order to really test this, I needed to add this so thought now
>             was a good time. It required a few changes here for the creation to
>             ensure correctness and safety. Now we keep the static pointer but
>             clear the data internally so on re-initialize, it will be a bit more
>             costly than before. I don't think this is a huge use-case so I did
>             not think it was a problem. I used the internal MutexLocker, I think
>             I used it well, let me know.
> 
>             I also added three tests:
> 
>             1) Stack depth test:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStackDepthTest.java.patch>
> 
>             This test shows that the maximum stack depth system is working.
> 
>             2) Thread safety:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadTest.java.patch>
> 
>             The test creates 24 threads and they all allocate at the same time.
>             The test then checks it does find samples from all the threads.
> 
>             3) Thread on/off safety
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorThreadOnOffTest.java.patch>
> 
>             The test creates 24 threads that all allocate a bunch of memory.
>             Then another thread turns the sampling on/off.
> 
>             Btw, both tests 2 & 3 failed without the locks.
> 
>             As I worked on this, I saw a lot of places where the tests are doing
>             very similar things, I'm going to clean up the code a bit and make a
>             HeapAllocator class that all tests can call directly. This will
>             greatly simplify the code.
> 
>             Thanks for any comments/criticisms!
>             Jc
> 
> 
>             On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbeyler at google.com
>             <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com
>             <mailto:jcbeyler at google.com>>> wrote:
> 
>              ? ? Dear all,
> 
>              ? ? Small update to the webrev:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>>
> 
>              ? ? Full webrev is here:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>>
> 
>              ? ? I updated a bit of the naming, removed a TODO comment, and I
>             added a test for testing the sampling rate. I also updated the
>             maximum stack depth to 1024, there is no
>              ? ? reason to keep it so small. I did a micro benchmark that tests
>             the overhead and it seems relatively the same.
> 
>              ? ? I compared allocations from a stack depth of 10 and allocations
>             from a stack depth of 1024 (allocations are from the same helper
>             method in
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java>
>                 
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java>>):
>              ? ? ?? ? ? ? ? - For an array of 1 integer allocated in a loop;
>             stack depth 1024 vs stack depth 10: 1% slower
>              ? ? ???????????- For an array of 200k integers allocated in a loop;
>             stack depth 1024 vs stack depth 10: 3% slower
> 
>              ? ? So basically now moving the maximum stack depth to 1024 but we
>             only copy over the stack depths actually used.
> 
>              ? ? For the next webrev, I will be adding a stack depth test to
>             show that it works and probably put back the mutex locking so that
>             we can see how difficult it is to keep
>              ? ? thread safe.
> 
>              ? ? Let me know what you think!
>              ? ? Jc
> 
> 
> 
>              ? ? On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com
>             <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com
>             <mailto:jcbeyler at google.com>>> wrote:
> 
>              ? ? ? ? Forgot to say that for my numbers:
>              ? ? ? ? ??- Not in the test are the actual numbers I got for the
>             various array sizes, I ran the program 30 times and parsed the
>             output; here are the averages and standard
>              ? ? ? ? deviation:
>              ? ? ? ? ?? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation
>              ? ? ? ? ?? ? ? 10000:? ? 1.59% average; 1.25% standard deviation
>              ? ? ? ? ?? ? ? 100000:? ?1.26% average; 1.26% standard deviation
> 
>              ? ? ? ? The 1000/10000/100000 are the sizes of the arrays being
>             allocated. These are allocated 100k times and the sampling rate is
>             111 times the size of the array.
> 
>              ? ? ? ? Thanks!
>              ? ? ? ? Jc
> 
> 
>              ? ? ? ? On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler
>             <jcbeyler at google.com <mailto:jcbeyler at google.com>
>             <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
> 
>              ? ? ? ? ? ? Hi all,
> 
>              ? ? ? ? ? ? After a bit of a break, I am back working on this :).
>             As before, here are two webrevs:
> 
>              ? ? ? ? ? ? - Full change set:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>>
>              ? ? ? ? ? ? - Compared to version 8:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>>
>              ? ? ? ? ? ? ?? ? (This version is compared to version 8 I last
>             showed but ported to the new folder hierarchy)
> 
>              ? ? ? ? ? ? In this version I have:
>              ? ? ? ? ? ? ?? - Handled Thomas' comments from his email of 07/03:
>              ? ? ? ? ? ? ?? ? ? ?- Merged the logging to be standard
>              ? ? ? ? ? ? ?? ? ? ?- Fixed up the code a bit where asked
>              ? ? ? ? ? ? ?? ? ? ?- Added some notes about the code not being
>             thread-safe yet
>              ? ? ? ? ? ? ?? ?- Removed additional dead code from the version
>             that modifies interpreter/c1/c2
>              ? ? ? ? ? ? ?? ?- Fixed compiler issues so that it compiles with
>             --disable-precompiled-header
>              ? ? ? ? ? ? ?? ? ? ? - Tested with ./configure
>             --with-boot-jdk=<jdk8> --with-debug-level=slowdebug
>             --disable-precompiled-headers
> 
>              ? ? ? ? ? ? Additionally, I added a test to check the sanity of the
>             sampler: HeapMonitorStatCorrectnessTest
>                         
>             (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch>>)
>              ? ? ? ? ? ? ?? ?- This allocates a number of arrays and checks that
>             we obtain the number of samples we want with an accepted error of
>             5%. I tested it 100 times and it
>              ? ? ? ? ? ? passed everytime, I can test more if wanted
>              ? ? ? ? ? ? ?? ?- Not in the test are the actual numbers I got for
>             the various array sizes, I ran the program 30 times and parsed the
>             output; here are the averages and
>              ? ? ? ? ? ? standard deviation:
>              ? ? ? ? ? ? ?? ? ? 1000:? ? ?1.28% average; 1.13% standard deviation
>              ? ? ? ? ? ? ?? ? ? 10000:? ? 1.59% average; 1.25% standard deviation
>              ? ? ? ? ? ? ?? ? ? 100000:? ?1.26% average; 1.26% standard deviation
> 
>              ? ? ? ? ? ? What this means is that we were always at about 1~2% of
>             the number of samples the test expected.
> 
>              ? ? ? ? ? ? Let me know what you think,
>              ? ? ? ? ? ? Jc
> 
>              ? ? ? ? ? ? On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler
>             <jcbeyler at google.com <mailto:jcbeyler at google.com>
>             <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
> 
>              ? ? ? ? ? ? ? ? Hi all,
> 
>              ? ? ? ? ? ? ? ? I apologize, I have not yet handled your remarks
>             but thought this new webrev would also be useful to see and comment
>             on perhaps.
> 
>              ? ? ? ? ? ? ? ? Here is the latest webrev, it is generated slightly
>             different than the others since now I'm using webrev.ksh without the
>             -N option:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>>
> 
>              ? ? ? ? ? ? ? ? And the webrev.07 to webrev.08 diff is here:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>>
> 
>              ? ? ? ? ? ? ? ? (Let me know if it works well)
> 
>              ? ? ? ? ? ? ? ? It's a small change between versions but it:
>              ? ? ? ? ? ? ? ? ?? - provides a fix that makes the average sample
>             rate correct (more on that below).
>              ? ? ? ? ? ? ? ? ?? - fixes the code to actually have it play nicely
>             with the fast tlab refill
>              ? ? ? ? ? ? ? ? ?? - cleaned up a bit the JVMTI text and now use
>             jvmtiFrameInfo
>              ? ? ? ? ? ? ? ? - moved the capability to be onload solo
> 
>              ? ? ? ? ? ? ? ? With this webrev, I've done a small study of the
>             random number generator we use here for the sampling rate. I took a
>             small program and it can be simplified to:
> 
>              ? ? ? ? ? ? ? ? for (outer loop)
>              ? ? ? ? ? ? ? ? for (inner loop)
>              ? ? ? ? ? ? ? ? int[] tmp = new int[arraySize];
> 
>              ? ? ? ? ? ? ? ? - I've fixed the outer and inner loops to being 800
>             for this experiment, meaning we allocate 640000 times an array of a
>             given array size.
> 
>              ? ? ? ? ? ? ? ? - Each program provides the average sample size
>             used for the whole execution
> 
>              ? ? ? ? ? ? ? ? - Then, I ran each variation 30 times and then
>             calculated the average of the average sample size used for various
>             array sizes. I selected the array size to
>              ? ? ? ? ? ? ? ? be one of the following: 1, 10, 100, 1000.
> 
>              ? ? ? ? ? ? ? ? - When compared to 512kb, the average sample size
>             of 30 runs:
>              ? ? ? ? ? ? ? ? 1: 4.62% of error
>              ? ? ? ? ? ? ? ? 10: 3.09% of error
>              ? ? ? ? ? ? ? ? 100: 0.36% of error
>              ? ? ? ? ? ? ? ? 1000: 0.1% of error
>              ? ? ? ? ? ? ? ? 10000: 0.03% of error
> 
>              ? ? ? ? ? ? ? ? What it shows is that, depending on the number of
>             samples, the average does become better. This is because with an
>             allocation of 1 element per array, it
>              ? ? ? ? ? ? ? ? will take longer to hit one of the thresholds. This
>             is seen by looking at the sample count statistic I put in. For the
>             same number of iterations (800 *
>              ? ? ? ? ? ? ? ? 800), the different array sizes provoke:
>              ? ? ? ? ? ? ? ? 1: 62 samples
>              ? ? ? ? ? ? ? ? 10: 125 samples
>              ? ? ? ? ? ? ? ? 100: 788 samples
>              ? ? ? ? ? ? ? ? 1000: 6166 samples
>              ? ? ? ? ? ? ? ? 10000: 57721 samples
> 
>              ? ? ? ? ? ? ? ? And of course, the more samples you have, the more
>             sample rates you pick, which means that your average gets closer
>             using that math.
> 
>              ? ? ? ? ? ? ? ? Thanks,
>              ? ? ? ? ? ? ? ? Jc
> 
>              ? ? ? ? ? ? ? ? On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler
>             <jcbeyler at google.com <mailto:jcbeyler at google.com>
>             <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>> wrote:
> 
>              ? ? ? ? ? ? ? ? ? ? Thanks Robbin,
> 
>              ? ? ? ? ? ? ? ? ? ? This seems to have worked. When I have the next
>             webrev ready, we will find out but I'm fairly confident it will work!
> 
>              ? ? ? ? ? ? ? ? ? ? Thanks agian!
>              ? ? ? ? ? ? ? ? ? ? Jc
> 
>              ? ? ? ? ? ? ? ? ? ? On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn
>             <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>
>             <mailto:robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>> wrote:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? Hi JC,
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? On 06/29/2017 12:15 AM, JC Beyler wrote:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? B) Incremental changes
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? I guess the most common work flow here is
>             using mq :
>              ? ? ? ? ? ? ? ? ? ? ? ? hg qnew fix_v1
>              ? ? ? ? ? ? ? ? ? ? ? ? edit files
>              ? ? ? ? ? ? ? ? ? ? ? ? hg qrefresh
>              ? ? ? ? ? ? ? ? ? ? ? ? hg qnew fix_v2
>              ? ? ? ? ? ? ? ? ? ? ? ? edit files
>              ? ? ? ? ? ? ? ? ? ? ? ? hg qrefresh
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? if you do hg log you will see 2 commits
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? webrev.ksh -r -2 -o my_inc_v1_v2
>              ? ? ? ? ? ? ? ? ? ? ? ? webrev.ksh -o my_full_v2
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? In? your .hgrc you might need:
>              ? ? ? ? ? ? ? ? ? ? ? ? [extensions]
>              ? ? ? ? ? ? ? ? ? ? ? ? mq =
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? /Robbin
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Again another newbiew question here...
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? For showing the incremental changes, is
>             there a link that explains how to do that? I apologize for my newbie
>             questions all the time :)
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Right now, I do:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ksh ../webrev.ksh -m -N
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? That generates a webrev.zip and send it
>             to Chuck Rasbold. He then uploads it to a new webrev.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? I tried commiting my change and adding
>             a small change. Then if I just do ksh ../webrev.ksh without any
>             options, it seems to produce a similar
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? page but now with only the changes I
>             had (so the 06-07 comparison you were talking about) and a changeset
>             that has it all. I imagine that is
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? what you meant.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Which means that my workflow would become:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1) Make changes
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2) Make a webrev without any options to
>             show just the differences with the tip
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3) Amend my changes to my local commit
>             so that I have it done with
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4) Go to 1
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Does that seem correct to you?
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Note that when I do this, I only see
>             the full change of a file in the full change set (Side note here:
>             now the page says change set and not
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? patch, which is maybe why Serguei was
>             having issues?).
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Thanks!
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? Jc
> 
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? On Wed, Jun 28, 2017 at 1:12 AM, Robbin
>             Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>
>             <mailto:robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>
>             <mailto:robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? <mailto:robbin.ehn at oracle.com
>             <mailto:robbin.ehn at oracle.com>>>> wrote:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? Hi,
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? On 06/28/2017 12:04 AM, JC Beyler
>             wrote:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Dear Thomas et al,
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Here is the newest webrev:
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>>
> 
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? You have some more bits to in
>             there but generally this looks good and really nice with more tests.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? I'll do and deep dive and re-test
>             this when I get back from my long vacation with whatever patch
>             version you have then.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? Also I think it's time you provide
>             incremental (v06->07 changes) as well as complete change-sets.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? Thanks, Robbin
> 
> 
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Thomas, I "think" I have
>             answered all your remarks. The summary is:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - The statistic system is up
>             and provides insight on what the heap sampler is doing
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- I've noticed that,
>             though the sampling rate is at the right mean, we are missing some
>             samples, I have not yet tracked out why
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? (details below)
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - I've run a tiny benchmark
>             that is the worse case: it is a very tight loop and allocated a
>             small array
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- In this case, I see no
>             overhead when the system is off so that is a good start :)
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- I see right now a high
>             overhead in this case when sampling is on. This is not a really too
>             surprising but I'm going to see if
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? this is consistent with our
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? internal implementation. The
>             benchmark is really allocation stressful so I'm not too surprised
>             but I want to do the due diligence.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- The statistic system up
>             is up and I have a new test
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>>
>                                                  
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch>>>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? - I did a bit of a study
>             about the random generator here, more details are below but
>             basically it seems to work well
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- I added a capability but
>             since this is the first time doing this, I was not sure I did it right
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- I did add a test though
>             for it and the test seems to do what I expect (all methods are
>             failing with the
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ?-
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
>                                                                             
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>>
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- I still need to figure
>             out what to do about the multi-agent vs single-agent issue
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- As far as measurements,
>             it seems I still need to look at:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- Why we do the 20 random
>             calls first, are they necessary?
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- Look at the mean of the
>             sampling rate that the random generator does and also what is
>             actually sampled
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?- What is the overhead in
>             terms of memory/performance when on?
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I have inlined my answers, I
>             think I got them all in the new webrev, let me know your thoughts.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Thanks again!
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Jc
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? On Fri, Jun 23, 2017 at 3:52
>             AM, Thomas Schatzl <thomas.schatzl at oracle.com
>             <mailto:thomas.schatzl at oracle.com> <mailto:thomas.schatzl at oracle.com
>             <mailto:thomas.schatzl at oracle.com>>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? <mailto:thomas.schatzl at oracle.com
>             <mailto:thomas.schatzl at oracle.com> <mailto:thomas.schatzl at oracle.com
>             <mailto:thomas.schatzl at oracle.com>>>
>             <mailto:thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>
>             <mailto:thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>>
> 
>                                                  
>             <mailto:thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>
>             <mailto:thomas.schatzl at oracle.com
>             <mailto:thomas.schatzl at oracle.com>>>>> wrote:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Hi,
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?On Wed, 2017-06-21 at
>             13:45 -0700, JC Beyler wrote:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Hi all,
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> First off: Thanks again
>             to Robbin and Thomas for their reviews :)
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Next, I've uploaded a
>             new webrev:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>
>                                                  
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>>
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Here is an update:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - @Robbin, I forgot to
>             say that yes I need to look at implementing
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> this for the other
>             architectures and testing it before it is all
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> ready to go. Is it
>             common to have it working on all possible
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> combinations or is
>             there a subset that I should be doing first and we
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> can do the others later?
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - I've tested
>             slowdebug, built and ran the JTreg tests I wrote with
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> slowdebug and fixed a
>             few more issues
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - I've refactored a bit
>             of the code following Thomas' comments
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - I think I've
>             handled all the comments from Thomas (I put
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> comments inline below
>             for the specifics)
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Thanks for handling all
>             those.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> - Following Thomas'
>             comments on statistics, I want to add some
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> quality assurance tests
>             and find that the easiest way would be to
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> have a few counters of
>             what is happening in the sampler and expose
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> that to the user.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - I'll be adding
>             that in the next version if no one sees any
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> objections to that.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - This will allow me
>             to add a sanity test in JTreg about number of
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> samples and average of
>             sampling rate
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> @Thomas: I had a few
>             questions that I inlined below but I will
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> summarize the "bigger
>             ones" here:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - You mentioned
>             constants are not using the right conventions, I
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> looked around and
>             didn't see any convention except normal naming then
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> for static constants.
>             Is that right?
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I looked through
>             https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>                                         
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>>
>                                                  
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>
>                                         
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui
>             <https://wiki.openjdk.java.net/display/HotSpot/StyleGui>>>>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?de and the rule is to
>             "follow an existing pattern and must have a
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?distinct appearance from
>             other names". Which does not help a lot I
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?guess :/ The GC team
>             started using upper camel case, e.g.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?SomeOtherConstant, but
>             very likely this is probably not applied
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?consistently throughout.
>             So I am fine with not adding another style
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?(like kMaxStackDepth with
>             the "k" in front with some unknown meaning)
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?is fine.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?(Chances are you will
>             find that style somewhere used anyway too,
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?apologies if so :/)
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Thanks for that link, now I
>             know where to look. I used the upper camel case in my code as well
>             then :) I should have gotten them all.
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > PS: I've also inlined
>             my answers to Thomas below:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > On Tue, Jun 13, 2017
>             at 8:03 AM, Thomas Schatzl <thomas.schatzl at oracl
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > e.com <http://e.com>
>             <http://e.com> <http://e.com> <http://e.com>> wrote:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > Hi all,
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > On Mon, 2017-06-12
>             at 11:11 -0700, JC Beyler wrote:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > Dear all,
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > I've continued
>             working on this and have done the following
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > webrev:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >
>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>>
>                                                  
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>
>                                         
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/
>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>>>>
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > [...]
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > Things I still
>             need to do:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Have to fix
>             that TLAB case for the FastTLABRefill
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Have to start
>             looking at the data to see that it is
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > consistent and does
>             gather the right samples, right frequency, etc.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Have to check
>             the GC elements and what that produces
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >? ? - Run a
>             slowdebug run and ensure I fixed all those issues you
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > saw > Robbin
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > > Thanks for looking
>             at the webrev and have a great week!
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >? ?scratching a bit
>             on the surface of this change, so apologies for
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > rather shallow comments:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > -
>             macroAssembler_x86.cpp:5604: while this is compiler code, and I
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > am not sure this is
>             final, please avoid littering the code with
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > TODO remarks :) They
>             tend to be candidates for later wtf moments
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > only.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > > Just file a CR for that.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > Newcomer question:
>             what is a CR and not sure I have the rights to do
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? > that yet ? :)
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Apologies. CR is a change
>             request, this suggests to file a bug in the
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?bug tracker. And you are
>             right, you can't just create a new account in
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?the OpenJDK JIRA
>             yourselves. :(
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Ok good to know, I'll continue
>             with my own todo list but I'll work hard on not letting it slip in
>             the webrevs anymore :)
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I was mostly referring to
>             the "... but it is a TODO" part of that
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?comment in
>             macroassembler_x86.cpp. Comments about the why of the code
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?are appreciated.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?[Note that I now
>             understand that this is to some degree still work in
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?progress. As long as the
>             final changeset does no contain TODO's I am
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?fine (and it's not a hard
>             objection, rather their use in "final" code
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?is typically limited in
>             my experience)]
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?5603? ?// Currently, if
>             this happens, just set back the actual end to
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?where it was.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?5604? ?// We miss a
>             chance to sample here.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Would be okay, if
>             explaining "this" and the "why" of missing a chance
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?to sample here would be best.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Like maybe:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?// If we needed to refill
>             TLABs, just set the actual end point to
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?// the end of the TLAB
>             again. We do not sample here although we could.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Done with your comment, it
>             works well in my mind.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I am not sure whether
>             "miss a chance to sample" meant "we could, but
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?consciously don't because
>             it's not that useful" or "it would be
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?necessary but don't
>             because it's too complicated to do.".
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Looking at the original
>             comment once more, I am also not sure if that
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?comment shouldn't
>             referring to the "end" variable (not actual_end)
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?because that's the
>             variable that is responsible for taking the sampling
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?path? (Going from the
>             member description of ThreadLocalAllocBuffer).
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I've moved this code and it no
>             longer shows up here but the rationale and answer was:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? So.. Yes, end is the variable
>             provoking the sampling. Actual end is the actual end of the TLAB.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? What was happening here is
>             that the code is resetting _end to point towards the end of the new
>             TLAB. Because, we now have the end for
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? sampling and _actual_end for
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? the actual end, we need to
>             update the actual_end as well.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Normally, were we to do the
>             real work here, we would calculate the (end - start) offset, then do:
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - Set the new end to : start +
>             (old_end - old_start)
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? - Set the actual end like we
>             do here now where it because it is the actual end.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Why is this not done here now
>             anymore?
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? - I was still debating
>             which path to take:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?- Do it in the fast
>             refill code, it has its perks:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?- In a world where
>             fast refills are happening all the time or a lot, we can augment
>             there the code to do the sampling
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ?- Remember what we had
>             as an end before leaving the slowpath and check on return
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ?- This is what I'm
>             doing now, it removes the need to go fix up all fast refill paths
>             but if you remain in fast refill paths,
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? you won't get sampling. I
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? have to think of the
>             consequences of that, maybe a future change later on?
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? - I have the
>             statistics now so I'm going to study that
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ?-> By the
>             way, though my statistics are showing I'm missing some samples, if I
>             turn off FastTlabRefill, it is the same
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? loss so for now, it seems
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? this does not occur in my
>             simple test.
> 
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?But maybe I am only
>             confused and it's best to just leave the comment
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?away. :)
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Thinking about it some
>             more, doesn't this not-sampling in this case
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?mean that sampling does
>             not work in any collector that does inline TLAB
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?allocation at the moment?
>             (Or is inline TLAB alloc automatically
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?disabled with sampling
>             somehow?)
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?That would indeed be a
>             bigger TODO then :)
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Agreed, this remark made me
>             think that perhaps as a first step the new way of doing it is better
>             but I did have to:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- Remove the const of the
>             ThreadLocalBuffer remaining and hard_end methods
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?- Move hard_end out of the
>             header file to have a bit more logic there
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Please let me know what you
>             think of that and if you prefer it this way or changing the fast
>             refills. (I prefer this way now because it
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? is more incremental).
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - calling
>             HeapMonitoring::do_weak_oops() (which should probably be
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > called weak_oops_do()
>             like other similar methods) only if string
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > deduplication is
>             enabled (in g1CollectedHeap.cpp:4511) seems wrong.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> The call should be at
>             least around 6 lines up outside the if.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Preferentially in a
>             method like process_weak_jni_handles(), including
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> additional logging. (No
>             new (G1) gc phase without minimal logging
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> :)).
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Done but really not
>             sure because:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> I put for logging:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ?log_develop_trace(gc,
>             freelist)("G1ConcRegionFreeing [other] : heap
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> monitoring");
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I would think that "gc,
>             ref" would be more appropriate log tags for
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?this similar to jni handles.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?(I am als not sure what
>             weak reference handling has to do with
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?G1ConcRegionFreeing, so I
>             am a bit puzzled)
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I was not sure what to put for
>             the tags or really as the message. I cleaned it up a bit now to:
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?log_develop_trace(gc,
>             ref)("HeapSampling [other] : heap monitoring processing");
> 
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Since weak_jni_handles
>             didn't have logging for me to be inspired
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> from, I did that but
>             unconvinced this is what should be done.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?The JNI handle processing
>             does have logging, but only in
>                                                      
>              ?ReferenceProcessor::process_discovered_references(). In
>                                                      
>              ?process_weak_jni_handles() only overall time is measured (in a G1
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?specific way, since only
>             G1 supports disabling reference procesing) :/
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?The code in
>             ReferenceProcessor prints both time taken
>                                                      
>              ?referenceProcessor.cpp:254, as well as the count, but strangely
>             only in
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?debug VMs.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?I have no idea why this
>             logging is that unimportant to only print that
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?in a debug VM. However
>             there are reviews out for changing this area a
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?bit, so it might be
>             useful to wait for that (JDK-8173335).
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? I cleaned it up a bit anyway
>             and now it returns the count of objects that are in the system.
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > - the change doubles
>             the size of
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> >
>             CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > threshold. Maybe it
>             could be refactored a bit.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Done I think, it looks
>             better to me :).
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?In
>             ThreadLocalAllocBuffer::handle_sample() I think the
>                                                      
>              ?set_back_actual_end()/pick_next_sample() calls could be hoisted out of
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?the "if" :)
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Done!
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > -
>             referenceProcessor.cpp:261: the change should add logging about
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > the number of
>             references encountered, maybe after the corresponding
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > "JNI weak reference
>             count" log message.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Just to double check,
>             are you saying that you'd like to have the heap
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> sampler to keep in
>             store how many sampled objects were encountered in
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> the
>             HeapMonitoring::weak_oops_do?
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - Would a return of
>             the method with the number of handled
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> references and logging
>             that work?
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Yes, it's fine if
>             HeapMonitoring::weak_oops_do() only returned the
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?number of processed weak
>             oops.
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Done also (but I admit I have
>             not tested the output yet) :)
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? - Additionally,
>             would you prefer it in a separate block with its
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> GCTraceTime?
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?Yes. Both kinds of
>             information is interesting: while the time taken is
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?typically more important,
>             the next question would be why, and the
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?number of references
>             typically goes a long way there.
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?See above though, it is
>             probably best to wait a bit.
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? Agreed that I "could" wait
>             but, if it's ok, I'll just refactor/remove this when we get closer
>             to something final. Either, JDK-8173335
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? has gone in and I will notice
>             it now or it will soon and I can change it then.
> 
> 
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > -
>             threadLocalAllocBuffer.cpp:331: one more "TODO"
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> Removed it and added it
>             to my personal todos to look at.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?>? ? ? > >
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > -
>             threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > documentation should
>             be updated about the sampling additions. I
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > would have no clue
>             what the difference between "actual_end" and
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> > "end" would be from
>             the given information.
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> If you are talking
>             about the comments in this file, I made them more
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> clear I hope in the new
>             webrev. If it was somewhere else, let me know
>              ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?> where to change.
> 
> 
> 

From jamsheed.c.m at oracle.com  Wed Oct 25 09:35:35 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Wed, 25 Oct 2017 15:05:35 +0530
Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for
 is_deopt_suspend needlessly
Message-ID: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>

Hi,

request for review,

webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/

jbs: https://bugs.openjdk.java.net/browse/JDK-6523512

desc: removed the is_deopt_suspend() from 
has_special_runtime_exit_condition checks

Best regards,

Jamsheed


From lutz.schmidt at sap.com  Wed Oct 25 10:01:41 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Wed, 25 Oct 2017 10:01:41 +0000
Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by
 exploiting vector instructions
Message-ID: <AF492C88-E679-4C5E-8745-F33B857BBB9E@sap.com>

Dear all,

I would like to request reviews for this s390-only enhancement:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8189793
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189793.00/index.html

Vector instructions, which have been available on System z for a while (since z13), promise noticeable performance improvements. This enhancement improves the String Compress and String Inflate intrinsics by exploiting vector instructions, when available. For long strings, up to 2x performance improvement has been observed in micro-benchmarks.

Special care was taken to preserve good performance for short strings. All examined workloads showed a high ratio of short and very short strings.

Thank you!
Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/b88470e8/attachment.html>

From doug.simon at oracle.com  Wed Oct 25 10:07:57 2017
From: doug.simon at oracle.com (Doug Simon)
Date: Wed, 25 Oct 2017 12:07:57 +0200
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
Message-ID: <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com>

 581 
 582             // Fixup the case of C1's inability to optimize profiling of a statically bindable call site
 583             if (entries == 1) {
 584                 counts[0] = totalCount;
 585             }
 586 
But what happens if we're looking at a profile from the interpreter? In that case, won't totalCount == 0 && counts[0] have the right value? In which case, the above fixup will lose this information. Maybe it should be:

    counts[0] += totalCount;

-Doug

> On 25 Oct 2017, at 05:52, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
> 
> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
> 
> Thanks,
> igor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/bde43bb2/attachment.html>

From thomas.schatzl at oracle.com  Wed Oct 25 12:43:08 2017
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 25 Oct 2017 14:43:08 +0200
Subject: Low-Overhead Heap Profiling
In-Reply-To: <CAF9BGBxQgVT5QkdBMwets0E+7BAmchCDWRqY2b04k06RFY=Xdw@mail.gmail.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <CAF9BGByisZchXTsw1=mLETLnOggJShRJBDxFeOUgt+criTTGuQ@mail.gmail.com>
 <1497366226.2829.109.camel@oracle.com>
 <CAF9BGBx8376y_yeMFnMC8PHSqKDaFc5NcbFerQ2up0YVKtSKGg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
 <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
 <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>
 <CAF9BGBzgdHoaW2XsDmmvbmfPbDRKP9uwX+VLcMg37rXYxSnC8w@mail.gmail.com>
 <CAF9BGBxQgVT5QkdBMwets0E+7BAmchCDWRqY2b04k06RFY=Xdw@mail.gmail.com>
Message-ID: <1508935388.13554.11.camel@oracle.com>

Hi Jc,

  sorry for taking a bit long to respond.... ;)

On Mon, 2017-10-23 at 08:27 -0700, JC Beyler wrote:
> Dear all,
> 
> Small update this week with this new webrev:
> ? -?http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/
> ? - Incremental is here:?http://cr.openjdk.java.net/~rasbold/8171119/
> webrev.12_13/
> 
> I patched the code changes showed by Robbin last week and I
> refactored collectedHeap.cpp:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/
> share/gc/shared/collectedHeap.cpp.patch
> 
> The original code became a bit too complex in my opinion with the
> handle_heap_sampling handling too many things. So I subdivided the
> logic into two smaller methods and moved out a bit of the logic to
> make it more clear. Hopefully it is :)
> 
> Let me know if you have any questions/comments :)
> Jc

A few minor issues:

  - weak reference handling has been factored out in JDK-8189359, now
you only need to add the additions required for this change to one
place. :) Please update the webrev :)

  - the one issue Robin noticed.

  - in the declaration of CollectedHeap::sample_allocation, it would be
nice if the fix_sample_rate parameter would be described - it takes a
time to figure out what it's used for. I.e. in case an allocation goes
beyond the sampling watermark, this value which represents the amount
of overallocation is used to adjust the next sampling watermark to
sample at the correct rate.
Something like this - and if what I wrote is incorrect, there is even
more reason to document it.
Or maybe just renaming "fix_sample_rate" to something more descriptive
- but I have no good idea about that.
With lack of units in the type, it would also be nice to have the unit
in the identifier name, as done elsewhere.

  - some (or most actually) of the new setters and getters in the
ThreadLocalAllocBuffer class could be private I think. Also, we
typically do not use "simple" getters that just return a member in the
class where they are defined.

  - ThreadLocalAllocBuffer::set_sample_end(): please use
pointer_delta() for pointer subtractions.

  - ThreadLocalAllocBuffer::pick_next_sample() - I recommend making the
first check an assert - it seems that it is only useful to call this
with heap monitoring enabled, as is done right now.

  - ThreadLocalAllocBuffer::pick_next_sample() - please use
"PTR_FORMAT" (or INTPTR_FORMAT - they are the same) as format string
for printing pointer values as is customary within Hotspot. %p output
is OS dependent. I.e. I heard that e.g. on Ubuntu it prints "null"
instead of 0x0...0 .... which is kind of annoying.

  - personal preference: do not allocate
HeapMonitoring::AlwaysTrueClosure globally, but only locally when it's
used. Setting it up seems to be very cheap.

  - HeapMonitoring::next_random() - the different names for the
constants use different formatting. Preferable (to me) is
UpperCamelCase, but at least make them uniform.

  - in HeapMonitoring::next_random(), you might want to use
right_n_bits() to create your mask.

  - not really convinced that it is a good idea to not somehow guard
StartHeapSampling() and StopHeapSampling() against being called by
multiple threads.

Otherwise looks okay from what I can see. 

Thanks,
  Thomas


From nils.eliasson at oracle.com  Wed Oct 25 13:13:08 2017
From: nils.eliasson at oracle.com (Nils Eliasson)
Date: Wed, 25 Oct 2017 15:13:08 +0200
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
 <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
 <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
Message-ID: <3ee0024d-6af5-afe9-8127-0dc2cc3a1711@oracle.com>

Deans suggestion with making the TypeVect initialization unconditional 
also removes all platform dependencies on type.cpp:

http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev.02/

Regards,
Nils

On 2017-10-25 00:02, Vladimir Kozlov wrote:
> We can't use platform specific UseAVX flag in shared code in type.cpp.
>
> I would say we should not support AVX (set UseAVX to 0) on AMD < 17h.
> And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 
> and corresponding vectors 32 and 64 bytes.
> If AMD's Instructions Set before 17h does not support whole 32 bytes 
> vectors we can't call it AVX.
>
> Thanks,
> Vladimir
>
> On 10/18/17 10:01 AM, dean.long at oracle.com wrote:
>> How about initializing TypeVect::VECTY and friends unconditionally?  
>> I am nervous about exchanging one guarding condition for another.
>>
>> dl
>>
>>
>> On 10/18/17 1:03 AM, Nils Eliasson wrote:
>>>
>>> HI,
>>>
>>> I ran into a problem with the interaction between MaxVectorSize and 
>>> the UseAVX. For some AMD CPUs we limit the vector size to 16 because 
>>> it gives the best performance.
>>>
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>        FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>      }
>>>
>>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the 
>>> TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even 
>>> though the platform has the capability.
>>>
>>> Type.cpp:~660
>>>
>>> [...]
>>> >   if (Matcher::vector_size_supported(T_FLOAT,8)) {
>>> >     TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
>>> >   }
>>> [...]
>>> >   mreg2type[Op_VecY] = TypeVect::VECTY;
>>>
>>>
>>> In the ad-files feature flags (UseAVX etc.) are used to control what 
>>> rules should be matched if it has effects on specific vector 
>>> registers. Here we have a mismatch.
>>>
>>> On a platform that supports AVX2 but have MaxVectorSize limited to 
>>> 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is 
>>> uninitialized. We will also hit asserts in a few places like: 
>>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), 
>>> "sanity");
>>>
>>> Shouldn't the type initialization in type.cpp be dependent on 
>>> feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for 
>>> the vector registers are initialized if the platform supports them, 
>>> but they might not be used if MaxVectorSize is limited.)
>>>
>>> This is a patch that solves the problem, but I have not convinced 
>>> myself that it is the right way:
>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>>
>>> Feedback appreciated,
>>>
>>> Regards,
>>> Nils Eliasson
>>>
>>>
>>>
>>>
>>>
>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>>
>>


From rwestrel at redhat.com  Wed Oct 25 14:29:03 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Wed, 25 Oct 2017 16:29:03 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>
Message-ID: <dk67evjnvww.fsf@rwestrel.remote.csb>


Hi Vladimir,

Thanks for looking at this.

> Did you consider less intrusive approach by adding branch over
> SafePoint with masking on index variable?
>
>    int mask = LoopStripMiningMask * inc; // simplified
>    for (int i = start; i < stop; i += inc) {
>       // body
>       if (i & mask != 0) continue;
>       safepoint;
>    }
>
> Or may be doing it inside .ad file in new SafePoint node
> implementation so that ideal graph is not affected.

We're looking for the best trade off between latency and thoughput: we
want the safepoint poll overhead to be entirely eliminated even when the
safepoint doesn't trigger.

> I am concern that suggested changes may affect Range Check elimination
> (you changed limit to variable value/flag) in addition to complexity
> of changes which may affect stability of C2.

The CountedLoop that is created with my patch is strictly identical to
the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are
not changed at that time. They are left as they are today. The
difference, with loop strip mining, is that the counted loop has a
skeleton outer loop. The bounds of the counted loop are adjusted once
loop opts are over. If the counted loop has a predicate, the predicate
is moved out of loop just as it is today. The only difference with
today, is that the predicate should be moved out of the outer loop. If a
pre and post loop needs to be created, then the only difference with
today is that the clones need to be moved out of the outer loop and
logic that locate the pre from the main loop need to account for the
outer loop.

It's obviously a complex change so if your primary concern is stability
then loop strip mining can be disabled by default. Assuming strip mining
off, then that patch is mostly some code refactoring and some logic that
never triggers.

Roland.

From ionutb83 at yahoo.com  Wed Oct 25 15:30:25 2017
From: ionutb83 at yahoo.com (Ionut)
Date: Wed, 25 Oct 2017 15:30:25 +0000 (UTC)
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <CAB=Je-Gv10KNn-hHsdFtdG=tTuAKwhH+Y_mB-HzKbeY+npyy_g@mail.gmail.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
 <1933684779.3254078.1508840637072@mail.yahoo.com>
 <354890084.3509873.1508863577533@mail.yahoo.com>
 <e043d652-9f49-0804-1911-fff449485566@oracle.com>
 <CAB=Je-Gv10KNn-hHsdFtdG=tTuAKwhH+Y_mB-HzKbeY+npyy_g@mail.gmail.com>
Message-ID: <473261957.4194696.1508945425413@mail.yahoo.com>

Hello All,
? ?Thanks for you input and useful links. Indeed, it confirms my initial guess.
RegardsIonut 

    On Tuesday, October 24, 2017 8:20 PM, Vladimir Sitnikov <sitnikov.vladimir at gmail.com> wrote:
 

 Just in case, here's Vladimir Ivanov's vectorization talk:?http://2017.jpoint.ru/en/talks/vector-programming-in-java/Slide 89 describes sum misundervectorization.
Vladimir

   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/e2deaf70/attachment.html>

From igor.veresov at oracle.com  Wed Oct 25 16:13:47 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 25 Oct 2017 09:13:47 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com>
Message-ID: <4D25A442-F0BD-41D1-837B-1115A165EA78@oracle.com>


> On Oct 25, 2017, at 3:07 AM, Doug Simon <doug.simon at oracle.com> wrote:
> 
>  581 
>  582             // Fixup the case of C1's inability to optimize profiling of a statically bindable call site
>  583             if (entries == 1) {
>  584                 counts[0] = totalCount;
>  585             }
>  586 
> But what happens if we're looking at a profile from the interpreter? In that case, won't totalCount == 0 && counts[0] have the right value? In which case, the above fixup will lose this information. Maybe it should be:
> 
>     counts[0] += totalCount;


If it?s pure interpreter you?d have entries == 0, so this fixup won?t fire. Also totalCount at the point of the fixup is a sum of every counter in profile (for all the types + the counter for types that weren?t recorded). So what the fixup does is that it attributes all the counts to the first type (if it?s a monomorphic call site).

igor

> 
> -Doug
> 
>> On 25 Oct 2017, at 05:52, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>> 
>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
>> 
>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/ <http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750 <https://bugs.openjdk.java.net/browse/JDK-8166750>
>> 
>> Thanks,
>> igor
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/1efc337f/attachment-0001.html>

From jcbeyler at google.com  Wed Oct 25 17:03:18 2017
From: jcbeyler at google.com (JC Beyler)
Date: Wed, 25 Oct 2017 10:03:18 -0700
Subject: Low-Overhead Heap Profiling
In-Reply-To: <68d73f67-1113-0997-8f5a-0baa23151397@oracle.com>
References: <CAF9BGBx8xFYudhJ=v1_hU5=nwEcC1X0NKRmC=E-RNN6SMs1ozg@mail.gmail.com>
 <1498215147.2741.34.camel@oracle.com>
 <CAF9BGBwaRu+12iRMm6U+n8j_+3izaEcv1OExdFKqZczEE7SJ0A@mail.gmail.com>
 <044f8c75-72f3-79fd-af47-7ee875c071fd@oracle.com>
 <CAF9BGBx7KR9vHgBtYVS5cWKRwdX-ggLn2X2P8cmbgu4+AnV=vg@mail.gmail.com>
 <23f4e6f5-c94e-01f7-ef1d-5e328d4823c8@oracle.com>
 <CAF9BGBzbf6e809jNQKh1C-zco9VR-Z003Qu6dKcr7yM4a_Zjbg@mail.gmail.com>
 <CAF9BGByozfCGHSs9mb5fMqe5vU6tgYSULTZHryK-UsUN95E2oQ@mail.gmail.com>
 <CAF9BGBw5HPF45e0yetwj7ydnu6N831KUWjZo6xNyHNKwG5XVyQ@mail.gmail.com>
 <CAF9BGByr5PYLax20s=jMch8pjNAeACS+fKodz41enYMBQ2_=bA@mail.gmail.com>
 <CAF9BGBxMYbUDx8f4PaoOgM1EgGdk6tnGdLE4ktRYGfws8DOTbQ@mail.gmail.com>
 <CAF9BGBxiX1DVKiW5zRhWa5ZTtX4VHM8aaGE-KE26erPT6-MkRQ@mail.gmail.com>
 <5ec70351-910a-96bb-eb03-43ca88bd6259@oracle.com>
 <CAF9BGBzgdHoaW2XsDmmvbmfPbDRKP9uwX+VLcMg37rXYxSnC8w@mail.gmail.com>
 <CAF9BGBxQgVT5QkdBMwets0E+7BAmchCDWRqY2b04k06RFY=Xdw@mail.gmail.com>
 <68d73f67-1113-0997-8f5a-0baa23151397@oracle.com>
Message-ID: <CAF9BGBwxLqhEay8rnTaZtxAcJf9nVPGLGZSSwCpNNzmCRygKmw@mail.gmail.com>

Clearly a last minute clean-up gone awry... Fixed for next webrev :)

On Wed, Oct 25, 2017 at 12:30 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:

> Hi,
>
> 325     HeapWord *tlab_old_end = thread->tlab().return end();
>
> Should be something like:
>
> 325     HeapWord *tlab_old_end = thread->tlab().end();
>
> Thanks, Robbin
>
> On 2017-10-23 17:27, JC Beyler wrote:
>
>> Dear all,
>>
>> Small update this week with this new webrev:
>>    - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/
>>    - Incremental is here: http://cr.openjdk.java.net/~ra
>> sbold/8171119/webrev.12_13/
>>
>> I patched the code changes showed by Robbin last week and I refactored
>> collectedHeap.cpp:
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src
>> /hotspot/share/gc/shared/collectedHeap.cpp.patch
>>
>> The original code became a bit too complex in my opinion with the
>> handle_heap_sampling handling too many things. So I subdivided the logic
>> into two smaller methods and moved out a bit of the logic to make it more
>> clear. Hopefully it is :)
>>
>> Let me know if you have any questions/comments :)
>> Jc
>>
>> On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler <jcbeyler at google.com <mailto:
>> jcbeyler at google.com>> wrote:
>>
>>     Hi Robbin,
>>
>>     That is because version 11 to 12 was only a test change. I was going
>> to
>>     write about it and say here are the webrev links:
>>     Incremental:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/>
>>
>>     Full webrev:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/
>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/>
>>
>>     This change focused only on refactoring the tests to be more
>> manageable,
>>     readable, maintainable. As all tests are looking at allocations, I
>> moved
>>     common code to a java class:
>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitor.java.patch
>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitor.java.patch>
>>
>>     And then most tests call into that class to turn on/off the sampling,
>>     allocate, etc. This has removed almost 500 lines of test code so I'm
>> happy
>>     about that.
>>
>>     Thanks for your changes, a bit of relics of previous versions :). I've
>>     already integrated them into my code and will make a new webrev end
>> of this
>>     week with a bit of refactor of the code handling the tlab slow path.
>> I find
>>     it could use a bit of refactoring to make it easier to follow so I'm
>> going
>>     to take a stab at it this week.
>>
>>     Any other issues/comments?
>>
>>     Thanks!
>>     Jc
>>
>>
>>     On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <robbin.ehn at oracle.com
>>     <mailto:robbin.ehn at oracle.com>> wrote:
>>
>>         Hi JC,
>>
>>         I saw a webrev.12 in the directory, with only test
>> changes(11->12), so I
>>         took that version.
>>         I had a look and tested the tests, worked fine!
>>
>>         First glance at the code (looking at full v12) some minor things
>> below,
>>         mostly unused stuff.
>>
>>         Thanks, Robbin
>>
>>         diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
>>         --- a/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct
>> 16
>>         16:54:06 2017 +0200
>>         +++ b/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct
>> 16
>>         17:42:42 2017 +0200
>>         @@ -211,2 +211,3 @@
>>             void initialize(int max_storage) {
>>         +    // validate max_storage to sane value ? What would 0 mean ?
>>               MutexLocker mu(HeapMonitor_lock);
>>         @@ -227,8 +228,4 @@
>>             bool initialized() { return _initialized; }
>>         -  volatile bool *initialized_address() { return &_initialized; }
>>
>>            private:
>>         -  // Protects the traces currently sampled (below).
>>         -  volatile intptr_t _stack_storage_lock[1];
>>         -
>>             // The traces currently sampled.
>>         @@ -313,3 +310,2 @@
>>             _initialized(false) {
>>         -    _stack_storage_lock[0] = 0;
>>           }
>>         @@ -532,13 +528,2 @@
>>
>>         -// Delegate the initialization question to the underlying
>> storage system.
>>         -bool HeapMonitoring::initialized() {
>>         -  return StackTraceStorage::storage()->initialized();
>>         -}
>>         -
>>         -// Delegate the initialization question to the underlying
>> storage system.
>>         -bool *HeapMonitoring::initialized_address() {
>>         -  return
>>         -             const_cast<bool*>(StackTraceS
>> torage::storage()->initialized_address());
>>         -}
>>         -
>>           void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces)
>> {
>>         diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
>>         --- a/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct
>> 16
>>         16:54:06 2017 +0200
>>         +++ b/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct
>> 16
>>         17:42:42 2017 +0200
>>         @@ -35,3 +35,2 @@
>>             static uint64_t _rnd;
>>         -  static bool _initialized;
>>             static jint _monitoring_rate;
>>         @@ -92,7 +91,2 @@
>>
>>         -  // Is the profiler initialized and where is the address to the
>>         initialized
>>         -  // boolean.
>>         -  static bool initialized();
>>         -  static bool *initialized_address();
>>         -
>>             // Called when o is to be sampled from a given thread and a
>> given size.
>>
>>
>>
>>         On 10/10/2017 12:57 AM, JC Beyler wrote:
>>
>>             Dear all,
>>
>>             Thread-safety is back!! Here is the update webrev:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/>
>>
>>             Full webrev is here:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/>
>>
>>             In order to really test this, I needed to add this so thought
>> now
>>             was a good time. It required a few changes here for the
>> creation to
>>             ensure correctness and safety. Now we keep the static pointer
>> but
>>             clear the data internally so on re-initialize, it will be a
>> bit more
>>             costly than before. I don't think this is a huge use-case so
>> I did
>>             not think it was a problem. I used the internal MutexLocker,
>> I think
>>             I used it well, let me know.
>>
>>             I also added three tests:
>>
>>             1) Stack depth test:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorStackDepthTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorStackDepthTest.java.patch>
>>
>>             This test shows that the maximum stack depth system is
>> working.
>>
>>             2) Thread safety:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorThreadTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorThreadTest.java.patch>
>>
>>             The test creates 24 threads and they all allocate at the same
>> time.
>>             The test then checks it does find samples from all the
>> threads.
>>
>>             3) Thread on/off safety
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>> eapMonitorThreadOnOffTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorThreadOnOffTest.java.patch>
>>
>>             The test creates 24 threads that all allocate a bunch of
>> memory.
>>             Then another thread turns the sampling on/off.
>>
>>             Btw, both tests 2 & 3 failed without the locks.
>>
>>             As I worked on this, I saw a lot of places where the tests
>> are doing
>>             very similar things, I'm going to clean up the code a bit and
>> make a
>>             HeapAllocator class that all tests can call directly. This
>> will
>>             greatly simplify the code.
>>
>>             Thanks for any comments/criticisms!
>>             Jc
>>
>>
>>             On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <
>> jcbeyler at google.com
>>             <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com
>>             <mailto:jcbeyler at google.com>>> wrote:
>>
>>                  Dear all,
>>
>>                  Small update to the webrev:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>>
>>
>>                  Full webrev is here:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>>
>>
>>                  I updated a bit of the naming, removed a TODO comment,
>> and I
>>             added a test for testing the sampling rate. I also updated the
>>             maximum stack depth to 1024, there is no
>>                  reason to keep it so small. I did a micro benchmark that
>> tests
>>             the overhead and it seems relatively the same.
>>
>>                  I compared allocations from a stack depth of 10 and
>> allocations
>>             from a stack depth of 1024 (allocations are from the same
>> helper
>>             method in
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi
>> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/
>> MyPackage/HeapMonitorStatRateTest.java
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_f
>> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor
>> /MyPackage/HeapMonitorStatRateTest.java>
>>                             <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.10/raw_files/new/test/hotspot/jtreg/se
>> rviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatRateTest.java
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_f
>> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor
>> /MyPackage/HeapMonitorStatRateTest.java>>):
>>                             - For an array of 1 integer allocated in a
>> loop;
>>             stack depth 1024 vs stack depth 10: 1% slower
>>                             - For an array of 200k integers allocated in
>> a loop;
>>             stack depth 1024 vs stack depth 10: 3% slower
>>
>>                  So basically now moving the maximum stack depth to 1024
>> but we
>>             only copy over the stack depths actually used.
>>
>>                  For the next webrev, I will be adding a stack depth test
>> to
>>             show that it works and probably put back the mutex locking so
>> that
>>             we can see how difficult it is to keep
>>                  thread safe.
>>
>>                  Let me know what you think!
>>                  Jc
>>
>>
>>
>>                  On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <
>> jcbeyler at google.com
>>             <mailto:jcbeyler at google.com> <mailto:jcbeyler at google.com
>>             <mailto:jcbeyler at google.com>>> wrote:
>>
>>                      Forgot to say that for my numbers:
>>                        - Not in the test are the actual numbers I got for
>> the
>>             various array sizes, I ran the program 30 times and parsed the
>>             output; here are the averages and standard
>>                      deviation:
>>                             1000:     1.28% average; 1.13% standard
>> deviation
>>                             10000:    1.59% average; 1.25% standard
>> deviation
>>                             100000:   1.26% average; 1.26% standard
>> deviation
>>
>>                      The 1000/10000/100000 are the sizes of the arrays
>> being
>>             allocated. These are allocated 100k times and the sampling
>> rate is
>>             111 times the size of the array.
>>
>>                      Thanks!
>>                      Jc
>>
>>
>>                      On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler
>>             <jcbeyler at google.com <mailto:jcbeyler at google.com>
>>             <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>>
>> wrote:
>>
>>                          Hi all,
>>
>>                          After a bit of a break, I am back working on
>> this :).
>>             As before, here are two webrevs:
>>
>>                          - Full change set:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.09/>>
>>                          - Compared to version 8:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/>>
>>                               (This version is compared to version 8 I
>> last
>>             showed but ported to the new folder hierarchy)
>>
>>                          In this version I have:
>>                             - Handled Thomas' comments from his email of
>> 07/03:
>>                                  - Merged the logging to be standard
>>                                  - Fixed up the code a bit where asked
>>                                  - Added some notes about the code not
>> being
>>             thread-safe yet
>>                              - Removed additional dead code from the
>> version
>>             that modifies interpreter/c1/c2
>>                              - Fixed compiler issues so that it compiles
>> with
>>             --disable-precompiled-header
>>                                   - Tested with ./configure
>>             --with-boot-jdk=<jdk8> --with-debug-level=slowdebug
>>             --disable-precompiled-headers
>>
>>                          Additionally, I added a test to check the sanity
>> of the
>>             sampler: HeapMonitorStatCorrectnessTest
>>                                     (http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorStatCorrectnessTest.java.patch>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorStatCorrectnessTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>> HeapMonitorStatCorrectnessTest.java.patch>>)
>>                              - This allocates a number of arrays and
>> checks that
>>             we obtain the number of samples we want with an accepted
>> error of
>>             5%. I tested it 100 times and it
>>                          passed everytime, I can test more if wanted
>>                              - Not in the test are the actual numbers I
>> got for
>>             the various array sizes, I ran the program 30 times and
>> parsed the
>>             output; here are the averages and
>>                          standard deviation:
>>                                 1000:     1.28% average; 1.13% standard
>> deviation
>>                                 10000:    1.59% average; 1.25% standard
>> deviation
>>                                 100000:   1.26% average; 1.26% standard
>> deviation
>>
>>                          What this means is that we were always at about
>> 1~2% of
>>             the number of samples the test expected.
>>
>>                          Let me know what you think,
>>                          Jc
>>
>>                          On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler
>>             <jcbeyler at google.com <mailto:jcbeyler at google.com>
>>             <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>>
>> wrote:
>>
>>                              Hi all,
>>
>>                              I apologize, I have not yet handled your
>> remarks
>>             but thought this new webrev would also be useful to see and
>> comment
>>             on perhaps.
>>
>>                              Here is the latest webrev, it is generated
>> slightly
>>             different than the others since now I'm using webrev.ksh
>> without the
>>             -N option:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>>
>>
>>                              And the webrev.07 to webrev.08 diff is here:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07_08/>>
>>
>>                              (Let me know if it works well)
>>
>>                              It's a small change between versions but it:
>>                                 - provides a fix that makes the average
>> sample
>>             rate correct (more on that below).
>>                                 - fixes the code to actually have it play
>> nicely
>>             with the fast tlab refill
>>                                 - cleaned up a bit the JVMTI text and now
>> use
>>             jvmtiFrameInfo
>>                              - moved the capability to be onload solo
>>
>>                              With this webrev, I've done a small study of
>> the
>>             random number generator we use here for the sampling rate. I
>> took a
>>             small program and it can be simplified to:
>>
>>                              for (outer loop)
>>                              for (inner loop)
>>                              int[] tmp = new int[arraySize];
>>
>>                              - I've fixed the outer and inner loops to
>> being 800
>>             for this experiment, meaning we allocate 640000 times an
>> array of a
>>             given array size.
>>
>>                              - Each program provides the average sample
>> size
>>             used for the whole execution
>>
>>                              - Then, I ran each variation 30 times and
>> then
>>             calculated the average of the average sample size used for
>> various
>>             array sizes. I selected the array size to
>>                              be one of the following: 1, 10, 100, 1000.
>>
>>                              - When compared to 512kb, the average sample
>> size
>>             of 30 runs:
>>                              1: 4.62% of error
>>                              10: 3.09% of error
>>                              100: 0.36% of error
>>                              1000: 0.1% of error
>>                              10000: 0.03% of error
>>
>>                              What it shows is that, depending on the
>> number of
>>             samples, the average does become better. This is because with
>> an
>>             allocation of 1 element per array, it
>>                              will take longer to hit one of the
>> thresholds. This
>>             is seen by looking at the sample count statistic I put in.
>> For the
>>             same number of iterations (800 *
>>                              800), the different array sizes provoke:
>>                              1: 62 samples
>>                              10: 125 samples
>>                              100: 788 samples
>>                              1000: 6166 samples
>>                              10000: 57721 samples
>>
>>                              And of course, the more samples you have,
>> the more
>>             sample rates you pick, which means that your average gets
>> closer
>>             using that math.
>>
>>                              Thanks,
>>                              Jc
>>
>>                              On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler
>>             <jcbeyler at google.com <mailto:jcbeyler at google.com>
>>             <mailto:jcbeyler at google.com <mailto:jcbeyler at google.com>>>
>> wrote:
>>
>>                                  Thanks Robbin,
>>
>>                                  This seems to have worked. When I have
>> the next
>>             webrev ready, we will find out but I'm fairly confident it
>> will work!
>>
>>                                  Thanks agian!
>>                                  Jc
>>
>>                                  On Wed, Jun 28, 2017 at 11:46 PM, Robbin
>> Ehn
>>             <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>
>>             <mailto:robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>>
>> wrote:
>>
>>                                      Hi JC,
>>
>>                                      On 06/29/2017 12:15 AM, JC Beyler
>> wrote:
>>
>>                                          B) Incremental changes
>>
>>
>>                                      I guess the most common work flow
>> here is
>>             using mq :
>>                                      hg qnew fix_v1
>>                                      edit files
>>                                      hg qrefresh
>>                                      hg qnew fix_v2
>>                                      edit files
>>                                      hg qrefresh
>>
>>                                      if you do hg log you will see 2
>> commits
>>
>>                                      webrev.ksh -r -2 -o my_inc_v1_v2
>>                                      webrev.ksh -o my_full_v2
>>
>>
>>                                      In  your .hgrc you might need:
>>                                      [extensions]
>>                                      mq =
>>
>>                                      /Robbin
>>
>>
>>                                          Again another newbiew question
>> here...
>>
>>                                          For showing the incremental
>> changes, is
>>             there a link that explains how to do that? I apologize for my
>> newbie
>>             questions all the time :)
>>
>>                                          Right now, I do:
>>
>>                                             ksh ../webrev.ksh -m -N
>>
>>                                          That generates a webrev.zip and
>> send it
>>             to Chuck Rasbold. He then uploads it to a new webrev.
>>
>>                                          I tried commiting my change and
>> adding
>>             a small change. Then if I just do ksh ../webrev.ksh without
>> any
>>             options, it seems to produce a similar
>>                                          page but now with only the
>> changes I
>>             had (so the 06-07 comparison you were talking about) and a
>> changeset
>>             that has it all. I imagine that is
>>                                          what you meant.
>>
>>                                          Which means that my workflow
>> would become:
>>
>>                                          1) Make changes
>>                                          2) Make a webrev without any
>> options to
>>             show just the differences with the tip
>>                                          3) Amend my changes to my local
>> commit
>>             so that I have it done with
>>                                          4) Go to 1
>>
>>                                          Does that seem correct to you?
>>
>>                                          Note that when I do this, I only
>> see
>>             the full change of a file in the full change set (Side note
>> here:
>>             now the page says change set and not
>>                                          patch, which is maybe why
>> Serguei was
>>             having issues?).
>>
>>                                          Thanks!
>>                                          Jc
>>
>>
>>
>>                                          On Wed, Jun 28, 2017 at 1:12 AM,
>> Robbin
>>             Ehn <robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>
>>             <mailto:robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>>
>>             <mailto:robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>
>>                                          <mailto:robbin.ehn at oracle.com
>>             <mailto:robbin.ehn at oracle.com>>>> wrote:
>>
>>                                               Hi,
>>
>>                                               On 06/28/2017 12:04 AM, JC
>> Beyler
>>             wrote:
>>
>>                                                   Dear Thomas et al,
>>
>>                                                   Here is the newest
>> webrev:
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/>>>
>>
>>
>>
>>                                               You have some more bits to
>> in
>>             there but generally this looks good and really nice with more
>> tests.
>>                                               I'll do and deep dive and
>> re-test
>>             this when I get back from my long vacation with whatever patch
>>             version you have then.
>>
>>                                               Also I think it's time you
>> provide
>>             incremental (v06->07 changes) as well as complete change-sets.
>>
>>                                               Thanks, Robbin
>>
>>
>>
>>
>>                                                   Thomas, I "think" I have
>>             answered all your remarks. The summary is:
>>
>>                                                   - The statistic system
>> is up
>>             and provides insight on what the heap sampler is doing
>>                                                        - I've noticed
>> that,
>>             though the sampling rate is at the right mean, we are missing
>> some
>>             samples, I have not yet tracked out why
>>                                          (details below)
>>
>>                                                   - I've run a tiny
>> benchmark
>>             that is the worse case: it is a very tight loop and allocated
>> a
>>             small array
>>                                                        - In this case, I
>> see no
>>             overhead when the system is off so that is a good start :)
>>                                                        - I see right now
>> a high
>>             overhead in this case when sampling is on. This is not a
>> really too
>>             surprising but I'm going to see if
>>                                          this is consistent with our
>>                                                   internal
>> implementation. The
>>             benchmark is really allocation stressful so I'm not too
>> surprised
>>             but I want to do the due diligence.
>>
>>                                                      - The statistic
>> system up
>>             is up and I have a new test
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> >
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> >>
>>                                                              <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test
>> /serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatT
>> est.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> >
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatTest.java.patch
>> >>>
>>                                                         - I did a bit of
>> a study
>>             about the random generator here, more details are below but
>>             basically it seems to work well
>>
>>                                                      - I added a
>> capability but
>>             since this is the first time doing this, I was not sure I did
>> it right
>>                                                        - I did add a test
>> though
>>             for it and the test seems to do what I expect (all methods are
>>             failing with the
>>                                          JVMTI_ERROR_MUST_POSSESS_CAPABILITY
>> error).
>>                                                            -
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch>
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch>>
>>
>>               <http://cr.openjdk.java.net/~r
>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch>
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>> bilityTest.java.patch>>>
>>
>>                                                      - I still need to
>> figure
>>             out what to do about the multi-agent vs single-agent issue
>>
>>                                                      - As far as
>> measurements,
>>             it seems I still need to look at:
>>                                                        - Why we do the 20
>> random
>>             calls first, are they necessary?
>>                                                        - Look at the mean
>> of the
>>             sampling rate that the random generator does and also what is
>>             actually sampled
>>                                                        - What is the
>> overhead in
>>             terms of memory/performance when on?
>>
>>                                                   I have inlined my
>> answers, I
>>             think I got them all in the new webrev, let me know your
>> thoughts.
>>
>>                                                   Thanks again!
>>                                                   Jc
>>
>>
>>                                                   On Fri, Jun 23, 2017 at
>> 3:52
>>             AM, Thomas Schatzl <thomas.schatzl at oracle.com
>>             <mailto:thomas.schatzl at oracle.com> <mailto:
>> thomas.schatzl at oracle.com
>>             <mailto:thomas.schatzl at oracle.com>>
>>                                          <mailto:thomas.schatzl at oracle.
>> com
>>             <mailto:thomas.schatzl at oracle.com> <mailto:
>> thomas.schatzl at oracle.com
>>             <mailto:thomas.schatzl at oracle.com>>>
>>             <mailto:thomas.schatzl at oracle.com <mailto:
>> thomas.schatzl at oracle.com>
>>             <mailto:thomas.schatzl at oracle.com <mailto:
>> thomas.schatzl at oracle.com>>
>>
>>                                                              <mailto:
>> thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.com>
>>             <mailto:thomas.schatzl at oracle.com
>>             <mailto:thomas.schatzl at oracle.com>>>>> wrote:
>>
>>                                                        Hi,
>>
>>                                                        On Wed, 2017-06-21
>> at
>>             13:45 -0700, JC Beyler wrote:
>>                                                        > Hi all,
>>                                                        >
>>                                                        > First off:
>> Thanks again
>>             to Robbin and Thomas for their reviews :)
>>                                                        >
>>                                                        > Next, I've
>> uploaded a
>>             new webrev:
>>                                                        >
>>             http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>
>>                                                              <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>
>>                                                     <
>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/
>>             <http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>>>>
>>
>>                                                        >
>>                                                        > Here is an
>> update:
>>                                                        >
>>                                                        > - @Robbin, I
>> forgot to
>>             say that yes I need to look at implementing
>>                                                        > this for the
>> other
>>             architectures and testing it before it is all
>>                                                        > ready to go. Is
>> it
>>             common to have it working on all possible
>>                                                        > combinations or
>> is
>>             there a subset that I should be doing first and we
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/5a0c311b/attachment-0001.html>

From doug.simon at oracle.com  Wed Oct 25 17:23:16 2017
From: doug.simon at oracle.com (Doug Simon)
Date: Wed, 25 Oct 2017 19:23:16 +0200
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <4D25A442-F0BD-41D1-837B-1115A165EA78@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <8150BCA2-3C50-4624-98ED-D914D51A5C6B@oracle.com>
 <4D25A442-F0BD-41D1-837B-1115A165EA78@oracle.com>
Message-ID: <EFB16F1D-B936-4576-A0B9-91A92FD1DEF0@oracle.com>

Thanks for the explanation - maybe you could add it to code as a comment.

Sent from my iPhone

> On 25 Oct 2017, at 6:13 pm, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
> 
> 
>> On Oct 25, 2017, at 3:07 AM, Doug Simon <doug.simon at oracle.com> wrote:
>> 
>>  581 
>>  582             // Fixup the case of C1's inability to optimize profiling of a statically bindable call site
>>  583             if (entries == 1) {
>>  584                 counts[0] = totalCount;
>>  585             }
>>  586 
>> But what happens if we're looking at a profile from the interpreter? In that case, won't totalCount == 0 && counts[0] have the right value? In which case, the above fixup will lose this information. Maybe it should be:
>> 
>>     counts[0] += totalCount;
> 
> 
> If it?s pure interpreter you?d have entries == 0, so this fixup won?t fire. Also totalCount at the point of the fixup is a sum of every counter in profile (for all the types + the counter for types that weren?t recorded). So what the fixup does is that it attributes all the counts to the first type (if it?s a monomorphic call site).
> 
> igor
> 
>> 
>> -Doug
>> 
>>> On 25 Oct 2017, at 05:52, Igor Veresov <igor.veresov at oracle.com> wrote:
>>> 
>>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
>>> 
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
>>> 
>>> Thanks,
>>> igor
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/ea6d37b9/attachment.html>

From vladimir.kozlov at oracle.com  Wed Oct 25 17:29:36 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 25 Oct 2017 10:29:36 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
Message-ID: <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>

Igor

Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms.

Thanks,
Vladimir

On 10/24/17 8:52 PM, Igor Veresov wrote:
> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
> 
> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
> 
> Thanks,
> igor
> 

From vladimir.kozlov at oracle.com  Wed Oct 25 19:07:26 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 25 Oct 2017 12:07:26 -0700
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <3ee0024d-6af5-afe9-8127-0dc2cc3a1711@oracle.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
 <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
 <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
 <3ee0024d-6af5-afe9-8127-0dc2cc3a1711@oracle.com>
Message-ID: <43716769-5d0a-f663-ef8e-c6da60346ac8@oracle.com>

Hi Nils,

"On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like"

MaxVectorSize was designed to limit vector size for testing purpose. I just run compiler/codegen jtreg tests, which includes vector tests, on avx2 Intel machine with -XX:MaxVectorSize=16 and did not 
hit any problems.

I looked and did not find what mismatch you are talking about:
"In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch."

C2 should not generate vector with size > MaxVectorSize so they should not be any instructions in .ad file which conflict with it.

Can you show output of -Xlog:os+cpu on your machine?

vector_size_supported() takes into account MaxVectorSize:

   static const bool vector_size_supported(const BasicType bt, int size) {
     return (Matcher::max_vector_size(bt) >= size &&
             Matcher::min_vector_size(bt) <= size);
   }

const int Matcher::max_vector_size(const BasicType bt) {
   return vector_width_in_bytes(bt)/type2aelembytes(bt);
}

const int Matcher::vector_width_in_bytes(BasicType bt) {
...
   // Use flag to limit vector size.
   size = MIN2(size,(int)MaxVectorSize);


Thanks,
Vladimir

On 10/25/17 6:13 AM, Nils Eliasson wrote:
> Deans suggestion with making the TypeVect initialization unconditional also removes all platform dependencies on type.cpp:
> 
> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev.02/
> 
> Regards,
> Nils
> 
> On 2017-10-25 00:02, Vladimir Kozlov wrote:
>> We can't use platform specific UseAVX flag in shared code in type.cpp.
>>
>> I would say we should not support AVX (set UseAVX to 0) on AMD < 17h.
>> And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and corresponding vectors 32 and 64 bytes.
>> If AMD's Instructions Set before 17h does not support whole 32 bytes vectors we can't call it AVX.
>>
>> Thanks,
>> Vladimir
>>
>> On 10/18/17 10:01 AM, dean.long at oracle.com wrote:
>>> How about initializing TypeVect::VECTY and friends unconditionally? I am nervous about exchanging one guarding condition for another.
>>>
>>> dl
>>>
>>>
>>> On 10/18/17 1:03 AM, Nils Eliasson wrote:
>>>>
>>>> HI,
>>>>
>>>> I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives the best performance.
>>>>
>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>> ???? }
>>>>
>>>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the platform has the capability.
>>>>
>>>> Type.cpp:~660
>>>>
>>>> [...]
>>>> >?? if (Matcher::vector_size_supported(T_FLOAT,8)) {
>>>> >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
>>>> >?? }
>>>> [...]
>>>> >?? mreg2type[Op_VecY] = TypeVect::VECTY;
>>>>
>>>>
>>>> In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch.
>>>>
>>>> On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like: 
>>>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity");
>>>>
>>>> Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, 
>>>> but they might not be used if MaxVectorSize is limited.)
>>>>
>>>> This is a patch that solves the problem, but I have not convinced myself that it is the right way:
>>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>>>
>>>> Feedback appreciated,
>>>>
>>>> Regards,
>>>> Nils Eliasson
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>>>
>>>
> 

From martin.doerr at sap.com  Wed Oct 25 19:08:59 2017
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 25 Oct 2017 19:08:59 +0000
Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by
 exploiting vector instructions
In-Reply-To: <AF492C88-E679-4C5E-8745-F33B857BBB9E@sap.com>
References: <AF492C88-E679-4C5E-8745-F33B857BBB9E@sap.com>
Message-ID: <18ddb703d81a4a22bc97f134dd276eff@sap.com>

Hi Lutz,

thanks for working on vector-based enhancements and for providing this webrev.

assembler_s390:
-The changes in the assembler look good.

s390.ad:
-It doesn't make sense to load constant len to a register and generate complex compare instructions for it and still to emit code for all cases. I assume that e.g. the 4 characters cases usually have a constant length. If so, much better code could be generated for them by omitting all the stuff around the simple instructions. (ppc64.ad already contains nodes for constant length of needle in indexOf rules.)

macroAssembler_s390:
-Are you sure the prefetch instructions improve performance?
I remember that we had them in other String intrinsics but removed them again as they showed absolutely no performance gain.
-Comment: Using hardcoded vector registers is ok for now, but may need to get changed e.g. when using them for C2's SuperWord optimization.
-Comment: You could use the vperm instruction instead of vo+vn, but I'm ok with the current implementation because loading a mask is much more convenient than getting the permutation vector loaded (e.g. from constant pool or pc relative).
-So the new vector loop looks good to me.
-In my opinion, the size of all the generated cases should be in relationship to their performance benefit.
As intrinsics are not like stubs and may get inlined often, I can't get rid of the impression that generating so large code wastes valuable code cache space with questionable performance gain in real world scenarios.

Best regards,
Martin

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Mittwoch, 25. Oktober 2017 12:02
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions

Dear all,

I would like to request reviews for this s390-only enhancement:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8189793
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189793.00/index.html

Vector instructions, which have been available on System z for a while (since z13), promise noticeable performance improvements. This enhancement improves the String Compress and String Inflate intrinsics by exploiting vector instructions, when available. For long strings, up to 2x performance improvement has been observed in micro-benchmarks.

Special care was taken to preserve good performance for short strings. All examined workloads showed a high ratio of short and very short strings.

Thank you!
Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/b99f14e5/attachment-0001.html>

From dean.long at oracle.com  Wed Oct 25 20:32:22 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Wed, 25 Oct 2017 13:32:22 -0700
Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for
 is_deopt_suspend needlessly
In-Reply-To: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>
References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>
Message-ID: <34bf5c2b-0b8e-1d9c-1b3e-e152d7e3cbf4@oracle.com>

Looks OK.? It appears that only Sparc uses is_deopt_suspend(), and then 
only when we exit native.

dl


On 10/25/17 2:35 AM, jamsheed wrote:
> Hi,
>
> request for review,
>
> webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/
>
> jbs: https://bugs.openjdk.java.net/browse/JDK-6523512
>
> desc: removed the is_deopt_suspend() from 
> has_special_runtime_exit_condition checks
>
> Best regards,
>
> Jamsheed
>


From igor.veresov at oracle.com  Wed Oct 25 22:16:47 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 25 Oct 2017 15:16:47 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>
Message-ID: <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com>

Sure. I?ve updated the webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/ <http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/>
Also added a comment in HotSpotMethodData.java per Doug?s request.

igor

> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Igor
> 
> Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms.
> 
> Thanks,
> Vladimir
> 
> On 10/24/17 8:52 PM, Igor Veresov wrote:
>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
>> Thanks,
>> igor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171025/95cad540/attachment.html>

From vladimir.kozlov at oracle.com  Wed Oct 25 23:04:09 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 25 Oct 2017 16:04:09 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>
 <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com>
Message-ID: <83757fa4-fed9-3d65-dd91-e547e3bcac05@oracle.com>

Looks good.

Thanks,
Vladimir

On 10/25/17 3:16 PM, Igor Veresov wrote:
> Sure. I?ve updated the webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/
> Also added a comment in HotSpotMethodData.java per Doug?s request.
> 
> igor
> 
>> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>> Igor
>>
>> Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms.
>>
>> Thanks,
>> Vladimir
>>
>> On 10/24/17 8:52 PM, Igor Veresov wrote:
>>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it 
>>> speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
>>> Thanks,
>>> igor
> 

From vladimir.kozlov at oracle.com  Wed Oct 25 23:21:36 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 25 Oct 2017 16:21:36 -0700
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco,
 handler_bcis->at(i), scope_depth)->pco() == handler_pcos->at(i))" failure
 with C1
In-Reply-To: <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
 <dk6zi97w0z6.fsf@rwestrel.remote.csb> <dk6shegr175.fsf@rwestrel.remote.csb>
 <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>
 <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com>
Message-ID: <59e5b4ff-1fb3-db98-264c-ea6b6f98e526@oracle.com>

Hi Roland,

Tests passed. Please, send changeset with test moved into compiler/exceptions/ directory.

Thanks,
Vladimir

On 10/24/17 2:08 PM, Vladimir Kozlov wrote:
> It looks good to me too. The only issue is test's placement - /c1 subdir is nothing to do with C1 compiler. I think test should be put into compiler/exceptions/ directory.
> I submitted pre-integration testing.
> 
> Thanks,
> Vladimir
> 
> On 10/18/17 8:19 PM, dean.long at oracle.com wrote:
>> Yes, but I'm not a Reviewer.
>>
>> dl
>>
>>
>> On 10/18/17 7:16 AM, Roland Westrelin wrote:
>>> Here is an updated webrev with Dean's suggestion:
>>>
>>> http://cr.openjdk.java.net/~roland/8188151/webrev.01/
>>>
>>> Can this be considered reviewed by you, Dean?
>>>
>>> Roland.
>>

From igor.veresov at oracle.com  Wed Oct 25 23:25:48 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Wed, 25 Oct 2017 16:25:48 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <83757fa4-fed9-3d65-dd91-e547e3bcac05@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>
 <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com>
 <83757fa4-fed9-3d65-dd91-e547e3bcac05@oracle.com>
Message-ID: <147B8B6A-1BDE-42C8-BEC5-9BD6538625DC@oracle.com>

Thanks!
igor

> On Oct 25, 2017, at 4:04 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 10/25/17 3:16 PM, Igor Veresov wrote:
>> Sure. I?ve updated the webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/
>> Also added a comment in HotSpotMethodData.java per Doug?s request.
>> igor
>>> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>> 
>>> Igor
>>> 
>>> Can you factor out checks into boolean function in shared place? May be move some surrounding code into it too - I see the same code on all platforms.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 10/24/17 8:52 PM, Igor Veresov wrote:
>>>> This a fix from Tom that I ported to all architectures and the new repo structure. While that fix doesn?t not solve the problem of the interpreter-C1 profiling style discrepancy completely it speeds up profiling of the statically bindable call sites and we?d like to push that. I also added a bit of a code to JVMCI to do the profile fix up analogous to what happens in CI.
>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
>>>> Thanks,
>>>> igor


From rohitarulraj at gmail.com  Thu Oct 26 04:48:35 2017
From: rohitarulraj at gmail.com (Rohit Arul Raj)
Date: Thu, 26 Oct 2017 10:18:35 +0530
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
 <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
 <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
Message-ID: <CAPVMLfVzJg+M3szHBKMnKAQ8xqddNquYoG+F0==n6=9JZ94b6A@mail.gmail.com>

Hello Vladimir,


Please find the requested details:


AVX/AVX2 support availability on AMD Processors:

Family 14h and earlier ? No AVX support

Family 15h -  (1st-gen), (2nd-gen), (3rd-gen) AVX support available, max
vector width is 32 bytes (we limit the vector size to 16 bytes in openJDK).

Family 16h ? AVX support available, max vector width is 32 bytes (we limit
the vector size to 16 bytes in openJDK).

Family 15h -  (4th-gen) AVX, AVX2 support available, max vector width is 32
bytes (we limit the vector size to 16 bytes in openJDK).

Family 17h ? AVX, AVX2 support available, max vector width is 32 bytes (our
proposed changes have vector size set to 32 bytes in openJDK).

AVX3 support is not available on AMD processors yet.

>From the comments below, Dean's suggestions seems reasonable.

Regards,
Rohit


On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> We can't use platform specific UseAVX flag in shared code in type.cpp.
>
> I would say we should not support AVX (set UseAVX to 0) on AMD < 17h.
> And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and
> corresponding vectors 32 and 64 bytes.
> If AMD's Instructions Set before 17h does not support whole 32 bytes
> vectors we can't call it AVX.
>
> Thanks,
> Vladimir
>
> On 10/18/17 10:01 AM, dean.long at oracle.com wrote:
>
>> How about initializing TypeVect::VECTY and friends unconditionally?  I am
>> nervous about exchanging one guarding condition for another.
>>
>> dl
>>
>>
>> On 10/18/17 1:03 AM, Nils Eliasson wrote:
>>
>>>
>>> HI,
>>>
>>> I ran into a problem with the interaction between MaxVectorSize and the
>>> UseAVX. For some AMD CPUs we limit the vector size to 16 because it gives
>>> the best performance.
>>>
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>        FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>      }
>>>>
>>>
>>> Whenf MaxVecorSize is set to 16 it has the sideeffect that the
>>> TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the
>>> platform has the capability.
>>>
>>> Type.cpp:~660
>>>
>>> [...]
>>> >   if (Matcher::vector_size_supported(T_FLOAT,8)) {
>>> >     TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
>>> >   }
>>> [...]
>>> >   mreg2type[Op_VecY] = TypeVect::VECTY;
>>>
>>>
>>> In the ad-files feature flags (UseAVX etc.) are used to control what
>>> rules should be matched if it has effects on specific vector registers.
>>> Here we have a mismatch.
>>>
>>> On a platform that supports AVX2 but have MaxVectorSize limited to 16,
>>> the VM will fail when the TypeVect::VECTY/mreg2type[Op_VecY] is
>>> uninitialized. We will also hit asserts in a few places like:
>>> assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY),
>>> "sanity");
>>>
>>> Shouldn't the type initialization in type.cpp be dependent on feature
>>> flag (UseAVX etc.) instead of MaxVectorSize? (The type for the vector
>>> registers are initialized if the platform supports them, but they might not
>>> be used if MaxVectorSize is limited.)
>>>
>>> This is a patch that solves the problem, but I have not convinced myself
>>> that it is the right way:
>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>>
>>> Feedback appreciated,
>>>
>>> Regards,
>>> Nils Eliasson
>>>
>>>
>>>
>>>
>>>
>>> http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171026/944f40e6/attachment-0001.html>

From jamsheed.c.m at oracle.com  Thu Oct 26 07:18:10 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Thu, 26 Oct 2017 12:48:10 +0530
Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for
 is_deopt_suspend needlessly
In-Reply-To: <34bf5c2b-0b8e-1d9c-1b3e-e152d7e3cbf4@oracle.com>
References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>
 <34bf5c2b-0b8e-1d9c-1b3e-e152d7e3cbf4@oracle.com>
Message-ID: <112bc062-2bb4-25e4-dfc5-546e2b740de4@oracle.com>

Thank you for the review, Dean

Best regards,

Jamsheed


On Thursday 26 October 2017 02:02 AM, dean.long at oracle.com wrote:
> Looks OK.? It appears that only Sparc uses is_deopt_suspend(), and 
> then only when we exit native.
>
> dl
>
>
> On 10/25/17 2:35 AM, jamsheed wrote:
>> Hi,
>>
>> request for review,
>>
>> webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/
>>
>> jbs: https://bugs.openjdk.java.net/browse/JDK-6523512
>>
>> desc: removed the is_deopt_suspend() from 
>> has_special_runtime_exit_condition checks
>>
>> Best regards,
>>
>> Jamsheed
>>
>


From tobias.hartmann at oracle.com  Thu Oct 26 08:00:35 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 26 Oct 2017 10:00:35 +0200
Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for
 is_deopt_suspend needlessly
In-Reply-To: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>
References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>
Message-ID: <373f160f-0cd9-9bd8-e89e-7320bd342977@oracle.com>

Hi Jamsheed,

On 25.10.2017 11:35, jamsheed wrote:
> webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/

Looks good to me.

Best regards,
Tobias

From jamsheed.c.m at oracle.com  Thu Oct 26 08:57:06 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Thu, 26 Oct 2017 14:27:06 +0530
Subject: RFR [10]: 6523512 : has_special_runtime_exit_condition checks for
 is_deopt_suspend needlessly
In-Reply-To: <373f160f-0cd9-9bd8-e89e-7320bd342977@oracle.com>
References: <51f5ac65-2bbc-bac9-8671-cdb422f97dc6@oracle.com>
 <373f160f-0cd9-9bd8-e89e-7320bd342977@oracle.com>
Message-ID: <5942a7f2-20be-d35e-3d25-c7cf599228fd@oracle.com>

Thank you for the review, Tobias

Best regards,

Jamsheed


On Thursday 26 October 2017 01:30 PM, Tobias Hartmann wrote:
> Hi Jamsheed,
>
> On 25.10.2017 11:35, jamsheed wrote:
>> webrev: http://cr.openjdk.java.net/~jcm/6523512/webrev.00/
>
> Looks good to me.
>
> Best regards,
> Tobias


From jamsheed.c.m at oracle.com  Thu Oct 26 13:09:49 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Thu, 26 Oct 2017 18:39:49 +0530
Subject: RFR [10]: 8185989: overview.html files should be deleted?
Message-ID: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com>

Hi,

request for review,

jbs: https://bugs.openjdk.java.net/browse/JDK-8185989

webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/

desc:

src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html

src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html

deleted

Best regards,

Jamsheed


From tobias.hartmann at oracle.com  Thu Oct 26 13:20:18 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 26 Oct 2017 15:20:18 +0200
Subject: RFR [10]: 8185989: overview.html files should be deleted?
In-Reply-To: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com>
References: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com>
Message-ID: <f4c14de0-f0da-5187-36ca-401895cc927c@oracle.com>

Hi Jamsheed,

looks good.

Best regards,
Tobias

On 26.10.2017 15:09, jamsheed wrote:
> Hi,
> 
> request for review,
> 
> jbs: https://bugs.openjdk.java.net/browse/JDK-8185989
> 
> webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/
> 
> desc:
> 
> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html
> 
> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html
> 
> deleted
> 
> Best regards,
> 
> Jamsheed
> 

From rwestrel at redhat.com  Thu Oct 26 13:59:10 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Thu, 26 Oct 2017 15:59:10 +0200
Subject: RFR(S): 8188151: "assert(entry_for(catch_pco, handler_bcis->at(i),
 scope_depth)->pco() == handler_pcos->at(i))" failure with C1
In-Reply-To: <59e5b4ff-1fb3-db98-264c-ea6b6f98e526@oracle.com>
References: <dk6shf1yjjh.fsf@rwestrel.remote.csb>
 <6018da87-a940-1289-774d-f5729a5399b0@oracle.com>
 <dk6zi97w0z6.fsf@rwestrel.remote.csb> <dk6shegr175.fsf@rwestrel.remote.csb>
 <8da25565-120f-cad1-98a2-eda33b8f6220@oracle.com>
 <91ffc3a8-3c02-16f4-9e5b-051169ec19a7@oracle.com>
 <59e5b4ff-1fb3-db98-264c-ea6b6f98e526@oracle.com>
Message-ID: <dk6h8umm2mp.fsf@rwestrel.remote.csb>


> Tests passed. Please, send changeset with test moved into compiler/exceptions/ directory.

Thanks for the review and testing. Here is the changeset:

http://cr.openjdk.java.net/~roland/8188151/8188151.changeset

Roland.

From tom.rodriguez at oracle.com  Thu Oct 26 16:48:03 2017
From: tom.rodriguez at oracle.com (Tom Rodriguez)
Date: Thu, 26 Oct 2017 09:48:03 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>
 <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com>
Message-ID: <59F211C3.90104@oracle.com>

Sorry I'm late to this, but I don't think the HotSpotMethodData changes 
are correct.  If you run with -XX:TypeProfileWidth=1 you'll get 
incorrect profiles for non-statically bindable call sites.  Shouldn't it 
be entries == 1 && methods[0].canBeStaticallyBound()?  I think the 
ciMethod workaround for this problem has the same issue.  Also I think 
it would make sense to null out the entry so it looks the same as a 
properly profiled vfinal call site.

tom

Igor Veresov wrote:
> Sure. I?ve updated the webrev:
> http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/
> Also added a comment in HotSpotMethodData.java per Doug?s request.
>
> igor
>
>> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>> Igor
>>
>> Can you factor out checks into boolean function in shared place? May
>> be move some surrounding code into it too - I see the same code on all
>> platforms.
>>
>> Thanks,
>> Vladimir
>>
>> On 10/24/17 8:52 PM, Igor Veresov wrote:
>>> This a fix from Tom that I ported to all architectures and the new
>>> repo structure. While that fix doesn?t not solve the problem of the
>>> interpreter-C1 profiling style discrepancy completely it speeds up
>>> profiling of the statically bindable call sites and we?d like to push
>>> that. I also added a bit of a code to JVMCI to do the profile fix up
>>> analogous to what happens in CI.
>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
>>> Thanks,
>>> igor
>

From vladimir.kozlov at oracle.com  Thu Oct 26 17:28:10 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 26 Oct 2017 10:28:10 -0700
Subject: RFR [10]: 8185989: overview.html files should be deleted?
In-Reply-To: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com>
References: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com>
Message-ID: <2a3bb4f1-07c8-fac2-69d2-2ad7853bfd6a@oracle.com>

Good.

Thanks,
Vladimir

On 10/26/17 6:09 AM, jamsheed wrote:
> Hi,
> 
> request for review,
> 
> jbs: https://bugs.openjdk.java.net/browse/JDK-8185989
> 
> webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/
> 
> desc:
> 
> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html
> 
> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html
> 
> deleted
> 
> Best regards,
> 
> Jamsheed
> 

From vladimir.kozlov at oracle.com  Thu Oct 26 17:36:38 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 26 Oct 2017 10:36:38 -0700
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <CAPVMLfVzJg+M3szHBKMnKAQ8xqddNquYoG+F0==n6=9JZ94b6A@mail.gmail.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
 <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
 <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
 <CAPVMLfVzJg+M3szHBKMnKAQ8xqddNquYoG+F0==n6=9JZ94b6A@mail.gmail.com>
Message-ID: <0bf320d1-aa35-e05c-0959-3ffa09110499@oracle.com>

Thank you, Rohit

Do you plan to propose changes to increase vector size to 32 for 15h and 16h? Or AMD is fine with current settings?

Thanks,
Vladimir

On 10/25/17 9:48 PM, Rohit Arul Raj wrote:
> Hello Vladimir,
> 
> 
> Please find the requested details:
> 
> 
> AVX/AVX2 support availability on AMD Processors:
> 
> Family 14h and earlier ? No AVX support
> 
> Family 15h -? (1^st -gen), (2nd-gen), (3rd-gen) AVX support available, max vector width is 32 bytes (we limit the vector 
> size to 16 bytes in openJDK).
> 
> Family 16h ? AVX support available, max vector width is 32 bytes (we limit the vector size to 16 bytes in openJDK).
> 
> Family 15h -? (4^th -gen) AVX, AVX2 support available, max vector width is 32 bytes (we limit the vector size to 16 
> bytes in openJDK).
> 
> Family 17h ? AVX, AVX2 support available, max vector width is 32 bytes (our proposed changes have vector size set to 32 
> bytes in openJDK).
> 
> AVX3 support is not available on AMD processors yet.
> 
> 
>  From the comments below, Dean's suggestions seems reasonable.
> 
> Regards,
> Rohit
> 
> 
> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
> 
>     We can't use platform specific UseAVX flag in shared code in type.cpp.
> 
>     I would say we should not support AVX (set UseAVX to 0) on AMD < 17h.
>     And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3 and corresponding vectors 32 and 64 bytes.
>     If AMD's Instructions Set before 17h does not support whole 32 bytes vectors we can't call it AVX.
> 
>     Thanks,
>     Vladimir
> 
>     On 10/18/17 10:01 AM, dean.long at oracle.com <mailto:dean.long at oracle.com> wrote:
> 
>         How about initializing TypeVect::VECTY and friends unconditionally?? I am nervous about exchanging one guarding
>         condition for another.
> 
>         dl
> 
> 
>         On 10/18/17 1:03 AM, Nils Eliasson wrote:
> 
> 
>             HI,
> 
>             I ran into a problem with the interaction between MaxVectorSize and the UseAVX. For some AMD CPUs we limit
>             the vector size to 16 because it gives the best performance.
> 
>                 +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>                 +????? // Limit vectors size to 16 bytes on AMD cpus < 17h.
>                  ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16);
>                  ???? }
> 
> 
>             Whenf MaxVecorSize is set to 16 it has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't
>             be initalized even though the platform has the capability.
> 
>             Type.cpp:~660
> 
>             [...]
>              >?? if (Matcher::vector_size_supported(T_FLOAT,8)) {
>              >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
>              >?? }
>             [...]
>              >?? mreg2type[Op_VecY] = TypeVect::VECTY;
> 
> 
>             In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has
>             effects on specific vector registers. Here we have a mismatch.
> 
>             On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail when the
>             TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will also hit asserts in a few places like:
>             assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity");
> 
>             Shouldn't the type initialization in type.cpp be dependent on feature flag (UseAVX etc.) instead of
>             MaxVectorSize? (The type for the vector registers are initialized if the platform supports them, but they
>             might not be used if MaxVectorSize is limited.)
> 
>             This is a patch that solves the problem, but I have not convinced myself that it is the right way:
>             http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>             <http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/>
> 
>             Feedback appreciated,
> 
>             Regards,
>             Nils Eliasson
> 
> 
> 
> 
> 
>             http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>             <http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/>
> 
> 
> 

From igor.veresov at oracle.com  Thu Oct 26 19:42:52 2017
From: igor.veresov at oracle.com (Igor Veresov)
Date: Thu, 26 Oct 2017 12:42:52 -0700
Subject: RFR(S) 8166750: profiling handles statically bindable call sites
 differently than the interpreter
In-Reply-To: <59F211C3.90104@oracle.com>
References: <3A763D2B-7F00-4189-ADEB-084EE55F5AED@oracle.com>
 <e714e40b-2dbb-406f-3186-b2f0800fce4d@oracle.com>
 <1E4747CD-60A0-4254-B8CE-F88058D22281@oracle.com> <59F211C3.90104@oracle.com>
Message-ID: <4B298FB6-326A-4C7F-BBF2-147DA4C4B8F2@oracle.com>

Good points, since I already push it, I?ll file a new bug.

igor

> On Oct 26, 2017, at 9:48 AM, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
> 
> Sorry I'm late to this, but I don't think the HotSpotMethodData changes are correct.  If you run with -XX:TypeProfileWidth=1 you'll get incorrect profiles for non-statically bindable call sites.  Shouldn't it be entries == 1 && methods[0].canBeStaticallyBound()?  I think the ciMethod workaround for this problem has the same issue.  Also I think it would make sense to null out the entry so it looks the same as a properly profiled vfinal call site.
> 
> tom
> 
> Igor Veresov wrote:
>> Sure. I?ve updated the webrev:
>> http://cr.openjdk.java.net/~iveresov/8166750/webrev.02/
>> Also added a comment in HotSpotMethodData.java per Doug?s request.
>> 
>> igor
>> 
>>> On Oct 25, 2017, at 10:29 AM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>> 
>>> Igor
>>> 
>>> Can you factor out checks into boolean function in shared place? May
>>> be move some surrounding code into it too - I see the same code on all
>>> platforms.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 10/24/17 8:52 PM, Igor Veresov wrote:
>>>> This a fix from Tom that I ported to all architectures and the new
>>>> repo structure. While that fix doesn?t not solve the problem of the
>>>> interpreter-C1 profiling style discrepancy completely it speeds up
>>>> profiling of the statically bindable call sites and we?d like to push
>>>> that. I also added a bit of a code to JVMCI to do the profile fix up
>>>> analogous to what happens in CI.
>>>> Webrev: http://cr.openjdk.java.net/~iveresov/8166750/webrev.01/
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166750
>>>> Thanks,
>>>> igor
>> 


From ekaterina.pavlova at oracle.com  Fri Oct 27 00:40:24 2017
From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova)
Date: Thu, 26 Oct 2017 17:40:24 -0700
Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java
 doesn't have timeout and hang on windows
In-Reply-To: <B7EDC3AF-CAC6-42CE-AF7C-C7172CCE7D0A@oracle.com>
References: <B7EDC3AF-CAC6-42CE-AF7C-C7172CCE7D0A@oracle.com>
Message-ID: <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com>

Looks good.

Thanks for fixing it,

-katya

On 10/17/17 9:45 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
>> 546 lines changed: 188 ins; 88 del; 270 mod;
> 
> Hi all,
> 
> could you please review this fix for ctw test?
> in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution.
>    
> the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows.
> 
> webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
> testing: applications/ctw/modules tests
> JBS: https://bugs.openjdk.java.net/browse/JDK-8186618
> 
> Thanks,
> -- Igor
> 


From vladimir.kozlov at oracle.com  Fri Oct 27 02:02:15 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Thu, 26 Oct 2017 19:02:15 -0700
Subject: [10] RFR(S) 8189064: Crash with compiler/codegen/*Vect.java on
 Solaris-sparc
Message-ID: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com>

webrev: http://cr.openjdk.java.net/~kvn/8189064/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8189064

New code from JDK-8187601 triggers an other round of loopopts to try to unroll more loops which were not vectorized. But that also trigger second round of vectorization. To avoid vectorization of 
already vectorized loops there is cl->is_vectorized_loop() check in SuperWord::transform_loop().
Unfortunately cl->mark_loop_vectorized() is called in SuperWord::output() under several conditions and one of them (compare vector length with unroll count) is not true on SPARC because it has very 
small vectors (8 bytes) as result cl->mark_loop_vectorized() is not called.

The fix is unconditionally call cl->mark_loop_vectorized() when vectors are generated.
I also modified JDK-8187601 changes to trigger an other round of loopopts only when main loop is not vectorized.

Failed vector tests from bug report passed. I submitted pre-integration testing.

Thanks,
Vladimir

From igor.ignatyev at oracle.com  Fri Oct 27 02:44:32 2017
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Thu, 26 Oct 2017 19:44:32 -0700
Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java
 doesn't have timeout and hang on windows
In-Reply-To: <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com>
References: <B7EDC3AF-CAC6-42CE-AF7C-C7172CCE7D0A@oracle.com>
 <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com>
Message-ID: <79997CB7-FF94-4354-BC7E-8CE5B73BDC10@oracle.com>

Katya, thank you reviewing it.

can I have another review for this patch from a Reviewer?

Thanks,
-- Igor
> On Oct 26, 2017, at 5:40 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
> 
> Looks good.
> 
> Thanks for fixing it,
> 
> -katya
> 
> On 10/17/17 9:45 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
>>> 546 lines changed: 188 ins; 88 del; 270 mod;
>> Hi all,
>> could you please review this fix for ctw test?
>> in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution.
>>   the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows.
>> webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
>> testing: applications/ctw/modules tests
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8186618
>> Thanks,
>> -- Igor
> 


From jamsheed.c.m at oracle.com  Fri Oct 27 05:24:45 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Fri, 27 Oct 2017 10:54:45 +0530
Subject: RFR [10]: 8185989: overview.html files should be deleted?
In-Reply-To: <2a3bb4f1-07c8-fac2-69d2-2ad7853bfd6a@oracle.com>
References: <6866d765-a827-e5e0-80f2-bbf62c0a5a52@oracle.com>
 <2a3bb4f1-07c8-fac2-69d2-2ad7853bfd6a@oracle.com>
Message-ID: <ea1e8ba9-a4b8-5e34-01f8-c7f877f21578@oracle.com>

Thank you for the review, Tobias, Vladimir

Best regards,

Jamsheed


On Thursday 26 October 2017 10:58 PM, Vladimir Kozlov wrote:
> Good.
>
> Thanks,
> Vladimir
>
> On 10/26/17 6:09 AM, jamsheed wrote:
>> Hi,
>>
>> request for review,
>>
>> jbs: https://bugs.openjdk.java.net/browse/JDK-8185989
>>
>> webrev: http://cr.openjdk.java.net/~jcm/8185989/webrev.00/
>>
>> desc:
>>
>> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/overview.html
>>
>> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/overview.html
>>
>> deleted
>>
>> Best regards,
>>
>> Jamsheed
>>


From tobias.hartmann at oracle.com  Fri Oct 27 07:35:58 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Fri, 27 Oct 2017 09:35:58 +0200
Subject: [10] RFR(S) 8189064: Crash with compiler/codegen/*Vect.java on
 Solaris-sparc
In-Reply-To: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com>
References: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com>
Message-ID: <d040815d-937e-4698-6bbc-c5fdd6eea374@oracle.com>

Hi Vladimir,

On 27.10.2017 04:02, Vladimir Kozlov wrote:
> webrev: http://cr.openjdk.java.net/~kvn/8189064/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8189064

Looks good to me!

Best regards,
Tobias

From vladimir.kozlov at oracle.com  Fri Oct 27 08:00:45 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 27 Oct 2017 01:00:45 -0700
Subject: [10] RFR(S) 8189064: Crash with compiler/codegen/*Vect.java on
 Solaris-sparc
In-Reply-To: <d040815d-937e-4698-6bbc-c5fdd6eea374@oracle.com>
References: <8ce768c1-5bd5-f811-49ad-8b584873ca8b@oracle.com>
 <d040815d-937e-4698-6bbc-c5fdd6eea374@oracle.com>
Message-ID: <f21364a2-11ac-079d-71f8-73c4d4504007@oracle.com>

Thank you, Tobias

Vladimir

On 10/27/17 12:35 AM, Tobias Hartmann wrote:
> Hi Vladimir,
> 
> On 27.10.2017 04:02, Vladimir Kozlov wrote:
>> webrev: http://cr.openjdk.java.net/~kvn/8189064/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8189064
> 
> Looks good to me!
> 
> Best regards,
> Tobias

From rohitarulraj at gmail.com  Fri Oct 27 11:03:54 2017
From: rohitarulraj at gmail.com (Rohit Arul Raj)
Date: Fri, 27 Oct 2017 16:33:54 +0530
Subject: Reduced MaxVectorSize and vector type initialization
In-Reply-To: <0bf320d1-aa35-e05c-0959-3ffa09110499@oracle.com>
References: <d388a699-1c03-ea9d-41ed-0d6e46c880d3@oracle.com>
 <00bae75d-e386-123d-b8b5-e7b9b3892088@oracle.com>
 <e248177c-fe72-3d6f-22e2-673e82b8ffd4@oracle.com>
 <CAPVMLfVzJg+M3szHBKMnKAQ8xqddNquYoG+F0==n6=9JZ94b6A@mail.gmail.com>
 <0bf320d1-aa35-e05c-0959-3ffa09110499@oracle.com>
Message-ID: <CAPVMLfWBSRcYaB45BPQDAgYBnF3tqq0MHT7aDPN0voUgFY-YJw@mail.gmail.com>

Hello Vladimir,

We are fine with the current settings.

Thanks,
Rohit

On Thu, Oct 26, 2017 at 11:06 PM, Vladimir Kozlov <
vladimir.kozlov at oracle.com> wrote:

> Thank you, Rohit
>
> Do you plan to propose changes to increase vector size to 32 for 15h and
> 16h? Or AMD is fine with current settings?
>
> Thanks,
> Vladimir
>
> On 10/25/17 9:48 PM, Rohit Arul Raj wrote:
>
>> Hello Vladimir,
>>
>>
>> Please find the requested details:
>>
>>
>> AVX/AVX2 support availability on AMD Processors:
>>
>> Family 14h and earlier ? No AVX support
>>
>> Family 15h -  (1^st -gen), (2nd-gen), (3rd-gen) AVX support available,
>> max vector width is 32 bytes (we limit the vector size to 16 bytes in
>> openJDK).
>>
>> Family 16h ? AVX support available, max vector width is 32 bytes (we
>> limit the vector size to 16 bytes in openJDK).
>>
>> Family 15h -  (4^th -gen) AVX, AVX2 support available, max vector width
>> is 32 bytes (we limit the vector size to 16 bytes in openJDK).
>>
>> Family 17h ? AVX, AVX2 support available, max vector width is 32 bytes
>> (our proposed changes have vector size set to 32 bytes in openJDK).
>>
>> AVX3 support is not available on AMD processors yet.
>>
>>
>>  From the comments below, Dean's suggestions seems reasonable.
>>
>> Regards,
>> Rohit
>>
>>
>> On Wed, Oct 25, 2017 at 3:32 AM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     We can't use platform specific UseAVX flag in shared code in type.cpp.
>>
>>     I would say we should not support AVX (set UseAVX to 0) on AMD < 17h.
>>     And we need to ask AMD which AMD cpus are supporting AVX, AVX2, AVX3
>> and corresponding vectors 32 and 64 bytes.
>>     If AMD's Instructions Set before 17h does not support whole 32 bytes
>> vectors we can't call it AVX.
>>
>>     Thanks,
>>     Vladimir
>>
>>     On 10/18/17 10:01 AM, dean.long at oracle.com <mailto:
>> dean.long at oracle.com> wrote:
>>
>>         How about initializing TypeVect::VECTY and friends
>> unconditionally?  I am nervous about exchanging one guarding
>>         condition for another.
>>
>>         dl
>>
>>
>>         On 10/18/17 1:03 AM, Nils Eliasson wrote:
>>
>>
>>             HI,
>>
>>             I ran into a problem with the interaction between
>> MaxVectorSize and the UseAVX. For some AMD CPUs we limit
>>             the vector size to 16 because it gives the best performance.
>>
>>                 +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>                 +      // Limit vectors size to 16 bytes on AMD cpus <
>> 17h.
>>                         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>                       }
>>
>>
>>             Whenf MaxVecorSize is set to 16 it has the sideeffect that
>> the TypeVect::VECTY and mreg2type[Op_VecY] won't
>>             be initalized even though the platform has the capability.
>>
>>             Type.cpp:~660
>>
>>             [...]
>>              >   if (Matcher::vector_size_supported(T_FLOAT,8)) {
>>              >     TypeVect::VECTY = TypeVect::make(T_FLOAT,8);
>>              >   }
>>             [...]
>>              >   mreg2type[Op_VecY] = TypeVect::VECTY;
>>
>>
>>             In the ad-files feature flags (UseAVX etc.) are used to
>> control what rules should be matched if it has
>>             effects on specific vector registers. Here we have a mismatch.
>>
>>             On a platform that supports AVX2 but have MaxVectorSize
>> limited to 16, the VM will fail when the
>>             TypeVect::VECTY/mreg2type[Op_VecY] is uninitialized. We will
>> also hit asserts in a few places like:
>>             assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY),
>> "sanity");
>>
>>             Shouldn't the type initialization in type.cpp be dependent on
>> feature flag (UseAVX etc.) instead of
>>             MaxVectorSize? (The type for the vector registers are
>> initialized if the platform supports them, but they
>>             might not be used if MaxVectorSize is limited.)
>>
>>             This is a patch that solves the problem, but I have not
>> convinced myself that it is the right way:
>>             http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>             <http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/>
>>
>>             Feedback appreciated,
>>
>>             Regards,
>>             Nils Eliasson
>>
>>
>>
>>
>>
>>             http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/
>>             <http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171027/09631087/attachment-0001.html>

From lutz.schmidt at sap.com  Fri Oct 27 11:06:50 2017
From: lutz.schmidt at sap.com (Schmidt, Lutz)
Date: Fri, 27 Oct 2017 11:06:50 +0000
Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by
 exploiting vector instructions
In-Reply-To: <18ddb703d81a4a22bc97f134dd276eff@sap.com>
References: <AF492C88-E679-4C5E-8745-F33B857BBB9E@sap.com>
 <18ddb703d81a4a22bc97f134dd276eff@sap.com>
Message-ID: <6F73CAE2-2FEC-4BC0-9F3A-FEE9748EB694@sap.com>

Hi Martin,

Thanks for reviewing my change!

This is a preliminary response just to let you know I?m working on the change. I?m putting a lot of effort in producing reliable performance measurement data. Turns out this is not easy (to be more honest: almost impossible).

s390.ad:
You are absolutely right, the sequence load_const/string_compress makes no sense at all. But it does not hurt either ? I could not find one match in all tests I ran. -> Match rule deleted.

macroAssembler_s390:
prefetch: did not see impact, neither positive nor negative. Artificial micro benchmarks will not benefit (data is in cache anyway). More complex benchmarks show measurement noise which covers the possible prefetch benefit. -> prefetch deleted.
Hardcoded vector registers: you are right. There are some design decisions pending, e.g. how many vector scratch registers?
Vperm instruction: using that is just another implementation variant that could save the vn vector instruction. On the other hand, loading the index vector is a (compared to vgmh) costly memory access. Given the fact that we mostly deal with short strings, initialization effort is relevant.
Code size vs. performance: the old, well known, often discussed tradeoff. Starting from the existing implementation, I invested quite some time in optimizing the (len <= 8) cases. With every refinement step I saw (or believed to see (measurement noise)) some improvement ? or discarded it. Is the overall improvement worth the larger code size? -> tradeoff, discussion.

Best Regards,
Lutz


On 25.10.2017, 21:08, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote:

Hi Lutz,

thanks for working on vector-based enhancements and for providing this webrev.

assembler_s390:
-The changes in the assembler look good.

s390.ad:
-It doesn't make sense to load constant len to a register and generate complex compare instructions for it and still to emit code for all cases. I assume that e.g. the 4 characters cases usually have a constant length. If so, much better code could be generated for them by omitting all the stuff around the simple instructions. (ppc64.ad already contains nodes for constant length of needle in indexOf rules.)

macroAssembler_s390:
-Are you sure the prefetch instructions improve performance?
I remember that we had them in other String intrinsics but removed them again as they showed absolutely no performance gain.
-Comment: Using hardcoded vector registers is ok for now, but may need to get changed e.g. when using them for C2's SuperWord optimization.
-Comment: You could use the vperm instruction instead of vo+vn, but I'm ok with the current implementation because loading a mask is much more convenient than getting the permutation vector loaded (e.g. from constant pool or pc relative).
-So the new vector loop looks good to me.
-In my opinion, the size of all the generated cases should be in relationship to their performance benefit.
As intrinsics are not like stubs and may get inlined often, I can't get rid of the impression that generating so large code wastes valuable code cache space with questionable performance gain in real world scenarios.

Best regards,
Martin

From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Schmidt, Lutz
Sent: Mittwoch, 25. Oktober 2017 12:02
To: hotspot-compiler-dev at openjdk.java.net
Subject: RFR(L): 8189793: [s390]: Improve String compress/inflate by exploiting vector instructions

Dear all,

I would like to request reviews for this s390-only enhancement:

Bug:    https://bugs.openjdk.java.net/browse/JDK-8189793
Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8189793.00/index.html

Vector instructions, which have been available on System z for a while (since z13), promise noticeable performance improvements. This enhancement improves the String Compress and String Inflate intrinsics by exploiting vector instructions, when available. For long strings, up to 2x performance improvement has been observed in micro-benchmarks.

Special care was taken to preserve good performance for short strings. All examined workloads showed a high ratio of short and very short strings.

Thank you!
Lutz


Dr. Lutz Schmidt | SAP JVM | PI  SAP CP Core | T: +49 (6227) 7-42834

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171027/cc75fe59/attachment.html>

From vladimir.kozlov at oracle.com  Fri Oct 27 15:50:16 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 27 Oct 2017 08:50:16 -0700
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <dk67evjnvww.fsf@rwestrel.remote.csb>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>
 <dk67evjnvww.fsf@rwestrel.remote.csb>
Message-ID: <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com>

I ran pre-integration testing with latest webrev.01 and it passed.
But, give me more time to look though changes.

Thanks,
Vladimir

On 10/25/17 7:29 AM, Roland Westrelin wrote:
> 
> Hi Vladimir,
> 
> Thanks for looking at this.
> 
>> Did you consider less intrusive approach by adding branch over
>> SafePoint with masking on index variable?
>>
>>     int mask = LoopStripMiningMask * inc; // simplified
>>     for (int i = start; i < stop; i += inc) {
>>        // body
>>        if (i & mask != 0) continue;
>>        safepoint;
>>     }
>>
>> Or may be doing it inside .ad file in new SafePoint node
>> implementation so that ideal graph is not affected.
> 
> We're looking for the best trade off between latency and thoughput: we
> want the safepoint poll overhead to be entirely eliminated even when the
> safepoint doesn't trigger.
> 
>> I am concern that suggested changes may affect Range Check elimination
>> (you changed limit to variable value/flag) in addition to complexity
>> of changes which may affect stability of C2.
> 
> The CountedLoop that is created with my patch is strictly identical to
> the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are
> not changed at that time. They are left as they are today. The
> difference, with loop strip mining, is that the counted loop has a
> skeleton outer loop. The bounds of the counted loop are adjusted once
> loop opts are over. If the counted loop has a predicate, the predicate
> is moved out of loop just as it is today. The only difference with
> today, is that the predicate should be moved out of the outer loop. If a
> pre and post loop needs to be created, then the only difference with
> today is that the clones need to be moved out of the outer loop and
> logic that locate the pre from the main loop need to account for the
> outer loop.
> 
> It's obviously a complex change so if your primary concern is stability
> then loop strip mining can be disabled by default. Assuming strip mining
> off, then that patch is mostly some code refactoring and some logic that
> never triggers.
> 
> Roland.
> 

From rwestrel at redhat.com  Fri Oct 27 16:09:10 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Fri, 27 Oct 2017 18:09:10 +0200
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>
 <dk67evjnvww.fsf@rwestrel.remote.csb>
 <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com>
Message-ID: <dk6zi8clgih.fsf@rwestrel.remote.csb>


> I ran pre-integration testing with latest webrev.01 and it passed.
> But, give me more time to look though changes.

Sure. Thanks for testing it.

Roland.

From maaartinus at gmail.com  Fri Oct 27 19:46:06 2017
From: maaartinus at gmail.com (Martin Grajcar)
Date: Fri, 27 Oct 2017 21:46:06 +0200
Subject: Vectorized Loop Unrolling on x64?
In-Reply-To: <CAB=Je-Gv10KNn-hHsdFtdG=tTuAKwhH+Y_mB-HzKbeY+npyy_g@mail.gmail.com>
References: <1302875736.3225693.1508835938207.ref@mail.yahoo.com>
 <1302875736.3225693.1508835938207@mail.yahoo.com>
 <40d0c98b-e946-404e-8506-9e031fe469b4@oracle.com>
 <1933684779.3254078.1508840637072@mail.yahoo.com>
 <354890084.3509873.1508863577533@mail.yahoo.com>
 <e043d652-9f49-0804-1911-fff449485566@oracle.com>
 <CAB=Je-Gv10KNn-hHsdFtdG=tTuAKwhH+Y_mB-HzKbeY+npyy_g@mail.gmail.com>
Message-ID: <CAGsWfGhA4Cx5F1eK_s77gHYiX1gWGbKUzsrZ6670P_KfLSr2dg@mail.gmail.com>

IIUIC the code on slide 90 is slow due to data dependencies as the only
accumulator sum is the bottleneck. Some very long time ago, I played with
unrolling it manually using multiple accumulators and gained a factor of
maybe 3. But this is well-known, so I wonder what am I missing?

IMHO there's no reason why sum += A[i] should be slower than B[i] += A[i]
assuming a sufficient iteration count.

On Tue, Oct 24, 2017 at 7:20 PM, Vladimir Sitnikov <
sitnikov.vladimir at gmail.com> wrote:

> Just in case, here's Vladimir Ivanov's vectorization talk: *http://2017.jpoint.ru/en/talks/vector-programming-in-java/
> <http://2017.jpoint.ru/en/talks/vector-programming-in-java/>*
> Slide 89 describes sum misundervectorization.
>
> Vladimir
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171027/798183fb/attachment.html>

From doug.simon at oracle.com  Fri Oct 27 21:05:17 2017
From: doug.simon at oracle.com (Doug Simon)
Date: Fri, 27 Oct 2017 23:05:17 +0200
Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to jweak
 values
Message-ID: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com>

Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set).

Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading.

https://bugs.openjdk.java.net/browse/JDK-8188102
http://cr.openjdk.java.net/~dnsimon/8188102/

From vladimir.kozlov at oracle.com  Fri Oct 27 21:19:45 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 27 Oct 2017 14:19:45 -0700
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>
 <dk67evjnvww.fsf@rwestrel.remote.csb>
 <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com>
Message-ID: <0972a0db-2115-daa3-9990-7d58915a74a5@oracle.com>

First observations.

src/hotspot/share/opto/c2_globals.hpp

We have uint and int types for flags now. Don't use uintx, which is 64-bit.

src/hotspot/share/runtime/arguments.cpp

I agree that UseCountedLoopSafepoints should enable strip mining by default. I am concern about 
enabling UseCountedLoopSafepoints by default. I will look on performance data late. But for 
regular/nightly testing we need to add special testing with it on and off.

src/hotspot/share/opto/loopnode.hpp

Should we just make _loop_flags field type uint (32-bit) since we hit 16-bit limit?

There is confusion (because you did not have enough bits?) about which loops are marked as 
strip_mined. I thought it is only inner loop but it looks like out (skeleton) loop also marked as 
such. I would suggest to mark them differently.

I was thinking may be we should create new Loop node subclass for outer loop. Then you don't need 
special flag for it and it will be obvious what they are in Ideal Graph. The same for outer loop end 
node.

src/hotspot/share/opto/superword.cpp

Where next change come from?

+      if (t2->Opcode() == Op_AddI && t2 == _lp->as_CountedLoop()->incr()) continue; // don't mess 
with the iv

Thanks,
Vladimir

On 10/27/17 8:50 AM, Vladimir Kozlov wrote:
> I ran pre-integration testing with latest webrev.01 and it passed.
> But, give me more time to look though changes.
> 
> Thanks,
> Vladimir
> 
> On 10/25/17 7:29 AM, Roland Westrelin wrote:
>>
>> Hi Vladimir,
>>
>> Thanks for looking at this.
>>
>>> Did you consider less intrusive approach by adding branch over
>>> SafePoint with masking on index variable?
>>>
>>> ??? int mask = LoopStripMiningMask * inc; // simplified
>>> ??? for (int i = start; i < stop; i += inc) {
>>> ?????? // body
>>> ?????? if (i & mask != 0) continue;
>>> ?????? safepoint;
>>> ??? }
>>>
>>> Or may be doing it inside .ad file in new SafePoint node
>>> implementation so that ideal graph is not affected.
>>
>> We're looking for the best trade off between latency and thoughput: we
>> want the safepoint poll overhead to be entirely eliminated even when the
>> safepoint doesn't trigger.
>>
>>> I am concern that suggested changes may affect Range Check elimination
>>> (you changed limit to variable value/flag) in addition to complexity
>>> of changes which may affect stability of C2.
>>
>> The CountedLoop that is created with my patch is strictly identical to
>> the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are
>> not changed at that time. They are left as they are today. The
>> difference, with loop strip mining, is that the counted loop has a
>> skeleton outer loop. The bounds of the counted loop are adjusted once
>> loop opts are over. If the counted loop has a predicate, the predicate
>> is moved out of loop just as it is today. The only difference with
>> today, is that the predicate should be moved out of the outer loop. If a
>> pre and post loop needs to be created, then the only difference with
>> today is that the clones need to be moved out of the outer loop and
>> logic that locate the pre from the main loop need to account for the
>> outer loop.
>>
>> It's obviously a complex change so if your primary concern is stability
>> then loop strip mining can be disabled by default. Assuming strip mining
>> off, then that patch is mostly some code refactoring and some logic that
>> never triggers.
>>
>> Roland.
>>

From vladimir.kozlov at oracle.com  Fri Oct 27 21:31:07 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Fri, 27 Oct 2017 14:31:07 -0700
Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to
 jweak values
In-Reply-To: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com>
References: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com>
Message-ID: <db1fc591-6cee-b405-275e-acc8f6461bcd@oracle.com>

CCing to GC group too.

Would be nice to run Hotspot testing with Graal as JIT. Katya, can you help with it?

Thanks,
Vladimir

On 10/27/17 2:05 PM, Doug Simon wrote:
> Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set).
> 
> Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading.
> 
> https://bugs.openjdk.java.net/browse/JDK-8188102
> http://cr.openjdk.java.net/~dnsimon/8188102/
> 

From Derek.White at cavium.com  Fri Oct 27 22:31:30 2017
From: Derek.White at cavium.com (White, Derek)
Date: Fri, 27 Oct 2017 22:31:30 +0000
Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
In-Reply-To: <c158f443-29f8-1e8a-42d0-db1f8533faa1@bell-sw.com>
References: <c158f443-29f8-1e8a-42d0-db1f8533faa1@bell-sw.com>
Message-ID: <BY1PR0701MB162737EE7C6D10D93DA6BCB1845A0@BY1PR0701MB1627.namprd07.prod.outlook.com>

Hi Dmitry,

The code looks good. 

I have one suggestion for MacroAssembler::kernel_crc32(). It's a matter of taste, so it really is just a suggestion:
 - The use of temp registers in the UseCRC32 case is kind of muddled, using tmp, and table0..table3 as temp registers, and the name "table" is confusing in this case.
 - Maybe it would be cleaner to refactor the UseCRC32 code into a separate kernel_crc32_using_crc32() subroutine (static or macro?). This would accept the main args and 4 registers for temps. The caller can supply some combination of table or tmp registers. 
- This would shrink the size of kernel_crc32() by a lot too.
- The next person to touch the UseNeon code could factor that out as well ??

This obviously would apply to kernel_crc32c as well.

Thanks!
 - Derek

> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Dmitry Chuyko
> Sent: Wednesday, October 11, 2017 12:31 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
> 
> Hello,
> 
> Please review an improvement of CRC32 calculation on AArch64.
> 
> MacroAssembler::kernel_crc32 gets table registers that are not used on
> -XX:+UseCRC32 path. They can be used to make neighbor loads and CRC
> calculations independent. Adding prologue and epilogue for main by-64 loop
> makes it applicable starting from len=128 so additional by-32 loop is added
> for smaller lengths.
> 
> rfe: https://bugs.openjdk.java.net/browse/JDK-8189176
> webrev: http://cr.openjdk.java.net/~dchuyko/8189176/webrev.00/
> benchmark:
> http://cr.openjdk.java.net/~dchuyko/8189176/crc32/CRC32Bench.java
> 
> Results for T88 and A53 are good, but splitting pair loads may slow down
> other CPUs so measurements on different HW are highly welcome.
> 
> -Dmitry


From aph at redhat.com  Sat Oct 28 07:51:57 2017
From: aph at redhat.com (Andrew Haley)
Date: Sat, 28 Oct 2017 08:51:57 +0100
Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
In-Reply-To: <c158f443-29f8-1e8a-42d0-db1f8533faa1@bell-sw.com>
References: <c158f443-29f8-1e8a-42d0-db1f8533faa1@bell-sw.com>
Message-ID: <035eedd5-8385-5f22-0316-0df784140442@redhat.com>

On 11/10/17 17:30, Dmitry Chuyko wrote:
> Results for T88 and A53 are good, but splitting pair loads may slow down 
> other CPUs so measurements on different HW are highly welcome.

Ah, yes.  OK, so I should do some measurements here.  Please remind me
offlist if I don't respond in a few days.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From aph at redhat.com  Sat Oct 28 07:52:33 2017
From: aph at redhat.com (Andrew Haley)
Date: Sat, 28 Oct 2017 08:52:33 +0100
Subject: [10] RFR: 8189176 - AARCH64: Improve _updateBytesCRC32 intrinsic
In-Reply-To: <BY1PR0701MB162737EE7C6D10D93DA6BCB1845A0@BY1PR0701MB1627.namprd07.prod.outlook.com>
References: <c158f443-29f8-1e8a-42d0-db1f8533faa1@bell-sw.com>
 <BY1PR0701MB162737EE7C6D10D93DA6BCB1845A0@BY1PR0701MB1627.namprd07.prod.outlook.com>
Message-ID: <ed5fb0b8-1d51-e6c3-0438-cf38e88971a3@redhat.com>

On 27/10/17 23:31, White, Derek wrote:
>  - The use of temp registers in the UseCRC32 case is kind of muddled, using tmp, and table0..table3 as temp registers, and the name "table" is confusing in this case.
>  - Maybe it would be cleaner to refactor the UseCRC32 code into a separate kernel_crc32_using_crc32() subroutine (static or macro?). This would accept the main args and 4 registers for temps. The caller can supply some combination of table or tmp registers. 
> - This would shrink the size of kernel_crc32() by a lot too.

That would be nice.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From kim.barrett at oracle.com  Sun Oct 29 22:29:51 2017
From: kim.barrett at oracle.com (Kim Barrett)
Date: Sun, 29 Oct 2017 18:29:51 -0400
Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to
 jweak values
In-Reply-To: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com>
References: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com>
Message-ID: <E0D909B8-A115-4F83-9D0B-12E3FCF7EBEB@oracle.com>

[added hotspot-gc-dev to cc list]

> On Oct 27, 2017, at 5:05 PM, Doug Simon <doug.simon at oracle.com> wrote:
> 
> Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set).
> 
> Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading.
> 
> https://bugs.openjdk.java.net/browse/JDK-8188102
> http://cr.openjdk.java.net/~dnsimon/8188102/

I didn't look at the .java, .py, or project files.

------------------------------------------------------------------------------ 
src/hotspot/share/jvmci/jvmciCompilerToVM.cpp
1061       nmethod* nm = cb->as_nmethod_or_null();

This appears to be dead code now.

------------------------------------------------------------------------------ 
src/hotspot/share/code/nmethod.cpp
1023   assert(Universe::heap()->is_gc_active(), "should only be called during gc");
...
1036     if (!Universe::heap()->is_gc_active() && cause != NULL)
1037       cause->klass()->print_on(&ls);

I was going to mention that lines 1036-1037 are missing braces around
the if-body.  However, those lines appear to be dead code, given the
assertion on line 1023.

------------------------------------------------------------------------------ 
src/hotspot/share/code/nmethod.cpp
1504 bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) {
...
1506     oop installed_code = JNIHandles::resolve(_jvmci_installed_code);

Resolving a weak reference can keep an otherwise dead referent alive.
See JDK-8188055 for a discussion of the corresponding problem for
j.l.r.Reference.

Right now, I think JNIHandles doesn't provide a (public) solution to
what I think is being attempted here that works for all collectors.
There is in-progress work toward a solution, but it's just that, "in
progress".

As a (possibly interim) solution, a function like the following might
be added to JNIHandles (put the definition near resolve_jweak).

bool JNIHandles::is_global_weak_cleared(jweak handle) {
  assert(is_jweak(handle), "not a weak handle");
  return guard_value<false>(jweak_ref(handle)) == NULL;
}

(That's completely untested, and I haven't thought carefully about the
name.  And should get input from other GC folks on how to deal with
this.)  I *think* do_unloading_jvmci then becomes something like the
following (again, completely untested)

bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) {
  if (_jvmci_installed_code != NULL) {
    if (JNIHandles::is_global_weak_cleared(_jvmci_installed_code)) {
      if (_jvmci_installed_code_triggers_unloading) {
        make_unloaded(is_alive, NULL);
        return true;
      } else {
        clear_jvmci_installed_code();
      }
    }
  }
  return false;
}

------------------------------------------------------------------------------


From doug.simon at oracle.com  Mon Oct 30 11:14:18 2017
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 30 Oct 2017 12:14:18 +0100
Subject: RFR: 8188102: [JVMCI] Convert special JVMCI oops in nmethod to
 jweak values
In-Reply-To: <E0D909B8-A115-4F83-9D0B-12E3FCF7EBEB@oracle.com>
References: <94B50A35-9898-47BA-8F20-A829FD7547AC@oracle.com>
 <E0D909B8-A115-4F83-9D0B-12E3FCF7EBEB@oracle.com>
Message-ID: <BDD0189D-ECBF-42A8-BDDE-2014753B53C9@oracle.com>

Hi Kim,

Thanks for the detailed review.

> On 29 Oct 2017, at 23:29, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
> [added hotspot-gc-dev to cc list]
> 
>> On Oct 27, 2017, at 5:05 PM, Doug Simon <doug.simon at oracle.com> wrote:
>> 
>> Please review this change that converts the JVMCI-specific object references in nmethod from oops to weak values. This removes GC API extensions added purely for these fields (e.g. so that G1 can insert it into the right remembered set, and when unloading an nmethod, to go and remove the nmethod from that remembered set).
>> 
>> Testing: I've run the Graal unit tests (mx unittest --verbose --gc-after-test -Xlog:class+unload=trace) which trigger a lot of nmethod unloading.
>> 
>> https://bugs.openjdk.java.net/browse/JDK-8188102
>> http://cr.openjdk.java.net/~dnsimon/8188102/
> 
> I didn't look at the .java, .py, or project files.
> 
> ------------------------------------------------------------------------------ 
> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp
> 1061       nmethod* nm = cb->as_nmethod_or_null();
> 
> This appears to be dead code now.

Indeed.

> ------------------------------------------------------------------------------ 
> src/hotspot/share/code/nmethod.cpp
> 1023   assert(Universe::heap()->is_gc_active(), "should only be called during gc");
> ...
> 1036     if (!Universe::heap()->is_gc_active() && cause != NULL)
> 1037       cause->klass()->print_on(&ls);
> 
> I was going to mention that lines 1036-1037 are missing braces around
> the if-body.  However, those lines appear to be dead code, given the
> assertion on line 1023.

Good catch. That problem pre-dates this webrev but I will clean it up here.

> ------------------------------------------------------------------------------ 
> src/hotspot/share/code/nmethod.cpp
> 1504 bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) {
> ...
> 1506     oop installed_code = JNIHandles::resolve(_jvmci_installed_code);
> 
> Resolving a weak reference can keep an otherwise dead referent alive.
> See JDK-8188055 for a discussion of the corresponding problem for
> j.l.r.Reference.
> 
> Right now, I think JNIHandles doesn't provide a (public) solution to
> what I think is being attempted here that works for all collectors.
> There is in-progress work toward a solution, but it's just that, "in
> progress".
> 
> As a (possibly interim) solution, a function like the following might
> be added to JNIHandles (put the definition near resolve_jweak).
> 
> bool JNIHandles::is_global_weak_cleared(jweak handle) {
>  assert(is_jweak(handle), "not a weak handle");
>  return guard_value<false>(jweak_ref(handle)) == NULL;
> }

Adding JNIHandles::is_global_weak_cleared makes sense. I've put it the public section near destroy_weak_global instead of the private section where resolve_jweak is declared.

> (That's completely untested, and I haven't thought carefully about the
> name.  And should get input from other GC folks on how to deal with
> this.)  I *think* do_unloading_jvmci then becomes something like the
> following (again, completely untested)
> 
> bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) {
>  if (_jvmci_installed_code != NULL) {
>    if (JNIHandles::is_global_weak_cleared(_jvmci_installed_code)) {
>      if (_jvmci_installed_code_triggers_unloading) {
>        make_unloaded(is_alive, NULL);
>        return true;
>      } else {
>        clear_jvmci_installed_code();
>      }
>    }
>  }
>  return false;
> }

I think your change works but comes at the cost of potentially preventing nmethod unloading for 1 extra (full?) GC cycle. It assumes that jweak clearing occurs before nmethod scanning. Is that guaranteed? If not, then I think what we want is:

bool nmethod::do_unloading_jvmci(BoolObjectClosure* is_alive, bool unloading_occurred) {
  if (_jvmci_installed_code != NULL) {
    bool cleared = JNIHandles::is_global_weak_cleared(_jvmci_installed_code);
    if (_jvmci_installed_code_triggers_unloading) {
      if (cleared) {
        // jweak reference processing has already cleared the referent
        make_unloaded(is_alive, NULL);
        return true;
      } else {
        oop installed_code = JNIHandles::resolve(_jvmci_installed_code);
        if (can_unload(is_alive, (oop*)&installed_code, unloading_occurred)) {
          return true;
        }
      }
    } else {
      if (cleared || !is_alive->do_object_b(JNIHandles::resolve(_jvmci_installed_code))) {
        clear_jvmci_installed_code();
      }
    }
  }
  return false;
}

I've created a new webrev at http://cr.openjdk.java.net/~dnsimon/8188102_2.

-Doug


From tobias.hartmann at oracle.com  Mon Oct 30 11:49:09 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 30 Oct 2017 12:49:09 +0100
Subject: [10] RFR(S): 8190351: InitialAndMaxUsageTest does not free allocated
 blob
Message-ID: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com>

Hi,

please review the following patch:
https://bugs.openjdk.java.net/browse/JDK-8190351
http://cr.openjdk.java.net/~thartmann/8190351/webrev.00/

If the fillWithSize method bails out because bean.getUsage().getUsed() > CACHE_USAGE_COEF * maxSize, it does not add the 
just allocated blob to the list. Also, we start with allocating blobs of size 368 Mb which is too large for a default 
code cache size of 256 Mb.

I've refactored the test and changed the allocation loop to start with blobs of size ~36 Mb.

Thanks,
Tobias

From vladimir.kozlov at oracle.com  Mon Oct 30 15:02:43 2017
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Mon, 30 Oct 2017 08:02:43 -0700
Subject: [10] RFR(S): 8190351: InitialAndMaxUsageTest does not free
 allocated blob
In-Reply-To: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com>
References: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com>
Message-ID: <d45d4dc2-c6a6-2bc1-47ca-b2dd2979cd79@oracle.com>

Looks good.

Thanks,
Vladimir

On 10/30/17 4:49 AM, Tobias Hartmann wrote:
> Hi,
> 
> please review the following patch:
> https://bugs.openjdk.java.net/browse/JDK-8190351
> http://cr.openjdk.java.net/~thartmann/8190351/webrev.00/
> 
> If the fillWithSize method bails out because bean.getUsage().getUsed() > 
> CACHE_USAGE_COEF * maxSize, it does not add the just allocated blob to 
> the list. Also, we start with allocating blobs of size 368 Mb which is 
> too large for a default code cache size of 256 Mb.
> 
> I've refactored the test and changed the allocation loop to start with 
> blobs of size ~36 Mb.
> 
> Thanks,
> Tobias

From tobias.hartmann at oracle.com  Mon Oct 30 15:07:49 2017
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Mon, 30 Oct 2017 16:07:49 +0100
Subject: [10] RFR(S): 8190351: InitialAndMaxUsageTest does not free
 allocated blob
In-Reply-To: <d45d4dc2-c6a6-2bc1-47ca-b2dd2979cd79@oracle.com>
References: <9402723c-02eb-72de-9fd5-87cc64ec628e@oracle.com>
 <d45d4dc2-c6a6-2bc1-47ca-b2dd2979cd79@oracle.com>
Message-ID: <93d15054-61dc-1bfc-3503-23081521e49c@oracle.com>

Thanks Vladimir!

Best regards,
Tobias

On 30.10.2017 16:02, Vladimir Kozlov wrote:
> Looks good.
> 
> Thanks,
> Vladimir
> 
> On 10/30/17 4:49 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8190351
>> http://cr.openjdk.java.net/~thartmann/8190351/webrev.00/
>>
>> If the fillWithSize method bails out because bean.getUsage().getUsed() > CACHE_USAGE_COEF * maxSize, it does not add 
>> the just allocated blob to the list. Also, we start with allocating blobs of size 368 Mb which is too large for a 
>> default code cache size of 256 Mb.
>>
>> I've refactored the test and changed the allocation loop to start with blobs of size ~36 Mb.
>>
>> Thanks,
>> Tobias

From dmitrij.pochepko at bell-sw.com  Mon Oct 30 15:42:35 2017
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Mon, 30 Oct 2017 18:42:35 +0300
Subject: [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use
 prefetch for large arrays
Message-ID: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>

Hi,

as part of JEP ?Improve performance of String and Array operations on 
AArch64? I wanted to send out a pre-review for some of the improved 
intrinsics to get early feedback. This is the first in a row.

Please pre-review patch for 8187472 - ?AARCH64: array_equals intrinsic 
doesn't use prefetch for large arrays? which improves large array 
handling (small arrays are unaffected).

In short, this patch uses large (64 byte) loop with prefetch instruction 
to handle large arrays, which is done in a stub. I can observe 
performance boost on systems without h/w prefetcher up to x6. System 
with hardware prefetching (Cortex A53 and some very modern ones) also 
benefit from this patch (15% improvement).

I've tried a number of different versions (attached to JDK-8187472) with 
different load instructions (ldr/ldp/<simd>), slightly different code 
shapes, different data dependencies across registers, alignments, e.t.c. 
Version presented in webrev (version 2.6d from JDK-8187472 attachments) 
is the simplest from the fast ones (as measured on 3 systems available 
for testing).

I've used this simple benchmark to measure performance: 
http://cr.openjdk.java.net/~dpochepk/8187472/ArrayEqualsBench.java

Chart for ThunderX: 
http://cr.openjdk.java.net/~dpochepk/8187472/ThunderX.png

Chart for Cortex A53(R-Pi): 
http://cr.openjdk.java.net/~dpochepk/8187472/R-Pi.png

Raw numbers for ThunderX: 
http://cr.openjdk.java.net/~dpochepk/8187472/ThunderX.results.txt

Raw numbers for R-Pi: 
http://cr.openjdk.java.net/~dpochepk/8187472/R-Pi.results.txt

webrev: http://cr.openjdk.java.net/~dpochepk/8187472/webrev.01/

Testing: I've run existing jtreg test 
(java/util/Arrays/ArraysEqCmpTest.java) in both Xmixed and Xcomp and 
found no regressions.

Any additional numbers on other systems are welcome, as well as early 
feedback on the code.

Thanks,

Dmitrij

From aph at redhat.com  Mon Oct 30 16:13:06 2017
From: aph at redhat.com (Andrew Haley)
Date: Mon, 30 Oct 2017 16:13:06 +0000
Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals
 intrinsic doesn't use prefetch for large arrays
In-Reply-To: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
Message-ID: <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>

On 30/10/17 15:42, Dmitrij Pochepko wrote:
> Any additional numbers on other systems are welcome, as well as early 
> feedback on the code.

I take it that the small comparisons are unaffected.  The small
comparisons are very common, so they shouldn't be ignored.

The patch seems unobjectionable, but it's extremely hard to test
this stuff.

Why is this change:

@@ -16154,7 +16154,7 @@
   ins_pipe(pipe_class_memory);
 %}

-instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegI_R4 cnt,
+instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegP_R4 cnt,
                         iRegI_R0 result, rFlagsReg cr)
 %{
   predicate(((StrEqualsNode*)n)->encoding() == StrIntrinsicNode::LL);

It seems very odd to me.

Was a vertor-based implementation considered?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From dmitrij.pochepko at bell-sw.com  Mon Oct 30 16:43:30 2017
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Mon, 30 Oct 2017 19:43:30 +0300
Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals
 intrinsic doesn't use prefetch for large arrays
In-Reply-To: <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>
References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
 <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>
Message-ID: <af9fc6a2-ca2f-1107-0934-301fbab6c1c0@bell-sw.com>


On 30.10.2017 19:13, Andrew Haley wrote:
> On 30/10/17 15:42, Dmitrij Pochepko wrote:
>> Any additional numbers on other systems are welcome, as well as early
>> feedback on the code.
> I take it that the small comparisons are unaffected.  The small
> comparisons are very common, so they shouldn't be ignored.
>
> The patch seems unobjectionable, but it's extremely hard to test
> this stuff.
Well, I've actually used small brute force test which generates all 
cases for arrays length from 1 to N(parameter) to test it, because I 
couldn't find better way.

i.e.:
case 0: equal arrays
case 1: arrays different in 1st symbol
...
case N: arrays different in (N-1)th symbol


And this test passed. However, I don't think such test should be added 
to jtreg testbase, because it takes long time to run, so, I assume 
existing array equals test is enough.


>
> Why is this change:
>
> @@ -16154,7 +16154,7 @@
>     ins_pipe(pipe_class_memory);
>   %}
>
> -instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegI_R4 cnt,
> +instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegP_R4 cnt,
>                           iRegI_R0 result, rFlagsReg cr)
>   %{
>     predicate(((StrEqualsNode*)n)->encoding() == StrIntrinsicNode::LL);
>
> It seems very odd to me.
You're right. It's leftover from previous versions. It can be reverted 
back to iRegI_R4.
>
> Was a vertor-based implementation considered?
>
Yes.
I've tried simd loads(even aligned ones to be sure that alignment is not 
an issue). simd versions were attached into JDK-8187472 as
 ?- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop 
iteration)
 ?- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
 ?- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).

I've measured it on ThunderX and found while best non-simd version 
handles 1000000 bytes arrays in ~295 microseconds, simd versions had 
numbers about ~355 microseconds.

Thanks,
Dmitrij

From jamsheed.c.m at oracle.com  Mon Oct 30 16:45:19 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Mon, 30 Oct 2017 22:15:19 +0530
Subject: [10] RFR: 8167409: Invalid value passed to critical JNI function
Message-ID: <d9f65dcd-2d78-3787-24c6-a566d49247e0@oracle.com>

Hi,

request for review,

jbs: https://bugs.openjdk.java.net/browse/JDK-8167409

webrev: http://cr.openjdk.java.net/~jcm/8167409/webrev.00/

(contributed by Ioannis Tsakpinis)

desc: the tmp? reg used to break the shuffling cycle (handled in 
ComputeMoveOrder)

is set to 64 bit.

Best regards,

Jamsheed


From jamsheed.c.m at oracle.com  Mon Oct 30 16:45:53 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Mon, 30 Oct 2017 22:15:53 +0530
Subject: [10] JBS: 8167408: Invalid critical JNI function lookup
Message-ID: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com>

Hi,

request for review,

jbs : https://bugs.openjdk.java.net/browse/JDK-8167408

webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/

(contributed by Ioannis Tsakpinis)

desc:

-- it starts with JavaCritical_ instead of Java_;
-- it does not have extra JNIEnv* and jclass arguments;
-- Java arrays are passed in two arguments: the first is an array 
length, and the second is a pointer to raw array data. That is, no need 
to call GetArrayElements and friends, you can instantly use a direct 
array pointer.

updated arg_size calculation wrt above points.

Best regards,

Jamsheed


From rwestrel at redhat.com  Mon Oct 30 17:02:03 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 30 Oct 2017 18:02:03 +0100
Subject: RFR(L): 8186027: C2: loop strip mining
In-Reply-To: <0972a0db-2115-daa3-9990-7d58915a74a5@oracle.com>
References: <dk6k20cxtak.fsf@rwestrel.remote.csb>
 <35a024b5-6632-0488-a001-14f960257fc7@oracle.com>
 <dk67evjnvww.fsf@rwestrel.remote.csb>
 <8b9f8985-daae-b3b4-ba48-3d9c6185ed95@oracle.com>
 <0972a0db-2115-daa3-9990-7d58915a74a5@oracle.com>
Message-ID: <dk6she0lgc4.fsf@rwestrel.remote.csb>


Hi Vladimir,

> Should we just make _loop_flags field type uint (32-bit) since we hit 16-bit limit?

We don't hit the limit with this change. I have some other changes for
which I had to change _loop_flags to uint. That's where the int -> uint
tweaks are coming from. I can remove them if you like as they are not
required. Sorry for the confusion.

> There is confusion (because you did not have enough bits?) about which loops are marked as 
> strip_mined. I thought it is only inner loop but it looks like out (skeleton) loop also marked as 
> such. I would suggest to mark them differently.

The way it works currently is:

Opcode() == Op_Loop && is_strip_mined() => outer loop
Opcode() == Op_CountedLoop && is_strip_mined() => inner loop

The outer loop can't be transformed to a counted loop so that scheme
shouldn't break.

> I was thinking may be we should create new Loop node subclass for outer loop. Then you don't need 
> special flag for it and it will be obvious what they are in Ideal Graph. The same for outer loop end 
> node.

Ok. That sounds like it could clean up the code a bit. Do you want me to
look into that?

> src/hotspot/share/opto/superword.cpp
>
> Where next change come from?
>
> +      if (t2->Opcode() == Op_AddI && t2 == _lp->as_CountedLoop()->incr()) continue; // don't mess 
> with the iv

I saw a few cases where t2 is the increment of the CountedLoop
iv. SuperWord::opnd_positions_match() then swaps the edges of the AddI
and later CountedLoopEndNode::phi() fails because the edges of the iv's
AddI are not in the expected order anymore.

Roland.

From glaubitz at physik.fu-berlin.de  Sun Oct 15 06:09:19 2017
From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz)
Date: Sun, 15 Oct 2017 06:09:19 -0000
Subject: RFR: 8171853: Remove Shark compiler
In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com>
Message-ID: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de>

Hi Roman!

Please let me look at SPARC next week first before merging this.

And thanks for notifying me that Zero is broken again *sigh*.

People, please test your changes. Yes, I know you all just care about Hotspot. But please understand that there are many people out there who rely on Zero, i.e. they are using it. Breaking code that people actively use is not nice and should not happen in a project like OpenJDK.

Building Zero takes maybe 5 minutes on a fast x86 machine, so I would like to ask everyone to please test their changes against Zero as well. These tests will keep the headaches for people relying on Zero low and also avoids that distributions have to ship many patches on top of OpenJDK upstream.

If you cannot test your patch on a given platform X, please let me know. I have access to every platform supported by OpenJDK except AIX/PPC.

Thanks,
Adrian

> On Oct 15, 2017, at 12:41 AM, Roman Kennke <rkennke at redhat.com> wrote:
> 
> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it.
> 
> What I have done:
> 
> grep -i -R shark src
> grep -i -R shark make
> grep -i -R shark doc
> grep -i -R shark doc
> 
> and purged any reference to shark. Almost everything was straightforward.
> 
> The only things I wasn't really sure of:
> 
> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope that's good?
> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing to do. If not, what *would* be the right thing?
> 
> Then of course I did:
> 
> rm -rf src/hotspot/share/shark
> 
> I also went through the build machinery and removed stuff related to Shark and LLVM libs.
> 
> Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-)
> 
> I tested by building a regular x86 JVM and running JTREG tests. All looks fine.
> 
> - I could not build zero because it seems broken because of the recent Atomic::* changes
> - I could not test any of the other arches that seemed to reference Shark (arm and sparc)
> 
> Here's the full webrev:
> 
> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ <http://cr.openjdk.java.net/%7Erkennke/8171853/webrev.00/>
> 
> Can I get a review on this?
> 
> Thanks, Roman


From vparfinenko at excelsior-usa.com  Fri Oct 27 09:26:04 2017
From: vparfinenko at excelsior-usa.com (Vladimir Parfinenko)
Date: Fri, 27 Oct 2017 16:26:04 +0700
Subject: Bug in HS interpreter: invokeinterface calls non-public method
Message-ID: <DE671AEE06C994438C1F64DBF4C85FA738EF7D@mail.excelsior>

Hi all,

I think I have found a bug in HotSpot interpreter.

The problems happens while invokeinterface of public method from
java.lang.Object (e.g. hashCode()) in case when the actual method
implementation is non-public (e.g. protected).

JVMS tells the following about invokeinterface instruction:
Otherwise, if step 1 or step 2 of the lookup procedure selects a method
that is not public, invokeinterface throws an IllegalAccessError.

However in some cases HS interpreter ignores this access check and
invokes non-public method.

Minimal example using jasm from asmtools is attached below. Compiling
and running it gives the following:

  $ jasm BadImpl.jasm && javac Caller.java
  
  $ java -Xint Caller
  Should pass:
  Should throw IAE:
  Exception in thread "main" java.lang.RuntimeException: protected
hashCode was called
          at BadImpl.hashCode(BadImpl.jasm)
          at Caller.main(Caller.java:11)
  
  $ java -Xcomp Caller
  Should pass:
  Should throw IAE:
  Exception in thread "main" java.lang.IllegalAccessError:
BadImpl.hashCode()I
          at Caller.main(Caller.java:11)


Note that first invocation ("Should pass") is necessary to reproduce the
problem. If you remove it everything works as expected.

Regards,
Vladimir Parfinenko


----------------------- Caller.java -----------------------
  public class Caller {
      public static void main(String[] args) {
          Interf x;
  
          System.out.println("Should pass:");
          x = new GoodImpl();
          x.hashCode();
  
          System.out.println("Should throw IAE:");
          x = new BadImpl();
          x.hashCode();
      }
  }
  
  interface Interf {
      @Override
      int hashCode();
  }
  
  class GoodImpl implements Interf {
  }
----------------------- Caller.java -----------------------

----------------------- BadImpl.jasm -----------------------
  super class BadImpl
  	implements Interf
  {
  
  
  Method "<init>":"()V"
  	stack 1 locals 1
  {
  		aload_0;
  		invokespecial	Method java/lang/Object."<init>":"()V";
  		return;
  }
  
  // override of Object method with protected one, javac doesn't allow
this
  protected Method hashCode:"()I"
  	stack 3 locals 1
  {
  		new	class java/lang/RuntimeException;
  		dup;
  		ldc	String "protected hashCode was called";
  		invokespecial	Method
java/lang/RuntimeException."<init>":"(Ljava/lang/String;)V";
  		athrow;
  }
  
  }
----------------------- BadImpl.jasm -----------------------


From rwestrel at redhat.com  Mon Oct 30 17:02:55 2017
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 30 Oct 2017 18:02:55 +0100
Subject: RFR(S): 8186125: "DU iteration must converge quickly" assert in
 split if with unsafe accesses
In-Reply-To: <dk6h8v4tpz9.fsf@rwestrel.remote.csb>
References: <dk6h8v4tpz9.fsf@rwestrel.remote.csb>
Message-ID: <dk6po94lgao.fsf@rwestrel.remote.csb>


Anyone to review this fix?

Roland.

> http://cr.openjdk.java.net/~roland/8186125/webrev.00/
>
> Split if is missing support for graph shapes with the Opaque4Node that
> was introduced for unsafe accesses by JDK-8176506.
>
> In the test case, the 2 Unsafe accesses share a single Opaque4Node
> before the if. When split if encounters the Cmp->Bol->Opaque4->If chain,
> it only tries to clone Cmp->Bol when it should clone Cmp->Bol->Opaque4
> to make one copy for each If.
>
> Roland.

From aph at redhat.com  Mon Oct 30 17:30:36 2017
From: aph at redhat.com (Andrew Haley)
Date: Mon, 30 Oct 2017 17:30:36 +0000
Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals
 intrinsic doesn't use prefetch for large arrays
In-Reply-To: <af9fc6a2-ca2f-1107-0934-301fbab6c1c0@bell-sw.com>
References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
 <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>
 <af9fc6a2-ca2f-1107-0934-301fbab6c1c0@bell-sw.com>
Message-ID: <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com>

On 30/10/17 16:43, Dmitrij Pochepko wrote:
> I've tried simd loads(even aligned ones to be sure that alignment is not 
> an issue). simd versions were attached into JDK-8187472 as
>  ?- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop 
> iteration)
>  ?- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
>  ?- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).
> 
> I've measured it on ThunderX and found while best non-simd version 
> handles 1000000 bytes arrays in ~295 microseconds, simd versions had 
> numbers about ~355 microseconds.

I'm rather reluctant to accept non-SIMD intrinsics because I expect
SIMD performance to improve, and I expect SIMD to be the future.  The
same is true of implementations which avoid the use of ldp.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From dmitrij.pochepko at bell-sw.com  Mon Oct 30 18:03:54 2017
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Mon, 30 Oct 2017 21:03:54 +0300
Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals
 intrinsic doesn't use prefetch for large arrays
In-Reply-To: <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com>
References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
 <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>
 <af9fc6a2-ca2f-1107-0934-301fbab6c1c0@bell-sw.com>
 <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com>
Message-ID: <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com>


On 30.10.2017 20:30, Andrew Haley wrote:
> On 30/10/17 16:43, Dmitrij Pochepko wrote:
>> I've tried simd loads(even aligned ones to be sure that alignment is not
>> an issue). simd versions were attached into JDK-8187472 as
>>   ?- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop
>> iteration)
>>   ?- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
>>   ?- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).
>>
>> I've measured it on ThunderX and found while best non-simd version
>> handles 1000000 bytes arrays in ~295 microseconds, simd versions had
>> numbers about ~355 microseconds.
> I'm rather reluctant to accept non-SIMD intrinsics because I expect
> SIMD performance to improve, and I expect SIMD to be the future.  The
> same is true of implementations which avoid the use of ldp.
>
I also expected NEON to be faster on very new designs. Since I have a 
SIMD version of this intrinsic that I can merge into stub under an if 
with new option (like UseSIMDForArrayEquals with default value set to 
false, almost the same as existing UseSIMDForMemoryOps, which is used in 
array copy intrinsic) if you want, but it is slower for the CPUs we have 
access to and likely not going to be the default. This way we'll have a 
fast version and a SIMD version.

I am hesitant if it is best to do this, or keep a single, simple, and 
fastest version for now for this intrinsic, and get back to it when SVE 
becomes widely available.

What do you think?

Note that other intrinsics that are in the works will use SIMD.

Thanks,
Dmitrij

From aph at redhat.com  Mon Oct 30 18:06:40 2017
From: aph at redhat.com (Andrew Haley)
Date: Mon, 30 Oct 2017 18:06:40 +0000
Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals
 intrinsic doesn't use prefetch for large arrays
In-Reply-To: <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com>
References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
 <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>
 <af9fc6a2-ca2f-1107-0934-301fbab6c1c0@bell-sw.com>
 <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com>
 <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com>
Message-ID: <47fb00b1-c51a-03d8-83f8-9c7cbd436f74@redhat.com>

On 30/10/17 18:03, Dmitrij Pochepko wrote:
> I am hesitant if it is best to do this, or keep a single, simple, and 
> fastest version for now for this intrinsic, and get back to it when SVE 
> becomes widely available.
> 
> What do you think?

Do it now, or we'll have merge problems later.

> Note that other intrinsics that are in the works will use SIMD.

OK, thanks.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From dmitrij.pochepko at bell-sw.com  Mon Oct 30 18:20:06 2017
From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko)
Date: Mon, 30 Oct 2017 21:20:06 +0300
Subject: [10] RFR: 8189101 - AARCH64: AARCH64: string compare intrinsic
 doesn't use prefetch
Message-ID: <abb84dfa-d5e3-6a7d-8f24-2b5c1865fe1c@bell-sw.com>

Hi,

this is a second pre-review as part of JEP ?Improve performance of 
String and Array operations on AArch64? for another improved intrinsics 
to get early feedback.

Please pre-review patch for 8189101 - ?AARCH64: AARCH64: string compare 
intrinsic doesn't use prefetch?

This patch moves code for long string processing to a stub and 
reorganize it. For large strings code was re-organized, added large 
64-byte unrolled loops and prefetch. Webrev is available at [1].
Surpisingly, it helps a bit for small strings, because code for string 
comparison node is now shorter, so, less icache lines needed to be 
populated to execute it.

A benchmark was developed to measure performance [2], which contains 4 
cases with various sizes: LL (latin1 vs latin1), LU (latin1 vs utf), UL 
(utf vs latin1) and UU (utf vs utf). I can see up to x5 performance on 
systems without h/w prefetcher (ThunderX) and up to 40% improvement on 
system with h/w prefetcher(Cortex A53).

Raw performance numbers are at [3]. Charts for performance numbers above 
are: Cortex A53 [4] and ThunderX [5].

Testing: I've run java/lang/String (contains test for String::compareTo 
method) jtreg tests with both Xmixed and Xcomp modes and found no 
regressions.

Any additional numbers on other systems are welcome, as well as early 
feedback on the code.

[1] http://cr.openjdk.java.net/~dpochepk/8189101/webrev/
[2] http://cr.openjdk.java.net/~dpochepk/8189101/StringCompareBench.java
[3] http://cr.openjdk.java.net/~dpochepk/8189101/strCmp_T88.txt and 
http://cr.openjdk.java.net/~dpochepk/8189101/strCmp_RPi.txt
[4] http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_LL.png 
http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_LU.png 
http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_UL.png and 
http://cr.openjdk.java.net/~dpochepk/8189101/R_Pi_UU.png
[5] http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_LL.png 
http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_UL.png 
http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_LU.png and 
http://cr.openjdk.java.net/~dpochepk/8189101/ThunderX_UU.png

Thanks,

Dmitrij
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171030/9570741f/attachment-0001.html>

From dmitrij.pochepko at bell-sw.com  Mon Oct 30 19:18:45 2017
From: dmitrij.pochepko at bell-sw.com (dmitrij.pochepko at bell-sw.com)
Date: Mon, 30 Oct 2017 22:18:45 +0300
Subject: [aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals
 intrinsic doesn't use prefetch for large arrays
In-Reply-To: <47fb00b1-c51a-03d8-83f8-9c7cbd436f74@redhat.com>
References: <99ecb097-c382-47a0-48db-be85310c1d9d@bell-sw.com>
 <fcc5cbe5-2ed8-7d25-b5b7-61958a2db058@redhat.com>
 <af9fc6a2-ca2f-1107-0934-301fbab6c1c0@bell-sw.com>
 <8e14d691-8edd-27fd-4687-4f1971daf2ea@redhat.com>
 <8144c663-ea6b-8d21-384b-baeb79f596c4@bell-sw.com>
 <47fb00b1-c51a-03d8-83f8-9c7cbd436f74@redhat.com>
Message-ID: <47181509391125@web22j.yandex.ru>

An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171030/a93de94e/attachment.html>

From dean.long at oracle.com  Mon Oct 30 20:48:46 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 30 Oct 2017 13:48:46 -0700
Subject: [10] JBS: 8167408: Invalid critical JNI function lookup
In-Reply-To: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com>
References: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com>
Message-ID: <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com>

I think you need a native test for Windows x86 that defines JavaCritical 
methods with various signatures (especially arrays) to make sure this is 
working correctly.

dl


On 10/30/17 9:45 AM, jamsheed wrote:
> Hi,
>
> request for review,
>
> jbs : https://bugs.openjdk.java.net/browse/JDK-8167408
>
> webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/
>
> (contributed by Ioannis Tsakpinis)
>
> desc:
>
> -- it starts with JavaCritical_ instead of Java_;
> -- it does not have extra JNIEnv* and jclass arguments;
> -- Java arrays are passed in two arguments: the first is an array 
> length, and the second is a pointer to raw array data. That is, no 
> need to call GetArrayElements and friends, you can instantly use a 
> direct array pointer.
>
> updated arg_size calculation wrt above points.
>
> Best regards,
>
> Jamsheed
>


From dean.long at oracle.com  Mon Oct 30 21:30:37 2017
From: dean.long at oracle.com (dean.long at oracle.com)
Date: Mon, 30 Oct 2017 14:30:37 -0700
Subject: [10] RFR: 8167409: Invalid value passed to critical JNI function
In-Reply-To: <d9f65dcd-2d78-3787-24c6-a566d49247e0@oracle.com>
References: <d9f65dcd-2d78-3787-24c6-a566d49247e0@oracle.com>
Message-ID: <f33e748a-a00d-586e-31b1-941cc01d4245@oracle.com>

Hi Jamsheed.? Do you have a test for this?

dl


On 10/30/17 9:45 AM, jamsheed wrote:
> Hi,
>
> request for review,
>
> jbs: https://bugs.openjdk.java.net/browse/JDK-8167409
>
> webrev: http://cr.openjdk.java.net/~jcm/8167409/webrev.00/
>
> (contributed by Ioannis Tsakpinis)
>
> desc: the tmp? reg used to break the shuffling cycle (handled in 
> ComputeMoveOrder)
>
> is set to 64 bit.
>
> Best regards,
>
> Jamsheed
>
>


From dmitry.chuyko at bell-sw.com  Tue Oct 31 16:01:09 2017
From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko)
Date: Tue, 31 Oct 2017 19:01:09 +0300
Subject: [10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter
 and C1
Message-ID: <b861b9cf-4a7f-0748-184d-e3c3ae582dd6@bell-sw.com>

Hello,

Please review an improvement of CRC32C calculation on AArch64. The 
implementation is based on JDK-8155162 [1] and the code for CRC32.

Intrinsics for array / byte buffer and direct byte buffer are enabled in 
C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and 
calls StubRoutines::updateBytesCRC32C().
Template interpreter now also generates 
TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it 
calculates parameters and jumps to StubRoutines::updateBytesCRC32C().

rfe: https://bugs.openjdk.java.net/browse/JDK-8189745
webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/
benchmark: 
http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java

Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost 
in interpreter.

For testing I made comparison of CRC32C result sets in C1 and 
interpreter for both array and direct byte buffer with zero and non-zero 
offset.

-Dmitry

[1] https://bugs.openjdk.java.net/browse/JDK-8155162
[2] 
https://bugs.openjdk.java.net/browse/JDK-8189745?focusedCommentId=14127141&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14127141


From aph at redhat.com  Tue Oct 31 17:25:58 2017
From: aph at redhat.com (Andrew Haley)
Date: Tue, 31 Oct 2017 17:25:58 +0000
Subject: [10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in
 interpreter and C1
In-Reply-To: <b861b9cf-4a7f-0748-184d-e3c3ae582dd6@bell-sw.com>
References: <b861b9cf-4a7f-0748-184d-e3c3ae582dd6@bell-sw.com>
Message-ID: <d30c5083-95a4-4b86-9eaf-cfe40d716063@redhat.com>

Hi,

On 31/10/17 16:01, Dmitry Chuyko wrote:

> Please review an improvement of CRC32C calculation on AArch64. The 
> implementation is based on JDK-8155162 [1] and the code for CRC32.
> 
> Intrinsics for array / byte buffer and direct byte buffer are enabled in 
> C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and 
> calls StubRoutines::updateBytesCRC32C().
> Template interpreter now also generates 
> TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it 
> calculates parameters and jumps to StubRoutines::updateBytesCRC32C().
> 
> rfe: https://bugs.openjdk.java.net/browse/JDK-8189745
> webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/
> benchmark: 
> http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java
> 
> Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost 
> in interpreter.
> 
> For testing I made comparison of CRC32C result sets in C1 and 
> interpreter for both array and direct byte buffer with zero and non-zero 
> offset.

That looks good to me, thanks.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

From vitalyd at gmail.com  Tue Oct 31 18:08:44 2017
From: vitalyd at gmail.com (Vitaly Davidovich)
Date: Tue, 31 Oct 2017 14:08:44 -0400
Subject: 8u144 hotspot fails to reach safepoint due to compiler thread - VM
 frozen
Message-ID: <CAHjP37F2Ey8z2j6m+i6NtONzvqfA5Svpi3P5gZ4ie9ampMLdsw@mail.gmail.com>

Hi guys,

I have some colleagues who appear to be running into
https://bugs.openjdk.java.net/browse/JDK-8059128 on Oracle JDK 8u144
(Linux, x86-64).  Naturally, there's no reproducer but they've seen this
happen several times in the last couple of months.

The symptom is the JVM becomes unresponsive - the application is not
servicing any traffic, and jstack doesn't work without the force option.
 jstack output (with native frames) captured some time apart shows the
compiler thread either in Parse::do_all_blocks -> do_one_block ->
do_one_bytecode -> ... InstanceKlass::has_finalizable_subclass ->
Dependencies::find_finalizable_subclass or <same as the previous one> ...
Dependencies::has_finalizable_subclass() -> Klass::next_sibling()

I see that 8059128 was closed as Incomplete, but it does look like there's
a real issue here.  Has anyone looked into this further or has any new
thoughts/ideas?

My understanding is the working theory is it's related to some data race
between class unloading and the compiler thread observing an inconsistent
(corrupt?) type hierarchy.  I see
https://bugs.openjdk.java.net/browse/JDK-8114823 is also noted as possibly
related - the app we're having trouble with is using G1, but class
unloading isn't disabled of course.  Is there some work around to reduce
the likelihood of having the compiler thread and GC cross paths like this?

Let me know if you need more info.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171031/3750736f/attachment.html>

From jamsheed.c.m at oracle.com  Tue Oct 31 19:37:51 2017
From: jamsheed.c.m at oracle.com (jamsheed)
Date: Wed, 1 Nov 2017 01:07:51 +0530
Subject: [10] JBS: 8167408: Invalid critical JNI function lookup
In-Reply-To: <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com>
References: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com>
 <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com>
Message-ID: <d38ae799-874c-007f-292a-c052aa8be78a@oracle.com>

Hi Dean,

Thank you for the review,

tested with a test case, previously it was not working for windows-x86, 
now it works.

revised webrev with test 
case:http://cr.openjdk.java.net/~jcm/8167408/webrev.01/

Best regards,

Jamsheed


On Tuesday 31 October 2017 02:18 AM, dean.long at oracle.com wrote:
> I think you need a native test for Windows x86 that defines 
> JavaCritical methods with various signatures (especially arrays) to 
> make sure this is working correctly.
>
> dl
>
>
> On 10/30/17 9:45 AM, jamsheed wrote:
>> Hi,
>>
>> request for review,
>>
>> jbs : https://bugs.openjdk.java.net/browse/JDK-8167408
>>
>> webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/
>>
>> (contributed by Ioannis Tsakpinis)
>>
>> desc:
>>
>> -- it starts with JavaCritical_ instead of Java_;
>> -- it does not have extra JNIEnv* and jclass arguments;
>> -- Java arrays are passed in two arguments: the first is an array 
>> length, and the second is a pointer to raw array data. That is, no 
>> need to call GetArrayElements and friends, you can instantly use a 
>> direct array pointer.
>>
>> updated arg_size calculation wrt above points.
>>
>> Best regards,
>>
>> Jamsheed
>>
>


From ionutb83 at yahoo.com  Tue Oct 31 21:59:07 2017
From: ionutb83 at yahoo.com (Ionut)
Date: Tue, 31 Oct 2017 21:59:07 +0000 (UTC)
Subject: Sum of integers optimization
References: <345880303.38177.1509487147494.ref@mail.yahoo.com>
Message-ID: <345880303.38177.1509487147494@mail.yahoo.com>

Hello All,
? I am playing with below example (very trivial, just computing a sum of 1...N integers):
@Benchmarkpublic long sum() {? ? long sum = 0;? ? for (int i = 1; i <= N; i++) { ? sum += i; }? ? return sum;}

Generated asm on my machine (snapshot from the main scalar loop):? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ......................................................? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? 0x00007f4779bff060: movsxd r10,r11d? ? ? ? ? ? ? ? ? ? ? ? ? ??? 0x00007f4779bff063: add? ? rax,r10
? 7.67%? ?24.83% ?? 0x00007f4779bff066: add? ? rax,r10? 6.11%? ? 3.64%? ?? 0x00007f4779bff069: add? ? rax,r10? 4.54%? ? 3.71%? ?? 0x00007f4779bff06c: add? ? rax,r10? 6.12%? ? 5.85%? ?? 0x00007f4779bff06f: add? ? rax,r10? 5.75%? ? 4.21%? ?? 0x00007f4779bff072: add? ? rax,r10? 5.96%? ? 4.38%? ?? 0x00007f4779bff075: add? ? rax,r10? 4.23%? ? 3.63%? ?? 0x00007f4779bff078: add? ? rax,r10? 6.70%? ? 6.32%? ?? 0x00007f4779bff07b: add? ? rax,r10? 7.40%? ? 4.56%? ?? 0x00007f4779bff07e: add? ? rax,r10? 4.61%? ? 3.31%? ?? 0x00007f4779bff081: add? ? rax,r10? 5.45%? ? 5.24%? ?? 0x00007f4779bff084: add? ? rax,r10? 5.99%? ? 5.14%? ?? 0x00007f4779bff087: add? ? rax,r10? 7.70%? ? 5.36%? ?? 0x00007f4779bff08a: add? ? rax,r10? 5.17%? ? 4.16%? ?? 0x00007f4779bff08d: add? ? rax,r10? 3.97%? ? 3.83%? ?? 0x00007f4779bff090: add? ? rax,r10? 4.80%? ? 3.97%? ?? 0x00007f4779bff093: add? ? rax,0x78? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? 5.92%? ? 5.97%? ?? 0x00007f4779bff097: add? ? r11d,0x10? ?? ? ? ? ? ?0.01%? ? ? ?? 0x00007f4779bff09b: cmp? ? r11d,0x5f5e0f2
? ? ? ? ? ? ? ? ? ? ? ? ? ?? 0x00007f4779bff0a2: jl? ? ?0x00007f4779bff060?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ......................................................

Questions:?- Would it be possible for JIT C2 to perform a better optimization in this context, as for example replacing the main loop (which might be costly) by a reduction formula as N*(N-1)/2 (in this specific case)??- Is there any context where JIT C2 can perform such optimization but I am missing??- If not, what prevents it for doing this?
ThanksIonut

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171031/6d6450be/attachment-0001.html>

From vladimir.x.ivanov at oracle.com  Tue Oct 31 22:28:55 2017
From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov)
Date: Wed, 1 Nov 2017 01:28:55 +0300
Subject: [10] JBS: 8167408: Invalid critical JNI function lookup
In-Reply-To: <d38ae799-874c-007f-292a-c052aa8be78a@oracle.com>
References: <34e7e1e6-09bd-a0c3-d3de-23a825474dbb@oracle.com>
 <2e195876-97cf-bc1f-5e3b-e19c5c5f240d@oracle.com>
 <d38ae799-874c-007f-292a-c052aa8be78a@oracle.com>
Message-ID: <d342bab3-210c-5ed4-0ede-142f855b0035@oracle.com>

Jamsheed, nice test!

2 suggestions:

   (1) Enable the test on all platforms: though the bug is 
platform-specific, it doesn't mean the test should be. I don't see any 
platform-specific code there and it's beneficial to test other platforms 
as well

   (2) Add some test cases with multiple array parameters.

Otherwise, looks good.

Best regards,
Vladimir Ivanov

On 10/31/17 10:37 PM, jamsheed wrote:
> Hi Dean,
> 
> Thank you for the review,
> 
> tested with a test case, previously it was not working for windows-x86, 
> now it works.
> 
> revised webrev with test 
> case:http://cr.openjdk.java.net/~jcm/8167408/webrev.01/
> 
> Best regards,
> 
> Jamsheed
> 
> 
> On Tuesday 31 October 2017 02:18 AM, dean.long at oracle.com wrote:
>> I think you need a native test for Windows x86 that defines 
>> JavaCritical methods with various signatures (especially arrays) to 
>> make sure this is working correctly.
>>
>> dl
>>
>>
>> On 10/30/17 9:45 AM, jamsheed wrote:
>>> Hi,
>>>
>>> request for review,
>>>
>>> jbs : https://bugs.openjdk.java.net/browse/JDK-8167408
>>>
>>> webrev: http://cr.openjdk.java.net/~jcm/8167408/webrev.00/
>>>
>>> (contributed by Ioannis Tsakpinis)
>>>
>>> desc:
>>>
>>> -- it starts with JavaCritical_ instead of Java_;
>>> -- it does not have extra JNIEnv* and jclass arguments;
>>> -- Java arrays are passed in two arguments: the first is an array 
>>> length, and the second is a pointer to raw array data. That is, no 
>>> need to call GetArrayElements and friends, you can instantly use a 
>>> direct array pointer.
>>>
>>> updated arg_size calculation wrt above points.
>>>
>>> Best regards,
>>>
>>> Jamsheed
>>>
>>
> 

From igor.ignatyev at oracle.com  Tue Oct 31 23:30:33 2017
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 31 Oct 2017 16:30:33 -0700
Subject: RFR(S) : 8186618 : [TESTBUG] Test applications/ctw/Modules.java
 doesn't have timeout and hang on windows
In-Reply-To: <79997CB7-FF94-4354-BC7E-8CE5B73BDC10@oracle.com>
References: <B7EDC3AF-CAC6-42CE-AF7C-C7172CCE7D0A@oracle.com>
 <75f3096e-b9f6-ca2e-c336-ada8c519db3b@oracle.com>
 <79997CB7-FF94-4354-BC7E-8CE5B73BDC10@oracle.com>
Message-ID: <EC6B2359-40EC-4E26-B27A-A307C0E4F8AD@oracle.com>

got an off-list Review from Jepser (cc'ed). Thank you Jesper!

-- Igor

> On Oct 26, 2017, at 7:44 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> Katya, thank you reviewing it.
> 
> can I have another review for this patch from a Reviewer?
> 
> Thanks,
> -- Igor
>> On Oct 26, 2017, at 5:40 PM, Ekaterina Pavlova <ekaterina.pavlova at oracle.com> wrote:
>> 
>> Looks good.
>> 
>> Thanks for fixing it,
>> 
>> -katya
>> 
>> On 10/17/17 9:45 PM, Igor Ignatyev wrote:
>>> http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
>>>> 546 lines changed: 188 ins; 88 del; 270 mod;
>>> Hi all,
>>> could you please review this fix for ctw test?
>>> in some configurations the test takes too much time, it also didn't have timeout, so in case of a hang, e.g. due to JDK-8189604, no one interrupted its execution.
>>>  the fix splits the test into several tests, one for each jigsaw module, which not only improves the tests' reliability and reportability, but also speeds up the execution via parallel execution. 2 hours timeout, which is enough for each particular module, has been added to all tests. the patch also puts ctw for java.desktop and jdk.jconsole modules in the problem list on windows.
>>> webrev: http://cr.openjdk.java.net/~iignatyev//8186618/webrev.00/index.html
>>> testing: applications/ctw/modules tests
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8186618
>>> Thanks,
>>> -- Igor
>> 
> 


From rednaxelafx at gmail.com  Tue Oct 31 23:42:56 2017
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Tue, 31 Oct 2017 16:42:56 -0700
Subject: Sum of integers optimization
In-Reply-To: <345880303.38177.1509487147494@mail.yahoo.com>
References: <345880303.38177.1509487147494.ref@mail.yahoo.com>
 <345880303.38177.1509487147494@mail.yahoo.com>
Message-ID: <CA+cQ+tRNsmCqH4L63UmUMskuzYEWxKOzV+exydCd4iEWmo2zug@mail.gmail.com>

Hi Ionut,

tl;dr: C2's infrastructure for optimizing loops can be made a lot stronger,
but from the current directions we can see around the OpenJDK community,
it's very unlikely for C2 to receive a major infrastructural upgrade in the
future. If you'd like to contribute to Graal to help optimize this kind of
code, I'm sure a lot of us in the community would love that.

You're right about the code produced by C2. Just ran your example on
JDK9/macOS and the main loop produced by C2 is:

  0x0000000118ee6640: movslq %r11d,%r10         ;*i2l {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 12 (line 7)

  0x0000000118ee6643: add    %r10,%rax
  0x0000000118ee6646: add    %r10,%rax
  0x0000000118ee6649: add    %r10,%rax
  0x0000000118ee664c: add    %r10,%rax
  0x0000000118ee664f: add    %r10,%rax
  0x0000000118ee6652: add    %r10,%rax
  0x0000000118ee6655: add    %r10,%rax
  0x0000000118ee6658: add    %r10,%rax
  0x0000000118ee665b: add    %r10,%rax
  0x0000000118ee665e: add    %r10,%rax
  0x0000000118ee6661: add    %r10,%rax
  0x0000000118ee6664: add    %r10,%rax
  0x0000000118ee6667: add    %r10,%rax
  0x0000000118ee666a: add    %r10,%rax
  0x0000000118ee666d: add    %r10,%rax
  0x0000000118ee6670: add    %r10,%rax
  0x0000000118ee6673: add    $0x78,%rax         ;*ladd {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 13 (line 7)

  0x0000000118ee6677: add    $0x10,%r11d        ;*iinc {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 15 (line 6)

Pretty much the same as what you saw.

It's certainly possible to tweak C2 or some other JIT compiler to make it
more optimized for this test case. I don't have a copy of Zing right now
but I believe its Falcon compiler will compile this down to the N*(N-1)/2
form that you expected, since the LLVM it's based on can compile this piece
of C code:

#include <stdint.h>

int64_t sum(int n) {
  int64_t sum = 0;
  for (int32_t i = 1; i <= n; i++) {
    sum += i;
  }
  return sum;
}

Down to:

sum:
  test edi, edi
  jle .LBB0_1
  lea eax, [rdi - 1]
  add edi, -2
  imul rdi, rax
  shr rdi
  lea rax, [rdi + 2*rax + 1]
  ret
.LBB0_1:
  xor eax, eax
  ret

For this test case, C2 could at least do a few things to generate better
code:
1. A better expression canonicalizer that flattens expression trees. The
chain of adds you see in the resulting code is because of the 16x unrolled sum
+= 1 is turned into:

// 120 == 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 +
15
sum = ((((((((((((((((sum + i) + i) + i) + i) + i) + i) + i) + i) + i) + i)
+ i) + i) + i) + i) + i) + i) + 120

See how the additions involving i are skewed to the left, effectively
degenerating an expression tree into a "linked list of additions". C2's
value number, on its own, doesn't recognize that it can reassociate the
expression into a flatter tree, e.g.

((((i + i) + (i + i)) + ((i + i) + (i + i))) + (((i + i) + (i + i)) + ((i +
i) + (i + i)))) + sum + 120

in which case C2's value number will be able to turn into:

t1 = i + i
t2 = t1 + t1
t3 = t2 + t2
t4 = t3 + t3
sum = t4 + sum + 120

and then into sum = (i << 4) + sum + 120.

This kind of reassociation will at least help make the loop body better,
without involving any complicated loop optimizations. The "tree flattening"
reassociation can actually be implemented by directly linearizing an
expression tree into a C0*X + C1*Y + ... + C2 form.

To get to the end goal of optimizing the whole loop into the N*(N-1)/2
form, you'd need more advanced loop analysis, e.g. something akin to LLVM's
SCEV, to recognize how "sum" is related to the loop induction variable.

BTW, Graal from graalvm-0.22 generates a straightforward loop for this case:

XYZ.sum (null)  [0x000000010cc091e0, 0x000000010cc09230]  80 bytes
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000011ebfd768} 'sum' '()J' in 'XYZ'
  #           [sp+0x10]  (sp of caller)
  0x000000010cc091e0: nopl   0x0(%rax,%rax,1)
  0x000000010cc091e5: mov    $0x1,%r10d
  0x000000010cc091eb: mov    $0x0,%rax
  0x000000010cc091f2: jmpq   0x000000010cc0920f
  0x000000010cc091f7: nopw   0x0(%rax,%rax,1)   ;*if_icmpgt {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 7 (line 6)

  0x000000010cc09200: mov    %r10d,%r11d
  0x000000010cc09203: inc    %r11d              ;*iinc {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 15 (line 6)

  0x000000010cc09206: movslq %r10d,%r10         ;*i2l {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 12 (line 7)

  0x000000010cc09209: add    %r10,%rax          ;*ladd {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 13 (line 7)

  0x000000010cc0920c: mov    %r11d,%r10d        ;*goto {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 18 (line 6)

  0x000000010cc0920f: cmp    $0x186a1,%r10d
  0x000000010cc09216: jl     0x000000010cc09200  ;*if_icmpgt {reexecute=0
rethrow=0 return_oop=0}
                                                ; - XYZ::sum at 7 (line 6)

  0x000000010cc09218: test   %eax,-0x1d69218(%rip)        #
0x000000010aea0006
                                                ;   {poll_return}
  0x000000010cc0921e: vzeroupper
  0x000000010cc09221: retq

- Kris

On Tue, Oct 31, 2017 at 2:59 PM, Ionut <ionutb83 at yahoo.com> wrote:

> Hello All,
>
>   I am playing with below example (very trivial, just computing a sum of
> 1...N integers):
>
> *@Benchmark*
> *public long sum() {*
> *    long sum = 0;*
> *    for (int i = 1; i <= N; i++) {*
> *   sum += i;*
> * }*
> *    return sum;*
> *}*
>
>
> Generated asm on my machine (snapshot from the main scalar loop):
>                                 ..............................
> ........................
>                             ?  0x00007f4779bff060: movsxd r10,r11d
>                            ?  0x00007f4779bff063: add    rax,r10
>   7.67%   24.83% ?  0x00007f4779bff066: add    rax,r10
>   6.11%    3.64%  ?  0x00007f4779bff069: add    rax,r10
>   4.54%    3.71%  ?  0x00007f4779bff06c: add    rax,r10
>   6.12%    5.85%  ?  0x00007f4779bff06f: add    rax,r10
>   5.75%    4.21%  ?  0x00007f4779bff072: add    rax,r10
>   5.96%    4.38%  ?  0x00007f4779bff075: add    rax,r10
>   4.23%    3.63%  ?  0x00007f4779bff078: add    rax,r10
>   6.70%    6.32%  ?  0x00007f4779bff07b: add    rax,r10
>   7.40%    4.56%  ?  0x00007f4779bff07e: add    rax,r10
>   4.61%    3.31%  ?  0x00007f4779bff081: add    rax,r10
>   5.45%    5.24%  ?  0x00007f4779bff084: add    rax,r10
>   5.99%    5.14%  ?  0x00007f4779bff087: add    rax,r10
>   7.70%    5.36%  ?  0x00007f4779bff08a: add    rax,r10
>   5.17%    4.16%  ?  0x00007f4779bff08d: add    rax,r10
>   3.97%    3.83%  ?  0x00007f4779bff090: add    rax,r10
>   4.80%    3.97%  ?  0x00007f4779bff093: add    rax,0x78
>
>   5.92%    5.97%  ?  0x00007f4779bff097: add    r11d,0x10
>            0.01%      ?  0x00007f4779bff09b: cmp    r11d,0x5f5e0f2
>                           ?  0x00007f4779bff0a2: jl     0x00007f4779bff060
>                                 ..............................
> ........................
>
> *Questions*:
> - Would it be possible for JIT C2 to perform a better optimization in this
> context, as for example replacing the main loop (which might be costly) by
> a reduction formula as N*(N-1)/2 (in this specific case)?
> - Is there any context where JIT C2 can perform such optimization but I am
> missing?
> - If not, what prevents it for doing this?
>
> Thanks
> Ionut
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171031/661d08d6/attachment-0001.html>