From christian.thalinger at oracle.com Wed Apr 1 00:36:28 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Tue, 31 Mar 2015 17:36:28 -0700 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: <551A0B72.8090208@oracle.com> References: <55133E1C.7070803@oracle.com> <5519FC8B.6070602@oracle.com> <551A0B72.8090208@oracle.com> Message-ID: <275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com> > On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov wrote: > > On 3/30/15 7:20 PM, Berg, Michael C wrote: >> Almost, it's more than that, there are missing components in long support in AVX2, so we only allow what superword can currently process safely and bypass the question of long support for reductions until AVX3, where support is complete enough to allow those forms of reductions. > > Okay. > >> Nils was the initial reviewer and sponsor, so Nils can you make another pass and comment on the current webrev for the review. > > Nils is out for few days. Christian looked on this too, let him do second review. The only comment I have is this opening brace should be one the same line: + void SuperWord::packset_sort(int n) + { Otherwise this looks good. Thanks. > > Thanks, > Vladimir > >> >> Thanks, >> >> -Michael >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Monday, March 30, 2015 6:47 PM >> To: Berg, Michael C >> Cc: 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) >> >> Here is updated webrev which addressed these and other issues: >> >> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/ >> >> Michael, I noticed that .ad file does not have matched instructions for AddReductionVL. I assume it is because there is no avx3 yet. Right? >> >> Otherwise this look good to me. You need second review from an other Reviewer since changes are big. >> >> Thanks, >> Vladimir >> >> On 3/25/15 4:00 PM, Vladimir Kozlov wrote: >>> Please, ignore previous email. I screwed up Michael's email address. >>> >>> Hi Michael, >>> >>> I have few major concerns which you need to address. >>> >>> Adding new field _attr to Node class should be avoided - it will >>> increase significantly memory footprint of graph and not be used >>> frequently (vectorization is rare case). >>> >>> NodeFlags has only 16 bits and you used 2. And I don't see how >>> Flag_is_loop_carried_dep is used. >>> >>> All above goes to one question: why mark_reductions() is executed in >>> loopopts before each unroll and not during superword processing? >>> If you do mark_reductions() in superword you can use VectorSet to >>> indicate nodes which are reduction nodes. >>> >>> And the same for _attr. Why to store alignment in Node and not use >>> _node_info in packset_eval()? >>> >>> Small note. Instead of: >>> + Node *defNode = n->in(len - 1); >>> use: >>> + Node *defNode = n->in(LoopNode::LoopBackControl); >>> >>> Thanks, >>> Vladimir >>> >>> >>> On 3/25/15 1:09 PM, Berg, Michael C wrote: >>>> Christian/Nils: Any additional comments for the review, if not >>>> Thursday I will upload the final webrev with the requested change. >>>> >>>> Thanks, >>>> >>>> -Michael >>>> >>>> *From:*Berg, Michael C >>>> *Sent:* Thursday, March 19, 2015 5:55 PM >>>> *To:* Christian Thalinger >>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction >>>> optimization ) >>>> >>>> Christian, yes we could rely on the base class definitions instead, >>>> since we are not augmenting arguments. >>>> >>>> I will remove the file changes after the review concludes in case >>>> there are any other modifications. >>>> >>>> Thanks, >>>> >>>> -Michael >>>> >>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>> *Sent:* Thursday, March 19, 2015 3:52 PM >>>> *To:* Berg, Michael C >>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>> >>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>> optimization ) >>>> >>>> On Mar 19, 2015, at 3:23 PM, Berg, Michael C >>>> > wrote: >>>> >>>> I have updated the webrev contents after some feedback(with no code >>>> changes), and Vladimir has placed it in location everyone can access. >>>> Anyone should be able to apply the patch or review the code from >>>> this info: >>>> >>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.00/ >>>> >>>> src/cpu/x86/vm/macroAssembler_x86.hpp: >>>> >>>> Why do we need these methods? MacroAssembler extends Assembler. >>>> >>>> >>>> this replaces the JBS version of the webrev files for 8074981. >>>> >>>> Thanks, >>>> >>>> -Michael >>>> >>>> -----Original Message----- >>>> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >>>> Sent: Thursday, March 19, 2015 12:55 AM >>>> To: Berg, Michael C >>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>> >>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>> optimization ) >>>> >>>> Michael, >>>> >>>> I've got it, thank you for explanation. >>>> >>>> Regards, >>>> Filipp. >>>> >>>> On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C >>>> > wrote: >>>> >>>> Filipp, for large iteration loops, if I am taking your meaning >>>> correctly, you could not do that without splitting the loop and >>>> re-architecting it into a loop nest pair to manage the reduction >>>> components. Seems like the overhead from that scenario could >>>> create cost issues where reductions could actually hamper >>>> performance in small vector expressions. Right now we never >>>> degrade and generally benefit with the implementation as it >>>> stands with the reductions stitched into the vector unit >>>> computations directly. >>>> >>>> Regarding sub/div/etc: >>>> For now we have waived off on non-commuting operations like sub >>>> and div, they would have to be very strictly managed via >>>> pack-set placement. But the answer is yes we could support them. >>>> >>>> Thanks, >>>> -Michael >>>> >>>> -----Original Message----- >>>> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >>>> Sent: Wednesday, March 18, 2015 2:20 AM >>>> To: Berg, Michael C >>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>> >>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>> optimization >>>> ) >>>> >>>> Hi Michael, >>>> >>>> thank you for contributing such a great improvement! >>>> >>>> Sorry if my question is silly, but I'm curious wouldn't it be >>>> better to replace integer scalar reduction variable with a >>>> vector "Rv" in loop's prologue, compile loop's body as a regular >>>> vectorized addition/multiplication, and reduce "Rv" to a scalar >>>> in loop's epilogue? >>>> >>>> Why you didn't add SubReduction* nodes? >>>> >>>> Best regards, >>>> Filipp. >>>> >>>> >>>> On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C >>>> > >>>> wrote: >>>> >>>> Hi All, >>>> >>>> >>>> >>>> We would like to contribute the Integer/FP scalar reduction >>>> optimization from Intel. >>>> >>>> The contribution is referenced as Bug ID 8074981 as a >>>> performance >>>> enhancement. >>>> >>>> >>>> >>>> Please review this patch: >>>> >>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 >>>> >>>> webrev: >>>> >>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip >>>> >>>> >>>> >>>> The optimization achieves as much as 2.3x on integer >>>> reductions and >>>> supports float and double precision optimizations >>>> >>>> which also have significant optimization uplift an obey >>>> strict fp >>>> constraints. >>>> >>>> >>>> >>>> Nils Eliasson has offered to sponsor this patch. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> -Michael >>>> From vladimir.kozlov at oracle.com Wed Apr 1 00:37:42 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 31 Mar 2015 17:37:42 -0700 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: <275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com> References: <55133E1C.7070803@oracle.com> <5519FC8B.6070602@oracle.com> <551A0B72.8090208@oracle.com> <275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com> Message-ID: <551B3DD6.9060807@oracle.com> On 3/31/15 5:36 PM, Christian Thalinger wrote: > >> On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov wrote: >> >> On 3/30/15 7:20 PM, Berg, Michael C wrote: >>> Almost, it's more than that, there are missing components in long support in AVX2, so we only allow what superword can currently process safely and bypass the question of long support for reductions until AVX3, where support is complete enough to allow those forms of reductions. >> >> Okay. >> >>> Nils was the initial reviewer and sponsor, so Nils can you make another pass and comment on the current webrev for the review. >> >> Nils is out for few days. Christian looked on this too, let him do second review. > > The only comment I have is this opening brace should be one the same line: > > + void SuperWord::packset_sort(int n) > + { I will fix it before push. Thanks, Vladimir > > Otherwise this looks good. Thanks. > >> >> Thanks, >> Vladimir >> >>> >>> Thanks, >>> >>> -Michael >>> >>> -----Original Message----- >>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>> Sent: Monday, March 30, 2015 6:47 PM >>> To: Berg, Michael C >>> Cc: 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) >>> >>> Here is updated webrev which addressed these and other issues: >>> >>> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/ >>> >>> Michael, I noticed that .ad file does not have matched instructions for AddReductionVL. I assume it is because there is no avx3 yet. Right? >>> >>> Otherwise this look good to me. You need second review from an other Reviewer since changes are big. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/25/15 4:00 PM, Vladimir Kozlov wrote: >>>> Please, ignore previous email. I screwed up Michael's email address. >>>> >>>> Hi Michael, >>>> >>>> I have few major concerns which you need to address. >>>> >>>> Adding new field _attr to Node class should be avoided - it will >>>> increase significantly memory footprint of graph and not be used >>>> frequently (vectorization is rare case). >>>> >>>> NodeFlags has only 16 bits and you used 2. And I don't see how >>>> Flag_is_loop_carried_dep is used. >>>> >>>> All above goes to one question: why mark_reductions() is executed in >>>> loopopts before each unroll and not during superword processing? >>>> If you do mark_reductions() in superword you can use VectorSet to >>>> indicate nodes which are reduction nodes. >>>> >>>> And the same for _attr. Why to store alignment in Node and not use >>>> _node_info in packset_eval()? >>>> >>>> Small note. Instead of: >>>> + Node *defNode = n->in(len - 1); >>>> use: >>>> + Node *defNode = n->in(LoopNode::LoopBackControl); >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> >>>> On 3/25/15 1:09 PM, Berg, Michael C wrote: >>>>> Christian/Nils: Any additional comments for the review, if not >>>>> Thursday I will upload the final webrev with the requested change. >>>>> >>>>> Thanks, >>>>> >>>>> -Michael >>>>> >>>>> *From:*Berg, Michael C >>>>> *Sent:* Thursday, March 19, 2015 5:55 PM >>>>> *To:* Christian Thalinger >>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction >>>>> optimization ) >>>>> >>>>> Christian, yes we could rely on the base class definitions instead, >>>>> since we are not augmenting arguments. >>>>> >>>>> I will remove the file changes after the review concludes in case >>>>> there are any other modifications. >>>>> >>>>> Thanks, >>>>> >>>>> -Michael >>>>> >>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>> *Sent:* Thursday, March 19, 2015 3:52 PM >>>>> *To:* Berg, Michael C >>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>>> optimization ) >>>>> >>>>> On Mar 19, 2015, at 3:23 PM, Berg, Michael C >>>>> > wrote: >>>>> >>>>> I have updated the webrev contents after some feedback(with no code >>>>> changes), and Vladimir has placed it in location everyone can access. >>>>> Anyone should be able to apply the patch or review the code from >>>>> this info: >>>>> >>>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.00/ >>>>> >>>>> src/cpu/x86/vm/macroAssembler_x86.hpp: >>>>> >>>>> Why do we need these methods? MacroAssembler extends Assembler. >>>>> >>>>> >>>>> this replaces the JBS version of the webrev files for 8074981. >>>>> >>>>> Thanks, >>>>> >>>>> -Michael >>>>> >>>>> -----Original Message----- >>>>> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >>>>> Sent: Thursday, March 19, 2015 12:55 AM >>>>> To: Berg, Michael C >>>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>>> optimization ) >>>>> >>>>> Michael, >>>>> >>>>> I've got it, thank you for explanation. >>>>> >>>>> Regards, >>>>> Filipp. >>>>> >>>>> On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C >>>>> > wrote: >>>>> >>>>> Filipp, for large iteration loops, if I am taking your meaning >>>>> correctly, you could not do that without splitting the loop and >>>>> re-architecting it into a loop nest pair to manage the reduction >>>>> components. Seems like the overhead from that scenario could >>>>> create cost issues where reductions could actually hamper >>>>> performance in small vector expressions. Right now we never >>>>> degrade and generally benefit with the implementation as it >>>>> stands with the reductions stitched into the vector unit >>>>> computations directly. >>>>> >>>>> Regarding sub/div/etc: >>>>> For now we have waived off on non-commuting operations like sub >>>>> and div, they would have to be very strictly managed via >>>>> pack-set placement. But the answer is yes we could support them. >>>>> >>>>> Thanks, >>>>> -Michael >>>>> >>>>> -----Original Message----- >>>>> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >>>>> Sent: Wednesday, March 18, 2015 2:20 AM >>>>> To: Berg, Michael C >>>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>>> >>>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>>> optimization >>>>> ) >>>>> >>>>> Hi Michael, >>>>> >>>>> thank you for contributing such a great improvement! >>>>> >>>>> Sorry if my question is silly, but I'm curious wouldn't it be >>>>> better to replace integer scalar reduction variable with a >>>>> vector "Rv" in loop's prologue, compile loop's body as a regular >>>>> vectorized addition/multiplication, and reduce "Rv" to a scalar >>>>> in loop's epilogue? >>>>> >>>>> Why you didn't add SubReduction* nodes? >>>>> >>>>> Best regards, >>>>> Filipp. >>>>> >>>>> >>>>> On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C >>>>> > >>>>> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> >>>>> >>>>> We would like to contribute the Integer/FP scalar reduction >>>>> optimization from Intel. >>>>> >>>>> The contribution is referenced as Bug ID 8074981 as a >>>>> performance >>>>> enhancement. >>>>> >>>>> >>>>> >>>>> Please review this patch: >>>>> >>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 >>>>> >>>>> webrev: >>>>> >>>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip >>>>> >>>>> >>>>> >>>>> The optimization achieves as much as 2.3x on integer >>>>> reductions and >>>>> supports float and double precision optimizations >>>>> >>>>> which also have significant optimization uplift an obey >>>>> strict fp >>>>> constraints. >>>>> >>>>> >>>>> >>>>> Nils Eliasson has offered to sponsor this patch. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> >>>>> -Michael >>>>> > From roland.westrelin at oracle.com Wed Apr 1 07:39:53 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 1 Apr 2015 09:39:53 +0200 Subject: RFR(S): 8075587: Compilation of constant array containing different sub classes crashes the JVM In-Reply-To: <55198DBE.2040203@oracle.com> References: <55198DBE.2040203@oracle.com> Message-ID: <64C6CBF7-5437-4B58-819B-780C32127A76@oracle.com> Thanks for the review, Vladimir. Roland. > On Mar 30, 2015, at 7:54 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 3/27/15 6:05 AM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8075587/webrev.00/ >> >> The bug was introduced by: >> >> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/5231c2210388 >> >> which causes the meet of 2 constant arrays to result in an array of elements of type bottom. >> >> Roland. >> From zoltan.majo at oracle.com Wed Apr 1 13:38:27 2015 From: zoltan.majo at oracle.com (=?windows-1252?Q?Zolt=E1n_Maj=F3?=) Date: Wed, 01 Apr 2015 15:38:27 +0200 Subject: [9] RFR(S): 8068945: Use RBP register as proper frame pointer in JIT compiled code on x86 In-Reply-To: <5519C29D.8080200@oracle.com> References: <55156A87.1070607@oracle.com> <1427706703.1606.22.camel@mylittlepony.linaroharston> <55196C2C.8080106@oracle.com> <5519B1AE.8070901@oracle.com> <5519BC6E.1090504@oracle.com> <5519C29D.8080200@oracle.com> Message-ID: <551BF4D3.90805@oracle.com> Hi Vladimir, On 03/30/2015 11:39 PM, Vladimir Kozlov wrote: > On 3/30/15 2:13 PM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> thank you for the feedback! >> >> On 03/30/2015 10:27 PM, Vladimir Kozlov wrote: >>> How about PreserveFramePointer instead of simple FramePointer? >>> >>> PreserveFramePointer will mean that compiled (or other) code will use >>> that register only as Frame pointer. >> >> I will change the flag's name to PreserveFramePointer and will also >> update the description. >> >>> Zoltan, x86 flags setting should be in general globals_x86.hpp. You >>> can #ifdef _LP64 there too. I don't understand why you only set it to >>> true on linux-x64. >> >> I remembered that the original discussion with Brendan Gregg mentioned >> only Linux's perf tool as a possible use case for "proper" frame >> pointers. So I was unsure whether to enable proper frame pointers by >> default on other x64 platforms as well. >> >> But if you think it would be better to have proper frame pointers on all >> x64 platforms, I will change the code to set PreserveFramePointer to >> true for all x64 platforms. Just please let me know. > > Currently compiled code for all x86 platforms is almost the same > (win64 has difference in registers usage) and we should keep it that way. > > Also the original request was to have flag to enable such behavior > (use RBP only as FP). So to have it off by default is acceptable. If > performance group or someone find a regression (or bug) due to this > change we can switch the flag off by default before jdk9 release. > > Try to run pstack on Solaris and jstack on OSX to make sure they > report correct call stack with compiled java methods. And JFR. > Also it would be nice to run SunStudio analyzer to verify that it works. I ran all tools you've suggested. JFR and jstack is unaffected, pstack produces nice stack traces (it did not always do so before). However, I've encountered a problem with SunStudio: Two asserts fail in the fastdebug build. Both of them "soft" failures, as neither the VM nor SunStudio crash with the product build. I worked on the problem today and have a partial understanding of the issue, but more investigation is needed to have a patch that preserves the correct behavior of SunStudio as well. So that will put this RFR on hold for a while, unfortunately. Thank you for the feedback and suggestions so far! Best regards, Zoltan > > Thanks, > Vladimir > >> >> Thank you! >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 3/30/15 8:30 AM, Zolt?n Maj? wrote: >>>> Hi Ed, >>>> >>>> >>>> thank you for your feedback! Please see comments below. >>>> >>>> On 03/30/2015 11:11 AM, Edward Nevill wrote: >>>>> Hi Zolt?n, >>>>> >>>>> On Fri, 2015-03-27 at 15:34 +0100, Zolt?n Maj? wrote: >>>>>> Full JPRT run, all tests pass. I also ran all hotspot compiler >>>>>> tests and >>>>>> the jdk tests in java/lang/invoke on both x86_64 and x86_32. All >>>>>> tests >>>>>> that pass without the patch pass also with the patch. >>>>>> >>>>>> I ran the SPEC JVM 2008 benchmarks on our performance >>>>>> infrastructure for >>>>>> x86_64. The performance evaluation suggests that there is no >>>>>> statistically significant performance degradation due to having >>>>>> proper >>>>>> frame pointers. Therefore I propose to have OmitFramePointer set to >>>>>> false by default on x86_64 (and set to true on all other platforms). >>>>> This patch looks good, however I think there is a problem with the >>>>> logic of OmitFramePointer. >>>>> >>>>> Here is my test case. >>>>> >>>>> --- CUT HERE --- >>>>> // $Id: fibo.java,v 1.2 2000/12/24 19:10:50 doug Exp $ >>>>> // http://www.bagley.org/~doug/shootout/ >>>>> >>>>> public class fibo { >>>>> public static void main(String args[]) { >>>>> int N = Integer.parseInt(args[0]); >>>>> System.out.println(fib(N)); >>>>> } >>>>> public static int fib(int n) { >>>>> if (n < 2) return(1); >>>>> return( fib(n-2) + fib(n-1) ); >>>>> } >>>>> } >>>>> --- CUT HERE --- >>>>> >>>>> If I run it as follows on my x86 64 bit linux. >>>>> >>>>> /work/images/jdk/bin/java -XX:-TieredCompilation >>>>> -XX:+PrintCompilation >>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions >>>>> -XX:-OmitFramePointer -XX:+PrintAssembly fibo 43 >>>>> >>>>> I get >>>>> >>>>> # {method} {0x00007fc62c97f388} 'fib' '(I)I' in 'fibo' >>>>> # parm0: rsi = int >>>>> # [sp+0x30] (sp of caller) >>>>> 0x00007fc625071100: mov %eax,-0x14000(%rsp) >>>>> 0x00007fc625071107: push %rbp >>>>> 0x00007fc625071108: mov %rsp,%rbp >>>>> 0x00007f836907110b: sub $0x20,%rsp ;*synchronization entry >>>>> >>>>> which is correct, it is NOT(-) OmitFramePointer, therefore it is >>>>> using >>>>> the frame pointer >>>>> >>>>> Now if I try just changing -XX:-OmitFramePointer to >>>>> -XX:+OmitFramePointer in the above I get >>>>> >>>>> /work/images/jdk/bin/java -XX:-TieredCompilation >>>>> -XX:+PrintCompilation >>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions >>>>> -XX:+OmitFramePointer -XX:+PrintAssembly fibo 43 >>>>> >>>>> I get >>>>> >>>>> # {method} {0x00007f14d3c00388} 'fib' '(I)I' in 'fibo' >>>>> # parm0: rsi = int >>>>> # [sp+0x30] (sp of caller) >>>>> 0x00007f14e1071100: mov %eax,-0x14000(%rsp) >>>>> 0x00007f14e1071107: push %rbp >>>>> 0x00007f14e1071108: sub $0x20,%rsp ;*synchronization entry >>>>> >>>>> which is correct, it is ID(+) OmitFramePointer, therefore it does not >>>>> use a frame pointer. >>>>> >>>>> However, if I now delete the -XX:+/-OmitFramePointer altogether, IE >>>>> >>>>> /work/images/jdk/bin/java -XX:-TieredCompilation >>>>> -XX:+PrintCompilation >>>>> -XX:CompileOnly=fibo::fib -XX:+UnlockDiagnosticVMOptions >>>>> -XX:+PrintAssembly fibo 43 >>>>> >>>>> I get >>>>> >>>>> # {method} {0x00007f0c4b730388} 'fib' '(I)I' in 'fibo' >>>>> # parm0: rsi = int >>>>> # [sp+0x30] (sp of caller) >>>>> 0x00007f0c75071100: mov %eax,-0x14000(%rsp) >>>>> 0x00007f0c75071107: push %rbp >>>>> 0x00007f0c75071108: sub $0x20,%rsp ;*synchronization entry >>>>> >>>>> It is not using a frame pointer which is the equivalent of >>>>> -XX:+OmitFramePointer. However in your description above you say >>>>> >>>>>> Therefore I propose to have OmitFramePointer set to false by default >>>>>> on x86_64 (and set to true on all other platforms). >>>>> whereas OmitFramePointer actually seems to be set to true on x86_64 >>>>> >>>>> I think the problem may be with the declaration and definition of >>>>> OmitFramePointer in globals.hpp and globals_x86.hpp >>>>> >>>>> In globals.hpp it does >>>>> >>>>> product(bool, OmitFramePointer, true, >>>>> >>>>> In globals_x86.hpp it does >>>>> >>>>> LP64_ONLY(define_pd_global(bool, OmitFramePointer, false);); >>>>> >>>>> I am not sure that you can mix product(...) and product_pd(...) like >>>>> this, so I think it just ends up getting the default from the >>>>> product(...). >>>> >>>> You are right, mixing product and product_pd does not make sense at >>>> all. >>>> Thank you for doing additional testing and for drawing attention to >>>> the >>>> problem. >>>> >>>> I updated the code to use product_pd and define_pd_global on all >>>> relevant platforms. >>>> >>>>> Aside: In general, I do not like options which include a negative in >>>>> them because I have to do a double think when I see something like, >>>>> -XX:-OmitFramePointer, as in, it is omitting the frame pointer, >>>>> therefore it is using a frame pointer. How about FramePointer so we >>>>> have -XX:+FramePointer to say I want frame pointers and >>>>> -XX:-FramePointer to say I don't. >>>> >>>> That is a good idea. Double negation is an unnecessary >>>> complication, so >>>> I changed the name of the flag to FramePointer, just as you suggested. >>>> >>>>> >>>>> I did some timing on the above 'fibo' test >>>>> >>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java >>>>> -XX:-OmitFramePointer fibo 43 >>>>> 701408733 >>>>> >>>>> real 0m1.545s >>>>> user 0m1.571s >>>>> sys 0m0.015s >>>>> [ed at mylittlepony java]$ time /work/images/jdk/bin/java >>>>> -XX:+OmitFramePointer fibo 43 >>>>> 701408733 >>>>> >>>>> real 0m1.504s >>>>> user 0m1.527s >>>>> sys 0m0.019s >>>>> >>>>> which is ~3% difference on this test case. On aarch64, I see ~7% >>>>> difference on this test case. >>>> >>>> Thank you for the performance measurements! >>>> >>>>> With the above change to fix the logic of OmitFramePointer (and >>>>> possible change its name) the patch looks good to me. >>>> >>>> Here is the updated webrev (the same webrev that was already included >>>> into my reply to Roland): >>>> >>>> http://cr.openjdk.java.net/~zmajo/8068945/webrev.01/ >>>> >>>>> I will prepare a mirror patch for aarch64. >>>> >>>> That would be great! >>>> >>>> Thank you and best regards, >>>> >>>> >>>> Zolt?n >>>> >>>>> >>>>> All the best, >>>>> Ed. >>>>> >>>>> >>>>> >>>> >> From vladimir.kozlov at oracle.com Wed Apr 1 17:40:33 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 01 Apr 2015 10:40:33 -0700 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: <551B3DD6.9060807@oracle.com> References: <55133E1C.7070803@oracle.com> <5519FC8B.6070602@oracle.com> <551A0B72.8090208@oracle.com> <275EB0FD-4915-494F-ABAF-B680AF06E8D6@oracle.com> <551B3DD6.9060807@oracle.com> Message-ID: <551C2D91.4010806@oracle.com> The push failed testing on Sparc with 64-bit fastdebug VM. Michael, could you look what could go wrong there? I will try to reproduce it on sparc. hotspot/test/compiler/codegen/7100757/Test7100757.java # Internal Error (/opt/jprt/T/P1/031135.vkozlov/s/hotspot/src/share/vm/opto/superword.cpp:1742), pid=9157, tid=23 # assert(_stk.length() == 0) failed: stk is empty # Stack: [0x0007fffeccc00000,0x0007fffeccd00000], sp=0x0007fffecccf6780, free space=985k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x15e78d4] void VMError::report_and_die()+0x6c4 V [libjvm.so+0xa762d0] void report_vm_error(const char*,int,const char*,const char*)+0x70 V [libjvm.so+0x14acd8c] bool SuperWord::construct_bb()+0x4c V [libjvm.so+0x14a2704] void SuperWord::SLP_extract()+0xc V [libjvm.so+0x10d8b6c] void PhaseIdealLoop::build_and_optimize(bool,bool)+0x1374 V [libjvm.so+0x9d9c70] void Compile::Optimize()+0x24c8 V [libjvm.so+0x9d0fdc] Compile::Compile #Nvariant 1(ciEnv*,C2Compiler*,ciMethod*,int,bool,bool,bool)+0x12fc V [libjvm.so+0x87595c] void C2Compiler::compile_method(ciEnv*,ciMethod*,int)+0xf4 Current CompileTask: C2: 1177 156 4 Test7100757::test (274 bytes) Thanks, Vladimir On 3/31/15 5:37 PM, Vladimir Kozlov wrote: > > On 3/31/15 5:36 PM, Christian Thalinger wrote: >> >>> On Mar 30, 2015, at 7:50 PM, Vladimir Kozlov >>> wrote: >>> >>> On 3/30/15 7:20 PM, Berg, Michael C wrote: >>>> Almost, it's more than that, there are missing components in long >>>> support in AVX2, so we only allow what superword can currently >>>> process safely and bypass the question of long support for >>>> reductions until AVX3, where support is complete enough to allow >>>> those forms of reductions. >>> >>> Okay. >>> >>>> Nils was the initial reviewer and sponsor, so Nils can you make >>>> another pass and comment on the current webrev for the review. >>> >>> Nils is out for few days. Christian looked on this too, let him do >>> second review. >> >> The only comment I have is this opening brace should be one the same >> line: >> >> + void SuperWord::packset_sort(int n) >> + { > > I will fix it before push. > > Thanks, > Vladimir > >> >> Otherwise this looks good. Thanks. >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Thanks, >>>> >>>> -Michael >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Monday, March 30, 2015 6:47 PM >>>> To: Berg, Michael C >>>> Cc: 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>> optimization ) >>>> >>>> Here is updated webrev which addressed these and other issues: >>>> >>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.01/ >>>> >>>> Michael, I noticed that .ad file does not have matched instructions >>>> for AddReductionVL. I assume it is because there is no avx3 yet. Right? >>>> >>>> Otherwise this look good to me. You need second review from an other >>>> Reviewer since changes are big. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/25/15 4:00 PM, Vladimir Kozlov wrote: >>>>> Please, ignore previous email. I screwed up Michael's email address. >>>>> >>>>> Hi Michael, >>>>> >>>>> I have few major concerns which you need to address. >>>>> >>>>> Adding new field _attr to Node class should be avoided - it will >>>>> increase significantly memory footprint of graph and not be used >>>>> frequently (vectorization is rare case). >>>>> >>>>> NodeFlags has only 16 bits and you used 2. And I don't see how >>>>> Flag_is_loop_carried_dep is used. >>>>> >>>>> All above goes to one question: why mark_reductions() is executed in >>>>> loopopts before each unroll and not during superword processing? >>>>> If you do mark_reductions() in superword you can use VectorSet to >>>>> indicate nodes which are reduction nodes. >>>>> >>>>> And the same for _attr. Why to store alignment in Node and not use >>>>> _node_info in packset_eval()? >>>>> >>>>> Small note. Instead of: >>>>> + Node *defNode = n->in(len - 1); >>>>> use: >>>>> + Node *defNode = n->in(LoopNode::LoopBackControl); >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> >>>>> On 3/25/15 1:09 PM, Berg, Michael C wrote: >>>>>> Christian/Nils: Any additional comments for the review, if not >>>>>> Thursday I will upload the final webrev with the requested change. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Michael >>>>>> >>>>>> *From:*Berg, Michael C >>>>>> *Sent:* Thursday, March 19, 2015 5:55 PM >>>>>> *To:* Christian Thalinger >>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>>>> *Subject:* RE: RFR(L): 8074981 (Integer/FP scalar reduction >>>>>> optimization ) >>>>>> >>>>>> Christian, yes we could rely on the base class definitions instead, >>>>>> since we are not augmenting arguments. >>>>>> >>>>>> I will remove the file changes after the review concludes in case >>>>>> there are any other modifications. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Michael >>>>>> >>>>>> *From:*Christian Thalinger [mailto:christian.thalinger at oracle.com] >>>>>> *Sent:* Thursday, March 19, 2015 3:52 PM >>>>>> *To:* Berg, Michael C >>>>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>>>> >>>>>> *Subject:* Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>>>> optimization ) >>>>>> >>>>>> On Mar 19, 2015, at 3:23 PM, Berg, Michael C >>>>>> > >>>>>> wrote: >>>>>> >>>>>> I have updated the webrev contents after some feedback(with >>>>>> no code >>>>>> changes), and Vladimir has placed it in location everyone can >>>>>> access. >>>>>> Anyone should be able to apply the patch or review the code from >>>>>> this info: >>>>>> >>>>>> http://cr.openjdk.java.net/~kvn/8074981/webrev.00/ >>>>>> >>>>>> src/cpu/x86/vm/macroAssembler_x86.hpp: >>>>>> >>>>>> Why do we need these methods? MacroAssembler extends Assembler. >>>>>> >>>>>> >>>>>> this replaces the JBS version of the webrev files for 8074981. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Michael >>>>>> >>>>>> -----Original Message----- >>>>>> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >>>>>> Sent: Thursday, March 19, 2015 12:55 AM >>>>>> To: Berg, Michael C >>>>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>>>> >>>>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>>>> optimization ) >>>>>> >>>>>> Michael, >>>>>> >>>>>> I've got it, thank you for explanation. >>>>>> >>>>>> Regards, >>>>>> Filipp. >>>>>> >>>>>> On Wed, Mar 18, 2015 at 5:53 PM, Berg, Michael C >>>>>> > >>>>>> wrote: >>>>>> >>>>>> Filipp, for large iteration loops, if I am taking your >>>>>> meaning >>>>>> correctly, you could not do that without splitting the >>>>>> loop and >>>>>> re-architecting it into a loop nest pair to manage the >>>>>> reduction >>>>>> components. Seems like the overhead from that scenario >>>>>> could >>>>>> create cost issues where reductions could actually hamper >>>>>> performance in small vector expressions. Right now we never >>>>>> degrade and generally benefit with the implementation as it >>>>>> stands with the reductions stitched into the vector unit >>>>>> computations directly. >>>>>> >>>>>> Regarding sub/div/etc: >>>>>> For now we have waived off on non-commuting operations >>>>>> like sub >>>>>> and div, they would have to be very strictly managed via >>>>>> pack-set placement. But the answer is yes we could >>>>>> support them. >>>>>> >>>>>> Thanks, >>>>>> -Michael >>>>>> >>>>>> -----Original Message----- >>>>>> From: Filipp Zhinkin [mailto:filipp.zhinkin at gmail.com] >>>>>> Sent: Wednesday, March 18, 2015 2:20 AM >>>>>> To: Berg, Michael C >>>>>> Cc: hotspot-compiler-dev at openjdk.java.net >>>>>> >>>>>> Subject: Re: RFR(L): 8074981 (Integer/FP scalar reduction >>>>>> optimization >>>>>> ) >>>>>> >>>>>> Hi Michael, >>>>>> >>>>>> thank you for contributing such a great improvement! >>>>>> >>>>>> Sorry if my question is silly, but I'm curious wouldn't >>>>>> it be >>>>>> better to replace integer scalar reduction variable with a >>>>>> vector "Rv" in loop's prologue, compile loop's body as a >>>>>> regular >>>>>> vectorized addition/multiplication, and reduce "Rv" to a >>>>>> scalar >>>>>> in loop's epilogue? >>>>>> >>>>>> Why you didn't add SubReduction* nodes? >>>>>> >>>>>> Best regards, >>>>>> Filipp. >>>>>> >>>>>> >>>>>> On Tue, Mar 17, 2015 at 12:40 AM, Berg, Michael C >>>>>> > >>>>>> wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> >>>>>> >>>>>> We would like to contribute the Integer/FP scalar >>>>>> reduction >>>>>> optimization from Intel. >>>>>> >>>>>> The contribution is referenced as Bug ID 8074981 as a >>>>>> performance >>>>>> enhancement. >>>>>> >>>>>> >>>>>> >>>>>> Please review this patch: >>>>>> >>>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 >>>>>> >>>>>> webrev: >>>>>> >>>>>> https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip >>>>>> >>>>>> >>>>>> >>>>>> The optimization achieves as much as 2.3x on integer >>>>>> reductions and >>>>>> supports float and double precision optimizations >>>>>> >>>>>> which also have significant optimization uplift an obey >>>>>> strict fp >>>>>> constraints. >>>>>> >>>>>> >>>>>> >>>>>> Nils Eliasson has offered to sponsor this patch. >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> >>>>>> -Michael >>>>>> >> From vladimir.x.ivanov at oracle.com Wed Apr 1 20:56:50 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 01 Apr 2015 23:56:50 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly Message-ID: <551C5B92.8060500@oracle.com> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8057967 HotSpot JITs inline very aggressively through CallSites. The optimistically treat CallSite target as constant, but record a nmethod dependency to invalidate the compiled code once CallSite target changes. Right now, such dependencies have call site class as a context. This context is too coarse and it leads to context pollution: if some CallSite target changes, VM needs to enumerate all nmethods which depends on call sites of such type. As performance analysis in the bug report shows, it can sum to significant amount of work. While working on the fix, I investigated 3 approaches: (1) unique context per call site (2) use CallSite target class (3) use a class the CallSite instance is linked to Considering call sites are ubiquitous (e.g. 10,000s on some octane benchmarks), loading a dedicated class for every call site is an overkill (even VM anonymous). CallSite target class (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is also not satisfactory, since it is a compiled LambdaForm VM anonymous class, which is heavily shared. It gets context pollution down, but still the overhead is quite high. So, I decided to focus on (3) and ended up with a mixture of (2) & (3). Comparing to other options, the complications of (3) are: - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so there should be some default context VM can use - CallSite instances can be shared and it shouldn't keep the context class from unloading; It motivated a scheme where CallSite context is initialized lazily and can change during lifetime. When CallSite is linked with an indy instruction, it's context is initialized. Usually, JIT sees CallSite instances with initialized context (since it reaches them through indy), but if it's not the case and there's no context yet, JIT sets it to "default context", which means "use target call site". I introduced CallSite$DependencyContext, which represents a nmethod dependency context and points (indirectly) to a Class used as a context. Context class is referenced through a phantom reference (sun.misc.Cleaner to simplify cleanup). Though it's impossible to extract referent using Reference.get(), VM can access it directly by reading corresponding field. Unlike other types of references, phantom references aren't cleared automatically. It allows VM to access context class until cleanup is performed. And cleanup resets the context to NULL, in addition to invalidating all relevant dependencies. There are 3 context states a CallSite instance can be in: (1) NULL: no depedencies (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in call site target class (3) DependencyContext for some class: dependencies are stored on the class DependencyContext instance points to Every CallSite starts w/o a context (1) and then lazily gets one ((2) or (3) depending on the situation). State transitions: (1->3): When a CallSite w/o a context (1) is linked with some indy call site, it's owner is recorded as a context (3). (1->2): When JIT needs to record a dependency on a target of a CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and uses target class to store the dependency. (3->1): When context class becomes unreachable, a cleanup hook invalidates all dependencies on that CallSite and resets the context to NULL (1). Only (3->1) requires dependency invalidation, because there are no depedencies in (1) and (2->1) isn't performed. (1->3) is done in Java code (CallSite.initContext) and (1->2) is performed in VM (ciCallSite::get_context()). The updates are performed by CAS, so there's no need in additional synchronization. Other operations on VM side are volatile (to play well with Java code) and performed with Compile_lock held (to avoid races between VM operations). Some statistics: Box2D, latest jdk9-dev - CallSite instances: ~22000 - invalidated nmethods due to CallSite target changes: ~60 - checked call_site_target_value dependencies: - before the fix: ~1,600,000 - after the fix: ~600 Testing: - dedicated test which excercises different state transitions - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn Thanks! Best regards, Vladimir Ivanov From john.r.rose at oracle.com Thu Apr 2 02:27:00 2015 From: john.r.rose at oracle.com (John Rose) Date: Wed, 1 Apr 2015 19:27:00 -0700 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551C5B92.8060500@oracle.com> References: <551C5B92.8060500@oracle.com> Message-ID: On Apr 1, 2015, at 1:56 PM, Vladimir Ivanov wrote: > > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ > https://bugs.openjdk.java.net/browse/JDK-8057967 Impressive work. Question: How common is state 2 (context-free CS) compared to state 3 (indy-bound CS)? And is state 2 well tested by Box2D? I recommend putting CONTEXT_OFFSET into CallSite, not the nested class. For one thing, your getDeclaredField call will fail (I think) with a security manager installed. You can load it up where TARGET_OFFSET is initialized. I haven't looked at the JVM changes yet, and I don't understand the cleaner, yet. Can a call site target class change as a result of LF recompiling or customization? If so, won't that cause a risk of dropped dependencies? ? John From roland.westrelin at oracle.com Thu Apr 2 07:39:36 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Thu, 2 Apr 2015 09:39:36 +0200 Subject: RFR(XS): 8076094: CheckCastPPNode::Value() has outdated logic for constants In-Reply-To: <55142D20.3040407@oracle.com> References: <55142D20.3040407@oracle.com> Message-ID: Thanks for the reviews Vladimir & Vladimir. Roland. > On Mar 26, 2015, at 5:00 PM, Vladimir Ivanov wrote: > > Looks good. > > Best regards, > Vladimir Ivanov > > On 3/26/15 6:02 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8076094/webrev.00/ >> >> I noticed this logic in CheckCastPPNode::Value() that doesn?t seem to make sense and asked John about it. He said it might be outdated. I removed it and had it go through testing and saw no problem. I propose we remove it. >> >> Roland. >> From peter.levart at gmail.com Thu Apr 2 08:02:17 2015 From: peter.levart at gmail.com (Peter Levart) Date: Thu, 02 Apr 2015 10:02:17 +0200 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551C5B92.8060500@oracle.com> References: <551C5B92.8060500@oracle.com> Message-ID: <551CF789.9080607@gmail.com> Hi Vladimir, Would it be possible for CallSite.context to hold the Cleaner instance itself (without indirection through DependencyContext)? DEFAULT_CONTEXT would then be a Cleaner instance that references some "default" Class object (for example DefaultContext.class that serves no other purpose). Regards, Peter On 04/01/2015 10:56 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ > https://bugs.openjdk.java.net/browse/JDK-8057967 > > HotSpot JITs inline very aggressively through CallSites. The > optimistically treat CallSite target as constant, but record a nmethod > dependency to invalidate the compiled code once CallSite target changes. > > Right now, such dependencies have call site class as a context. This > context is too coarse and it leads to context pollution: if some > CallSite target changes, VM needs to enumerate all nmethods which > depends on call sites of such type. > > As performance analysis in the bug report shows, it can sum to > significant amount of work. > > While working on the fix, I investigated 3 approaches: > (1) unique context per call site > (2) use CallSite target class > (3) use a class the CallSite instance is linked to > > Considering call sites are ubiquitous (e.g. 10,000s on some octane > benchmarks), loading a dedicated class for every call site is an > overkill (even VM anonymous). > > CallSite target class > (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is > also not satisfactory, since it is a compiled LambdaForm VM anonymous > class, which is heavily shared. It gets context pollution down, but > still the overhead is quite high. > > So, I decided to focus on (3) and ended up with a mixture of (2) & (3). > > Comparing to other options, the complications of (3) are: > - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so > there should be some default context VM can use > > - CallSite instances can be shared and it shouldn't keep the context > class from unloading; > > It motivated a scheme where CallSite context is initialized lazily and > can change during lifetime. When CallSite is linked with an indy > instruction, it's context is initialized. Usually, JIT sees CallSite > instances with initialized context (since it reaches them through > indy), but if it's not the case and there's no context yet, JIT sets > it to "default context", which means "use target call site". > > I introduced CallSite$DependencyContext, which represents a nmethod > dependency context and points (indirectly) to a Class used as a > context. > > Context class is referenced through a phantom reference > (sun.misc.Cleaner to simplify cleanup). Though it's impossible to > extract referent using Reference.get(), VM can access it directly by > reading corresponding field. Unlike other types of references, phantom > references aren't cleared automatically. It allows VM to access > context class until cleanup is performed. And cleanup resets the > context to NULL, in addition to invalidating all relevant dependencies. > > There are 3 context states a CallSite instance can be in: > (1) NULL: no depedencies > (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in > call site target class > (3) DependencyContext for some class: dependencies are stored on the > class DependencyContext instance points to > > Every CallSite starts w/o a context (1) and then lazily gets one ((2) > or (3) depending on the situation). > > State transitions: > (1->3): When a CallSite w/o a context (1) is linked with some indy > call site, it's owner is recorded as a context (3). > > (1->2): When JIT needs to record a dependency on a target of a > CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and > uses target class to store the dependency. > > (3->1): When context class becomes unreachable, a cleanup hook > invalidates all dependencies on that CallSite and resets the context > to NULL (1). > > Only (3->1) requires dependency invalidation, because there are no > depedencies in (1) and (2->1) isn't performed. > > (1->3) is done in Java code (CallSite.initContext) and (1->2) is > performed in VM (ciCallSite::get_context()). The updates are performed > by CAS, so there's no need in additional synchronization. Other > operations on VM side are volatile (to play well with Java code) and > performed with Compile_lock held (to avoid races between VM operations). > > Some statistics: > Box2D, latest jdk9-dev > - CallSite instances: ~22000 > > - invalidated nmethods due to CallSite target changes: ~60 > > - checked call_site_target_value dependencies: > - before the fix: ~1,600,000 > - after the fix: ~600 > > Testing: > - dedicated test which excercises different state transitions > - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn > > Thanks! > > Best regards, > Vladimir Ivanov From aleksey.shipilev at oracle.com Thu Apr 2 16:10:53 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Thu, 02 Apr 2015 19:10:53 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551C5B92.8060500@oracle.com> References: <551C5B92.8060500@oracle.com> Message-ID: <551D6A0D.8090500@oracle.com> On 04/01/2015 11:56 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ > https://bugs.openjdk.java.net/browse/JDK-8057967 Glad to see this finally addressed, thanks! I did not look through the code changes, but ran Octane on my configuration. As expected, Typescript had improved substantially. Other benchmarks are not affected much. This in line with the performance analysis done for the original bug report. Baseline: Benchmark Mode Cnt Score Error Units Box2D.test ss 20 4454.677 ? 345.807 ms/op CodeLoad.test ss 20 4784.299 ? 370.658 ms/op Crypto.test ss 20 878.395 ? 87.918 ms/op DeltaBlue.test ss 20 502.182 ? 52.362 ms/op EarleyBoyer.test ss 20 2250.508 ? 273.924 ms/op Gbemu.test ss 20 5893.102 ? 656.036 ms/op Mandreel.test ss 20 9323.484 ? 825.801 ms/op NavierStokes.test ss 20 657.608 ? 41.212 ms/op PdfJS.test ss 20 3829.534 ? 353.702 ms/op Raytrace.test ss 20 1202.826 ? 166.795 ms/op Regexp.test ss 20 156.782 ? 20.992 ms/op Richards.test ss 20 324.256 ? 35.874 ms/op Splay.test ss 20 179.660 ? 34.120 ms/op Typescript.test ss 20 40.537 ? 2.457 s/op Patched: Benchmark Mode Cnt Score Error Units Box2D.test ss 20 4306.198 ? 376.030 ms/op CodeLoad.test ss 20 4881.635 ? 395.585 ms/op Crypto.test ss 20 823.551 ? 106.679 ms/op DeltaBlue.test ss 20 490.557 ? 41.705 ms/op EarleyBoyer.test ss 20 2299.763 ? 270.961 ms/op Gbemu.test ss 20 5612.868 ? 414.052 ms/op Mandreel.test ss 20 8616.735 ? 825.813 ms/op NavierStokes.test ss 20 640.722 ? 28.035 ms/op PdfJS.test ss 20 4139.396 ? 373.580 ms/op Raytrace.test ss 20 1227.632 ? 151.088 ms/op Regexp.test ss 20 169.246 ? 34.055 ms/op Richards.test ss 20 331.824 ? 32.706 ms/op Splay.test ss 20 168.479 ? 23.512 ms/op Typescript.test ss 20 31.181 ? 1.790 s/op The offending profile branch (Universe::flush_dependents_on) is also gone, which explains the performance improvement. Thanks, -Aleksey. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Thu Apr 2 16:17:25 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 02 Apr 2015 19:17:25 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: References: <551C5B92.8060500@oracle.com> Message-ID: <551D6B95.5030109@oracle.com> John, Peter, Thanks a lot for the feedback! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/ http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/ > Question: How common is state 2 (context-free CS) compared to state 3 (indy-bound CS)? It's quite rare (<2%). For Box2D the stats are: total # of call sites instantiated: 22000 (1): ~1800 (stay uninitialized) (2): ~19900 (3): ~300 > And is state 2 well tested by Box2D? No, it's not. But: (1) I wrote a focused test on different context state transitions (see test/compiler/jsr292/CallSiteDepContextTest.java); and (2) artificially stressed the logic by eagerly initializing the context to DEFAULT_CONTEXT. I had (2)->(3) transition (DEF_CTX => bound Class context) at some point, but decided to get rid of it. IMO the price of recompilation (recorded dependencies should be invalidated during context migration) is too much for reduced number of dependencies enumerated. > I recommend putting CONTEXT_OFFSET into CallSite, not the nested class. > For one thing, your getDeclaredField call will fail (I think) with a security manager installed. > You can load it up where TARGET_OFFSET is initialized. Since I removed DependencyContext, I moved CONTEXT_OFFSET to CallSite. BTW why do you think security manager was the problem? (1) Class.getDeclaredField() is caller-sensitive; and (2) DependencyContext was eagerly initialized with CallSite (see UNSAFE.ensureClassInitialized() in original version). > > I haven't looked at the JVM changes yet, and I don't understand the cleaner, yet. > Can a call site target class change as a result of LF recompiling or customization? > If so, won't that cause a risk of dropped dependencies? Good point! It's definitely a problem I haven't envisioned. Ok, I completely removed call site target class logic and use DefaultContext class instead. On 4/2/15 11:02 AM, Peter Levart wrote:> Hi Vladimir, > > Would it be possible for CallSite.context to hold the Cleaner instance > itself (without indirection through DependencyContext)? > > DEFAULT_CONTEXT would then be a Cleaner instance that references some > "default" Class object (for example DefaultContext.class that serves no > other purpose). Good idea! I eliminated the indirection as you suggest. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Thu Apr 2 16:26:20 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 02 Apr 2015 19:26:20 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551D6A0D.8090500@oracle.com> References: <551C5B92.8060500@oracle.com> <551D6A0D.8090500@oracle.com> Message-ID: <551D6DAC.8030607@oracle.com> Aleksey, thanks a lot for the performance evaluation of the fix! Best regards, Vladimir Ivanov On 4/2/15 7:10 PM, Aleksey Shipilev wrote: > On 04/01/2015 11:56 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ >> https://bugs.openjdk.java.net/browse/JDK-8057967 > > Glad to see this finally addressed, thanks! > > I did not look through the code changes, but ran Octane on my > configuration. As expected, Typescript had improved substantially. Other > benchmarks are not affected much. This in line with the performance > analysis done for the original bug report. > > Baseline: > > Benchmark Mode Cnt Score Error Units > Box2D.test ss 20 4454.677 ? 345.807 ms/op > CodeLoad.test ss 20 4784.299 ? 370.658 ms/op > Crypto.test ss 20 878.395 ? 87.918 ms/op > DeltaBlue.test ss 20 502.182 ? 52.362 ms/op > EarleyBoyer.test ss 20 2250.508 ? 273.924 ms/op > Gbemu.test ss 20 5893.102 ? 656.036 ms/op > Mandreel.test ss 20 9323.484 ? 825.801 ms/op > NavierStokes.test ss 20 657.608 ? 41.212 ms/op > PdfJS.test ss 20 3829.534 ? 353.702 ms/op > Raytrace.test ss 20 1202.826 ? 166.795 ms/op > Regexp.test ss 20 156.782 ? 20.992 ms/op > Richards.test ss 20 324.256 ? 35.874 ms/op > Splay.test ss 20 179.660 ? 34.120 ms/op > Typescript.test ss 20 40.537 ? 2.457 s/op > > Patched: > > Benchmark Mode Cnt Score Error Units > Box2D.test ss 20 4306.198 ? 376.030 ms/op > CodeLoad.test ss 20 4881.635 ? 395.585 ms/op > Crypto.test ss 20 823.551 ? 106.679 ms/op > DeltaBlue.test ss 20 490.557 ? 41.705 ms/op > EarleyBoyer.test ss 20 2299.763 ? 270.961 ms/op > Gbemu.test ss 20 5612.868 ? 414.052 ms/op > Mandreel.test ss 20 8616.735 ? 825.813 ms/op > NavierStokes.test ss 20 640.722 ? 28.035 ms/op > PdfJS.test ss 20 4139.396 ? 373.580 ms/op > Raytrace.test ss 20 1227.632 ? 151.088 ms/op > Regexp.test ss 20 169.246 ? 34.055 ms/op > Richards.test ss 20 331.824 ? 32.706 ms/op > Splay.test ss 20 168.479 ? 23.512 ms/op > Typescript.test ss 20 31.181 ? 1.790 s/op > > The offending profile branch (Universe::flush_dependents_on) is also > gone, which explains the performance improvement. > > Thanks, > -Aleksey. > From john.r.rose at oracle.com Thu Apr 2 20:21:03 2015 From: john.r.rose at oracle.com (John Rose) Date: Thu, 2 Apr 2015 13:21:03 -0700 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551D6B95.5030109@oracle.com> References: <551C5B92.8060500@oracle.com> <551D6B95.5030109@oracle.com> Message-ID: <606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com> On Apr 2, 2015, at 9:17 AM, Vladimir Ivanov wrote: > >> >> I recommend putting CONTEXT_OFFSET into CallSite, not the nested class. >> For one thing, your getDeclaredField call will fail (I think) with a security manager installed. >> You can load it up where TARGET_OFFSET is initialized. > Since I removed DependencyContext, I moved CONTEXT_OFFSET to CallSite. > > BTW why do you think security manager was the problem? (1) Class.getDeclaredField() is caller-sensitive; and (2) DependencyContext was eagerly initialized with CallSite (see UNSAFE.ensureClassInitialized() in original version). CallSite$DependencyContext and CallSite are distinct classes. At the JVM level they cannot access each others' private members. So if DependencyContext wants to reflect a private field from CallSite, there will be extra security checks. These sometimes fail, as in: https://bugs.openjdk.java.net/browse/JDK-7050328 ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Thu Apr 2 22:08:10 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 03 Apr 2015 01:08:10 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com> References: <551C5B92.8060500@oracle.com> <551D6B95.5030109@oracle.com> <606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com> Message-ID: <551DBDCA.4020100@oracle.com> John, Thanks for the clarification! >> BTW why do you think security manager was the problem? (1) >> Class.getDeclaredField() is caller-sensitive; and (2) >> DependencyContext was eagerly initialized with CallSite (see >> UNSAFE.ensureClassInitialized() in original version). > > CallSite$DependencyContext and CallSite are distinct classes. > At the JVM level they cannot access each others' private members. > So if DependencyContext wants to reflect a private field from CallSite, > there will be extra security checks. These sometimes fail, as in: Member access permission check isn't performed if caller and member owner class are loaded by the same class loader (which is the case with CallSite$DependencyContext and CallSite classes). jdk/src/java.base/share/classes/java/lang/Class.java: @CallerSensitive public Field getDeclaredField(String name) throws NoSuchFieldException, SecurityException { checkMemberAccess(Member.DECLARED, Reflection.getCallerClass(), true); ... private void checkMemberAccess(int which, Class caller, boolean checkProxyInterfaces) { final SecurityManager s = System.getSecurityManager(); if (s != null) { final ClassLoader ccl = ClassLoader.getClassLoader(caller); final ClassLoader cl = getClassLoader0(); if (which != Member.PUBLIC) { if (ccl != cl) { s.checkPermission(SecurityConstants.CHECK_MEMBER_ACCESS_PERMISSION); } Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Thu Apr 2 22:33:45 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 02 Apr 2015 15:33:45 -0700 Subject: RFR(XS) 8076523: assert(((ABS(iv_adjustment_in_bytes) % elt_size) == 0)) fails in superword.cpp Message-ID: <551DC3C9.7030704@oracle.com> http://cr.openjdk.java.net/~kvn/8076523/webrev/ https://bugs.openjdk.java.net/browse/JDK-8076523 The problem was caused by JDK-8026049 changes. Vectorization assumes that offset in array is aligned to size of memory operations (which access element of array). With UseUnalignedAccesses Long load/store operations could be used to access byte[] array without alignment to sizeof(jlong). Vectorization has code which verifies alignment - it should be adjusted to check that offset % mem_oper_size == 0. Fix tested with failed tests and JPRT. Thanks, Vladimir From john.r.rose at oracle.com Thu Apr 2 22:57:41 2015 From: john.r.rose at oracle.com (John Rose) Date: Thu, 2 Apr 2015 15:57:41 -0700 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551DBDCA.4020100@oracle.com> References: <551C5B92.8060500@oracle.com> <551D6B95.5030109@oracle.com> <606B777D-C99C-4F28-9E0B-A0A032659B71@oracle.com> <551DBDCA.4020100@oracle.com> Message-ID: <27F0D8DE-FF78-4B41-B5DF-2C985705CC96@oracle.com> On Apr 2, 2015, at 3:08 PM, Vladimir Ivanov wrote: > > Member access permission check isn't performed if caller and member owner class are loaded by the same class loader (which is the case with CallSite$DependencyContext and CallSite classes). Heh! And I thought I had compiled the reflection logic to gray matter. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Thu Apr 2 22:59:08 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 2 Apr 2015 15:59:08 -0700 Subject: RFR(XS) 8076523: assert(((ABS(iv_adjustment_in_bytes) % elt_size) == 0)) fails in superword.cpp In-Reply-To: <551DC3C9.7030704@oracle.com> References: <551DC3C9.7030704@oracle.com> Message-ID: <623F272A-2ACF-4401-BCBC-71B99394C533@oracle.com> Looks good. igor > On Apr 2, 2015, at 3:33 PM, Vladimir Kozlov wrote: > > http://cr.openjdk.java.net/~kvn/8076523/webrev/ > https://bugs.openjdk.java.net/browse/JDK-8076523 > > The problem was caused by JDK-8026049 changes. Vectorization assumes that offset in array is aligned to size of memory operations (which access element of array). With UseUnalignedAccesses Long load/store operations could be used to access byte[] array without alignment to sizeof(jlong). > > Vectorization has code which verifies alignment - it should be adjusted to check that offset % mem_oper_size == 0. > > Fix tested with failed tests and JPRT. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Thu Apr 2 23:01:37 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 02 Apr 2015 16:01:37 -0700 Subject: RFR(XS) 8076523: assert(((ABS(iv_adjustment_in_bytes) % elt_size) == 0)) fails in superword.cpp In-Reply-To: <623F272A-2ACF-4401-BCBC-71B99394C533@oracle.com> References: <551DC3C9.7030704@oracle.com> <623F272A-2ACF-4401-BCBC-71B99394C533@oracle.com> Message-ID: <551DCA51.5020708@oracle.com> Thank you, Igor Vladimir On 4/2/15 3:59 PM, Igor Veresov wrote: > Looks good. > > igor > >> On Apr 2, 2015, at 3:33 PM, Vladimir Kozlov wrote: >> >> http://cr.openjdk.java.net/~kvn/8076523/webrev/ >> https://bugs.openjdk.java.net/browse/JDK-8076523 >> >> The problem was caused by JDK-8026049 changes. Vectorization assumes that offset in array is aligned to size of memory operations (which access element of array). With UseUnalignedAccesses Long load/store operations could be used to access byte[] array without alignment to sizeof(jlong). >> >> Vectorization has code which verifies alignment - it should be adjusted to check that offset % mem_oper_size == 0. >> >> Fix tested with failed tests and JPRT. >> >> Thanks, >> Vladimir > From igor.veresov at oracle.com Mon Apr 6 22:46:02 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 6 Apr 2015 15:46:02 -0700 Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on some SPARC systems is incorrect Message-ID: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both. Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ Thanks, igor From vladimir.kozlov at oracle.com Mon Apr 6 22:59:47 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 06 Apr 2015 15:59:47 -0700 Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on some SPARC systems is incorrect In-Reply-To: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> Message-ID: <55230FE3.2070301@oracle.com> Good. Thanks, Vladimir On 4/6/15 3:46 PM, Igor Veresov wrote: > The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both. > > Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ > > Thanks, > igor > From igor.veresov at oracle.com Mon Apr 6 23:59:35 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 6 Apr 2015 16:59:35 -0700 Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on some SPARC systems is incorrect In-Reply-To: <55230FE3.2070301@oracle.com> References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> <55230FE3.2070301@oracle.com> Message-ID: Thanks, Vladimir! igor > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 4/6/15 3:46 PM, Igor Veresov wrote: >> The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both. >> >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ >> >> Thanks, >> igor >> From vitalyd at gmail.com Tue Apr 7 00:13:30 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 6 Apr 2015 20:13:30 -0400 Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on some SPARC systems is incorrect In-Reply-To: References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> <55230FE3.2070301@oracle.com> Message-ID: Hi Igor, // One the first visit determine the name of the l2 cache line size property and memoize it Typo - should be "On the first ..." I presume. Also, that code doesn't just memoize the property name, it also appears to actually probe for and set the value - worthwhile to update the comment? sent from my phone On Apr 6, 2015 8:00 PM, "Igor Veresov" wrote: > Thanks, Vladimir! > > igor > > > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov > wrote: > > > > Good. > > > > Thanks, > > Vladimir > > > > On 4/6/15 3:46 PM, Igor Veresov wrote: > >> The L2 data cache line size property can be name either > "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both. > >> > >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ > >> > >> Thanks, > >> igor > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Tue Apr 7 01:35:37 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 7 Apr 2015 01:35:37 +0000 Subject: RFR 8076276 support for AVX512 Message-ID: Hi Folks, We (Intel) would like to contribute initial support for AVX512 (EVEX encoding, new register support, new ISA support, etc) for EVEX enabled microarchitectures. The contribution is referenced as Bug ID 8076276 as a performance enhancement. Please review this patch and comment as needed: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276 webrev: http://cr.openjdk.java.net/~kvn/8076276/webrev Superword optimizations covered on the vectorization path experience as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops. Vladimir Koslov has offered to sponsor this patch. -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Apr 7 01:53:20 2015 From: igor.veresov at oracle.com (Igor Veresov) Date: Mon, 6 Apr 2015 18:53:20 -0700 Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on some SPARC systems is incorrect In-Reply-To: References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> <55230FE3.2070301@oracle.com> Message-ID: <6D50B83F-FA80-48B4-BBD8-6A4EC41A3D17@oracle.com> Vitaly, Thanks, others already noted the typo. The comment means that we memoize the result of bruteforcing of the name of the property. I?m quite sure what is the confusion? What would you like it to say? igor > On Apr 6, 2015, at 5:13 PM, Vitaly Davidovich wrote: > > Hi Igor, > > // One the first visit determine the name of the l2 cache line size property and memoize it > > Typo - should be "On the first ..." I presume. > > Also, that code doesn't just memoize the property name, it also appears to actually probe for and set the value - worthwhile to update the comment? > > sent from my phone > > On Apr 6, 2015 8:00 PM, "Igor Veresov" > wrote: > Thanks, Vladimir! > > igor > > > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov > wrote: > > > > Good. > > > > Thanks, > > Vladimir > > > > On 4/6/15 3:46 PM, Igor Veresov wrote: > >> The L2 data cache line size property can be name either "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both. > >> > >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ > >> > >> Thanks, > >> igor > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Apr 7 02:12:41 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 6 Apr 2015 22:12:41 -0400 Subject: RFR(S) 8076968: PICL based initialization of L2 cache line size on some SPARC systems is incorrect In-Reply-To: <6D50B83F-FA80-48B4-BBD8-6A4EC41A3D17@oracle.com> References: <0CE3FCF6-E68E-4011-B102-D041536370AB@oracle.com> <55230FE3.2070301@oracle.com> <6D50B83F-FA80-48B4-BBD8-6A4EC41A3D17@oracle.com> Message-ID: I initially read the comment as meaning it memoizes the name of the property (and not the value) but re-reading it again, I think it's fine. sent from my phone On Apr 6, 2015 9:53 PM, "Igor Veresov" wrote: > Vitaly, > > Thanks, others already noted the typo. > The comment means that we memoize the result of bruteforcing of the name > of the property. I?m quite sure what is the confusion? What would you like > it to say? > > igor > > On Apr 6, 2015, at 5:13 PM, Vitaly Davidovich wrote: > > Hi Igor, > > // One the first visit determine the name of the l2 cache line size > property and memoize it > > Typo - should be "On the first ..." I presume. > > Also, that code doesn't just memoize the property name, it also appears to > actually probe for and set the value - worthwhile to update the comment? > > sent from my phone > On Apr 6, 2015 8:00 PM, "Igor Veresov" wrote: > >> Thanks, Vladimir! >> >> igor >> >> > On Apr 6, 2015, at 3:59 PM, Vladimir Kozlov >> wrote: >> > >> > Good. >> > >> > Thanks, >> > Vladimir >> > >> > On 4/6/15 3:46 PM, Igor Veresov wrote: >> >> The L2 data cache line size property can be name either >> "l2-cache-line-size? or ?l2-dcache-line-size?. We have to try them both. >> >> >> >> Webrev: http://cr.openjdk.java.net/~iveresov/8076968/webrev.00/ >> >> >> >> Thanks, >> >> igor >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.c.berg at intel.com Tue Apr 7 18:07:42 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Tue, 7 Apr 2015 18:07:42 +0000 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: References: Message-ID: Please ignore this one its already checked in... From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Berg, Michael C Sent: Monday, March 16, 2015 2:18 PM To: hotspot-compiler-dev at openjdk.java.net Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) Hi All, We would like to contribute the Integer/FP scalar reduction optimization from Intel. The contribution is referenced as Bug ID 8074981 as a performance enhancement. Please review this patch: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip The optimization achieves as much as 2.3x on integer reductions and supports float and double precision optimizations which also have significant optimization uplift an obey strict fp constraints. Nils Eliasson has offered to sponsor this patch. Thanks, -Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Apr 7 18:30:37 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 7 Apr 2015 14:30:37 -0400 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: References: Message-ID: Hi Michael/Vladimir, Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being) backported to java 8? Thanks On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C wrote: > Please ignore this one its already checked in? > > > > *From:* hotspot-compiler-dev [mailto: > hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of *Berg, > Michael C > *Sent:* Monday, March 16, 2015 2:18 PM > *To:* hotspot-compiler-dev at openjdk.java.net > *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization ) > > > > Hi All, > > > > We would like to contribute the Integer/FP scalar reduction optimization from > Intel. > > The contribution is referenced as Bug ID 8074981 as a performance > enhancement. > > > > Please review this patch: > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 > > webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip > > > > The optimization achieves as much as 2.3x on integer reductions and > supports float and double precision optimizations > > which also have significant optimization uplift an obey strict fp > constraints. > > > > Nils Eliasson has offered to sponsor this patch. > > > > Thanks, > > > > -Michael > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 7 18:38:42 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 07 Apr 2015 11:38:42 -0700 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: References: Message-ID: <55242432.9010607@oracle.com> Currently it is only jdk9. There are no plans to backport to 8u. The thinking is that we will get jdk9 released when this hardware will be widely available. Regards, Vladimir On 4/7/15 11:30 AM, Vitaly Davidovich wrote: > Hi Michael/Vladimir, > > Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being) backported to java 8? > > Thanks > > On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C > wrote: > > Please ignore this one its already checked in?____ > > __ __ > > *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net > ] *On Behalf Of *Berg, Michael C > *Sent:* Monday, March 16, 2015 2:18 PM > *To:* hotspot-compiler-dev at openjdk.java.net > *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )____ > > __ __ > > Hi All,____ > > __ __ > > We would like to contribute the Integer/FP scalar reduction optimization from Intel.____ > > The contribution is referenced as Bug ID 8074981 as a performance enhancement. ____ > > __ __ > > Please review this patch:____ > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 ____ > > webrev: https://bugs.openjdk.java.net/secure/attachment/26101/webrev.zip ____ > > __ __ > > The optimization achieves as much as 2.3x on integer reductions and supports float and double precision > optimizations____ > > which also have significant optimization uplift an obey strict fp constraints.____ > > __ __ > > Nils Eliasson has offered to sponsor this patch.____ > > __ __ > > Thanks,____ > > __ __ > > -Michael____ > > __ __ > > From vitalyd at gmail.com Tue Apr 7 18:55:17 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 7 Apr 2015 14:55:17 -0400 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: <55242432.9010607@oracle.com> References: <55242432.9010607@oracle.com> Message-ID: Ok, thanks. That makes sense for avx512 support, but I think having Michael's changes from this thread sooner would be nice as it's quite likely that users are already running java 8 on hardware where this may have benefit. Java 9 is still ways away, and even when it's released, the migration process is not always quick (depending on the nature of major changes). But, if backporting it is messy, it's probably not worth it. On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov wrote: > Currently it is only jdk9. There are no plans to backport to 8u. > The thinking is that we will get jdk9 released when this hardware will be > widely available. > > Regards, > Vladimir > > On 4/7/15 11:30 AM, Vitaly Davidovich wrote: > >> Hi Michael/Vladimir, >> >> Out of curiosity, is this change and the out-for-review avx512 one going >> to be (or planned on being) backported to java 8? >> >> Thanks >> >> On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C > > wrote: >> >> Please ignore this one its already checked in?____ >> >> __ __ >> >> *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev- >> bounces at openjdk.java.net >> ] *On Behalf >> Of *Berg, Michael C >> *Sent:* Monday, March 16, 2015 2:18 PM >> *To:* hotspot-compiler-dev at openjdk.java.net > hotspot-compiler-dev at openjdk.java.net> >> *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization >> )____ >> >> __ __ >> >> Hi All,____ >> >> __ __ >> >> We would like to contribute the Integer/FP scalar reduction >> optimization from Intel.____ >> >> The contribution is referenced as Bug ID 8074981 as a performance >> enhancement. ____ >> >> __ __ >> >> Please review this patch:____ >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8074981 ____ >> >> webrev: https://bugs.openjdk.java.net/secure/attachment/26101/ >> webrev.zip ____ >> >> __ __ >> >> The optimization achieves as much as 2.3x on integer reductions and >> supports float and double precision >> optimizations____ >> >> which also have significant optimization uplift an obey strict fp >> constraints.____ >> >> __ __ >> >> Nils Eliasson has offered to sponsor this patch.____ >> >> __ __ >> >> Thanks,____ >> >> __ __ >> >> -Michael____ >> >> __ __ >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 7 19:04:38 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 07 Apr 2015 12:04:38 -0700 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: References: <55242432.9010607@oracle.com> Message-ID: <55242A46.7020108@oracle.com> We want to motivate people to migrate to new releases :) If you mean loop reduction vectorization we can consider it after it is tested for some time in jdk9. Vladimir On 4/7/15 11:55 AM, Vitaly Davidovich wrote: > Ok, thanks. That makes sense for avx512 support, but I think having Michael's changes from this thread sooner would be > nice as it's quite likely that users are already running java 8 on hardware where this may have benefit. Java 9 is > still ways away, and even when it's released, the migration process is not always quick (depending on the nature of > major changes). But, if backporting it is messy, it's probably not worth it. > > On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov > wrote: > > Currently it is only jdk9. There are no plans to backport to 8u. > The thinking is that we will get jdk9 released when this hardware will be widely available. > > Regards, > Vladimir > > On 4/7/15 11:30 AM, Vitaly Davidovich wrote: > > Hi Michael/Vladimir, > > Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being) backported > to java 8? > > Thanks > > On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C > >> wrote: > > Please ignore this one its already checked in?____ > > __ __ > > *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-__bounces at openjdk.java.net > > >] *On Behalf Of *Berg, Michael C > *Sent:* Monday, March 16, 2015 2:18 PM > *To:* hotspot-compiler-dev at openjdk.__java.net > > > *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )____ > > __ __ > > Hi All,____ > > __ __ > > We would like to contribute the Integer/FP scalar reduction optimization from Intel.____ > > The contribution is referenced as Bug ID 8074981 as a performance enhancement. ____ > > __ __ > > Please review this patch:____ > > Bug-id: https://bugs.openjdk.java.net/__browse/JDK-8074981 > ____ > > webrev: https://bugs.openjdk.java.net/__secure/attachment/26101/__webrev.zip > ____ > > __ __ > > The optimization achieves as much as 2.3x on integer reductions and supports float and double precision > optimizations____ > > which also have significant optimization uplift an obey strict fp constraints.____ > > __ __ > > Nils Eliasson has offered to sponsor this patch.____ > > __ __ > > Thanks,____ > > __ __ > > -Michael____ > > __ __ > > > From michael.haupt at oracle.com Tue Apr 7 19:11:30 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Tue, 7 Apr 2015 21:11:30 +0200 Subject: RFR (S): 8076461: JSR292: remove unused native and constants Message-ID: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> Dear all, please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ Tested with JPRT, HotSpot testset. Thanks, Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | HotSpot Compiler Team Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Tue Apr 7 19:12:16 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 7 Apr 2015 15:12:16 -0400 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: <55242A46.7020108@oracle.com> References: <55242432.9010607@oracle.com> <55242A46.7020108@oracle.com> Message-ID: Oh, the motivation is there! :) However, it's not always a quick process even if everyone's motivated as there may be changes of consequence. As a small example, java 8 virtual memory charge is significantly higher than java 7 due to metaspace vs permgen differences. In some cases, this now requires tweaking java 8 settings in order to keep things running smoothly. With a big enough codebase, such migrations are never as quick as one would hope. At any rate, yes, I meant loop reduction vectorization. It seems like a fairly self-contained change which should be relatively painless to backport, hence my inquiry. On Tue, Apr 7, 2015 at 3:04 PM, Vladimir Kozlov wrote: > We want to motivate people to migrate to new releases :) > If you mean loop reduction vectorization we can consider it after it is > tested for some time in jdk9. > > Vladimir > > On 4/7/15 11:55 AM, Vitaly Davidovich wrote: > >> Ok, thanks. That makes sense for avx512 support, but I think having >> Michael's changes from this thread sooner would be >> nice as it's quite likely that users are already running java 8 on >> hardware where this may have benefit. Java 9 is >> still ways away, and even when it's released, the migration process is >> not always quick (depending on the nature of >> major changes). But, if backporting it is messy, it's probably not worth >> it. >> >> On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com > wrote: >> >> Currently it is only jdk9. There are no plans to backport to 8u. >> The thinking is that we will get jdk9 released when this hardware >> will be widely available. >> >> Regards, >> Vladimir >> >> On 4/7/15 11:30 AM, Vitaly Davidovich wrote: >> >> Hi Michael/Vladimir, >> >> Out of curiosity, is this change and the out-for-review avx512 >> one going to be (or planned on being) backported >> to java 8? >> >> Thanks >> >> On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C < >> michael.c.berg at intel.com >> > com>>> wrote: >> >> Please ignore this one its already checked in?____ >> >> __ __ >> >> *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-_ >> _bounces at openjdk.java.net >> >> > >] *On >> Behalf Of *Berg, Michael C >> *Sent:* Monday, March 16, 2015 2:18 PM >> *To:* hotspot-compiler-dev at openjdk.__java.net > hotspot-compiler-dev at openjdk.java.net> >> > hotspot-compiler-dev at openjdk.java.net>> >> *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction >> optimization )____ >> >> __ __ >> >> Hi All,____ >> >> __ __ >> >> We would like to contribute the Integer/FP scalar reduction >> optimization from Intel.____ >> >> The contribution is referenced as Bug ID 8074981 as a >> performance enhancement. ____ >> >> __ __ >> >> Please review this patch:____ >> >> Bug-id: https://bugs.openjdk.java.net/__browse/JDK-8074981 >> ____ >> >> webrev: https://bugs.openjdk.java.net/ >> __secure/attachment/26101/__webrev.zip >> >> ____ >> >> __ __ >> >> The optimization achieves as much as 2.3x on integer >> reductions and supports float and double precision >> optimizations____ >> >> which also have significant optimization uplift an obey >> strict fp constraints.____ >> >> __ __ >> >> Nils Eliasson has offered to sponsor this patch.____ >> >> __ __ >> >> Thanks,____ >> >> __ __ >> >> -Michael____ >> >> __ __ >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.haupt at oracle.com Tue Apr 7 20:10:53 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Tue, 7 Apr 2015 22:10:53 +0200 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> Message-ID: <8D74B563-185A-40E8-AEA2-A6688E819377@oracle.com> Hello, in case anyone was wondering about the empty changeset in the webrev: that's fixed now. Thanks to Vladimir I. for pointing out the glitch in my webrev creation approach. :-) Best, Michael > Am 07.04.2015 um 21:11 schrieb Michael Haupt : > > Dear all, > > please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. > > RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 > Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ > > Tested with JPRT, HotSpot testset. > > Thanks, > > Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | HotSpot Compiler Team Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Apr 7 21:49:59 2015 From: john.r.rose at oracle.com (John Rose) Date: Tue, 7 Apr 2015 14:49:59 -0700 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> Message-ID: On Apr 7, 2015, at 12:11 PM, Michael Haupt wrote: > > Dear all, > > please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about. Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java). If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE. It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync. If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for. But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure. I'm happy to see the "GC" guys go away. They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams. ? John > > RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 > Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ > > Tested with JPRT, HotSpot testset. > > Thanks, > > Michael > > -- > > > Dr. Michael Haupt | Principal Member of Technical Staff > Phone: +49 331 200 7277 | Fax: +49 331 200 7561 > Oracle Java Platform Group | HotSpot Compiler Team > Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany > Oracle is committed to developing practices and products that help protect the environment > From vladimir.x.ivanov at oracle.com Wed Apr 8 13:06:41 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 08 Apr 2015 16:06:41 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551C5B92.8060500@oracle.com> References: <551C5B92.8060500@oracle.com> Message-ID: <552527E1.5060102@oracle.com> Any volunteers to review VM part? Latest webrev: http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/ http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/ Best regards, Vladimir Ivanov On 4/1/15 11:56 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ > https://bugs.openjdk.java.net/browse/JDK-8057967 > > HotSpot JITs inline very aggressively through CallSites. The > optimistically treat CallSite target as constant, but record a nmethod > dependency to invalidate the compiled code once CallSite target changes. > > Right now, such dependencies have call site class as a context. This > context is too coarse and it leads to context pollution: if some > CallSite target changes, VM needs to enumerate all nmethods which > depends on call sites of such type. > > As performance analysis in the bug report shows, it can sum to > significant amount of work. > > While working on the fix, I investigated 3 approaches: > (1) unique context per call site > (2) use CallSite target class > (3) use a class the CallSite instance is linked to > > Considering call sites are ubiquitous (e.g. 10,000s on some octane > benchmarks), loading a dedicated class for every call site is an > overkill (even VM anonymous). > > CallSite target class > (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is > also not satisfactory, since it is a compiled LambdaForm VM anonymous > class, which is heavily shared. It gets context pollution down, but > still the overhead is quite high. > > So, I decided to focus on (3) and ended up with a mixture of (2) & (3). > > Comparing to other options, the complications of (3) are: > - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so > there should be some default context VM can use > > - CallSite instances can be shared and it shouldn't keep the context > class from unloading; > > It motivated a scheme where CallSite context is initialized lazily and > can change during lifetime. When CallSite is linked with an indy > instruction, it's context is initialized. Usually, JIT sees CallSite > instances with initialized context (since it reaches them through indy), > but if it's not the case and there's no context yet, JIT sets it to > "default context", which means "use target call site". > > I introduced CallSite$DependencyContext, which represents a nmethod > dependency context and points (indirectly) to a Class used as a context. > > Context class is referenced through a phantom reference > (sun.misc.Cleaner to simplify cleanup). Though it's impossible to > extract referent using Reference.get(), VM can access it directly by > reading corresponding field. Unlike other types of references, phantom > references aren't cleared automatically. It allows VM to access context > class until cleanup is performed. And cleanup resets the context to > NULL, in addition to invalidating all relevant dependencies. > > There are 3 context states a CallSite instance can be in: > (1) NULL: no depedencies > (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in > call site target class > (3) DependencyContext for some class: dependencies are stored on the > class DependencyContext instance points to > > Every CallSite starts w/o a context (1) and then lazily gets one ((2) or > (3) depending on the situation). > > State transitions: > (1->3): When a CallSite w/o a context (1) is linked with some indy > call site, it's owner is recorded as a context (3). > > (1->2): When JIT needs to record a dependency on a target of a > CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and > uses target class to store the dependency. > > (3->1): When context class becomes unreachable, a cleanup hook > invalidates all dependencies on that CallSite and resets the context to > NULL (1). > > Only (3->1) requires dependency invalidation, because there are no > depedencies in (1) and (2->1) isn't performed. > > (1->3) is done in Java code (CallSite.initContext) and (1->2) is > performed in VM (ciCallSite::get_context()). The updates are performed > by CAS, so there's no need in additional synchronization. Other > operations on VM side are volatile (to play well with Java code) and > performed with Compile_lock held (to avoid races between VM operations). > > Some statistics: > Box2D, latest jdk9-dev > - CallSite instances: ~22000 > > - invalidated nmethods due to CallSite target changes: ~60 > > - checked call_site_target_value dependencies: > - before the fix: ~1,600,000 > - after the fix: ~600 > > Testing: > - dedicated test which excercises different state transitions > - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn > > Thanks! > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Apr 8 19:36:23 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 08 Apr 2015 12:36:23 -0700 Subject: RFR 8076276 support for AVX512 In-Reply-To: References: Message-ID: <55258337.2050605@oracle.com> I would suggest to remove MoveK and RegK from these changes since they are not used. We can add them later when you have the use case. sharedRuntime_x86_64.* You should have code and not comment: // TODO: add ZMM save code vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment : // Some OSs have a bug when upper 128bits of YMM I see repeated next pattern in C1 code. It should be moved to a function in FrameMap: + int num_caller_save_xmm_regs = FrameMap::nof_caller_save_xmm_regs; +#if _LP64 + if (UseAVX < 3) { + num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2; + } +#endif In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes. c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific? matcher.cpp - typo 'eno': + // For VecZ we need eno alignment and 64 bytes (16 slots) for spills. Thanks, Vladimir On 4/6/15 6:35 PM, Berg, Michael C wrote: > Hi Folks, > > We (Intel) would like to contribute initial support for AVX512 (EVEX encoding, new register support, new ISA support, > etc) for EVEX enabled microarchitectures. > The contribution is referenced as Bug ID 8076276 as a performance enhancement. > > Please review this patch and comment as needed: > > Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276 > > webrev: > http://cr.openjdk.java.net/~kvn/8076276/webrev > > Superword optimizations covered on the vectorization path experience as much as 50% reduction in loop trace instruction > count which make up the path length of EVEX encoded SIMD optimized loops. > > Vladimir Koslov has offered to sponsor this patch. > From vladimir.kozlov at oracle.com Wed Apr 8 19:41:38 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 08 Apr 2015 12:41:38 -0700 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: References: <55242432.9010607@oracle.com> <55242A46.7020108@oracle.com> Message-ID: <55258472.4030106@oracle.com> Note, that if we backport loop reduction vectorization, we backport only 8074981 changes as they are: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6fff5df5f3d2 There will be no support for MulVL in it which requires avx512. Regards, Vladimir On 4/7/15 12:12 PM, Vitaly Davidovich wrote: > Oh, the motivation is there! :) However, it's not always a quick process even if everyone's motivated as there may be > changes of consequence. As a small example, java 8 virtual memory charge is significantly higher than java 7 due to > metaspace vs permgen differences. In some cases, this now requires tweaking java 8 settings in order to keep things > running smoothly. With a big enough codebase, such migrations are never as quick as one would hope. > > At any rate, yes, I meant loop reduction vectorization. It seems like a fairly self-contained change which should be > relatively painless to backport, hence my inquiry. > > On Tue, Apr 7, 2015 at 3:04 PM, Vladimir Kozlov > wrote: > > We want to motivate people to migrate to new releases :) > If you mean loop reduction vectorization we can consider it after it is tested for some time in jdk9. > > Vladimir > > On 4/7/15 11:55 AM, Vitaly Davidovich wrote: > > Ok, thanks. That makes sense for avx512 support, but I think having Michael's changes from this thread sooner > would be > nice as it's quite likely that users are already running java 8 on hardware where this may have benefit. Java 9 is > still ways away, and even when it's released, the migration process is not always quick (depending on the nature of > major changes). But, if backporting it is messy, it's probably not worth it. > > On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov > >> wrote: > > Currently it is only jdk9. There are no plans to backport to 8u. > The thinking is that we will get jdk9 released when this hardware will be widely available. > > Regards, > Vladimir > > On 4/7/15 11:30 AM, Vitaly Davidovich wrote: > > Hi Michael/Vladimir, > > Out of curiosity, is this change and the out-for-review avx512 one going to be (or planned on being) > backported > to java 8? > > Thanks > > On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C > > ____com >>> wrote: > > Please ignore this one its already checked in?____ > > __ __ > > *From:* hotspot-compiler-dev [mailto:hotspot-compiler-dev-____bounces at openjdk.java.net > > > > > >>] *On Behalf Of *Berg, Michael C > *Sent:* Monday, March 16, 2015 2:18 PM > *To:* hotspot-compiler-dev at openjdk.____java.net > > > ____openjdk.java.net > >> > *Subject:* RFR(L): 8074981 (Integer/FP scalar reduction optimization )____ > > __ __ > > Hi All,____ > > __ __ > > We would like to contribute the Integer/FP scalar reduction optimization from Intel.____ > > The contribution is referenced as Bug ID 8074981 as a performance enhancement. ____ > > __ __ > > Please review this patch:____ > > Bug-id: https://bugs.openjdk.java.net/____browse/JDK-8074981 > > > > ____ > > webrev: https://bugs.openjdk.java.net/____secure/attachment/26101/____webrev.zip > > > ____ > > __ __ > > The optimization achieves as much as 2.3x on integer reductions and supports float and double > precision > optimizations____ > > which also have significant optimization uplift an obey strict fp constraints.____ > > __ __ > > Nils Eliasson has offered to sponsor this patch.____ > > __ __ > > Thanks,____ > > __ __ > > -Michael____ > > __ __ > > > > From vitalyd at gmail.com Wed Apr 8 19:57:46 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 8 Apr 2015 15:57:46 -0400 Subject: RFR(L): 8074981 (Integer/FP scalar reduction optimization ) In-Reply-To: <55258472.4030106@oracle.com> References: <55242432.9010607@oracle.com> <55242A46.7020108@oracle.com> <55258472.4030106@oracle.com> Message-ID: Sounds good to me, I'll take what I can get :) Thanks On Wed, Apr 8, 2015 at 3:41 PM, Vladimir Kozlov wrote: > Note, that if we backport loop reduction vectorization, we backport only > 8074981 changes as they are: > > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/6fff5df5f3d2 > > There will be no support for MulVL in it which requires avx512. > > Regards, > Vladimir > > On 4/7/15 12:12 PM, Vitaly Davidovich wrote: > >> Oh, the motivation is there! :) However, it's not always a quick process >> even if everyone's motivated as there may be >> changes of consequence. As a small example, java 8 virtual memory charge >> is significantly higher than java 7 due to >> metaspace vs permgen differences. In some cases, this now requires >> tweaking java 8 settings in order to keep things >> running smoothly. With a big enough codebase, such migrations are never >> as quick as one would hope. >> >> At any rate, yes, I meant loop reduction vectorization. It seems like a >> fairly self-contained change which should be >> relatively painless to backport, hence my inquiry. >> >> On Tue, Apr 7, 2015 at 3:04 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com > wrote: >> >> We want to motivate people to migrate to new releases :) >> If you mean loop reduction vectorization we can consider it after it >> is tested for some time in jdk9. >> >> Vladimir >> >> On 4/7/15 11:55 AM, Vitaly Davidovich wrote: >> >> Ok, thanks. That makes sense for avx512 support, but I think >> having Michael's changes from this thread sooner >> would be >> nice as it's quite likely that users are already running java 8 >> on hardware where this may have benefit. Java 9 is >> still ways away, and even when it's released, the migration >> process is not always quick (depending on the nature of >> major changes). But, if backporting it is messy, it's probably >> not worth it. >> >> On Tue, Apr 7, 2015 at 2:38 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com >> > oracle.com>>> wrote: >> >> Currently it is only jdk9. There are no plans to backport to >> 8u. >> The thinking is that we will get jdk9 released when this >> hardware will be widely available. >> >> Regards, >> Vladimir >> >> On 4/7/15 11:30 AM, Vitaly Davidovich wrote: >> >> Hi Michael/Vladimir, >> >> Out of curiosity, is this change and the out-for-review >> avx512 one going to be (or planned on being) >> backported >> to java 8? >> >> Thanks >> >> On Tue, Apr 7, 2015 at 2:07 PM, Berg, Michael C < >> michael.c.berg at intel.com >> > > >> > michael.c.berg at intel.>____com > >>> wrote: >> >> Please ignore this one its already checked in?____ >> >> __ __ >> >> *From:* hotspot-compiler-dev [mailto: >> hotspot-compiler-dev-____bounces at openjdk.java.net >> >> > > >> > ___bounces at openjdk.java.net >> >> > >>] *On >> Behalf Of *Berg, Michael C >> *Sent:* Monday, March 16, 2015 2:18 PM >> *To:* hotspot-compiler-dev at openjdk.____java.net < >> http://java.net> >> > hotspot-compiler-dev at openjdk.java.net>> >> > hotspot-compiler-dev@>____openjdk.java.net >> > openjdk.java.net >> >> >> *Subject:* RFR(L): 8074981 (Integer/FP scalar >> reduction optimization )____ >> >> __ __ >> >> Hi All,____ >> >> __ __ >> >> We would like to contribute the Integer/FP scalar >> reduction optimization from Intel.____ >> >> The contribution is referenced as Bug ID 8074981 as >> a performance enhancement. ____ >> >> __ __ >> >> Please review this patch:____ >> >> Bug-id: https://bugs.openjdk.java.net/ >> ____browse/JDK-8074981 >> >> > https://bugs.openjdk.java.net/browse/JDK-8074981>> >> ____ >> >> webrev: https://bugs.openjdk.java.net/ >> ____secure/attachment/26101/____webrev.zip >> > __webrev.zip> >> > net/secure/attachment/26101/__webrev.zip >> > >> ____ >> >> __ __ >> >> The optimization achieves as much as 2.3x on >> integer reductions and supports float and double >> precision >> optimizations____ >> >> which also have significant optimization uplift an >> obey strict fp constraints.____ >> >> __ __ >> >> Nils Eliasson has offered to sponsor this patch.____ >> >> __ __ >> >> Thanks,____ >> >> __ __ >> >> -Michael____ >> >> __ __ >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 8 20:32:56 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 08 Apr 2015 13:32:56 -0700 Subject: RFR 8076276 support for AVX512 In-Reply-To: References: <55258337.2050605@oracle.com> Message-ID: <55259078.1080309@oracle.com> Michael, please, make sure to include mailing lists in replies - it is review process. I understand that K register may be important but I don't see the need to include it in these changes which are huge already. We can do it as separate changes unless you point me where they are critical needed for avx512 instructions. I don't see the use of it in current changes which simple widen vectors to 512 bits. I am concern that K reg implementation is incomplete but it is hard to see and review it in current changes. Regards, Vladimir On 4/8/15 1:09 PM, Berg, Michael C wrote: > Vladimir, RegK is needed as it frames the kmov instructions which utilize KRegister and > the enumerated k registers, which are critically needed and used, although not yet matched (we use k1 and k0 now). I will look into to the rest of > the comments. The plan is to register allocate the k registers at some point though. > > Thanks, > Michael > > -----Original Message----- > From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov > Sent: Wednesday, April 08, 2015 12:36 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR 8076276 support for AVX512 > > I would suggest to remove MoveK and RegK from these changes since they are not used. > We can add them later when you have the use case. > > sharedRuntime_x86_64.* You should have code and not comment: > // TODO: add ZMM save code > > vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment : > > // Some OSs have a bug when upper 128bits of YMM > > > I see repeated next pattern in C1 code. It should be moved to a function in FrameMap: > > + int num_caller_save_xmm_regs = > +FrameMap::nof_caller_save_xmm_regs; > +#if _LP64 > + if (UseAVX < 3) { > + num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2; > + } > +#endif > > > In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes. > > c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific? > > matcher.cpp - typo 'eno': > > + // For VecZ we need eno alignment and 64 bytes (16 slots) for spills. > > > Thanks, > Vladimir > > > On 4/6/15 6:35 PM, Berg, Michael C wrote: >> Hi Folks, >> >> We (Intel) would like to contribute initial support for AVX512 (EVEX >> encoding, new register support, new ISA support, >> etc) for EVEX enabled microarchitectures. >> The contribution is referenced as Bug ID 8076276 as a performance enhancement. >> >> Please review this patch and comment as needed: >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276 >> >> webrev: >> http://cr.openjdk.java.net/~kvn/8076276/webrev >> >> Superword optimizations covered on the vectorization path experience >> as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops. >> >> Vladimir Koslov has offered to sponsor this patch. >> From mark.reinhold at oracle.com Wed Apr 8 23:09:25 2015 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Wed, 8 Apr 2015 16:09:25 -0700 (PDT) Subject: JEP 243: Java-Level JVM Compiler Interface Message-ID: <20150408230925.99BBD553C1@eggemoggin.niobe.net> New JEP Candidate: http://openjdk.java.net/jeps/243 - Mark From duncan.macgregor at ge.com Thu Apr 9 09:46:21 2015 From: duncan.macgregor at ge.com (MacGregor, Duncan (GE Energy Management)) Date: Thu, 9 Apr 2015 09:46:21 +0000 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <551D6DAC.8030607@oracle.com> References: <551C5B92.8060500@oracle.com> <551D6A0D.8090500@oracle.com> <551D6DAC.8030607@oracle.com> Message-ID: Now I?m back from my Easter break I?ve run done some testing with our code. Hs-comp is looking good in general, and this code does appear to give a nice little extra boost. My results are showing a difference at peak performance, which I found slightly surprising so I?ll need to take a look at just how often targets are being reset and for what reasons. Anyway, in general I?m getting about 10% better performance with hs-comp than 8u40, and that?s in code which spends a substantial amount of its time down in some C libraries. Keep up the good work Vladimir! Duncan. On 02/04/2015 17:26, "Vladimir Ivanov" wrote: >Aleksey, thanks a lot for the performance evaluation of the fix! > >Best regards, >Vladimir Ivanov > >On 4/2/15 7:10 PM, Aleksey Shipilev wrote: >> On 04/01/2015 11:56 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ >>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ >>> https://bugs.openjdk.java.net/browse/JDK-8057967 >> >> Glad to see this finally addressed, thanks! >> >> I did not look through the code changes, but ran Octane on my >> configuration. As expected, Typescript had improved substantially. Other >> benchmarks are not affected much. This in line with the performance >> analysis done for the original bug report. >> >> Baseline: >> >> Benchmark Mode Cnt Score Error Units >> Box2D.test ss 20 4454.677 ? 345.807 ms/op >> CodeLoad.test ss 20 4784.299 ? 370.658 ms/op >> Crypto.test ss 20 878.395 ? 87.918 ms/op >> DeltaBlue.test ss 20 502.182 ? 52.362 ms/op >> EarleyBoyer.test ss 20 2250.508 ? 273.924 ms/op >> Gbemu.test ss 20 5893.102 ? 656.036 ms/op >> Mandreel.test ss 20 9323.484 ? 825.801 ms/op >> NavierStokes.test ss 20 657.608 ? 41.212 ms/op >> PdfJS.test ss 20 3829.534 ? 353.702 ms/op >> Raytrace.test ss 20 1202.826 ? 166.795 ms/op >> Regexp.test ss 20 156.782 ? 20.992 ms/op >> Richards.test ss 20 324.256 ? 35.874 ms/op >> Splay.test ss 20 179.660 ? 34.120 ms/op >> Typescript.test ss 20 40.537 ? 2.457 s/op >> >> Patched: >> >> Benchmark Mode Cnt Score Error Units >> Box2D.test ss 20 4306.198 ? 376.030 ms/op >> CodeLoad.test ss 20 4881.635 ? 395.585 ms/op >> Crypto.test ss 20 823.551 ? 106.679 ms/op >> DeltaBlue.test ss 20 490.557 ? 41.705 ms/op >> EarleyBoyer.test ss 20 2299.763 ? 270.961 ms/op >> Gbemu.test ss 20 5612.868 ? 414.052 ms/op >> Mandreel.test ss 20 8616.735 ? 825.813 ms/op >> NavierStokes.test ss 20 640.722 ? 28.035 ms/op >> PdfJS.test ss 20 4139.396 ? 373.580 ms/op >> Raytrace.test ss 20 1227.632 ? 151.088 ms/op >> Regexp.test ss 20 169.246 ? 34.055 ms/op >> Richards.test ss 20 331.824 ? 32.706 ms/op >> Splay.test ss 20 168.479 ? 23.512 ms/op >> Typescript.test ss 20 31.181 ? 1.790 s/op >> >> The offending profile branch (Universe::flush_dependents_on) is also >> gone, which explains the performance improvement. >> >> Thanks, >> -Aleksey. >> >_______________________________________________ >mlvm-dev mailing list >mlvm-dev at openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev From tobias.hartmann at oracle.com Thu Apr 9 12:10:22 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 09 Apr 2015 14:10:22 +0200 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java Message-ID: <55266C2E.3050207@oracle.com> Hi, please review the following patch. https://bugs.openjdk.java.net/browse/JDK-8076625 http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ Problem: A random offset to access in a byte array is computed by int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { return abs(r.nextInt()) % (buf.capacity() - size); } The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. Solution: Use nextInt(int n) to set the limit of the random value. Testing: Failing testcase and JPRT. Thanks, Tobias From vladimir.x.ivanov at oracle.com Thu Apr 9 14:33:48 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 09 Apr 2015 17:33:48 +0300 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java In-Reply-To: <55266C2E.3050207@oracle.com> References: <55266C2E.3050207@oracle.com> Message-ID: <55268DCC.70909@oracle.com> Looks good. Best regards, Vladimir Ivanov On 4/9/15 3:10 PM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8076625 > http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ > > Problem: > A random offset to access in a byte array is computed by > > int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { > return abs(r.nextInt()) % (buf.capacity() - size); > } > > The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. > > Solution: > Use nextInt(int n) to set the limit of the random value. > > Testing: > Failing testcase and JPRT. > > Thanks, > Tobias > From tobias.hartmann at oracle.com Thu Apr 9 14:35:05 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 09 Apr 2015 16:35:05 +0200 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java In-Reply-To: <55268DCC.70909@oracle.com> References: <55266C2E.3050207@oracle.com> <55268DCC.70909@oracle.com> Message-ID: <55268E19.8030404@oracle.com> Thanks, Vladimir. Best, Tobias On 09.04.2015 16:33, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 4/9/15 3:10 PM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8076625 >> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ >> >> Problem: >> A random offset to access in a byte array is computed by >> >> int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { >> return abs(r.nextInt()) % (buf.capacity() - size); >> } >> >> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. >> >> Solution: >> Use nextInt(int n) to set the limit of the random value. >> >> Testing: >> Failing testcase and JPRT. >> >> Thanks, >> Tobias >> From vladimir.kozlov at oracle.com Thu Apr 9 16:22:43 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Apr 2015 09:22:43 -0700 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java In-Reply-To: <55266C2E.3050207@oracle.com> References: <55266C2E.3050207@oracle.com> Message-ID: <5526A753.8040606@oracle.com> Tobias, I also asked to use Utils::getRandomInstance() to get reproducible results. Thanks, Vladimir On 4/9/15 5:10 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch. > > https://bugs.openjdk.java.net/browse/JDK-8076625 > http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ > > Problem: > A random offset to access in a byte array is computed by > > int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { > return abs(r.nextInt()) % (buf.capacity() - size); > } > > The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. > > Solution: > Use nextInt(int n) to set the limit of the random value. > > Testing: > Failing testcase and JPRT. > > Thanks, > Tobias > From michael.c.berg at intel.com Thu Apr 9 23:02:57 2015 From: michael.c.berg at intel.com (Berg, Michael C) Date: Thu, 9 Apr 2015 23:02:57 +0000 Subject: RFR 8076276 support for AVX512 In-Reply-To: <55259078.1080309@oracle.com> References: <55258337.2050605@oracle.com> <55259078.1080309@oracle.com> Message-ID: Vladimir, some explanation of the EVEX encoding model is needed: Some instructions are agnostic to vector length and can take the implicit k0 definition in encoding. Some instructions must have predication definitions for their mask application to SIMD, which explicitly exclude k0. The range usage of predication mask registers must be k1..k7 as a real definition which code must provide with a mask value. The EVEX enabled machine environment does not automatically initialize any of the mask assignable registers (k1..k7), so we must emit kmov instructions which gather an immediate value from a gpr register. You will see code such as this in the review. This effectively means KRegister must stay in the implementation, but I can accommodate the lion share of what you have indicated. The places where KRegister is used via the assembler layer are: src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265, src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it needs one too" src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046 This is in place of formal register allocation for now as well as when we do more extravagant things with SIMD masks. I will keep the webrev around so I can easily add these pieces back in as we are going to need them. Also there are many other mask register instructions in the ISA which we will need to make use of in the future. If this is amenable I will look into the other changes and resend the webrev accordingly modified. Thanks, Michael -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 08, 2015 1:33 PM To: Berg, Michael C Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR 8076276 support for AVX512 Michael, please, make sure to include mailing lists in replies - it is review process. I understand that K register may be important but I don't see the need to include it in these changes which are huge already. We can do it as separate changes unless you point me where they are critical needed for avx512 instructions. I don't see the use of it in current changes which simple widen vectors to 512 bits. I am concern that K reg implementation is incomplete but it is hard to see and review it in current changes. Regards, Vladimir On 4/8/15 1:09 PM, Berg, Michael C wrote: > Vladimir, RegK is needed as it frames the kmov instructions which > utilize KRegister and the enumerated k registers, which are critically > needed and used, although not yet matched (we use k1 and k0 now). I will look into to the rest of the comments. The plan is to register allocate the k registers at some point though. > > Thanks, > Michael > > -----Original Message----- > From: hotspot-compiler-dev > [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of > Vladimir Kozlov > Sent: Wednesday, April 08, 2015 12:36 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR 8076276 support for AVX512 > > I would suggest to remove MoveK and RegK from these changes since they are not used. > We can add them later when you have the use case. > > sharedRuntime_x86_64.* You should have code and not comment: > // TODO: add ZMM save code > > vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment : > > // Some OSs have a bug when upper 128bits of YMM > > > I see repeated next pattern in C1 code. It should be moved to a function in FrameMap: > > + int num_caller_save_xmm_regs = > +FrameMap::nof_caller_save_xmm_regs; > +#if _LP64 > + if (UseAVX < 3) { > + num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2; > + } > +#endif > > > In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes. > > c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific? > > matcher.cpp - typo 'eno': > > + // For VecZ we need eno alignment and 64 bytes (16 slots) for spills. > > > Thanks, > Vladimir > > > On 4/6/15 6:35 PM, Berg, Michael C wrote: >> Hi Folks, >> >> We (Intel) would like to contribute initial support for AVX512 (EVEX >> encoding, new register support, new ISA support, >> etc) for EVEX enabled microarchitectures. >> The contribution is referenced as Bug ID 8076276 as a performance enhancement. >> >> Please review this patch and comment as needed: >> >> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276 >> >> webrev: >> http://cr.openjdk.java.net/~kvn/8076276/webrev >> >> Superword optimizations covered on the vectorization path experience >> as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops. >> >> Vladimir Koslov has offered to sponsor this patch. >> From vladimir.kozlov at oracle.com Thu Apr 9 23:53:36 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 09 Apr 2015 16:53:36 -0700 Subject: RFR 8076276 support for AVX512 In-Reply-To: References: <55258337.2050605@oracle.com> <55259078.1080309@oracle.com> Message-ID: <55271100.8080203@oracle.com> Michael, Thank you for detail explanation. I need to clarify by request: 1. I am fine with kmov amd Kregister definitions and usage in assembler, macroassembler and stubs. 2. I don't want KRegister and Kmove in C2 code (opto/ and .ad files) until we have full support for them in RA and signal processing. Thanks, Vladimir On 4/9/15 4:02 PM, Berg, Michael C wrote: > Vladimir, some explanation of the EVEX encoding model is needed: > > Some instructions are agnostic to vector length and can take the implicit k0 definition in encoding. Some instructions must have predication definitions for their mask application to SIMD, which explicitly exclude k0. The range usage of predication mask registers must be k1..k7 as a real definition which code must provide with a mask value. The EVEX enabled machine environment does not automatically initialize any of the mask assignable registers (k1..k7), so we must emit kmov instructions which gather an immediate value from a gpr register. You will see code such as this in the review. This effectively means KRegister must stay in the > implementation, but I can accommodate the lion share of what you have indicated. The places where KRegister is used via the assembler layer are: > > src/cpu/x86/vm/stubGenerator_x86_64.cpp: 265, > src/cpu/x86/vm/stubGenerator_x86_32.cpp: 169 "not there yet, but it needs one too" > src/cpu/x86/vm/macroAssembler_x86.cpp: 4550, 7046 > > This is in place of formal register allocation for now as well as when we do more extravagant things with SIMD masks. I will keep the webrev around so I can easily add these pieces back in as we are going to need them. > Also there are many other mask register instructions in the ISA which we will need to make use of in the future. If this is amenable I will look into the other changes and resend the webrev accordingly modified. > > Thanks, > Michael > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, April 08, 2015 1:33 PM > To: Berg, Michael C > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR 8076276 support for AVX512 > > Michael, please, make sure to include mailing lists in replies - it is review process. > > I understand that K register may be important but I don't see the need to include it in these changes which are huge already. We can do it as separate changes unless you point me where they are critical needed for avx512 instructions. > I don't see the use of it in current changes which simple widen vectors to 512 bits. > > I am concern that K reg implementation is incomplete but it is hard to see and review it in current changes. > > Regards, > Vladimir > > On 4/8/15 1:09 PM, Berg, Michael C wrote: >> Vladimir, RegK is needed as it frames the kmov instructions which >> utilize KRegister and the enumerated k registers, which are critically >> needed and used, although not yet matched (we use k1 and k0 now). I will look into to the rest of the comments. The plan is to register allocate the k registers at some point though. >> >> Thanks, >> Michael >> >> -----Original Message----- >> From: hotspot-compiler-dev >> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of >> Vladimir Kozlov >> Sent: Wednesday, April 08, 2015 12:36 PM >> To: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR 8076276 support for AVX512 >> >> I would suggest to remove MoveK and RegK from these changes since they are not used. >> We can add them later when you have the use case. >> >> sharedRuntime_x86_64.* You should have code and not comment: >> // TODO: add ZMM save code >> >> vm_version_x86.cpp Add code to verify that system preserve Z registers during interrupt. See code after comment : >> >> // Some OSs have a bug when upper 128bits of YMM >> >> >> I see repeated next pattern in C1 code. It should be moved to a function in FrameMap: >> >> + int num_caller_save_xmm_regs = >> +FrameMap::nof_caller_save_xmm_regs; >> +#if _LP64 >> + if (UseAVX < 3) { >> + num_caller_save_xmm_regs = num_caller_save_xmm_regs / 2; >> + } >> +#endif >> >> >> In general we should avoid using #ifdef X86 in shared code: matcher.cpp. This file will not be issue if you remove RegK from changes. >> >> c2compiler.cpp - can you move that code to Compile::pd_compiler2_init() which is platform specific? >> >> matcher.cpp - typo 'eno': >> >> + // For VecZ we need eno alignment and 64 bytes (16 slots) for spills. >> >> >> Thanks, >> Vladimir >> >> >> On 4/6/15 6:35 PM, Berg, Michael C wrote: >>> Hi Folks, >>> >>> We (Intel) would like to contribute initial support for AVX512 (EVEX >>> encoding, new register support, new ISA support, >>> etc) for EVEX enabled microarchitectures. >>> The contribution is referenced as Bug ID 8076276 as a performance enhancement. >>> >>> Please review this patch and comment as needed: >>> >>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076276 >>> >>> webrev: >>> http://cr.openjdk.java.net/~kvn/8076276/webrev >>> >>> Superword optimizations covered on the vectorization path experience >>> as much as 50% reduction in loop trace instruction count which make up the path length of EVEX encoded SIMD optimized loops. >>> >>> Vladimir Koslov has offered to sponsor this patch. >>> From tobias.hartmann at oracle.com Fri Apr 10 08:52:18 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 10 Apr 2015 10:52:18 +0200 Subject: [8u60] RFR of backport for 8066875: VirtualSpace does not use large pages In-Reply-To: <1427374516.3149.37.camel@oracle.com> References: <1427374516.3149.37.camel@oracle.com> Message-ID: <55278F42.9030703@oracle.com> Hi Thomas, the code cache related changes look good to me (not a reviewer). Best, Tobias On 26.03.2015 13:55, Thomas Schatzl wrote: > Hi all, > > can I have reviews for the backport of "8066875: VirtualSpace does not > use large pages" for 8u60? I also would like to have one review from the > compiler team (cc'ed) since the change touches some compiler files. > > It did only apply with minor changes, so I need re-reviews. The problem > is that in jdk9 the code cache sizing has been changed. In particular: > - dropped the hunk in code/codeCache.cpp because the code to determine > memory sizes in 8u60 is much simpler i.e. . > E.g. this change: > --- a/src/share/vm/code/codeCache.cpp Thu Jan 15 16:05:20 2015 +0100 > +++ b/src/share/vm/code/codeCache.cpp Fri Jan 16 10:29:12 2015 +0100 > @@ -233,8 +233,8 @@ > ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) { > // Determine alignment > const size_t page_size = os::can_execute_large_page_memory() ? > - MIN2(os::page_size_for_region(InitialCodeCacheSize, 8), > - os::page_size_for_region(size, 8)) : > + MIN2(os::page_size_for_region_aligned(InitialCodeCacheSize, 8), > + os::page_size_for_region_aligned(size, 8)) : > os::vm_page_size(); > const size_t granularity = os::vm_allocation_granularity(); > const size_t r_align = MAX2(page_size, granularity); > > - fixed the code in heap.cpp because of the same change (JDK-8015774: > Add support for multiple code heaps) is not in 8u60. > > Note that this change is based on "8049864: TestParallelHeapSizeFlags > fails with unexpected heap size" > which is also out for review (on hotspot-gc-dev), and "8053995: Add > method to WhiteBox to get vm_pagesize" which applies cleanly. > > Full 8u60 changeset: > http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60/ > Fix changeset: > http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60-fix/ > > CR: > https://bugs.openjdk.java.net/browse/JDK-8066875 > Original change: > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4321214d5dbc > > Testing: jprt > > With that changeset in place, JDK-8058354 can be merged relatively > easily, which is the goal of most of the recent backports. > > Thanks, > Thomas > > > > > > From tobias.hartmann at oracle.com Fri Apr 10 10:42:45 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 10 Apr 2015 12:42:45 +0200 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java In-Reply-To: <5526A753.8040606@oracle.com> References: <55266C2E.3050207@oracle.com> <5526A753.8040606@oracle.com> Message-ID: <5527A925.5060103@oracle.com> Hi Vladimir, On 09.04.2015 18:22, Vladimir Kozlov wrote: > I also asked to use Utils::getRandomInstance() to get reproducible results. Sorry, I missed that. Here is the new webrev: http://cr.openjdk.java.net/~thartmann/8076625/webrev.01/ I also noticed that the sizes of short and char reads passed to 'randomOffset' are too large (4 instead of 2). Fixed it. Best, Tobias > Thanks, > Vladimir > > > On 4/9/15 5:10 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch. >> >> https://bugs.openjdk.java.net/browse/JDK-8076625 >> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ >> >> Problem: >> A random offset to access in a byte array is computed by >> >> int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { >> return abs(r.nextInt()) % (buf.capacity() - size); >> } >> >> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. >> >> Solution: >> Use nextInt(int n) to set the limit of the random value. >> >> Testing: >> Failing testcase and JPRT. >> >> Thanks, >> Tobias >> From bengt.rutisson at oracle.com Fri Apr 10 10:57:37 2015 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Fri, 10 Apr 2015 12:57:37 +0200 Subject: [8u60] RFR of backport for 8066875: VirtualSpace does not use large pages In-Reply-To: <55278F42.9030703@oracle.com> References: <1427374516.3149.37.camel@oracle.com> <55278F42.9030703@oracle.com> Message-ID: <5527ACA1.5000907@oracle.com> Hi Thomas, On 2015-04-10 10:52, Tobias Hartmann wrote: > Hi Thomas, > > the code cache related changes look good to me (not a reviewer). The change looks good to me too. Bengt > > Best, > Tobias > > On 26.03.2015 13:55, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for the backport of "8066875: VirtualSpace does not >> use large pages" for 8u60? I also would like to have one review from the >> compiler team (cc'ed) since the change touches some compiler files. >> >> It did only apply with minor changes, so I need re-reviews. The problem >> is that in jdk9 the code cache sizing has been changed. In particular: >> - dropped the hunk in code/codeCache.cpp because the code to determine >> memory sizes in 8u60 is much simpler i.e. . >> E.g. this change: >> --- a/src/share/vm/code/codeCache.cpp Thu Jan 15 16:05:20 2015 +0100 >> +++ b/src/share/vm/code/codeCache.cpp Fri Jan 16 10:29:12 2015 +0100 >> @@ -233,8 +233,8 @@ >> ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) { >> // Determine alignment >> const size_t page_size = os::can_execute_large_page_memory() ? >> - MIN2(os::page_size_for_region(InitialCodeCacheSize, 8), >> - os::page_size_for_region(size, 8)) : >> + MIN2(os::page_size_for_region_aligned(InitialCodeCacheSize, 8), >> + os::page_size_for_region_aligned(size, 8)) : >> os::vm_page_size(); >> const size_t granularity = os::vm_allocation_granularity(); >> const size_t r_align = MAX2(page_size, granularity); >> >> - fixed the code in heap.cpp because of the same change (JDK-8015774: >> Add support for multiple code heaps) is not in 8u60. >> >> Note that this change is based on "8049864: TestParallelHeapSizeFlags >> fails with unexpected heap size" >> which is also out for review (on hotspot-gc-dev), and "8053995: Add >> method to WhiteBox to get vm_pagesize" which applies cleanly. >> >> Full 8u60 changeset: >> http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60/ >> Fix changeset: >> http://cr.openjdk.java.net/~tschatzl/8066875-8u60/webrev.8u60-fix/ >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8066875 >> Original change: >> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4321214d5dbc >> >> Testing: jprt >> >> With that changeset in place, JDK-8058354 can be merged relatively >> easily, which is the goal of most of the recent backports. >> >> Thanks, >> Thomas >> >> >> >> >> >> From kirill.zhaldybin at oracle.com Fri Apr 10 13:15:12 2015 From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin) Date: Fri, 10 Apr 2015 16:15:12 +0300 Subject: RFR(XS): JDK-8071546: hotspot/test/compiler/codecache/jmx/PoolsIndependenceTest.java has been fixed, but still is in the exclude list Message-ID: <5527CCE0.2060301@oracle.com> Dear all, Could you please review this really small fix? CR: https://bugs.openjdk.java.net/browse/JDK-8071546? Webrev: http://cr.openjdk.java.net/~ppunegov/kzhaldybin/8071546/webrev/ 1 line changed: 0 ins; 1 del; 0 mod Thank you. Regards, Kirill From igor.ignatyev at oracle.com Fri Apr 10 13:22:34 2015 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 10 Apr 2015 16:22:34 +0300 Subject: RFR(XS): JDK-8071546: hotspot/test/compiler/codecache/jmx/PoolsIndependenceTest.java has been fixed, but still is in the exclude list In-Reply-To: <5527CCE0.2060301@oracle.com> References: <5527CCE0.2060301@oracle.com> Message-ID: <5527CE9A.6040502@oracle.com> Hi Kirill, looks good to me. Igor On 04/10/2015 04:15 PM, Kirill Zhaldybin wrote: > Dear all, > > Could you please review this really small fix? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8071546? > > Webrev: > http://cr.openjdk.java.net/~ppunegov/kzhaldybin/8071546/webrev/ > 1 line changed: 0 ins; 1 del; 0 mod > > Thank you. > > Regards, Kirill From evgeniya.stepanova at oracle.com Fri Apr 10 14:01:37 2015 From: evgeniya.stepanova at oracle.com (Evgeniya Stepanova) Date: Fri, 10 Apr 2015 17:01:37 +0300 Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor from hotspot/test/compiler/* tests Message-ID: <5527D7C1.9050704@oracle.com> Hi, Could you please review back-port of 8038098 to the 8udev repo? Diff applies cleanly to the all tests except of the test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which does not exist in 8u60 repo. After fix tests pass with 8u60 b09 with the client vm. webrev for 8u60: http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/ bug: https://bugs.openjdk.java.net/browse/JDK-8038098 Original webrev: http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/ mail thread for 9: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html Original change: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32 http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32 Thanks, Jane -- /Evgeniya Stepanova/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Fri Apr 10 14:30:19 2015 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 10 Apr 2015 16:30:19 +0200 Subject: [8u60] RFR of backport for 8066875: VirtualSpace does not use large pages In-Reply-To: <5527ACA1.5000907@oracle.com> References: <1427374516.3149.37.camel@oracle.com> <55278F42.9030703@oracle.com> <5527ACA1.5000907@oracle.com> Message-ID: <1428676219.3364.14.camel@oracle.com> Hi Tobias and Bengt, On Fri, 2015-04-10 at 12:57 +0200, Bengt Rutisson wrote: > Hi Thomas, > > On 2015-04-10 10:52, Tobias Hartmann wrote: > > Hi Thomas, > > > > the code cache related changes look good to me (not a reviewer). > > The change looks good to me too. Thanks for the reviews. Thomas From daniel.daugherty at oracle.com Fri Apr 10 15:12:37 2015 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 10 Apr 2015 09:12:37 -0600 Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop In-Reply-To: <5527E6E6.5000707@redhat.com> References: <5527E6E6.5000707@redhat.com> Message-ID: <5527E865.4060709@oracle.com> Adding in the Compiler team since this is the MacroAssembler... Dan On 4/10/15 9:06 AM, Andrew Haley wrote: > This is for x86: > > addl (idx, 0x2); > andl (idx, 0x1); > subl(idx, 1); > jcc(Assembler::negative, L_post_third_loop_done); > > I'm trying to guess what the "addl (idx, 0x2)" instruction was supposed to do. > I don't think it has any effect now. > > Andrew. From vladimir.kozlov at oracle.com Fri Apr 10 15:17:54 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Apr 2015 08:17:54 -0700 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java In-Reply-To: <5527A925.5060103@oracle.com> References: <55266C2E.3050207@oracle.com> <5526A753.8040606@oracle.com> <5527A925.5060103@oracle.com> Message-ID: <5527E9A2.7050709@oracle.com> Looks good. Thanks, Vladimir On 4/10/15 3:42 AM, Tobias Hartmann wrote: > Hi Vladimir, > > On 09.04.2015 18:22, Vladimir Kozlov wrote: >> I also asked to use Utils::getRandomInstance() to get reproducible results. > > Sorry, I missed that. Here is the new webrev: > > http://cr.openjdk.java.net/~thartmann/8076625/webrev.01/ > > I also noticed that the sizes of short and char reads passed to 'randomOffset' are too large (4 instead of 2). Fixed it. > > Best, > Tobias > > >> Thanks, >> Vladimir >> >> >> On 4/9/15 5:10 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch. >>> >>> https://bugs.openjdk.java.net/browse/JDK-8076625 >>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ >>> >>> Problem: >>> A random offset to access in a byte array is computed by >>> >>> int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { >>> return abs(r.nextInt()) % (buf.capacity() - size); >>> } >>> >>> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. >>> >>> Solution: >>> Use nextInt(int n) to set the limit of the random value. >>> >>> Testing: >>> Failing testcase and JPRT. >>> >>> Thanks, >>> Tobias >>> From vladimir.kozlov at oracle.com Fri Apr 10 15:28:11 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Apr 2015 08:28:11 -0700 Subject: RFR(XS): JDK-8071546: hotspot/test/compiler/codecache/jmx/PoolsIndependenceTest.java has been fixed, but still is in the exclude list In-Reply-To: <5527CCE0.2060301@oracle.com> References: <5527CCE0.2060301@oracle.com> Message-ID: <5527EC0B.5080102@oracle.com> Good. Vladimir On 4/10/15 6:15 AM, Kirill Zhaldybin wrote: > Dear all, > > Could you please review this really small fix? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8071546? > > Webrev: > http://cr.openjdk.java.net/~ppunegov/kzhaldybin/8071546/webrev/ > 1 line changed: 0 ins; 1 del; 0 mod > > Thank you. > > Regards, Kirill From vladimir.kozlov at oracle.com Fri Apr 10 16:30:45 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Apr 2015 09:30:45 -0700 Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop In-Reply-To: <5527E865.4060709@oracle.com> References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com> Message-ID: <5527FAB5.6090802@oracle.com> It restores counter after: subl(idx, 2); jcc(Assembler::negative, L_check_1); The value could be 1,2,3 before this point (idx & 3 before). After 'sub 2': -1, 0, 1. So we have to restore to positive values before subtracting 1. Vladimir On 4/10/15 8:12 AM, Daniel D. Daugherty wrote: > Adding in the Compiler team since this is the MacroAssembler... > > Dan > > > On 4/10/15 9:06 AM, Andrew Haley wrote: >> This is for x86: >> >> addl (idx, 0x2); >> andl (idx, 0x1); >> subl(idx, 1); >> jcc(Assembler::negative, L_post_third_loop_done); >> >> I'm trying to guess what the "addl (idx, 0x2)" instruction was supposed to do. >> I don't think it has any effect now. >> >> Andrew. > From aph at redhat.com Fri Apr 10 16:47:02 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 10 Apr 2015 17:47:02 +0100 Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop In-Reply-To: <5527FAB5.6090802@oracle.com> References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com> <5527FAB5.6090802@oracle.com> Message-ID: <5527FE86.3060302@redhat.com> On 04/10/2015 05:30 PM, Vladimir Kozlov wrote: > It restores counter after: > > subl(idx, 2); > jcc(Assembler::negative, L_check_1); > > The value could be 1,2,3 before this point (idx & 3 before). > After 'sub 2': -1, 0, 1. > > So we have to restore to positive values before subtracting 1. But after > andl (idx, 0x1); the only possible values are 0 and 1, and this is true regardless of the 'sub 2'. >>> addl (idx, 0x2); >>> andl (idx, 0x1); >>> subl(idx, 1); >>> jcc(Assembler::negative, L_post_third_loop_done); Andrew. From vladimir.kozlov at oracle.com Fri Apr 10 16:47:13 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Apr 2015 09:47:13 -0700 Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop In-Reply-To: <5527FAB5.6090802@oracle.com> References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com> <5527FAB5.6090802@oracle.com> Message-ID: <5527FE91.1040606@oracle.com> On 4/10/15 9:30 AM, Vladimir Kozlov wrote: > It restores counter after: > > subl(idx, 2); > jcc(Assembler::negative, L_check_1); > > The value could be 1,2,3 before this point (idx & 3 before). > After 'sub 2': -1, 0, 1. > > So we have to restore to positive values before subtracting 1. And you are right, it is not needed. We get the same (correct) result from 'and idx,1' regardless executing 'add'. Vladimir > > Vladimir > > On 4/10/15 8:12 AM, Daniel D. Daugherty wrote: >> Adding in the Compiler team since this is the MacroAssembler... >> >> Dan >> >> >> On 4/10/15 9:06 AM, Andrew Haley wrote: >>> This is for x86: >>> >>> addl (idx, 0x2); >>> andl (idx, 0x1); >>> subl(idx, 1); >>> jcc(Assembler::negative, L_post_third_loop_done); >>> >>> I'm trying to guess what the "addl (idx, 0x2)" instruction was supposed to do. >>> I don't think it has any effect now. >>> >>> Andrew. >> From aph at redhat.com Fri Apr 10 16:49:41 2015 From: aph at redhat.com (Andrew Haley) Date: Fri, 10 Apr 2015 17:49:41 +0100 Subject: A strange bit of code in MacroAssembler::multiply_128_x_128_loop In-Reply-To: <5527FE91.1040606@oracle.com> References: <5527E6E6.5000707@redhat.com> <5527E865.4060709@oracle.com> <5527FAB5.6090802@oracle.com> <5527FE91.1040606@oracle.com> Message-ID: <5527FF25.5030508@redhat.com> On 04/10/2015 05:47 PM, Vladimir Kozlov wrote: > On 4/10/15 9:30 AM, Vladimir Kozlov wrote: >> It restores counter after: >> >> subl(idx, 2); >> jcc(Assembler::negative, L_check_1); >> >> The value could be 1,2,3 before this point (idx & 3 before). >> After 'sub 2': -1, 0, 1. >> >> So we have to restore to positive values before subtracting 1. > > And you are right, it is not needed. We get the same (correct) result from 'and idx,1' regardless executing 'add'. OK, cool. It's not important then: I was just wondering if I'd found a latent bug. Andrew. From tobias.hartmann at oracle.com Mon Apr 13 04:51:06 2015 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 13 Apr 2015 06:51:06 +0200 Subject: [9] RFR(XS): 8076625: IndexOutOfBoundsException in HeapByteBufferTest.java In-Reply-To: <5527E9A2.7050709@oracle.com> References: <55266C2E.3050207@oracle.com> <5526A753.8040606@oracle.com> <5527A925.5060103@oracle.com> <5527E9A2.7050709@oracle.com> Message-ID: <552B4B3A.4010609@oracle.com> Thanks, Vladimir. Best, Tobias On 10.04.2015 17:17, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/10/15 3:42 AM, Tobias Hartmann wrote: >> Hi Vladimir, >> >> On 09.04.2015 18:22, Vladimir Kozlov wrote: >>> I also asked to use Utils::getRandomInstance() to get reproducible results. >> >> Sorry, I missed that. Here is the new webrev: >> >> http://cr.openjdk.java.net/~thartmann/8076625/webrev.01/ >> >> I also noticed that the sizes of short and char reads passed to 'randomOffset' are too large (4 instead of 2). Fixed it. >> >> Best, >> Tobias >> >> >>> Thanks, >>> Vladimir >>> >>> >>> On 4/9/15 5:10 AM, Tobias Hartmann wrote: >>>> Hi, >>>> >>>> please review the following patch. >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8076625 >>>> http://cr.openjdk.java.net/~thartmann/8076625/webrev.00/ >>>> >>>> Problem: >>>> A random offset to access in a byte array is computed by >>>> >>>> int randomOffset(SplittableRandom r, MyByteBuffer buf, int size) { >>>> return abs(r.nextInt()) % (buf.capacity() - size); >>>> } >>>> >>>> The call to r.nextInt() may return Integer.MIN_VALUE (-2147483648) and the corresponding absolute value (+2147483648) does not fit into an int and will overflow back to -2147483648. As a result the returned offset is negative. >>>> >>>> Solution: >>>> Use nextInt(int n) to set the limit of the random value. >>>> >>>> Testing: >>>> Failing testcase and JPRT. >>>> >>>> Thanks, >>>> Tobias >>>> From igor.ignatyev at oracle.com Mon Apr 13 09:22:13 2015 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 13 Apr 2015 12:22:13 +0300 Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor from hotspot/test/compiler/* tests In-Reply-To: <5527D7C1.9050704@oracle.com> References: <5527D7C1.9050704@oracle.com> Message-ID: <552B8AC5.6040404@oracle.com> Evgeniya, looks good to me. Igor On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote: > Hi, > > Could you please review back-port of 8038098 to the 8udev repo? > Diff applies cleanly to the all tests except of the > test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which > does not exist in 8u60 repo. > After fix tests pass with 8u60 b09 with the client vm. > > webrev for 8u60: > http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/ > bug: https://bugs.openjdk.java.net/browse/JDK-8038098 > > Original webrev: > http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/ > mail thread for 9: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html > Original change: > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32 > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32 > > Thanks, > Jane > -- > /Evgeniya Stepanova/ From evgeniya.stepanova at oracle.com Mon Apr 13 09:28:28 2015 From: evgeniya.stepanova at oracle.com (Evgeniya Stepanova) Date: Mon, 13 Apr 2015 12:28:28 +0300 Subject: [8u60] RFR(s): 8038098: [TESTBUG] remove explicit set build flavor from hotspot/test/compiler/* tests In-Reply-To: <552B8AC5.6040404@oracle.com> References: <5527D7C1.9050704@oracle.com> <552B8AC5.6040404@oracle.com> Message-ID: <552B8C3C.2080403@oracle.com> Hi Igor, Thank you for the review! Jane On 13.04.2015 12:22, Igor Ignatyev wrote: > Evgeniya, > > looks good to me. > > Igor > > On 04/10/2015 05:01 PM, Evgeniya Stepanova wrote: >> Hi, >> >> Could you please review back-port of 8038098 to the 8udev repo? >> Diff applies cleanly to the all tests except of the >> test/compiler/IntegerArithmetic/TestIntegerComparison.java test, which >> does not exist in 8u60 repo. >> After fix tests pass with 8u60 b09 with the client vm. >> >> webrev for 8u60: >> http://cr.openjdk.java.net/~eistepan/8038098/8u60/webrev.00/ >> bug: https://bugs.openjdk.java.net/browse/JDK-8038098 >> >> Original webrev: >> http://cr.openjdk.java.net/~iignatyev/eistepan/8038098/webrev.02/ >> mail thread for 9: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-September/015540.html >> >> Original change: >> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/662499384b32 >> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/662499384b32 >> >> Thanks, >> Jane >> -- >> /Evgeniya Stepanova/ -- /Evgeniya Stepanova/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.haupt at oracle.com Mon Apr 13 11:40:08 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Mon, 13 Apr 2015 13:40:08 +0200 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> Message-ID: <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com> Hi John, thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this. Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/ Best, Michael > Am 07.04.2015 um 23:49 schrieb John Rose : > > On Apr 7, 2015, at 12:11 PM, Michael Haupt wrote: >> >> Dear all, >> >> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. > > The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about. Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java). If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE. It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync. > > If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for. But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure. > > I'm happy to see the "GC" guys go away. They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams. > > ? John > >> >> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 >> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ >> >> Tested with JPRT, HotSpot testset. >> >> Thanks, >> >> Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | HotSpot Compiler Team Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Mon Apr 13 11:51:43 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Mon, 13 Apr 2015 13:51:43 +0200 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal Message-ID: <552BADCF.80109@oracle.com> Hi, please review the following patch. Bug: https://bugs.openjdk.java.net/browse/JDK-8067648 Problem: On architectures with hardware support for AES operations, the Java version (the version in the JDK sources) of the com.sun.crypto.provides.AESCrypt::encryptBlock(byte[], int, byte[], int) method is replaced with an intrinsic that uses the CPU's AES instructions. The Java version of encryptBlock operates on arrays of size AES_BLOCK_SIZE=16 and it consequently performs a number of "implicit" checks (e.g., null checks and range checks) as required by the Java VM specification. The intrinsified version of encryptBlock, however, does not perform any of these checks. Omitting checks results in a VM crash if invalid parameters (e.g., a null pointer, as reported in the current case) are passed to the method. Solution: The failure reported in the current issue appears in the com.sun.crypto.provider.GCTR class that calls the intrinsified version of encryptBlock. None of the methods of the class are accessible from packages other than com.sun.crypto.provider. So, after private a discussion with John Rose, Vladimir Kozlov, and Roland Westrelin, I propose to solve this problem on the Java-level. The GCTR::counter field is supposed to be initialized with an array of size AES_BLOCK_SIZE so that it is safe to call encryptBlock. The 'counter' field is never supposed to become NULL during the lifetime of a GCTR object (so that encryptBlock can be always called safely). The GCTR class supports saving and restoring the value of the 'counter' field (via the save() and restore() methods). For saving/restoring, the class uses the 'counterSave' field as temporary storage. It is also possible to reset the a GCTR object to its initial state by calling reset(). Reset sets both the 'counter' and 'counterSave' fields to their initial values. If a call to the method reset() is followed by a call to restore(), the field 'counter' is not restored to its original value, but it becomes NULL. This is an invalid state, because a GCTR object should always contain a valid 'counter' array. This problem has been also described (in part) by Chris Ellis. https://intrbiz.com/post/blog/development/java_8_aes_gcm_nullpointerexception This patch proposes to restore the contents of 'counter' from 'counterSave' only if some data has been saved into 'counterSave' before (i.e., counterSave is not NULL). The patch also adds a check to the constructor of GCTR to verify if the length of 'counter' is AES_BLOCK_SIZE. (I checked and JDK code uses this class only with arrays of size AES_BLOCK_SIZE, but it is good if the required size is documented and enforced by GCTR.) The array to store the output of the encryptBlock method (the third parameter) should be also of length AES_BLOCK_SIZE. That is ensured by the GCTR class (both in the doFinal and update methods). The input and output offsets (the second and fourth parameters) are 0, as required by encryptBlock. Webrev: http://cr.openjdk.java.net/~zmajo/8067648/webrev.00/ Testing: - JPRT (both with 9 and 8u), all tests in the testsets hotspot pass; - JTREG tests in jdk_security[1-4] executed locally with the sources built with --enable-openjdk-only; all tests that pass without the patch pass with the patch as well; - failure reported in 8067648 can be reproduced with 8u, failure is not triggered with patch applied. Thank you and best regards, Zoltan From roland.westrelin at oracle.com Mon Apr 13 14:39:41 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 13 Apr 2015 16:39:41 +0200 Subject: RFR(S): 8069191: moving predicate out of loops may cause array accesses to bypass null check In-Reply-To: References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com> <55086F20.9020305@oracle.com> <2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com> <550893F0.9050608@oracle.com> Message-ID: <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com> Vladimir, Do you think this webrev looks ok? Roland. > On Mar 24, 2015, at 1:55 PM, Roland Westrelin wrote: > > See inlined. > >>> Thanks for looking at this. >>> >>>> This is what I waited for long time! Thank you for doing this, Roland. >>>> >>>> How you handle case when CastPP is input of Phi node? >>> >>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right? >> >> Yes. So you simple remove CastPP in such case. Okay. >> >>> >>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems. >>> >>> The >>> >>> if (m->is_block_proj()) { >>> >>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something? >> >> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there. >> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes. >> If control nodes come only from CastPP then I am fine with your code. > > I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that: > > - CastPP nodes don?t always have a control > - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If > - the test for some CastPP?s nodes are removed during escape analysis > > I updated the code to reflect those cases. > > http://cr.openjdk.java.net/~roland/8069191/webrev.01/ > > Roland. > >> >> Thanks, >> Vladimir >> >>> >>> Roland. >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 3/17/15 3:54 AM, Roland Westrelin wrote: >>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/ >>>>> >>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8069191 >>>>> >>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8039999 >>>>> >>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges. >>>>> >>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc). >>>>> >>>>> Roland. From vladimir.kozlov at oracle.com Mon Apr 13 15:26:28 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Apr 2015 08:26:28 -0700 Subject: RFR(S): 8069191: moving predicate out of loops may cause array accesses to bypass null check In-Reply-To: <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com> References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com> <55086F20.9020305@oracle.com> <2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com> <550893F0.9050608@oracle.com> <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com> Message-ID: <552BE024.7080509@oracle.com> Yes, changes look good. Thanks, Vladimir On 4/13/15 7:39 AM, Roland Westrelin wrote: > Vladimir, > > Do you think this webrev looks ok? > > Roland. > > >> On Mar 24, 2015, at 1:55 PM, Roland Westrelin wrote: >> >> See inlined. >> >>>> Thanks for looking at this. >>>> >>>>> This is what I waited for long time! Thank you for doing this, Roland. >>>>> >>>>> How you handle case when CastPP is input of Phi node? >>>> >>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right? >>> >>> Yes. So you simple remove CastPP in such case. Okay. >>> >>>> >>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems. >>>> >>>> The >>>> >>>> if (m->is_block_proj()) { >>>> >>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something? >>> >>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there. >>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes. >>> If control nodes come only from CastPP then I am fine with your code. >> >> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that: >> >> - CastPP nodes don?t always have a control >> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If >> - the test for some CastPP?s nodes are removed during escape analysis >> >> I updated the code to reflect those cases. >> >> http://cr.openjdk.java.net/~roland/8069191/webrev.01/ >> >> Roland. >> >>> >>> Thanks, >>> Vladimir >>> >>>> >>>> Roland. >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote: >>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/ >>>>>> >>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191 >>>>>> >>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999 >>>>>> >>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges. >>>>>> >>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc). >>>>>> >>>>>> Roland. > From roland.westrelin at oracle.com Mon Apr 13 15:27:57 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Mon, 13 Apr 2015 17:27:57 +0200 Subject: RFR(S): 8069191: moving predicate out of loops may cause array accesses to bypass null check In-Reply-To: <552BE024.7080509@oracle.com> References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com> <55086F20.9020305@oracle.com> <2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com> <550893F0.9050608@oracle.com> <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com> <552BE024.7080509@oracle.com> Message-ID: > Yes, changes look good. Thanks for the review. Do I need another review for this? Roland. > > Thanks, > Vladimir > > On 4/13/15 7:39 AM, Roland Westrelin wrote: >> Vladimir, >> >> Do you think this webrev looks ok? >> >> Roland. >> >> >>> On Mar 24, 2015, at 1:55 PM, Roland Westrelin wrote: >>> >>> See inlined. >>> >>>>> Thanks for looking at this. >>>>> >>>>>> This is what I waited for long time! Thank you for doing this, Roland. >>>>>> >>>>>> How you handle case when CastPP is input of Phi node? >>>>> >>>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right? >>>> >>>> Yes. So you simple remove CastPP in such case. Okay. >>>> >>>>> >>>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems. >>>>> >>>>> The >>>>> >>>>> if (m->is_block_proj()) { >>>>> >>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something? >>>> >>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there. >>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes. >>>> If control nodes come only from CastPP then I am fine with your code. >>> >>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that: >>> >>> - CastPP nodes don?t always have a control >>> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If >>> - the test for some CastPP?s nodes are removed during escape analysis >>> >>> I updated the code to reflect those cases. >>> >>> http://cr.openjdk.java.net/~roland/8069191/webrev.01/ >>> >>> Roland. >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> >>>>> Roland. >>>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote: >>>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/ >>>>>>> >>>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191 >>>>>>> >>>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999 >>>>>>> >>>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges. >>>>>>> >>>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc). >>>>>>> >>>>>>> Roland. >> From vladimir.kozlov at oracle.com Mon Apr 13 15:30:15 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Apr 2015 08:30:15 -0700 Subject: RFR(S): 8069191: moving predicate out of loops may cause array accesses to bypass null check In-Reply-To: References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com> <55086F20.9020305@oracle.com> <2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com> <550893F0.9050608@oracle.com> <6572CBFF-8B3F-45CD-A016-FF85D324AC04@oracle.com> <552BE024.7080509@oracle.com> Message-ID: <552BE107.5050408@oracle.com> On 4/13/15 8:27 AM, Roland Westrelin wrote: >> Yes, changes look good. > > Thanks for the review. Do I need another review for this? Yes, please. Ask someone directly. Vladimir > > Roland. > >> >> Thanks, >> Vladimir >> >> On 4/13/15 7:39 AM, Roland Westrelin wrote: >>> Vladimir, >>> >>> Do you think this webrev looks ok? >>> >>> Roland. >>> >>> >>>> On Mar 24, 2015, at 1:55 PM, Roland Westrelin wrote: >>>> >>>> See inlined. >>>> >>>>>> Thanks for looking at this. >>>>>> >>>>>>> This is what I waited for long time! Thank you for doing this, Roland. >>>>>>> >>>>>>> How you handle case when CastPP is input of Phi node? >>>>>> >>>>>> The Phi has a control so any memory node that depends on the Phi output is guaranteed to be ?after? the CastPP, right? >>>>> >>>>> Yes. So you simple remove CastPP in such case. Okay. >>>>> >>>>>> >>>>>>> I am worried how you separate cases when precedence edge added from CastPP and other precedence cases. Can you explain? May be there are no problems. >>>>>> >>>>>> The >>>>>> >>>>>> if (m->is_block_proj()) { >>>>>> >>>>>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something? >>>>> >>>>> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there. >>>>> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes. >>>>> If control nodes come only from CastPP then I am fine with your code. >>>> >>>> I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that: >>>> >>>> - CastPP nodes don?t always have a control >>>> - some CastPP nodes depend on a Region because the test was moved to the branch of a dominating If >>>> - the test for some CastPP?s nodes are removed during escape analysis >>>> >>>> I updated the code to reflect those cases. >>>> >>>> http://cr.openjdk.java.net/~roland/8069191/webrev.01/ >>>> >>>> Roland. >>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>> >>>>>> Roland. >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 3/17/15 3:54 AM, Roland Westrelin wrote: >>>>>>>> http://cr.openjdk.java.net/~roland/8069191/webrev.00/ >>>>>>>> >>>>>>>> In the test (that needs to be run with StressGCM to cause incorrect code generation), a dependency carried by a CastPP is lost when CastPPs are removed after CCP. Detailed description of the bug is in: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8069191 >>>>>>>> >>>>>>>> Vladimir suggested investigating the performance impact of keeping the CastPPs for the entire compilation. I found that this still causes performance regressions as documented in: >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8039999 >>>>>>>> >>>>>>>> The fix I suggest is to keep CastPPs during optimizations and remove then during final graph reshaping. To not loose the dependencies they carry, precedence edges are added to memory operations that depend on them. During GCM, the control of the memory operations to take the current control and the precedence edges. >>>>>>>> >>>>>>>> Experiments show that this scheme doesn?t cause performance regressions (I ran promotion testing on x64 and sparc). >>>>>>>> >>>>>>>> Roland. >>> > From john.r.rose at oracle.com Mon Apr 13 19:38:55 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 Apr 2015 12:38:55 -0700 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com> References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com> Message-ID: <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com> That's much better; thanks. Glad to hear the verifyC's still works. The MN_* constants are a private interface between C++ and Java code. Those are the most important to verify. You can get rid of these lines; we don't look at vtable indexes any more: // The JVM uses values of -2 and above for vtable indexes. // Field values are simple positive offsets. // Ref: src/share/vm/oops/methodOop.hpp // This value is negative enough to avoid such numbers, // but not too negative. The other constants are publicly defined in various standards docs (except T_ILLEGAL). I don't think these constants are used any more, except the MN_* and REF_* ones. (The REF_* ones are in the JVM standard, so are in some sense pre-verified.) I suggest also removing the ACC_*, T_*, and CONSTANT_* names, if you can. We probably stopped using any of those when we started using ASM. Thanks! ? John On Apr 13, 2015, at 4:40 AM, Michael Haupt wrote: > > Hi John, > > thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this. > > Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/ > > Best, > > Michael > >> Am 07.04.2015 um 23:49 schrieb John Rose : >> >> On Apr 7, 2015, at 12:11 PM, Michael Haupt wrote: >>> >>> Dear all, >>> >>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. >> >> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about. Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java). If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE. It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync. >> >> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for. But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure. >> >> I'm happy to see the "GC" guys go away. They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams. >> >> ? John >> >>> >>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 >>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ >>> >>> Tested with JPRT, HotSpot testset. >>> >>> Thanks, >>> >>> Michael > > > -- > > > Dr. Michael Haupt | Principal Member of Technical Staff > Phone: +49 331 200 7277 | Fax: +49 331 200 7561 > Oracle Java Platform Group | HotSpot Compiler Team > Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany > Oracle is committed to developing practices and products that help protect the environment > From john.r.rose at oracle.com Mon Apr 13 19:50:33 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 Apr 2015 12:50:33 -0700 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552BADCF.80109@oracle.com> References: <552BADCF.80109@oracle.com> Message-ID: <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? wrote: > > please review the following patch. Good. This line has a typo ("encrypBlock" = gang member induction party foul?): + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Mon Apr 13 20:55:06 2015 From: zoltan.majo at oracle.com (=?utf-8?Q?Zolt=C3=A1n_Maj=C3=B3?=) Date: Mon, 13 Apr 2015 22:55:06 +0200 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> Message-ID: <28A5E5B5-4B00-45B8-96D9-1D28B0522319@oracle.com> Hi Tony, > On 13 Apr 2015, at 22:09, Anthony Scarpino wrote: > > Hi, > > Could you forward the whole message, with the patch, to the security list. I have only received John's response, but not the webrev. please find the original RFR below. I?ve sent it to security-dev at openjdk.java.net at the same time as I did send it to hotspot-compiler-dev at openjdk.java.net. But as security-dev seems to be moderated for non-members, the original message is most likely awaiting moderator approval. Thank you and best regards, Zoltan ================ Hi, please review the following patch. Bug: https://bugs.openjdk.java.net/browse/JDK-8067648 Problem: On architectures with hardware support for AES operations, the Java version (the version in the JDK sources) of the com.sun.crypto.provides.AESCrypt::encryptBlock(byte[], int, byte[], int) method is replaced with an intrinsic that uses the CPU's AES instructions. The Java version of encryptBlock operates on arrays of size AES_BLOCK_SIZE=16 and it consequently performs a number of "implicit" checks (e.g., null checks and range checks) as required by the Java VM specification. The intrinsified version of encryptBlock, however, does not perform any of these checks. Omitting checks results in a VM crash if invalid parameters (e.g., a null pointer, as reported in the current case) are passed to the method. Solution: The failure reported in the current issue appears in the com.sun.crypto.provider.GCTR class that calls the intrinsified version of encryptBlock. None of the methods of the class are accessible from packages other than com.sun.crypto.provider. So, after private a discussion with John Rose, Vladimir Kozlov, and Roland Westrelin, I propose to solve this problem on the Java-level. The GCTR::counter field is supposed to be initialized with an array of size AES_BLOCK_SIZE so that it is safe to call encryptBlock. The 'counter' field is never supposed to become NULL during the lifetime of a GCTR object (so that encryptBlock can be always called safely). The GCTR class supports saving and restoring the value of the 'counter' field (via the save() and restore() methods). For saving/restoring, the class uses the 'counterSave' field as temporary storage. It is also possible to reset the a GCTR object to its initial state by calling reset(). Reset sets both the 'counter' and 'counterSave' fields to their initial values. If a call to the method reset() is followed by a call to restore(), the field 'counter' is not restored to its original value, but it becomes NULL. This is an invalid state, because a GCTR object should always contain a valid 'counter' array. This problem has been also described (in part) by Chris Ellis. https://intrbiz.com/post/blog/development/java_8_aes_gcm_nullpointerexception This patch proposes to restore the contents of 'counter' from 'counterSave' only if some data has been saved into 'counterSave' before (i.e., counterSave is not NULL). The patch also adds a check to the constructor of GCTR to verify if the length of 'counter' is AES_BLOCK_SIZE. (I checked and JDK code uses this class only with arrays of size AES_BLOCK_SIZE, but it is good if the required size is documented and enforced by GCTR.) The array to store the output of the encryptBlock method (the third parameter) should be also of length AES_BLOCK_SIZE. That is ensured by the GCTR class (both in the doFinal and update methods). The input and output offsets (the second and fourth parameters) are 0, as required by encryptBlock. Webrev: http://cr.openjdk.java.net/~zmajo/8067648/webrev.00/ Testing: - JPRT (both with 9 and 8u), all tests in the testsets hotspot pass; - JTREG tests in jdk_security[1-4] executed locally with the sources built with --enable-openjdk-only; all tests that pass without the patch pass with the patch as well; - failure reported in 8067648 can be reproduced with 8u, failure is not triggered with patch applied. Thank you and best regards, Zoltan > Thanks > > Tony > > > > On Apr 13, 2015, at 12:50 PM, John Rose wrote: > >> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? wrote: >>> >>> please review the following patch. >> >> Good. This line has a typo ("encrypBlock" = gang member induction party foul?): >> + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM >> >> ? John From jan.civlin at intel.com Mon Apr 13 22:06:44 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Mon, 13 Apr 2015 22:06:44 +0000 Subject: RFR(S): 8076284: Improve vectorization of parallel streams In-Reply-To: <39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com> <39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com> <39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com> Message-ID: <39F83597C33E5F408096702907E6C450E3E839@ORSMSX104.amr.corp.intel.com> Hi All, We would like to contribute the improvement of vectorization of parallel streams from Intel. The contribution Bug ID: 8076284. Please review this patch: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284 webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/ Description Improve vectorization of the unordered parallel streams (by vectorizing forEachRemaining method). For example, this forEach will be vectorized: java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0, RANGE - 1).parallel(); iStream.forEach( id -> c[id] = c[id] + c[id+1] ); It also enables on-demand loop vectorization in a given method (by providing more hints to SuperWord optimization). For example, use -XX:CompileCommand=option,computeCall,Vectorize to vectorize this loop void computeCall(double [] Call, double puByDf, double pdByDf) { for(int i = timeStep; i > 0; i--) for(int j = 0; j <= i - 1; j++) Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j]; } This enhancement is contributed by Intel and sponsored by the hotspot compiler team. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.r.rose at oracle.com Tue Apr 14 03:31:26 2015 From: john.r.rose at oracle.com (John Rose) Date: Mon, 13 Apr 2015 20:31:26 -0700 Subject: RFR(S): 8069191: moving predicate out of loops may cause array accesses to bypass null check In-Reply-To: References: <100419DB-199E-489C-B3EA-F104BF0EB203@oracle.com> <55086F20.9020305@oracle.com> <2ACAAB95-8175-48DB-8BD9-F5BF168A6666@oracle.com> <550893F0.9050608@oracle.com> Message-ID: <88169234-01DE-470C-B56A-D96AD7C53D50@oracle.com> Reviewed. On Mar 24, 2015, at 5:55 AM, Roland Westrelin wrote: > >>> >>> test guarantees that the precedence edge is a control node. And I assume it?s always ok to remove the precedence edge and adjust the control when the precedence edge is a control node. Do you think that could break something? >> >> Only if control edge came from CastPP. I know it is additional work but can you run something (CTW? jvm98) and look what types of precedence edges GCM can see? Unfortunately I don't remember what we have there. >> There are a lot of places where we use add_prec(), mostly add pointers to memory nodes. >> If control nodes come only from CastPP then I am fine with your code. > > I added debugging code (that I didn?t keep in the webrev below) that added (memory operation, control from CastPP) pairs in a side table during final graph reshaping, updated the pairs during matching and checked that all nodes that gcm sees with a control precedence got it from a CastPP. I ran CTW and other tests with that code and all tests passed. During that testing, I noticed that: That's a good testing method. Precedence edges are a simple way to add miscellaneous node relations but it is easy to forget they are there. I guess the gcm.cpp code picks them up completely. And after the extra edges are added, not much happens that could "forget" (drop) an edge. (Note that copying a node to make a better one has a risk to "forget" precedence edges.) But, if this technique were to be used in any more expansive way, or if you have lingering doubts about using precedence edges here, I would recommend creating an explicit new node type that captures multiple control dependency edges. As we have a MergeMem node we could have a MergeControl node, whose input edges (after in(0)) would act like the precedence edges you are adding now. Two minor comments on code style in compile.cpp: The new 'switch' is hard to untangle. Wouldn't it be simpler to put the 'wq.push(use)' call before the 'break', and drop the 'default' case completely? Also, I really dislike it when block structure ({...}) cuts across #ifdef structure. This hack would be slightly better: #ifdef _LP64 if (n->in(1)->is_DecodeNarrowPtr() || n->in(2)->is_DecodeNarrowPtr()) ... } else #endif //_LP64 { ... } Better yet, you could also just delete the #ifdef LP64 and let the tests go forward. Or incorporate a manifest constant: const bool is_LP64 = LP64_ONLY(true) NOT_LP64(false); if (is_LP64 && (...)) { ... } else { ... } The code in gcm.cpp treats precedence edges asymmetrically. (The expression is 'n = is_dominator(bn, bm) ? m : n'.) Do we want to assert that one of them dominates the other, perhaps using 'assert_dom'? It's great to see all that mysterious old code go away. ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From zoltan.majo at oracle.com Tue Apr 14 07:44:30 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 14 Apr 2015 09:44:30 +0200 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> Message-ID: <552CC55E.4010702@oracle.com> Hi Johh, thank you for the review! On 04/13/2015 09:50 PM, John Rose wrote: > On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? > wrote: >> >> please review the following patch. > > Good. This line has a typo ("encrypBlock" = gang member induction > party foul?): > + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM Thanks for catching that. Here is the new webrev: http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ Best regards, Zoltan > > ? John From wolfgang.pedot at finkzeit.at Tue Apr 14 10:00:56 2015 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Tue, 14 Apr 2015 12:00:56 +0200 Subject: Java 8 TieredCompilation Blacklist? Message-ID: <552CE558.5040602@finkzeit.at> Hello, I have recently migrated a big-ish application from 7u40 to 8u40 and I noticed a quite substantial increase in CPU utilisation. After doing some research I figured out that the cause of that is TieredCompilation which is now on by default, I have deactivated that feature and now CPU utilisation is back to normal. I tested TieredCompilation before on 7u and also had an increase in CPU up to the point where the application actually slowed down so I ended that test. A part of the application uses BIRT and that tends to generate a lot of short-lived classes to optimize Javascript-code, my guess is that the tiered compiler compiles those classes in an attempt to optimize them and depending on the usage of the system that increases CPU without really accelerating anything (according to statistics). I have found "CompileOnly" which seems to be something to be used for test and development, is there something like a Blacklist I can use to tell the compiler NOT to compile classes in a specific package? The system had been running for ~13h on 8u40 and used 1.5h of CPU-time for compilation, the previous version running on 7u40 had been up for ~62.5days and only used 36min for compilation. I did notice the much quicker warmup in the response-times after the switch to 8u40 but I dont want the system to spend so much time compiling stuff that does not really improve performance. any help would be appreciated Wolfgang From michael.haupt at oracle.com Tue Apr 14 11:33:42 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Tue, 14 Apr 2015 13:33:42 +0200 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com> References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com> <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com> Message-ID: <46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com> Hi John, thanks again; I've applied your suggestions, re-tested as before and uploaded the revision to http://cr.openjdk.java.net/~mhaupt/8076461/webrev.02/. Best, Michael > Am 13.04.2015 um 21:38 schrieb John Rose : > > That's much better; thanks. Glad to hear the verifyC's still works. > > The MN_* constants are a private interface between C++ and Java code. Those are the most important to verify. > > You can get rid of these lines; we don't look at vtable indexes any more: > // The JVM uses values of -2 and above for vtable indexes. > // Field values are simple positive offsets. > // Ref: src/share/vm/oops/methodOop.hpp > // This value is negative enough to avoid such numbers, > // but not too negative. > > The other constants are publicly defined in various standards docs (except T_ILLEGAL). > > I don't think these constants are used any more, except the MN_* and REF_* ones. (The REF_* ones are in the JVM standard, so are in some sense pre-verified.) > > I suggest also removing the ACC_*, T_*, and CONSTANT_* names, if you can. We probably stopped using any of those when we started using ASM. > > Thanks! > > ? John > > On Apr 13, 2015, at 4:40 AM, Michael Haupt wrote: >> >> Hi John, >> >> thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this. >> >> Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/ >> >> Best, >> >> Michael >> >>> Am 07.04.2015 um 23:49 schrieb John Rose : >>> >>> On Apr 7, 2015, at 12:11 PM, Michael Haupt wrote: >>>> >>>> Dear all, >>>> >>>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. >>> >>> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about. Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java). If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE. It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync. >>> >>> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for. But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure. >>> >>> I'm happy to see the "GC" guys go away. They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams. >>> >>> ? John >>> >>>> >>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 >>>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ >>>> >>>> Tested with JPRT, HotSpot testset. >>>> >>>> Thanks, >>>> >>>> Michael -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | HotSpot Compiler Team Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Tue Apr 14 11:47:51 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 14 Apr 2015 14:47:51 +0300 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: <46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com> References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com> <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com> <46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com> Message-ID: <552CFE67.1070208@oracle.com> Looks good. I'll push it for you. Best regards, Vladimir Ivanov On 4/14/15 2:33 PM, Michael Haupt wrote: > Hi John, > > thanks again; I've applied your suggestions, re-tested as before and uploaded the revision to http://cr.openjdk.java.net/~mhaupt/8076461/webrev.02/. > > Best, > > Michael > >> Am 13.04.2015 um 21:38 schrieb John Rose : >> >> That's much better; thanks. Glad to hear the verifyC's still works. >> >> The MN_* constants are a private interface between C++ and Java code. Those are the most important to verify. >> >> You can get rid of these lines; we don't look at vtable indexes any more: >> // The JVM uses values of -2 and above for vtable indexes. >> // Field values are simple positive offsets. >> // Ref: src/share/vm/oops/methodOop.hpp >> // This value is negative enough to avoid such numbers, >> // but not too negative. >> >> The other constants are publicly defined in various standards docs (except T_ILLEGAL). >> >> I don't think these constants are used any more, except the MN_* and REF_* ones. (The REF_* ones are in the JVM standard, so are in some sense pre-verified.) >> >> I suggest also removing the ACC_*, T_*, and CONSTANT_* names, if you can. We probably stopped using any of those when we started using ASM. >> >> Thanks! >> >> ? John >> >> On Apr 13, 2015, at 4:40 AM, Michael Haupt wrote: >>> >>> Hi John, >>> >>> thank you very much for your review; keeping the Constants class around for VM/JDK constant value agreement certainly makes sense. I have undone most of the removal work and verified in a slowdebug build that MHN.verifyConstants() works. I've also added a comment on the Constants class to clarify its role a bit. Local tests and JPRT are still happy with this. >>> >>> Updated webrev: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.01/ >>> >>> Best, >>> >>> Michael >>> >>>> Am 07.04.2015 um 23:49 schrieb John Rose : >>>> >>>> On Apr 7, 2015, at 12:11 PM, Michael Haupt wrote: >>>>> >>>>> Dear all, >>>>> >>>>> please review and sponsor this change. Cross-posted to hs-comp and core-lib as this is at the JVM/libraries boundary. This is a straightforward refactoring change that removes many constants and unused API from MHNatives, and places some constants used only in MemberName in that class. >>>> >>>> The class MethodHandleNatives.Constants exists to enumerate and cross-check any constants which the JVM and JDK code need to agree about. Removing a constant from MethodHandleNatives.Constants (moving to MemberName) may cause failures when MHN.verifyConstants is run (via "java -esa" on a debug build of Java). If there are no failures, I wonder what would happen if the JVM and JDK got out of sync. in their notion of the value of a constant like MN_CALLER_SENSITIVE. It's important that some part of our release testing detect if MN_CALLER_SENSITIVE (etc.) gets out of sync. >>>> >>>> If there is some reason why this testing is no longer needed, I'd like to see the whole Constants class go away, since that's all it's really good for. But I don't see that reason yet, and moving the constants somewhere either will cause a test failure, or *should* cause a test failure. >>>> >>>> I'm happy to see the "GC" guys go away. They were artifacts of a quickly moving 292 implementation that spanned two repositories with unsynchronized change streams. >>>> >>>> ? John >>>> >>>>> >>>>> RFE: https://bugs.openjdk.java.net/browse/JDK-8076461 >>>>> Changes: http://cr.openjdk.java.net/~mhaupt/8076461/webrev.00/ >>>>> >>>>> Tested with JPRT, HotSpot testset. >>>>> >>>>> Thanks, >>>>> >>>>> Michael > > From michael.haupt at oracle.com Tue Apr 14 11:53:57 2015 From: michael.haupt at oracle.com (Michael Haupt) Date: Tue, 14 Apr 2015 13:53:57 +0200 Subject: RFR (S): 8076461: JSR292: remove unused native and constants In-Reply-To: <552CFE67.1070208@oracle.com> References: <4EB3C4DA-C382-4795-A676-6147E863DFF1@oracle.com> <3083F107-6D99-4C4F-948C-9326C0E843CE@oracle.com> <0C16CFAC-EFD5-41E8-840E-3421FA96F3E8@oracle.com> <46742670-8A71-4026-8ED5-25BE82DD2698@oracle.com> <552CFE67.1070208@oracle.com> Message-ID: <5F0E415F-6453-48E8-923F-25B835EDC08E@oracle.com> ... thank you, Vladimir! Best, Michael > Am 14.04.2015 um 13:47 schrieb Vladimir Ivanov : > > Looks good. > > I'll push it for you. > > Best regards, > Vladimir Ivanov -- Dr. Michael Haupt | Principal Member of Technical Staff Phone: +49 331 200 7277 | Fax: +49 331 200 7561 Oracle Java Platform Group | HotSpot Compiler Team Oracle Deutschland B.V. & Co. KG, Schiffbauergasse 14 | 14467 Potsdam, Germany Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Tue Apr 14 15:59:06 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Apr 2015 08:59:06 -0700 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552CC55E.4010702@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> <552CC55E.4010702@oracle.com> Message-ID: <552D394A.10309@oracle.com> Sorry for later notice. Can you also list initialCounterBlk.length value in exception message? Thanks, Vladimir On 4/14/15 12:44 AM, Zolt?n Maj? wrote: > Hi Johh, > > > thank you for the review! > > On 04/13/2015 09:50 PM, John Rose wrote: >> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? > wrote: >>> >>> please review the following patch. >> >> Good. This line has a typo ("encrypBlock" = gang member induction party foul?): >> + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM > > Thanks for catching that. Here is the new webrev: > > http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ > > Best regards, > > > Zoltan > >> >> ? John > From zoltan.majo at oracle.com Tue Apr 14 17:54:08 2015 From: zoltan.majo at oracle.com (=?UTF-8?B?Wm9sdMOhbiBNYWrDsw==?=) Date: Tue, 14 Apr 2015 19:54:08 +0200 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552D394A.10309@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> <552CC55E.4010702@oracle.com> <552D394A.10309@oracle.com> Message-ID: <552D5440.4090005@oracle.com> Hi Vladimir, On 04/14/2015 05:59 PM, Vladimir Kozlov wrote: > Sorry for later notice. Can you also list initialCounterBlk.length > value in exception message? thank you for the feedback! I extended the error message in the exception, here is the updated webrev: http://cr.openjdk.java.net/~zmajo/8067648/webrev.02/ Best regards, Zoltan > > Thanks, > Vladimir > > On 4/14/15 12:44 AM, Zolt?n Maj? wrote: >> Hi Johh, >> >> >> thank you for the review! >> >> On 04/13/2015 09:50 PM, John Rose wrote: >>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? >> > wrote: >>>> >>>> please review the following patch. >>> >>> Good. This line has a typo ("encrypBlock" = gang member induction >>> party foul?): >>> + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM >> >> Thanks for catching that. Here is the new webrev: >> >> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ >> >> Best regards, >> >> >> Zoltan >> >>> >>> ? John >> From vladimir.kozlov at oracle.com Tue Apr 14 17:59:03 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Apr 2015 10:59:03 -0700 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552D5440.4090005@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> <552CC55E.4010702@oracle.com> <552D394A.10309@oracle.com> <552D5440.4090005@oracle.com> Message-ID: <552D5567.0@oracle.com> Good. Thanks, Vladimir On 4/14/15 10:54 AM, Zolt?n Maj? wrote: > Hi Vladimir, > > > > On 04/14/2015 05:59 PM, Vladimir Kozlov wrote: >> Sorry for later notice. Can you also list initialCounterBlk.length value in exception message? > > thank you for the feedback! > > I extended the error message in the exception, here is the updated webrev: > > http://cr.openjdk.java.net/~zmajo/8067648/webrev.02/ > > Best regards, > > > Zoltan > >> >> Thanks, >> Vladimir >> >> On 4/14/15 12:44 AM, Zolt?n Maj? wrote: >>> Hi Johh, >>> >>> >>> thank you for the review! >>> >>> On 04/13/2015 09:50 PM, John Rose wrote: >>>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? > wrote: >>>>> >>>>> please review the following patch. >>>> >>>> Good. This line has a typo ("encrypBlock" = gang member induction party foul?): >>>> + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM >>> >>> Thanks for catching that. Here is the new webrev: >>> >>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ >>> >>> Best regards, >>> >>> >>> Zoltan >>> >>>> >>>> ? John >>> > From zoltan.majo at oracle.com Tue Apr 14 19:09:19 2015 From: zoltan.majo at oracle.com (Zoltan Majo) Date: Tue, 14 Apr 2015 20:09:19 +0100 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552D5567.0@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> <552CC55E.4010702@oracle.com> <552D394A.10309@oracle.com> <552D5440.4090005@oracle.com> <552D5567.0@oracle.com> Message-ID: <552D65DF.4020306@oracle.com> Thank you, John and Vladimir, for the review! Best regards, Zoltan On 14.04.2015 18:59, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 4/14/15 10:54 AM, Zolt?n Maj? wrote: >> Hi Vladimir, >> >> >> >> On 04/14/2015 05:59 PM, Vladimir Kozlov wrote: >>> Sorry for later notice. Can you also list initialCounterBlk.length >>> value in exception message? >> >> thank you for the feedback! >> >> I extended the error message in the exception, here is the updated >> webrev: >> >> http://cr.openjdk.java.net/~zmajo/8067648/webrev.02/ >> >> Best regards, >> >> >> Zoltan >> >>> >>> Thanks, >>> Vladimir >>> >>> On 4/14/15 12:44 AM, Zolt?n Maj? wrote: >>>> Hi Johh, >>>> >>>> >>>> thank you for the review! >>>> >>>> On 04/13/2015 09:50 PM, John Rose wrote: >>>>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? >>>> > wrote: >>>>>> >>>>>> please review the following patch. >>>>> >>>>> Good. This line has a typo ("encrypBlock" = gang member induction >>>>> party foul?): >>>>> + * AESCrypt.encrypBlock method can be intrinsified on the >>>>> HotSpot VM >>>> >>>> Thanks for catching that. Here is the new webrev: >>>> >>>> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ >>>> >>>> Best regards, >>>> >>>> >>>> Zoltan >>>> >>>>> >>>>> ? John >>>> >> From zoltan.majo at oracle.com Tue Apr 14 19:13:22 2015 From: zoltan.majo at oracle.com (Zoltan Majo) Date: Tue, 14 Apr 2015 20:13:22 +0100 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552D584E.50201@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> <552CC55E.4010702@oracle.com> <552D584E.50201@oracle.com> Message-ID: <552D66D2.2040805@oracle.com> Thank you, Tony, for the review! Best regards, Zoltan On 14.04.2015 19:11, Anthony Scarpino wrote: > The updated changes look good to me.. > > Tony > > On 04/14/2015 12:44 AM, Zolt?n Maj? wrote: >> Hi Johh, >> >> >> thank you for the review! >> >> On 04/13/2015 09:50 PM, John Rose wrote: >>> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? >> > wrote: >>>> >>>> please review the following patch. >>> >>> Good. This line has a typo ("encrypBlock" = gang member induction >>> party foul?): >>> + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM >> >> Thanks for catching that. Here is the new webrev: >> >> http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ >> >> Best regards, >> >> >> Zoltan >> >>> >>> ? John >> > From anthony.scarpino at oracle.com Mon Apr 13 20:09:10 2015 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Mon, 13 Apr 2015 13:09:10 -0700 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> Message-ID: Hi, Could you forward the whole message, with the patch, to the security list. I have only received John's response, but not the webrev. Thanks Tony > On Apr 13, 2015, at 12:50 PM, John Rose wrote: > >> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? wrote: >> >> please review the following patch. > > Good. This line has a typo ("encrypBlock" = gang member induction party foul?): > + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM > > ? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.scarpino at oracle.com Tue Apr 14 18:11:26 2015 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Tue, 14 Apr 2015 11:11:26 -0700 Subject: [9] RFR(S): 8067648: JVM crashes reproducable with GCM cipher suites in GCTR doFinal In-Reply-To: <552CC55E.4010702@oracle.com> References: <552BADCF.80109@oracle.com> <4E2B097B-D807-428A-B7FB-DFC63F1A7B63@oracle.com> <552CC55E.4010702@oracle.com> Message-ID: <552D584E.50201@oracle.com> The updated changes look good to me.. Tony On 04/14/2015 12:44 AM, Zolt?n Maj? wrote: > Hi Johh, > > > thank you for the review! > > On 04/13/2015 09:50 PM, John Rose wrote: >> On Apr 13, 2015, at 4:51 AM, Zolt?n Maj? > > wrote: >>> >>> please review the following patch. >> >> Good. This line has a typo ("encrypBlock" = gang member induction >> party foul?): >> + * AESCrypt.encrypBlock method can be intrinsified on the HotSpot VM > > Thanks for catching that. Here is the new webrev: > > http://cr.openjdk.java.net/~zmajo/8067648/webrev.01/ > > Best regards, > > > Zoltan > >> >> ? John > From jan.civlin at intel.com Mon Apr 13 10:33:09 2015 From: jan.civlin at intel.com (Civlin, Jan) Date: Mon, 13 Apr 2015 10:33:09 +0000 Subject: RFR(S): 8076284: Improve vectorization of parallel streams In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com> References: <39F83597C33E5F408096702907E6C450E3E586@ORSMSX104.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B63334516@FMSMSX112.amr.corp.intel.com> <39F83597C33E5F408096702907E6C450E3E5A4@ORSMSX104.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2B63334531@FMSMSX112.amr.corp.intel.com> Message-ID: <39F83597C33E5F408096702907E6C450E3E734@ORSMSX104.amr.corp.intel.com> Hi All, We would like to contribute the improvement of vectorization of parallel streams from Intel. The contribution Bug ID: 8076284. Please review this patch: Bug-id: https://bugs.openjdk.java.net/browse/JDK-8076284 webrev: http://cr.openjdk.java.net/~kvn/8076284/webrev/ Description Improve vectorization of the unordered parallel streams (by vectorizing forEachRemaining method). For example, this forEach will be vectorized: java.util.stream.IntStream iStream = java.util.stream.IntStream.range(0, RANGE - 1).parallel(); iStream.forEach( id -> c[id] = c[id] + c[id+1] ); It also enables on-demand loop vectorization in a given method (by providing more hints to SuperWord optimization). For example, use -XX:CompileCommand=option,computeCall,Vectorize to vectorize this loop void computeCall(double [] Call, double puByDf, double pdByDf) { for(int i = timeStep; i > 0; i--) for(int j = 0; j <= i - 1; j++) Call[j] = puByDf * Call[j + 1] + pdByDf * Call[j]; } This enhancement is contributed by Intel and sponsored by the hotspot compiler team. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Apr 15 09:17:24 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 15 Apr 2015 11:17:24 +0200 Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails with "assert(is_Initialize()) failed: invalid node class" Message-ID: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com> http://cr.openjdk.java.net/~roland/8074676/webrev.00/ The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped(). Roland. From roland.westrelin at oracle.com Wed Apr 15 09:48:49 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 15 Apr 2015 11:48:49 +0200 Subject: RFR(S): 8077832: SA's dumpreplaydata, dumpcfg and buildreplayjars are broken Message-ID: http://cr.openjdk.java.net/~roland/8077832/webrev.00/ I found 3 locations where the SA code is out of sync with the hotspot code. Roland. From roland.westrelin at oracle.com Wed Apr 15 10:16:59 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 15 Apr 2015 12:16:59 +0200 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <552527E1.5060102@oracle.com> References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com> Message-ID: Hi Vladimir, > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/ In ciCallSite::get_context(), is it safe to manipulate a raw oop the way you do it (with 2 different oops). Can?t it be moved concurrently by the GC? Roland. > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/ > > Best regards, > Vladimir Ivanov > > On 4/1/15 11:56 PM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ >> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ >> https://bugs.openjdk.java.net/browse/JDK-8057967 >> >> HotSpot JITs inline very aggressively through CallSites. The >> optimistically treat CallSite target as constant, but record a nmethod >> dependency to invalidate the compiled code once CallSite target changes. >> >> Right now, such dependencies have call site class as a context. This >> context is too coarse and it leads to context pollution: if some >> CallSite target changes, VM needs to enumerate all nmethods which >> depends on call sites of such type. >> >> As performance analysis in the bug report shows, it can sum to >> significant amount of work. >> >> While working on the fix, I investigated 3 approaches: >> (1) unique context per call site >> (2) use CallSite target class >> (3) use a class the CallSite instance is linked to >> >> Considering call sites are ubiquitous (e.g. 10,000s on some octane >> benchmarks), loading a dedicated class for every call site is an >> overkill (even VM anonymous). >> >> CallSite target class >> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is >> also not satisfactory, since it is a compiled LambdaForm VM anonymous >> class, which is heavily shared. It gets context pollution down, but >> still the overhead is quite high. >> >> So, I decided to focus on (3) and ended up with a mixture of (2) & (3). >> >> Comparing to other options, the complications of (3) are: >> - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so >> there should be some default context VM can use >> >> - CallSite instances can be shared and it shouldn't keep the context >> class from unloading; >> >> It motivated a scheme where CallSite context is initialized lazily and >> can change during lifetime. When CallSite is linked with an indy >> instruction, it's context is initialized. Usually, JIT sees CallSite >> instances with initialized context (since it reaches them through indy), >> but if it's not the case and there's no context yet, JIT sets it to >> "default context", which means "use target call site". >> >> I introduced CallSite$DependencyContext, which represents a nmethod >> dependency context and points (indirectly) to a Class used as a context. >> >> Context class is referenced through a phantom reference >> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to >> extract referent using Reference.get(), VM can access it directly by >> reading corresponding field. Unlike other types of references, phantom >> references aren't cleared automatically. It allows VM to access context >> class until cleanup is performed. And cleanup resets the context to >> NULL, in addition to invalidating all relevant dependencies. >> >> There are 3 context states a CallSite instance can be in: >> (1) NULL: no depedencies >> (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in >> call site target class >> (3) DependencyContext for some class: dependencies are stored on the >> class DependencyContext instance points to >> >> Every CallSite starts w/o a context (1) and then lazily gets one ((2) or >> (3) depending on the situation). >> >> State transitions: >> (1->3): When a CallSite w/o a context (1) is linked with some indy >> call site, it's owner is recorded as a context (3). >> >> (1->2): When JIT needs to record a dependency on a target of a >> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and >> uses target class to store the dependency. >> >> (3->1): When context class becomes unreachable, a cleanup hook >> invalidates all dependencies on that CallSite and resets the context to >> NULL (1). >> >> Only (3->1) requires dependency invalidation, because there are no >> depedencies in (1) and (2->1) isn't performed. >> >> (1->3) is done in Java code (CallSite.initContext) and (1->2) is >> performed in VM (ciCallSite::get_context()). The updates are performed >> by CAS, so there's no need in additional synchronization. Other >> operations on VM side are volatile (to play well with Java code) and >> performed with Compile_lock held (to avoid races between VM operations). >> >> Some statistics: >> Box2D, latest jdk9-dev >> - CallSite instances: ~22000 >> >> - invalidated nmethods due to CallSite target changes: ~60 >> >> - checked call_site_target_value dependencies: >> - before the fix: ~1,600,000 >> - after the fix: ~600 >> >> Testing: >> - dedicated test which excercises different state transitions >> - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn >> >> Thanks! >> >> Best regards, >> Vladimir Ivanov From vitalyd at gmail.com Wed Apr 15 14:10:41 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 15 Apr 2015 10:10:41 -0400 Subject: CHA for interfaces in C2 compiler Message-ID: Hi guys, So CHA on classes works nicely in the case of only one subtype loaded. What about interfaces? Currently, it looks like no such optimization/analysis is done. In my experience, there's a substantial amount of code that exposes an interface via some API, but then loads only implementation of it. The interface is used instead of abstract class to allow more flexibility in the future. I fully realize that lots of interfaces have more than 1 implementer loaded at runtime, but I also think it's worthwhile to attempt CHA for them. Is this something that's feasible to do? It would require more class loading dependencies to be tracked, but I'm also fine with having this be an extra flag that I can use to enable/disable this optimization. Thoughts? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitry.samersoff at oracle.com Wed Apr 15 14:24:11 2015 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Wed, 15 Apr 2015 17:24:11 +0300 Subject: RFR(S): 8077832: SA's dumpreplaydata, dumpcfg and buildreplayjars are broken In-Reply-To: References: Message-ID: <552E748B.4090202@oracle.com> Roland, Looks good to me. -Dmitry On 2015-04-15 12:48, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8077832/webrev.00/ > > I found 3 locations where the SA code is out of sync with the hotspot code. > > Roland. > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From forax at univ-mlv.fr Wed Apr 15 14:24:22 2015 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 15 Apr 2015 16:24:22 +0200 Subject: CHA for interfaces in C2 compiler In-Reply-To: References: Message-ID: <552E7496.3080304@univ-mlv.fr> On 04/15/2015 04:10 PM, Vitaly Davidovich wrote: > Hi guys, > > So CHA on classes works nicely in the case of only one subtype > loaded. What about interfaces? Currently, it looks like no such > optimization/analysis is done. In my experience, there's a > substantial amount of code that exposes an interface via some API, but > then loads only implementation of it. The interface is used instead > of abstract class to allow more flexibility in the future. > > I fully realize that lots of interfaces have more than 1 implementer > loaded at runtime, but I also think it's worthwhile to attempt CHA for > them. > > Is this something that's feasible to do? It would require more class > loading dependencies to be tracked, but I'm also fine with having this > be an extra flag that I can use to enable/disable this optimization. > > Thoughts? > > Thanks I've implemented something like this in a language (which has a special syntax for calling Java object). To avoid to have too many metadata, I've used a simple heuristic, the idea is that an interface with a lot of methods do not have a lot of implementations so the runtime only tried to do CHA, using a SwitchPoint, if there were more than 3 methods (included) in the interface. cheers, R?mi From vitalyd at gmail.com Wed Apr 15 14:26:51 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 15 Apr 2015 10:26:51 -0400 Subject: CHA for interfaces in C2 compiler In-Reply-To: <552E7496.3080304@univ-mlv.fr> References: <552E7496.3080304@univ-mlv.fr> Message-ID: A heuristic like that would work for most of my cases as well :). On Wed, Apr 15, 2015 at 10:24 AM, Remi Forax wrote: > > On 04/15/2015 04:10 PM, Vitaly Davidovich wrote: > >> Hi guys, >> >> So CHA on classes works nicely in the case of only one subtype loaded. >> What about interfaces? Currently, it looks like no such >> optimization/analysis is done. In my experience, there's a substantial >> amount of code that exposes an interface via some API, but then loads only >> implementation of it. The interface is used instead of abstract class to >> allow more flexibility in the future. >> >> I fully realize that lots of interfaces have more than 1 implementer >> loaded at runtime, but I also think it's worthwhile to attempt CHA for them. >> >> Is this something that's feasible to do? It would require more class >> loading dependencies to be tracked, but I'm also fine with having this be >> an extra flag that I can use to enable/disable this optimization. >> >> Thoughts? >> >> Thanks >> > > I've implemented something like this in a language (which has a special > syntax for calling Java object). > To avoid to have too many metadata, I've used a simple heuristic, the idea > is that an interface with a lot of methods do not have a lot of > implementations so the runtime only tried to do CHA, using a SwitchPoint, > if there were more than 3 methods (included) in the interface. > > cheers, > R?mi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From staffan.larsen at oracle.com Wed Apr 15 14:28:06 2015 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 15 Apr 2015 16:28:06 +0200 Subject: RFR(S): 8077832: SA's dumpreplaydata, dumpcfg and buildreplayjars are broken In-Reply-To: References: Message-ID: <04E2ECF6-B209-48C8-8B8D-7B28FB1ABCEC@oracle.com> Looks good! Thanks, /Staffan > On 15 apr 2015, at 11:48, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8077832/webrev.00/ > > I found 3 locations where the SA code is out of sync with the hotspot code. > > Roland. From vladimir.x.ivanov at oracle.com Wed Apr 15 14:35:50 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 15 Apr 2015 17:35:50 +0300 Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails with "assert(is_Initialize()) failed: invalid node class" In-Reply-To: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com> References: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com> Message-ID: <552E7746.7050303@oracle.com> Looks good. Best regards, Vladimir Ivanov On 4/15/15 12:17 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8074676/webrev.00/ > > The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped(). > > Roland. > From gustav.r.akesson at gmail.com Wed Apr 15 14:53:46 2015 From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=) Date: Wed, 15 Apr 2015 16:53:46 +0200 Subject: CHA for interfaces in C2 compiler In-Reply-To: References: <552E7496.3080304@univ-mlv.fr> Message-ID: Hi, I was surprised by this finding after reading Shipilev's blog. In the huge Java code base I'm currently working in, we have a significant amount of interfaces with a single implementing class, and hardly any abstract classes. >From a use-case perspective I would gladly welcome an attempt to improve the CHA for interfaces. Best regards, Gustav ?kesson Den 15 apr 2015 16:27 skrev "Vitaly Davidovich" : > A heuristic like that would work for most of my cases as well :). > > On Wed, Apr 15, 2015 at 10:24 AM, Remi Forax wrote: > >> >> On 04/15/2015 04:10 PM, Vitaly Davidovich wrote: >> >>> Hi guys, >>> >>> So CHA on classes works nicely in the case of only one subtype loaded. >>> What about interfaces? Currently, it looks like no such >>> optimization/analysis is done. In my experience, there's a substantial >>> amount of code that exposes an interface via some API, but then loads only >>> implementation of it. The interface is used instead of abstract class to >>> allow more flexibility in the future. >>> >>> I fully realize that lots of interfaces have more than 1 implementer >>> loaded at runtime, but I also think it's worthwhile to attempt CHA for them. >>> >>> Is this something that's feasible to do? It would require more class >>> loading dependencies to be tracked, but I'm also fine with having this be >>> an extra flag that I can use to enable/disable this optimization. >>> >>> Thoughts? >>> >>> Thanks >>> >> >> I've implemented something like this in a language (which has a special >> syntax for calling Java object). >> To avoid to have too many metadata, I've used a simple heuristic, the >> idea is that an interface with a lot of methods do not have a lot of >> implementations so the runtime only tried to do CHA, using a SwitchPoint, >> if there were more than 3 methods (included) in the interface. >> >> cheers, >> R?mi >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Wed Apr 15 15:55:24 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 15 Apr 2015 18:55:24 +0300 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com> Message-ID: <552E89EC.7080900@oracle.com> Roland, thanks for looking into the fix! You are right. I moved VM_ENTRY_MARK to the beginning of the method [1]. Updated webrev in place. http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/ Best regards, Vladimir Ivanov [1] diff --git a/src/share/vm/ci/ciCallSite.cpp b/src/share/vm/ci/ciCallSite.cpp --- a/src/share/vm/ci/ciCallSite.cpp +++ b/src/share/vm/ci/ciCallSite.cpp @@ -55,6 +55,8 @@ // Return the target MethodHandle of this CallSite. ciKlass* ciCallSite::get_context() { assert(!is_constant_call_site(), ""); + + VM_ENTRY_MARK; oop call_site_oop = get_oop(); InstanceKlass* ctxk = MethodHandles::get_call_site_context(call_site_oop); if (ctxk == NULL) { @@ -63,7 +65,6 @@ java_lang_invoke_CallSite::set_context_cas(call_site_oop, def_context_oop, /*expected=*/NULL); ctxk = MethodHandles::get_call_site_context(call_site_oop); } - VM_ENTRY_MARK; return (CURRENT_ENV->get_metadata(ctxk))->as_klass(); } On 4/15/15 1:16 PM, Roland Westrelin wrote: > Hi Vladimir, > >> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/hotspot/ > > In ciCallSite::get_context(), is it safe to manipulate a raw oop the way you do it (with 2 different oops). Can?t it be moved concurrently by the GC? > > Roland. > >> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/jdk/ >> >> Best regards, >> Vladimir Ivanov >> >> On 4/1/15 11:56 PM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/hotspot/ >>> http://cr.openjdk.java.net/~vlivanov/8057967/webrev.00/jdk/ >>> https://bugs.openjdk.java.net/browse/JDK-8057967 >>> >>> HotSpot JITs inline very aggressively through CallSites. The >>> optimistically treat CallSite target as constant, but record a nmethod >>> dependency to invalidate the compiled code once CallSite target changes. >>> >>> Right now, such dependencies have call site class as a context. This >>> context is too coarse and it leads to context pollution: if some >>> CallSite target changes, VM needs to enumerate all nmethods which >>> depends on call sites of such type. >>> >>> As performance analysis in the bug report shows, it can sum to >>> significant amount of work. >>> >>> While working on the fix, I investigated 3 approaches: >>> (1) unique context per call site >>> (2) use CallSite target class >>> (3) use a class the CallSite instance is linked to >>> >>> Considering call sites are ubiquitous (e.g. 10,000s on some octane >>> benchmarks), loading a dedicated class for every call site is an >>> overkill (even VM anonymous). >>> >>> CallSite target class >>> (MethodHandle.form->LambdaForm.vmentry->MemberName.clazz->Class) is >>> also not satisfactory, since it is a compiled LambdaForm VM anonymous >>> class, which is heavily shared. It gets context pollution down, but >>> still the overhead is quite high. >>> >>> So, I decided to focus on (3) and ended up with a mixture of (2) & (3). >>> >>> Comparing to other options, the complications of (3) are: >>> - CallSite can stay unlinked (e.g. CallSite.dynamicInvoker()), so >>> there should be some default context VM can use >>> >>> - CallSite instances can be shared and it shouldn't keep the context >>> class from unloading; >>> >>> It motivated a scheme where CallSite context is initialized lazily and >>> can change during lifetime. When CallSite is linked with an indy >>> instruction, it's context is initialized. Usually, JIT sees CallSite >>> instances with initialized context (since it reaches them through indy), >>> but if it's not the case and there's no context yet, JIT sets it to >>> "default context", which means "use target call site". >>> >>> I introduced CallSite$DependencyContext, which represents a nmethod >>> dependency context and points (indirectly) to a Class used as a context. >>> >>> Context class is referenced through a phantom reference >>> (sun.misc.Cleaner to simplify cleanup). Though it's impossible to >>> extract referent using Reference.get(), VM can access it directly by >>> reading corresponding field. Unlike other types of references, phantom >>> references aren't cleared automatically. It allows VM to access context >>> class until cleanup is performed. And cleanup resets the context to >>> NULL, in addition to invalidating all relevant dependencies. >>> >>> There are 3 context states a CallSite instance can be in: >>> (1) NULL: no depedencies >>> (2) DependencyContext.DEFAULT_CONTEXT: dependencies are stored in >>> call site target class >>> (3) DependencyContext for some class: dependencies are stored on the >>> class DependencyContext instance points to >>> >>> Every CallSite starts w/o a context (1) and then lazily gets one ((2) or >>> (3) depending on the situation). >>> >>> State transitions: >>> (1->3): When a CallSite w/o a context (1) is linked with some indy >>> call site, it's owner is recorded as a context (3). >>> >>> (1->2): When JIT needs to record a dependency on a target of a >>> CallSite w/o a context(1), it sets the context to DEFAULT_CONTEXT and >>> uses target class to store the dependency. >>> >>> (3->1): When context class becomes unreachable, a cleanup hook >>> invalidates all dependencies on that CallSite and resets the context to >>> NULL (1). >>> >>> Only (3->1) requires dependency invalidation, because there are no >>> depedencies in (1) and (2->1) isn't performed. >>> >>> (1->3) is done in Java code (CallSite.initContext) and (1->2) is >>> performed in VM (ciCallSite::get_context()). The updates are performed >>> by CAS, so there's no need in additional synchronization. Other >>> operations on VM side are volatile (to play well with Java code) and >>> performed with Compile_lock held (to avoid races between VM operations). >>> >>> Some statistics: >>> Box2D, latest jdk9-dev >>> - CallSite instances: ~22000 >>> >>> - invalidated nmethods due to CallSite target changes: ~60 >>> >>> - checked call_site_target_value dependencies: >>> - before the fix: ~1,600,000 >>> - after the fix: ~600 >>> >>> Testing: >>> - dedicated test which excercises different state transitions >>> - jdk/java/lang/invoke, hotspot/test/compiler/jsr292, nashorn >>> >>> Thanks! >>> >>> Best regards, >>> Vladimir Ivanov > From vladimir.x.ivanov at oracle.com Wed Apr 15 16:26:15 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 15 Apr 2015 19:26:15 +0300 Subject: CHA for interfaces in C2 compiler In-Reply-To: References: Message-ID: <552E9127.9030908@oracle.com> Vitaly, Type profiling reliably detects single interface implementation cases and type check overhead is completely eliminated in most of the cases (type checks are aggressively commoned). Do you still think it is worth an effort? Best regards, Vladimir Ivanov On 4/15/15 5:10 PM, Vitaly Davidovich wrote: > Hi guys, > > So CHA on classes works nicely in the case of only one subtype loaded. > What about interfaces? Currently, it looks like no such > optimization/analysis is done. In my experience, there's a substantial > amount of code that exposes an interface via some API, but then loads > only implementation of it. The interface is used instead of abstract > class to allow more flexibility in the future. > > I fully realize that lots of interfaces have more than 1 implementer > loaded at runtime, but I also think it's worthwhile to attempt CHA for them. > > Is this something that's feasible to do? It would require more class > loading dependencies to be tracked, but I'm also fine with having this > be an extra flag that I can use to enable/disable this optimization. > > Thoughts? > > Thanks From vitalyd at gmail.com Wed Apr 15 16:37:50 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 15 Apr 2015 12:37:50 -0400 Subject: CHA for interfaces in C2 compiler In-Reply-To: <552E9127.9030908@oracle.com> References: <552E9127.9030908@oracle.com> Message-ID: Hi Vladimir, Here's what I see on 7u60: private static int doIt(final Foo f) { return f.num(); } interface Foo { int num(); } final class FooImpl implements Foo { @Override public int num() { return 1; } } Running a simple test where only FooImpl is loaded (in fact, it's the only impl period) produces the following asm (stripped down to essentials): 0x00007f0b31e14a6c: mov 0x8(%rsi),%r10d ; implicit exception: dispatches to 0x00007f0b31e14a9d 0x00007f0b31e14a70: cmp $0x71c9e068,%r10d ; {oop('FooImpl')} 0x00007f0b31e14a77: jne 0x00007f0b31e14a8a 0x00007f0b31e14a79: mov $0x1,%eax 0x00007f0b31e14a7e: add $0x10,%rsp 0x00007f0b31e14a82: pop %rbp If I change Foo to be an abstract class, we get this: 0x00007f0209deb18c: test %rsi,%rsi 0x00007f0209deb18f: je 0x00007f0209deb1a2 0x00007f0209deb191: mov $0x1,%eax 0x00007f0209deb196: add $0x10,%rsp 0x00007f0209deb19a: pop %rbp So there's an explicit null check but no type check. Did something change in java 8 or 9 that leads you to say "completely eliminated"? Thanks On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Vitaly, > > Type profiling reliably detects single interface implementation cases and > type check overhead is completely eliminated in most of the cases (type > checks are aggressively commoned). > > Do you still think it is worth an effort? > > Best regards, > Vladimir Ivanov > > > On 4/15/15 5:10 PM, Vitaly Davidovich wrote: > >> Hi guys, >> >> So CHA on classes works nicely in the case of only one subtype loaded. >> What about interfaces? Currently, it looks like no such >> optimization/analysis is done. In my experience, there's a substantial >> amount of code that exposes an interface via some API, but then loads >> only implementation of it. The interface is used instead of abstract >> class to allow more flexibility in the future. >> >> I fully realize that lots of interfaces have more than 1 implementer >> loaded at runtime, but I also think it's worthwhile to attempt CHA for >> them. >> >> Is this something that's feasible to do? It would require more class >> loading dependencies to be tracked, but I'm also fine with having this >> be an extra flag that I can use to enable/disable this optimization. >> >> Thoughts? >> >> Thanks >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.westrelin at oracle.com Wed Apr 15 16:43:04 2015 From: roland.westrelin at oracle.com (Roland Westrelin) Date: Wed, 15 Apr 2015 18:43:04 +0200 Subject: [9] RFR (M): 8057967: CallSite dependency tracking scales devastatingly poorly In-Reply-To: <552E89EC.7080900@oracle.com> References: <551C5B92.8060500@oracle.com> <552527E1.5060102@oracle.com> <552E89EC.7080900@oracle.com> Message-ID: > http://cr.openjdk.java.net/~vlivanov/8057967/webrev.01/ That looks good to me. Roland. From vladimir.x.ivanov at oracle.com Wed Apr 15 17:02:23 2015 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 15 Apr 2015 20:02:23 +0300 Subject: CHA for interfaces in C2 compiler In-Reply-To: References: <552E9127.9030908@oracle.com> Message-ID: <552E999F.1080707@oracle.com> Nothing changed in 8 & 9 in this respect. You are looking on a microbenchmark, where you have a trivial method with contains just a single call. My point is that it's a corner case and you shouldn't notice the difference in a larger application. Null checks are pervasive on Java level, but for JIT compiler it is enough to perform it only once on a value to known the value is non-null afterwards. The same applies to exact type checks: dominating exact type check eliminates the need to repeat the type check. It is recorded in C2 type system and propagated to all usages. Every place where type profiling for that interface happens a single exact type will be recorded. Please, note that CHA is more generic and covers the cases when numerous classes have a single method implementation. Type profiling is usually useless in such case. But in your example there's a single implementing class, so type profile works fine. Best regards, Vladimir Ivanov On 4/15/15 7:37 PM, Vitaly Davidovich wrote: > Hi Vladimir, > > Here's what I see on 7u60: > > private static int doIt(final Foo f) { > return f.num(); > } > > interface Foo > { > int num(); > } > > final class FooImpl implements Foo > { > @Override > public int num() { > return 1; > } > } > > Running a simple test where only FooImpl is loaded (in fact, it's the > only impl period) produces the following asm (stripped down to essentials): > > 0x00007f0b31e14a6c: mov 0x8(%rsi),%r10d ; implicit exception: > dispatches to 0x00007f0b31e14a9d > 0x00007f0b31e14a70: cmp $0x71c9e068,%r10d ; {oop('FooImpl')} > 0x00007f0b31e14a77: jne 0x00007f0b31e14a8a > 0x00007f0b31e14a79: mov $0x1,%eax > 0x00007f0b31e14a7e: add $0x10,%rsp > 0x00007f0b31e14a82: pop %rbp > > If I change Foo to be an abstract class, we get this: > > 0x00007f0209deb18c: test %rsi,%rsi > 0x00007f0209deb18f: je 0x00007f0209deb1a2 > 0x00007f0209deb191: mov $0x1,%eax > 0x00007f0209deb196: add $0x10,%rsp > 0x00007f0209deb19a: pop %rbp > > So there's an explicit null check but no type check. > > Did something change in java 8 or 9 that leads you to say "completely > eliminated"? > > Thanks > > On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov > > wrote: > > Vitaly, > > Type profiling reliably detects single interface implementation > cases and type check overhead is completely eliminated in most of > the cases (type checks are aggressively commoned). > > Do you still think it is worth an effort? > > Best regards, > Vladimir Ivanov > > > On 4/15/15 5:10 PM, Vitaly Davidovich wrote: > > Hi guys, > > So CHA on classes works nicely in the case of only one subtype > loaded. > What about interfaces? Currently, it looks like no such > optimization/analysis is done. In my experience, there's a > substantial > amount of code that exposes an interface via some API, but then > loads > only implementation of it. The interface is used instead of > abstract > class to allow more flexibility in the future. > > I fully realize that lots of interfaces have more than 1 implementer > loaded at runtime, but I also think it's worthwhile to attempt > CHA for them. > > Is this something that's feasible to do? It would require more class > loading dependencies to be tracked, but I'm also fine with > having this > be an extra flag that I can use to enable/disable this optimization. > > Thoughts? > > Thanks > > From vladimir.kozlov at oracle.com Wed Apr 15 17:23:02 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Apr 2015 10:23:02 -0700 Subject: RFR(XS): 8074676: java.lang.invoke.PermuteArgsTest.java fails with "assert(is_Initialize()) failed: invalid node class" In-Reply-To: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com> References: <4D0538B9-FA75-46FE-8466-C6F75791DE0C@oracle.com> Message-ID: <552E9E76.1080006@oracle.com> Good. Thanks, Vladimir On 4/15/15 2:17 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8074676/webrev.00/ > > The guards that I added in the Arrays.copyOf() intrinsic can cause the control to become top. The code is missing a check for stopped(). > > Roland. > From vladimir.kozlov at oracle.com Wed Apr 15 17:24:01 2015 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Apr 2015 10:24:01 -0700 Subject: RFR(S): 8077832: SA's dumpreplaydata, dumpcfg and buildreplayjars are broken In-Reply-To: References: Message-ID: <552E9EB1.6070607@oracle.com> Looks good. Thanks, Vladimir On 4/15/15 2:48 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8077832/webrev.00/ > > I found 3 locations where the SA code is out of sync with the hotspot code. > > Roland. > From vitalyd at gmail.com Wed Apr 15 17:40:18 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 15 Apr 2015 13:40:18 -0400 Subject: CHA for interfaces in C2 compiler In-Reply-To: <552E999F.1080707@oracle.com> References: <552E9127.9030908@oracle.com> <552E999F.1080707@oracle.com> Message-ID: So I'm not worried about null checks because they're actually handled really well. They're also typically a quick test against a register if not using implicit checking via trap. As for propagating type information, I'm assuming this information is propagated into the inlined code only -- if anything fails to inline, it will not receive this information and will perform the same type check, is that right? It's hard to argue against "this is a microbenchmark, larger code won't notice the difference", but when you have code that's "scattered" around (i.e. not all inlined in the same place) then it sounds like this check will still be performed at each of those places. In a complex call graph, it's not realistic to expect the entire thing to inline (for good reason) -- there are going to be islands. My thinking here is that given this analysis exists for classes (and works really well), extending it to interfaces (using a heuristic like Remi's, a flag, etc) would be profitable in some places. On Wed, Apr 15, 2015 at 1:02 PM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Nothing changed in 8 & 9 in this respect. > > You are looking on a microbenchmark, where you have a trivial method with > contains just a single call. My point is that it's a corner case and you > shouldn't notice the difference in a larger application. > > Null checks are pervasive on Java level, but for JIT compiler it is enough > to perform it only once on a value to known the value is non-null > afterwards. > > The same applies to exact type checks: dominating exact type check > eliminates the need to repeat the type check. It is recorded in C2 type > system and propagated to all usages. > > Every place where type profiling for that interface happens a single exact > type will be recorded. > > Please, note that CHA is more generic and covers the cases when numerous > classes have a single method implementation. Type profiling is usually > useless in such case. > > But in your example there's a single implementing class, so type profile > works fine. > > Best regards, > Vladimir Ivanov > > > On 4/15/15 7:37 PM, Vitaly Davidovich wrote: > >> Hi Vladimir, >> >> Here's what I see on 7u60: >> >> private static int doIt(final Foo f) { >> return f.num(); >> } >> >> interface Foo >> { >> int num(); >> } >> >> final class FooImpl implements Foo >> { >> @Override >> public int num() { >> return 1; >> } >> } >> >> Running a simple test where only FooImpl is loaded (in fact, it's the >> only impl period) produces the following asm (stripped down to >> essentials): >> >> 0x00007f0b31e14a6c: mov 0x8(%rsi),%r10d ; implicit exception: >> dispatches to 0x00007f0b31e14a9d >> 0x00007f0b31e14a70: cmp $0x71c9e068,%r10d ; {oop('FooImpl')} >> 0x00007f0b31e14a77: jne 0x00007f0b31e14a8a >> 0x00007f0b31e14a79: mov $0x1,%eax >> 0x00007f0b31e14a7e: add $0x10,%rsp >> 0x00007f0b31e14a82: pop %rbp >> >> If I change Foo to be an abstract class, we get this: >> >> 0x00007f0209deb18c: test %rsi,%rsi >> 0x00007f0209deb18f: je 0x00007f0209deb1a2 >> 0x00007f0209deb191: mov $0x1,%eax >> 0x00007f0209deb196: add $0x10,%rsp >> 0x00007f0209deb19a: pop %rbp >> >> So there's an explicit null check but no type check. >> >> Did something change in java 8 or 9 that leads you to say "completely >> eliminated"? >> >> Thanks >> >> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov >> > >> wrote: >> >> Vitaly, >> >> Type profiling reliably detects single interface implementation >> cases and type check overhead is completely eliminated in most of >> the cases (type checks are aggressively commoned). >> >> Do you still think it is worth an effort? >> >> Best regards, >> Vladimir Ivanov >> >> >> On 4/15/15 5:10 PM, Vitaly Davidovich wrote: >> >> Hi guys, >> >> So CHA on classes works nicely in the case of only one subtype >> loaded. >> What about interfaces? Currently, it looks like no such >> optimization/analysis is done. In my experience, there's a >> substantial >> amount of code that exposes an interface via some API, but then >> loads >> only implementation of it. The interface is used instead of >> abstract >> class to allow more flexibility in the future. >> >> I fully realize that lots of interfaces have more than 1 >> implementer >> loaded at runtime, but I also think it's worthwhile to attempt >> CHA for them. >> >> Is this something that's feasible to do? It would require more >> class >> loading dependencies to be tracked, but I'm also fine with >> having this >> be an extra flag that I can use to enable/disable this >> optimization. >> >> Thoughts? >> >> Thanks >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Apr 15 18:39:00 2015 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 15 Apr 2015 14:39:00 -0400 Subject: CHA for interfaces in C2 compiler In-Reply-To: References: <552E9127.9030908@oracle.com> <552E999F.1080707@oracle.com> Message-ID: By the way, just to be clear - my main gripe with the type check is that it loads memory (class pointer in the header) which can take an unnecessary cache miss if no instance data is used or instance data to be used is at least cacheline size bytes away from the header; the cmp+jmp is not ideal but secondary. sent from my phone On Apr 15, 2015 1:40 PM, "Vitaly Davidovich" wrote: > So I'm not worried about null checks because they're actually handled > really well. They're also typically a quick test against a register if not > using implicit checking via trap. > > As for propagating type information, I'm assuming this information is > propagated into the inlined code only -- if anything fails to inline, it > will not receive this information and will perform the same type check, is > that right? It's hard to argue against "this is a microbenchmark, larger > code won't notice the difference", but when you have code that's > "scattered" around (i.e. not all inlined in the same place) then it sounds > like this check will still be performed at each of those places. In a > complex call graph, it's not realistic to expect the entire thing to inline > (for good reason) -- there are going to be islands. My thinking here is > that given this analysis exists for classes (and works really well), > extending it to interfaces (using a heuristic like Remi's, a flag, etc) > would be profitable in some places. > > > On Wed, Apr 15, 2015 at 1:02 PM, Vladimir Ivanov < > vladimir.x.ivanov at oracle.com> wrote: > >> Nothing changed in 8 & 9 in this respect. >> >> You are looking on a microbenchmark, where you have a trivial method with >> contains just a single call. My point is that it's a corner case and you >> shouldn't notice the difference in a larger application. >> >> Null checks are pervasive on Java level, but for JIT compiler it is >> enough to perform it only once on a value to known the value is non-null >> afterwards. >> >> The same applies to exact type checks: dominating exact type check >> eliminates the need to repeat the type check. It is recorded in C2 type >> system and propagated to all usages. >> >> Every place where type profiling for that interface happens a single >> exact type will be recorded. >> >> Please, note that CHA is more generic and covers the cases when numerous >> classes have a single method implementation. Type profiling is usually >> useless in such case. >> >> But in your example there's a single implementing class, so type profile >> works fine. >> >> Best regards, >> Vladimir Ivanov >> >> >> On 4/15/15 7:37 PM, Vitaly Davidovich wrote: >> >>> Hi Vladimir, >>> >>> Here's what I see on 7u60: >>> >>> private static int doIt(final Foo f) { >>> return f.num(); >>> } >>> >>> interface Foo >>> { >>> int num(); >>> } >>> >>> final class FooImpl implements Foo >>> { >>> @Override >>> public int num() { >>> return 1; >>> } >>> } >>> >>> Running a simple test where only FooImpl is loaded (in fact, it's the >>> only impl period) produces the following asm (stripped down to >>> essentials): >>> >>> 0x00007f0b31e14a6c: mov 0x8(%rsi),%r10d ; implicit exception: >>> dispatches to 0x00007f0b31e14a9d >>> 0x00007f0b31e14a70: cmp $0x71c9e068,%r10d ; {oop('FooImpl')} >>> 0x00007f0b31e14a77: jne 0x00007f0b31e14a8a >>> 0x00007f0b31e14a79: mov $0x1,%eax >>> 0x00007f0b31e14a7e: add $0x10,%rsp >>> 0x00007f0b31e14a82: pop %rbp >>> >>> If I change Foo to be an abstract class, we get this: >>> >>> 0x00007f0209deb18c: test %rsi,%rsi >>> 0x00007f0209deb18f: je 0x00007f0209deb1a2 >>> 0x00007f0209deb191: mov $0x1,%eax >>> 0x00007f0209deb196: add $0x10,%rsp >>> 0x00007f0209deb19a: pop %rbp >>> >>> So there's an explicit null check but no type check. >>> >>> Did something change in java 8 or 9 that leads you to say "completely >>> eliminated"? >>> >>> Thanks >>> >>> On Wed, Apr 15, 2015 at 12:26 PM, Vladimir Ivanov >>> > >>> wrote: >>> >>> Vitaly, >>> >>> Type profiling reliably detects single interface implementation >>> cases and type check overhead is completely eliminated in most of >>> the cases (type checks are aggressively commoned). >>> >>> Do you still think it is worth an effort? >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> >>> On 4/15/15 5:10 PM, Vitaly Davidovich wrote: >>> >>> Hi guys, >>> >>> So CHA on classes works nicely in the case of only one subtype >>> loaded. >>> What about interfaces? Currently, it looks like no such >>> optimization/analysis is done. In my experience, there's a >>> substantial >>> amount of code that exposes an interface via some API, but then >>> loads >>> only implementation of it. The interface is used instead of >>> abstract >>> class to allow more flexibility in the future. >>> >>> I fully realize that lots of interfaces have more than 1 >>> implementer >>> loaded at runtime, but I also think it's worthwhile to attempt >>> CHA for them. >>> >>> Is this something that's feasible to do? It would require more >>> class >>> loading dependencies to be tracked, but I'm also fine with >>> having this >>> be an extra flag that I can use to enable/disable this >>> optimization. >>> >>> Thoughts? >>> >>> Thanks >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Wed Apr 15 19:47:54 2015 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Wed, 15 Apr 2015 12:47:54 -0700 Subject: Java 8 TieredCompilation Blacklist? In-Reply-To: <552CE558.5040602@finkzeit.at> References: <552CE558.5040602@finkzeit.at> Message-ID: <55FB246C-621D-4C5E-AB04-193F4A6BA457@oracle.com> exclude is what you want: $ java -XX:CompileCommand=help The CompileCommand option enables the user of the JVM to control specific behavior of the dynamic compilers. Many commands require a pattern that defines the set of methods the command shall be applied to. The CompileCommand option provides the following commands: break, - debug breakpoint in compiler and in generated code print, - print assembly exclude, - don't compile or inline inline, - always inline dontinline, - don't inline compileonly, - compile only log, - log compilation option,,